• 0 Posts
  • 42 Comments
Joined 1 year ago
cake
Cake day: February 6th, 2024

help-circle
  • That o3 does well on frontier math held-out set is impressive, no doubt

    I think there is plenty of room for doubt still. elliotglazer on reddit writes:

    Epochā€™s lead mathematician here. Yes, OAI funded this and has the dataset, which allowed them to evaluate o3 in-house. We havenā€™t yet independently verified their 25% claim. To do so, weā€™re currently developing a hold-out dataset and will be able to test their model without them having any prior exposure to these problems.

    My personal opinion is that OAIā€™s score is legit (i.e., they didnā€™t train on the dataset), and that they have no incentive to lie about internal benchmarking performances. However, we canā€™t vouch for them until our independent evaluation is complete.

    (emphasis mine). So there is good reason to doubt that the ā€œheld-out datasetā€ even exists.






  • I read one of the papers. About the specific question you have: given a string of bits s, theyā€™re making the choice to associate the empirical distribution to s, as if s was generated by an iid Bernoulli process. So if s has 10 zero bits and 30 one bits, its associated empirical distribution is Ber(3/4). This is the distribution which theyā€™re calculating the entropy of. I have no idea on what basis they are making this choice.

    The rest of the paper didnā€™t make sense to me - they are somehow assigning a number N of ā€œinformation statesā€ which can change over time as the memory cells fail. I honestly have no idea what itā€™s supposed to mean and kinda suspect the whole thing is rubbish.

    Edit: after reading the authorā€™s quotes from the associated hype article Iā€™m 100% sure itā€™s rubbish. Itā€™s also really funny that they didnā€™t manage to catch the COVID-19 research hype train so theyā€™ve pivoted to the simulation hypothesis.