Hi, I’m Eric and I work at a big chip company making chips and such! I do math for a job, but it’s cold hard stochastic optimization that makes people who know names like Tychonoff and Sylow weep.

My pfp is Hank Azaria in Heat, but you already knew that.

  • 3 Posts
  • 8 Comments
Joined 2 years ago
cake
Cake day: January 22nd, 2024

help-circle



  • Deep thinker asks why?

    Thus spoketh the Yud: “The weird part is that DOGE is happening 0.5-2 years before the point where you actually could get an AGI cluster to go in and judge every molecule of government. Out of all the American generations, why is this happening now, that bare bit too early?”

    Yud, you sweet naive smol uwu babyesian boi, how gullible do you have to be to believe that a) tminus 6 months to AGI kek (do people track these dog shit predictions?) b) the purpose of DOGE is just accountability and definitely not the weaponized manifestation of techno oligarchy ripping apart our society for the copper wiring in the walls?


  • Reposting this for the new week thread since it truly is a record of how untrustworthy sammy and co are. Remember how OAI claimed that O3 had displayed superhuman levels on the mega hard Frontier Math exam written by Fields Medalist? Funny/totally not fishy story haha. Turns out OAI had exclusive access to that test for months and funded its creation and refused to let the creators of test publicly acknowledge this until after OAI did their big stupid magic trick.

    From Subbarao Kambhampati via linkedIn:

    "𝐎𝐧 𝐭𝐡𝐞 𝐬𝐞𝐞𝐝𝐲 𝐨𝐩𝐭𝐢𝐜𝐬 𝐨𝐟 “𝑩𝒖𝒊𝒍𝒅𝒊𝒏𝒈 𝒂𝒏 𝑨𝑮𝑰 𝑴𝒐𝒂𝒕 𝒃𝒚 𝑪𝒐𝒓𝒓𝒂𝒍𝒍𝒊𝒏𝒈 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌 𝑪𝒓𝒆𝒂𝒕𝒐𝒓𝒔” hashtag#SundayHarangue. One of the big reasons for the increased volume of “𝐀𝐆𝐈 𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰” hype has been o3’s performance on the “frontier math” benchmark–something that other models basically had no handle on.

    We are now being told (https://lnkd.in/gUaGKuAE) that this benchmark data may have been exclusively available (https://lnkd.in/g5E3tcse) to OpenAI since before o1–and that the benchmark creators were not allowed to disclose this *until after o3 *.

    That o3 does well on frontier math held-out set is impressive, no doubt, but the mental picture of “𝒐1/𝒐3 𝒘𝒆𝒓𝒆 𝒋𝒖𝒔𝒕 𝒃𝒆𝒊𝒏𝒈 𝒕𝒓𝒂𝒊𝒏𝒆𝒅 𝒐𝒏 𝒔𝒊𝒎𝒑𝒍𝒆 𝒎𝒂𝒕𝒉, 𝒂𝒏𝒅 𝒕𝒉𝒆𝒚 𝒃𝒐𝒐𝒕𝒔𝒕𝒓𝒂𝒑𝒑𝒆𝒅 𝒕𝒉𝒆𝒎𝒔𝒆𝒍𝒗𝒆𝒔 𝒕𝒐 𝒇𝒓𝒐𝒏𝒕𝒊𝒆𝒓 𝒎𝒂𝒕𝒉”–that the AGI tomorrow crowd seem to have–that 𝘖𝘱𝘦𝘯𝘈𝘐 𝘸𝘩𝘪𝘭𝘦 𝘯𝘰𝘵 𝘦𝘹𝘱𝘭𝘪𝘤𝘪𝘵𝘭𝘺 𝘤𝘭𝘢𝘪𝘮𝘪𝘯𝘨, 𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘭𝘺 𝘥𝘪𝘥𝘯’𝘵 𝘥𝘪𝘳𝘦𝘤𝘵𝘭𝘺 𝘤𝘰𝘯𝘵𝘳𝘢𝘥𝘪𝘤𝘵–is shattered by this. (I have, in fact, been grumbling to my students since o3 announcement that I don’t completely believe that OpenAI didn’t have access to the Olympiad/Frontier Math data before hand… )

    I do think o1/o3 are impressive technical achievements (see https://lnkd.in/gvVqmTG9 )

    𝑫𝒐𝒊𝒏𝒈 𝒘𝒆𝒍𝒍 𝒐𝒏 𝒉𝒂𝒓𝒅 𝒃𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌𝒔 𝒕𝒉𝒂𝒕 𝒚𝒐𝒖 𝒉𝒂𝒅 𝒑𝒓𝒊𝒐𝒓 𝒂𝒄𝒄𝒆𝒔𝒔 𝒕𝒐 𝒊𝒔 𝒔𝒕𝒊𝒍𝒍 𝒊𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒗𝒆–𝒃𝒖𝒕 𝒅𝒐𝒆𝒔𝒏’𝒕 𝒒𝒖𝒊𝒕𝒆 𝒔𝒄𝒓𝒆𝒂𝒎 “𝑨𝑮𝑰 𝑻𝒐𝒎𝒐𝒓𝒓𝒐𝒘.”

    We all know that data contamination is an issue with LLMs and LRMs. We also know that reasoning claims need more careful vetting than “𝘸𝘦 𝘥𝘪𝘥𝘯’𝘵 𝘴𝘦𝘦 𝘵𝘩𝘢𝘵 𝘴𝘱𝘦𝘤𝘪𝘧𝘪𝘤 𝘱𝘳𝘰𝘣𝘭𝘦𝘮 𝘪𝘯𝘴𝘵𝘢𝘯𝘤𝘦 𝘥𝘶𝘳𝘪𝘯𝘨 𝘵𝘳𝘢𝘪𝘯𝘪𝘯𝘨” (see “In vs. Out of Distribution analyses are not that useful for understanding LLM reasoning capabilities” https://lnkd.in/gZ2wBM_F ).

    At the very least, this episode further argues for increased vigilance/skepticism on the part of AI research community in how they parse the benchmark claims put out commercial entities."

    Big stupid snake oil strikes again.




  • Big Yud: You try to explain how airplane fuel can melt a skyscraper, but your calculation doesn’t include relativistic effects, and then the 9/11 conspiracy theorists spend the next 10 years talking about how you deny relativity.

    Similarly: A paperclip maximizer is not “monomoniacally” “focused” on paperclips. We talked about a superintelligence that wanted 1 thing, because you get exactly the same results as from a superintelligence that wants paperclips and staples (2 things), or from a superintelligence that wants 100 things. The number of things It wants bears zero relevance to anything. It’s just easier to explain the mechanics if you start with a superintelligence that wants 1 thing, because you can talk about how It evaluates “number of expected paperclips resulting from an action” instead of “expected paperclips * 2 + staples * 3 + giant mechanical clocks * 1000” and onward for a hundred other terms of Its utility function that all asymptote at different rates.

    The only load-bearing idea is that none of the things It wants are galaxies full of fun-having sentient beings who care about each other. And the probability of 100 uncontrolled utility function components including one term for Fun are ~0, just like it would be for 10 components, 1 component, or 1000 components. 100 tries at having monkeys generate Shakespeare has ~0 probability of succeeding, just the same for all practical purposes as 1 try.

    (If a googol monkeys are all generating using English letter-triplet probabilities in a Markov chain, their probability of generating Shakespeare is vastly higher but still effectively zero. Remember this Markov Monkey Fallacy anytime somebody talks about how LLMs are being trained on human text and therefore are much more likely up with human values; an improbable outcome can be rendered “much more likely” while still being not likely enough.)

    An unaligned superintelligence is “monomaniacal” in only and exactly the same way that you monomaniacally focus on all that stuff you care about instead of organizing piles of dust specks into prime-numbered heaps. From the perspective of something that cares purely about prime dust heaps, you’re monomaniacally focused on all that human stuff, and it can’t talk you into caring about prime dust heaps instead. But that’s not because you’re so incredibly focused on your own thing to the exclusion of its thing, it’s just, prime dust heaps are not inside the list of things you’d even consider. It doesn’t matter, from their perspective, that you want a lot of stuff instead of just one thing. You want the human stuff, and the human stuff, simple or complicated, doesn’t include making sure that dust heaps contain a prime number of dust specks.

    Any time you hear somebody talking about the “monomaniacal” paperclip maximizer scenario, they have failed to understand what the problem was supposed to be; failed at imagining alien minds as entities in their own right rather than mutated humans; and failed at understanding how to work with simplified models that give the same results as complicated models