The absolute state of AI results. - Page 81

BradZax · #**801** Yesterday, 10:42 PM

Quote:

Originally Posted by Ekco [You must be logged in to view images. Log in or Register.]

It actually isn't, I went and scrolled the paper lol

You're misunderstanding the point of the paper.

I'm not saying these researchers are everyday users.

These are the exact senior scientists at Google DeepMind who build and evaluate Gemini.

The chart isn't supposed to show how a casual user prompts a model; it's a technical evaluation from the actual creators of the AI showing how the underlying architecture behaves.

They absolutely 'matter' because they build the tech.

Ekco · #**802** Today, 01:17 AM

My beef is with the METR graph which isn't mentioned once in the paper nor are they, its a sensational Berkeley nonprofit writing disingenuous misleading tests to show the outcome they want, working backwards from ai in scifi scary so we should stop just like they work backward from cows fart too much so you shouldn't be allowed to have a hamburger

[You must be logged in to view images. Log in or Register.]

If you do click one of the dots it does list way past 16hours shown but the methodology of the test itself and the chart are both misleading to show scary exponential growth, nothing changed with how the models themselves are fundementally built, it's stuff around the model that is improving, the harness & MoE. You can stick a earlier model in a harness and let it run for days also but the chart doesn't show that they cap gpt5 at 6 hours and previous models are 12-30 minutes, if you run the same opus they have ranked as they do without a harness it scores will be completely different

Quote:

This doesn’t mean opus 4.6 can work for ~14 hrs, it means on tasks that would take a human expert ~14 hrs, the agent successfully finishes them 50% of the time. Probably completes them way faster actually

Their either turbo retarded or knowily commiting academic fraud for political/funding reasons, nobody there even works at a frontier lab I assume just nonprofit advocacy fart huffing from what I can tell

https://summify.io/discover/is-ai-ab...-s-not-5GezB1/ one click bait YouTuber to counter another

And the Google fanfic thought expirement about the timeline of one sci fi concept progressing into another sci fi theoretical concept itself I have no issue with