![]() |
|
#801
|
||||
|
Quote:
I'm not saying these researchers are everyday users. These are the exact senior scientists at Google DeepMind who build and evaluate Gemini. The chart isn't supposed to show how a casual user prompts a model; it's a technical evaluation from the actual creators of the AI showing how the underlying architecture behaves. They absolutely 'matter' because they build the tech. | |||
|
#802
|
||||
|
My beef is with the METR graph which isn't mentioned once in the paper nor are they, its a sensational Berkeley nonprofit writing disingenuous misleading tests to show the outcome they want, working backwards from ai in scifi scary so we should stop just like they work backward from cows fart too much so you shouldn't be allowed to have a hamburger
[You must be logged in to view images. Log in or Register.] If you do click one of the dots it does list way past 16hours shown but the methodology of the test itself and the chart are both misleading to show scary exponential growth, nothing changed with how the models themselves are fundementally built, it's stuff around the model that is improving, the harness & MoE. You can stick a earlier model in a harness and let it run for days also but the chart doesn't show that they cap gpt5 at 6 hours and previous models are 12-30 minutes, if you run the same opus they have ranked as they do without a harness it scores will be completely different Quote:
https://summify.io/discover/is-ai-ab...-s-not-5GezB1/ one click bait YouTuber to counter another And the Google fanfic thought expirement about the timeline of one sci fi concept progressing into another sci fi theoretical concept itself I have no issue with | |||
![]() |
|
|