Project 1999

Go Back   Project 1999 > General Community > Off Topic

Reply
 
Thread Tools Display Modes
  #801  
Old Yesterday, 10:42 PM
BradZax BradZax is offline
Fire Giant


Join Date: Dec 2025
Posts: 763
Default

Quote:
Originally Posted by Ekco [You must be logged in to view images. Log in or Register.]
It actually isn't, I went and scrolled the paper lol
You're misunderstanding the point of the paper.

I'm not saying these researchers are everyday users.

These are the exact senior scientists at Google DeepMind who build and evaluate Gemini.

The chart isn't supposed to show how a casual user prompts a model; it's a technical evaluation from the actual creators of the AI showing how the underlying architecture behaves.

They absolutely 'matter' because they build the tech.
Reply With Quote
  #802  
Old Today, 01:17 AM
Ekco Ekco is offline
Planar Protector

Ekco's Avatar

Join Date: Jan 2023
Location: Felwithe
Posts: 5,367
Default

My beef is with the METR graph which isn't mentioned once in the paper nor are they, its a sensational Berkeley nonprofit writing disingenuous misleading tests to show the outcome they want, working backwards from ai in scifi scary so we should stop just like they work backward from cows fart too much so you shouldn't be allowed to have a hamburger

[You must be logged in to view images. Log in or Register.]

If you do click one of the dots it does list way past 16hours shown but the methodology of the test itself and the chart are both misleading to show scary exponential growth, nothing changed with how the models themselves are fundementally built, it's stuff around the model that is improving, the harness & MoE. You can stick a earlier model in a harness and let it run for days also but the chart doesn't show that they cap gpt5 at 6 hours and previous models are 12-30 minutes, if you run the same opus they have ranked as they do without a harness it scores will be completely different

Quote:
This doesn’t mean opus 4.6 can work for ~14 hrs, it means on tasks that would take a human expert ~14 hrs, the agent successfully finishes them 50% of the time. Probably completes them way faster actually
Their either turbo retarded or knowily commiting academic fraud for political/funding reasons, nobody there even works at a frontier lab I assume just nonprofit advocacy fart huffing from what I can tell

https://summify.io/discover/is-ai-ab...-s-not-5GezB1/ one click bait YouTuber to counter another

And the Google fanfic thought expirement about the timeline of one sci fi concept progressing into another sci fi theoretical concept itself I have no issue with
__________________
Ekco - 60 Wiz // Oshieh - 60 Dru // Tpow - 59 Nec // Kusanagi - 54 Pal // Losthawk - 52 Rng // Tiltuesday - EC mule
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 02:23 AM.


Everquest is a registered trademark of Daybreak Game Company LLC.
Project 1999 is not associated or affiliated in any way with Daybreak Game Company LLC.
Powered by vBulletin®
Copyright ©2000 - 2026, Jelsoft Enterprises Ltd.