The absolute state of AI results. - Page 81

BradZax · #**801** Yesterday, 10:42 PM

Quote:

Originally Posted by Ekco [You must be logged in to view images. Log in or Register.]

It actually isn't, I went and scrolled the paper lol

You're misunderstanding the point of the paper.

I'm not saying these researchers are everyday users.

These are the exact senior scientists at Google DeepMind who build and evaluate Gemini.

The chart isn't supposed to show how a casual user prompts a model; it's a technical evaluation from the actual creators of the AI showing how the underlying architecture behaves.

They absolutely 'matter' because they build the tech.

Ekco · #**802** Today, 01:17 AM

My beef is with the METR graph which isn't mentioned once in the paper nor are they, its a sensational Berkeley nonprofit writing disingenuous misleading tests to show the outcome they want, working backwards from ai in scifi scary so we should stop just like they work backward from cows fart too much so you shouldn't be allowed to have a hamburger

[You must be logged in to view images. Log in or Register.]

If you do click one of the dots it does list way past 16hours shown but the methodology of the test itself and the chart are both misleading to show scary exponential growth, nothing changed with how the models themselves are fundementally built, it's stuff around the model that is improving, the harness & MoE. You can stick a earlier model in a harness and let it run for days also but the chart doesn't show that they cap gpt5 at 6 hours and previous models are 12-30 minutes, if you run the same opus they have ranked as they do without a harness it scores will be completely different

Quote:

This doesn’t mean opus 4.6 can work for ~14 hrs, it means on tasks that would take a human expert ~14 hrs, the agent successfully finishes them 50% of the time. Probably completes them way faster actually

Their either turbo retarded or knowily commiting academic fraud for political/funding reasons, nobody there even works at a frontier lab I assume just nonprofit advocacy fart huffing from what I can tell

https://summify.io/discover/is-ai-ab...-s-not-5GezB1/ one click bait YouTuber to counter another

And the Google fanfic thought expirement about the timeline of one sci fi concept progressing into another sci fi theoretical concept itself I have no issue with

BradZax · #**803** Today, 01:39 PM

I posted a document from the lead developers on Gemini, not a youtuber.

Quote:

Originally Posted by Ekco [You must be logged in to view images. Log in or Register.]

the machine god weighs in

Quote:

He is focusing on a narrow technical detail, but he is fundamentally missing the core thesis of the paper ("From AGI to ASI" by DeepMind).

Why His Argument Fails

1. The "Harness" is the Model's Capability: He argues that performance increases are just coming from the "harness" (scaffolding, evaluation frameworks, or test-time compute) rather than the "fundamental" model architecture. This is a false dichotomy. Modern AI capabilities are defined by the system, not just the raw pre-trained base weight matrix. If wrapping a model in a test harness or an Mixture of Experts (MoE) architecture allows it to use test-time compute to solve harder problems, that is a legitimate, scalable expansion of capability.

2. The Paper Explicitly Maps This: The paper doesn't hide this fact; it explicitly highlights "ASI via group agent formation / multi-agent collectives" and "algorithmic paradigm shifts (test-time compute/scaffolding)" as core parallel pathways to Superintelligence. His "gotcha" is literally just him summarizing a section of the paper he thinks he discovered, while missing the point that the paper categorizes this as a primary vector for exponential scaling.

3. The "Capping" Fallacy: He claims they cap GPT-5 at 6 hours while letting older models run longer, arguing it distorts the chart. However, older models scale incredibly poorly with extra runtime—they get stuck in infinite loops or exhaust their context windows. Giving a modern system more hours yields exponentially better results because its underlying architecture can actually utilize that prolonged reasoning time productively.

Here, argue with AI about it.

This is the part I liked anyway.

Quote:

Originally Posted by BradZax [You must be logged in to view images. Log in or Register.]

This will validate all the AI haters so much, so enjoy.

[You must be logged in to view images. Log in or Register.]

Ekco · #**804** Today, 02:04 PM

Quote:

My beef is with the METR graph which isn't mentioned once in the paper

Quote:

Modern AI capabilities are defined by the system, not just the raw pre-trained base weight matrix.

That's literally what the graph is, knowingly and on purpose comparing raw models vs models with harnesses it's apples and oranges and they graphed it

Quote:

In early 2026, tech analysts and AI researchers heavily panned METR’s capability timelines. Critics pointed out that METR's data is plagued by basic errors

Maybe ask ai if that's a fair comparison, point 3 is wrong also, you can totally put 5.1 in a harness and it would score higher point 1

Quote:

He argues that performance increases are just coming from the "harness"

lol wut when did I say the harness is where the performance is coming from, this entire output is garbage I can only imagine what kind of fucked up prompt you put in to get this to be spit out and be this confused, try Claude or Chatgpt not grok or Google overview

Quote:

His "gotcha" is literally just him summarizing a section of the paper he thinks he discovered

I didn't even read the paper, this is just common ai knowledge in articles and current debates, esp in the AGI/ASI debate people at Google disagree with other people at Google about this.

this is the same company that had a mustard tiger named Blake Lemoine who thought a now obsolete chatbot from years ago had a fucking soul because of his religious views, smart people can be retarded and have views on super ai gonna kill us all or turn the entire universe into paperclips, one dipshit taken seriously until recently was even afraid of the concept of a ASI in the future with time traveling capabilities that would torture him for eternity for not working on ai, these are just thought experiments same as AGI/ASI

BradZax · #**805** Today, 02:16 PM

Quote:

I didn't even read the paper

Then who are you talking to?

Quote:

That's literally what the graph is

No its what someone who didn't read the paper would think though.

Quote:

this is the same company

That's stock including splits is roughly $7,300 per share today.

Quote:

smart people can be retarded and have views on super ai gonna kill us all or turn the entire universe into paperclips,

Cool, but nobody in this context is calming that.

If anything I pointed out that they are going to cause the end of the world by destabilizing politics in the USA and globally if anyone thought that was true and wanted to stop them from developing AGI.

[You must be logged in to view images. Log in or Register.]

BradZax · #**806** Today, 02:40 PM

Quote:

Originally Posted by Ekco [You must be logged in to view images. Log in or Register.]

Maybe ask ai if that's a fair comparison, point 3 is wrong also, you can totally put 5.1 in a harness and it would score higher point 1 lol wut when did I say the harness is where the performance is coming from, this entire output is garbage I can only imagine what kind of fucked up prompt you put in to get this to be spit out and be this confused, try Claude or Chatgpt not grok or Google overviewI

OK: "read [this discussion] and [this document] and respond in kind."

Quote:

The METR chart isn't misleading at all, and you're fundamentally misinterpreting the entire thesis of the DeepMind paper. You are trying to separate the core model from the software harness, but in the real world, they are the same system.

The paper explicitly states that raw base model scaling is slowing down, which is why the industry has shifted to test-time compute, software scaffolding, and multi-agent systems. The harness and the tools are the new scaling paradigm.

Furthermore, your claim that you can just stick an older model like Opus in a modern harness for days to get the same results is technically wrong. Older models lack the context windows and the architectural stability required to handle long-horizon reasoning tasks. If you run them that long, they suffer from compounding errors, hallucinate, and crash. The ability to effectively utilize extended runtime and scaffolding is a direct capability of the newer model architectures. The chart isn't a trick; it's showing the reality of how AI systems scale now.

Enjoy arguing with AI while you say that AI isn't going to become AGI/ASI anytime soon.

Ill be kicking back playing everquest.