Project 1999 - View Single Post

Ekco · #**797** Yesterday, 02:30 AM

that graph now that ive actually look at it is just about time on task and stops at 16 hours dues to unreliable test suite, whatever the fuck that means, write a better test, either way its outdated as fuck Claude runs for days now

Quote:

Claude can run continuously for days on massive coding projects thanks to dynamic, agentic harnesses like the Claude Agent SDK or Claude Code

so whoever put that chart together doesn't know how people have been using the models for numerous months now, in parallel with a overseer agent and dozens of sub agents and all that bullshit in an agentic self-correction loop

the thing that actually matters is capability and the plateau is way more pronounced in those charts in the models released in the last year, the big gains are in reducing the time to complete a task successfully, some giant codebase can take chatgpt 5.5 3 days to work on and Mythos supposedly did the work correctly in like 10 hours or something

[You must be logged in to view images. Log in or Register.]
we've hit that ceiling, the is AGI even possible easily by just making a 1 trillion parameter model type idea and the answer is no and charts like this are pointless now because of diminishing returns of just training larger and larger parameter single models Qwen opensource is at like 350b parameters but that's probably just adding up all the separate MoE models

Quote:

While frontier closed-source providers (Anthropic, OpenAI, Google) no longer disclose exact parameter counts for models like Claude Fable 5, Claude Opus 4.8, or GPT-5.5 Pro, the landscape for open-weights and verifiable models has scaled significantly heading into mid-2026.

Frontier Open MoE Qwen3 235B A22B / Qwen3.5-397B-A17B 235B – 397B total (17B–22B active per token)

yeah, they stopped reporting the parameters because number no longer going up = scary for investors interested in two companies about to IPO

Quote:

Major Paradigm Shifts Since Late 2024:

Active vs. Total Parameters (MoE Dominance): Large open-weights models have shifted aggressively toward Mixture-of-Experts (MoE) architectures (seen in the Qwen3 and DeepSeek-V4 series). A model may have up to 397B total parameters sitting in storage, but only routes ~17B to 22B active parameters per token, drastically reducing inference latency while preserving massive knowledge depth.

The "Medium" Sweet Spot Shift: The traditional 7B baseline has moved up. Architectures like Gemma 3 (12B) and Qwen3.5 (9B) maximize dense compute efficiency, effectively rendering the old 3B–7B performance tier obsolete for complex multistep coding or agentic loops.

so the sweet spot, is the same model im using for Kaia on a GPU from 2021 that costs like 250 bucks, a model running LOCALLY on your cell phone has enough juice for 99.9% of user queries if built right to use tools like let_me_fucking_google_that_for_you.py considering what most people are actually uses these chatbots for

so not only is AGI/ASI not going to happen, all these companies are going to go bankrupt causing a deep recession because their business plan we started this journey with doesn't make any sense anymore

open source models wins on both ends of the spectrum, locally run open source wins for a non trivial chunk of the consumer/enthusiast market and enterprise coding just got something that costs 1/5th of a Claude or ChatGPT seat dropped in their lap thanks to China