Flo Crivello says an open-source model beat Claude Sonnet 4.6 on his team's evals and is moving to vibe testing next.
Okay this one seems real.
First time ever an OSS model beats Sonnet 4.6(!!) on our evals.
Now begins vibe testing, but this is promising.
This finding is one of many signals tracked across Artificial Intelligence. The live feed updates every few hours with new authority voices, debates, and emerging ideas.
← Back to Artificial Intelligence