🥤 Pop open your daily can of SOTA
The freshest frontier AI news, leaderboards, and deep dives — served daily. Hacker News meets Morning Brew, for the people actually building AI.
Popping open a fresh can… 🥤
Fetching the latest AI news.
Harness Reliability Index
Ranks agent harnesses on task completion, tool-use stability, context fidelity, and cost efficiency across a 120-task eval suite. Methodology published openly.
| # | Harness | Score | Completion | $/task |
|---|---|---|---|---|
| 1 | Claude Code | 92.4% | $0.18 | |
| 2 | Codex (OpenAI) | 87.1% | $0.22 | |
| 3 | Gemini CLI | 83.6% | $0.12 | |
| 4 | GHCP CLI | 79.2% | $0.16 | |
| 5 | Cursor Agent | 74.8% | $0.21 | |
| 6 | Aider | 68.3% | $0.09 | |
| 7 | AutoGen Studio | 61.7% | $0.28 |
External Leaderboards
The canonical benchmarks and rankings maintained by the community and research labs. Links open the primary source.
Market Maps
Market maps are fermenting 🫧
The frontier AI ecosystem — harness stacks, agent frameworks, orchestration infra, verticals.
Quarterly · First drop May 2026Fresh explainers on the way
"WTF is a harness?" · "Context engineering explained" · "The IMPACT framework" · "Prompt eng is dead, long live harness eng"
Explainers · Weekly · First can cracks open in 1 weekTall cans in the works
Long-form deep dives on AI in healthcare, insurance, legal, finance, dev tools. The verticals where harness engineering actually ships.
Tall Can · MonthlyThe Bigger Picture — coming soon
Long-form essays on what AI actually means for economics, history, and society. Labour markets, institutional change, historical analogues, the sociology of disruption. Less hype, more signal.
Essays · Irregular · First piece May 2026Six-packs coming soon
Conversations with builders at NVIDIA, Anthropic, OpenAI, LangChain, and open-source maintainers shaping the harness layer.
Six-Pack · Biweekly · First episode June 2026Jobs on the frontier
SLM Playground
Chat with a small language model running locally — no API key, no server. Tweak the system prompt to see how it changes behaviour. Model weights download once and are cached.