🥤 Pop open your daily can of SOTA

The freshest frontier AI news, leaderboards, and deep dives — served daily. Hacker News meets Morning Brew, for the people actually building AI.

Free. 3x per week. Unsubscribe anytime.
Hot
New
Top
All
OpenAI
Anthropic
Meta AI

Popping open a fresh can… 🥤

Fetching the latest AI news.

Fresh Batch · April 2026

Harness Reliability Index

Ranks agent harnesses on task completion, tool-use stability, context fidelity, and cost efficiency across a 120-task eval suite. Methodology published openly.

Harness Reliability Index — v1.3
Synthetic placeholder
Updated Apr 16 · 7 harnesses · 120 tasks
# Harness Score Completion $/task
1 Claude Code
92.4% $0.18
2 Codex (OpenAI)
87.1% $0.22
3 Gemini CLI
83.6% $0.12
4 GHCP CLI
79.2% $0.16
5 Cursor Agent
74.8% $0.21
6 Aider
68.3% $0.09
7 AutoGen Studio
61.7% $0.28

External Leaderboards

The canonical benchmarks and rankings maintained by the community and research labs. Links open the primary source.

Chatbot Arena
Human-preference ELO rankings across models via blind A/B chat
LMSYS · General / Chat
Open LLM Leaderboard
Open-weight model rankings across IFEval, MATH, GPQA, MuSR, BBH, MMLU-Pro
Hugging Face · General / Open-weight
SWE-bench
Real GitHub issue resolution — the standard for coding agent benchmarks
Princeton NLP · Coding / Agents
ARC-AGI
Abstraction and reasoning tasks designed to resist memorization — the AGI-proximity benchmark
ARC Prize · Reasoning / AGI
LiveBench
Contamination-resistant benchmark updated monthly with new questions from recent sources
LiveBench.ai · General / Comprehensive
HELM
Holistic evaluation across accuracy, calibration, robustness, fairness, efficiency, and more
Stanford CRFM · General / Comprehensive

Market Maps

Market maps are fermenting 🫧

The frontier AI ecosystem — harness stacks, agent frameworks, orchestration infra, verticals.

Quarterly · First drop May 2026

Fresh explainers on the way

"WTF is a harness?" · "Context engineering explained" · "The IMPACT framework" · "Prompt eng is dead, long live harness eng"

Explainers · Weekly · First can cracks open in 1 week

Tall cans in the works

Long-form deep dives on AI in healthcare, insurance, legal, finance, dev tools. The verticals where harness engineering actually ships.

Tall Can · Monthly

The Bigger Picture — coming soon

Long-form essays on what AI actually means for economics, history, and society. Labour markets, institutional change, historical analogues, the sociology of disruption. Less hype, more signal.

Essays · Irregular · First piece May 2026

Six-packs coming soon

Conversations with builders at NVIDIA, Anthropic, OpenAI, LangChain, and open-source maintainers shaping the harness layer.

Six-Pack · Biweekly · First episode June 2026

Jobs on the frontier

All roles
Harness / Agents
AI Safety
Context Eng
AI-Native SWE
Infra / ML
Staff Harness Engineer 🔥
Anthropic · San Francisco / Remote
$320K – $480K
Agent Product Lead
OpenAI · San Francisco
$280K – $420K
Senior AI Safety Researcher
DeepMind · Remote
$260K – $400K
Founding Harness Engineer
Morph (Series B) · NYC / Remote
$220K – $340K + equity
Context Engineering Lead
Parallel.ai · Remote
$200K – $310K + equity
Experimental · Runs entirely in your browser

SLM Playground

Chat with a small language model running locally — no API key, no server. Tweak the system prompt to see how it changes behaviour. Model weights download once and are cached.

Model not loaded. Choose one above and click Load.
Load a model to start chatting ↑