🥤 Pop open your daily can of SOTA

The freshest frontier AI news, leaderboards, and deep dives — served daily. Hacker News meets Morning Brew, for the people actually building AI.

Free. 3x per week. Unsubscribe anytime.

Hot

New

Top

All

OpenAI

Anthropic

Meta AI

Popping open a fresh can… 🥤

Fetching the latest AI news.

Fresh Batch · April 2026

Harness Reliability Index

Ranks agent harnesses on task completion, tool-use stability, context fidelity, and cost efficiency across a 120-task eval suite. Methodology published openly.

Harness Reliability Index — v1.3

Synthetic placeholder

Updated Apr 16 · 7 harnesses · 120 tasks

#	Harness	Completion	$/task
1	Claude Code	92.4%	$0.18
2	Codex (OpenAI)	87.1%	$0.22
3	Gemini CLI	83.6%	$0.12
4	GHCP CLI	79.2%	$0.16
5	Cursor Agent	74.8%	$0.21
6	Aider	68.3%	$0.09
7	AutoGen Studio	61.7%	$0.28

External Leaderboards

The canonical benchmarks and rankings maintained by the community and research labs. Links open the primary source.

Chatbot Arena ↗

Human-preference ELO rankings across models via blind A/B chat

LMSYS · General / Chat

Open LLM Leaderboard ↗

Open-weight model rankings across IFEval, MATH, GPQA, MuSR, BBH, MMLU-Pro

Hugging Face · General / Open-weight

SWE-bench ↗

Real GitHub issue resolution — the standard for coding agent benchmarks

Princeton NLP · Coding / Agents

ARC-AGI ↗

Abstraction and reasoning tasks designed to resist memorization — the AGI-proximity benchmark

ARC Prize · Reasoning / AGI

LiveBench ↗

Contamination-resistant benchmark updated monthly with new questions from recent sources

LiveBench.ai · General / Comprehensive

HELM ↗

Holistic evaluation across accuracy, calibration, robustness, fairness, efficiency, and more

Stanford CRFM · General / Comprehensive

Market Maps

Market maps are fermenting 🫧

The frontier AI ecosystem — harness stacks, agent frameworks, orchestration infra, verticals.

Quarterly · First drop May 2026

Fresh explainers on the way

"WTF is a harness?" · "Context engineering explained" · "The IMPACT framework" · "Prompt eng is dead, long live harness eng"

Explainers · Weekly · First can cracks open in 1 week

Tall cans in the works

Long-form deep dives on AI in healthcare, insurance, legal, finance, dev tools. The verticals where harness engineering actually ships.

Tall Can · Monthly

The Bigger Picture — coming soon

Long-form essays on what AI actually means for economics, history, and society. Labour markets, institutional change, historical analogues, the sociology of disruption. Less hype, more signal.

Essays · Irregular · First piece May 2026

Six-packs coming soon

Conversations with builders at NVIDIA, Anthropic, OpenAI, LangChain, and open-source maintainers shaping the harness layer.

Six-Pack · Biweekly · First episode June 2026

Jobs on the frontier

All roles

Harness / Agents

AI Safety

Context Eng

AI-Native SWE

Infra / ML

Staff Harness Engineer 🔥

Anthropic · San Francisco / Remote

$320K – $480K

Agent Product Lead

OpenAI · San Francisco

$280K – $420K

Senior AI Safety Researcher

DeepMind · Remote

$260K – $400K

Founding Harness Engineer

Morph (Series B) · NYC / Remote

$220K – $340K + equity

Context Engineering Lead

Parallel.ai · Remote

$200K – $310K + equity

Experimental · Runs entirely in your browser

SLM Playground

Chat with a small language model running locally — no API key, no server. Tweak the system prompt to see how it changes behaviour. Model weights download once and are cached.

Model not loaded. Choose one above and click Load.

Load a model to start chatting ↑