Blog

On building things and learning from it.

Apr 8, 2026

Kensa: An Open Source Agent Eval Harness

2 min

Apr 2, 2026

Why Build the Agent Verification Layer

3 min

Mar 15, 2026

Physician Disagreement in Healthcare Evals

5 min

Mar 9, 2026

The Half-life of Benchmarks

5 min

Feb 16, 2026

Blurt: Talk to your Agents

2 min

Feb 12, 2026

Human Review is the Bottleneck

5 min

Jan 14, 2026

10x Coding Agents

7 min

Sep 12, 2025

Directive for Claude

2 min

Jun 30, 2025

Multimodal RAG

8 min

Jun 23, 2025

The Paradox of Infinite Context

4 min