AI & Tech

Insights from the work, written by the people doing it

Every piece below is written by someone on our team who shipped what they're describing. No marketing edits. No ghostwriters.

Featured

Why most agentic deployments stall at 60% accuracy — and what we did to break through
Field report · November 2026 · 14 min read

Why most agentic deployments stall at 60% accuracy — and what we did to break through

A pattern we've seen at four separate clients: agents reach 60% accuracy in a week, then plateau for months. The cause is almost never the model. We unpack the four most common stalls and the playbook for each.

Read the field report →

Recent writing

Latest from the team

The eval harness comes first: a working pattern for agent reliability
Methodology · 9 min

The eval harness comes first: a working pattern for agent reliability

Treating evaluation as the spec — not the QA step — changes how teams build agents. Here's the architecture we've standardized on.

October 2026

What we learned shipping prior authorization agents into a regulated payer
Industry · 11 min

What we learned shipping prior authorization agents into a regulated payer

Eighteen months of operations data from a 3M-member health plan. Where the agents excelled, where they failed, and how we adapted.

September 2026

Tool-use reliability across the major frontier models, late 2026 edition
Research · 15 min

Tool-use reliability across the major frontier models, late 2026 edition

A reproducible benchmark across enterprise tool-use scenarios — JSON conformance, error recovery, retrieval-grounded answering. Numbers, not vibes.

August 2026

Drafting a model risk management policy that covers AI agents
Governance · 7 min

Drafting a model risk management policy that covers AI agents

SR 11-7 wasn't written for agents, but it doesn't need to be. We share the policy template our financial services clients have adapted.

August 2026

On-call for AI agents: an SRE's guide to running production autonomy
Operations · 8 min

On-call for AI agents: an SRE's guide to running production autonomy

Lessons from running 47 production agent workflows across our managed operations practice. What pages, what doesn't, and how we tune.

July 2026

Workflow archaeology: how we map a process before automating it
Methodology · 12 min

Workflow archaeology: how we map a process before automating it

A two-week protocol for understanding an existing workflow before designing the agent that will replace parts of it. Templates included.

June 2026

Subscribe

Once a month, in your inbox

A single email per month, summarizing the most useful piece we published and the most important thing we learned in the field. No promotional content, ever.

Have an idea for a field report?

We sometimes co-publish with clients on engagements that produced unusually clean lessons.

Get in touch →