QConSF 2025 - Developing Claude Code At Anthropic At AI Speed

At QCon San Francisco 2025, Adam Wolff described how Claude Code at Anthropic is built with an AI coding assistant at the center of the workflow. He reported that about ninety percent of the production code for the tool is written with or by Claude Code. The team ships continuously to internal users and targets weekday releases for external users.

With an assistant that can generate and refactor code plus tests quickly, he said planning loses some of its former central role. The limiting factor becomes how fast teams can ship, observe behavior in production, and update their understanding of the requirements.

“Implementation used to be the expensive part of the loop. With AI in the workflow, the limiting factor is how fast you can collect and respond to feedback.”

Claude Code needs rich terminal input, including slash commands, file mentions, and keystroke specific behavior. Conventional advice says not to rebuild text input because users expect a large set of editing shortcuts. The team decided to take control of input anyway because they needed full control over every keystroke. Wolff described this decision as a bet that could only be evaluated after shipping the first version and presented three stories from Claude Code development.

In the first story, they introduced a virtual Cursor class that models the text buffer and cursor position as an immutable value. The initial implementation was a few hundred lines of TypeScript supported by a substantial test suite. Later, another engineer added Vim mode on top in a single pull request, with hundreds of lines of logic and tests generated with Claude Code.

As adoption grew across languages, Unicode related issues began to surface. The team added grapheme clustering and a later refactor reduced worst-case latency from several seconds per keystroke to a few milliseconds by deferring work and using more efficient search strategies. Wolff treats this story as an example of a successful experiment in which the pain of additional complexity decreased over time and the architecture continued to support fast changes.

In the second story Wolff examined how Claude interacts with the shell. The first design was a PersistentShell class that managed one long running shell process behind a queue of commands. This preserved natural shell semantics for working directory and environment variables, since each command ran in the same process. The implementation was several hundred lines of code with logic for queueing, recovery, and pseudo terminal handling.

Problems appeared when the team introduced a batch tool that let the model run many commands at once. The queue inside PersistentShell serialized these calls and became a bottleneck for agent behavior. The team replaced it with a design where each command starts a fresh shell process. After shipping this change and receiving complaints, they settled on a snapshot approach that captures aliases and functions once in the user shell and sources that script before each transient command. Wolff observed that “you do not plan this kind of design, you discover it through experimentation.”

“Shipping small changes frequently, and being willing to unship when needed, is central to our use of AI in development. The loop of build, ship, observe, and adjust is where most of the value appears.”

The third story focused on persistence for Claude Code conversations. The initial implementation used append only JSONL files on disk, which required no external services and had no special installation requirements. This design already worked for production users. Wolff still wanted stronger query capabilities and structured migrations, so the team decided to adopt SQLite with a type safe ORM.

After the database backed version shipped, problems appeared in rapid sequence. The native SQLite driver caused install failures on some systems, especially with package managers that handle native binaries in strict ways, and Wolff commented that “native dependencies basically do not work for this distribution model” in this context. Locking behavior in SQLite under concurrent access did not match the expectations of developers used to row level locks in other databases. Within fifteen days the team removed the SQLite layer and returned to the simpler JSONL storage.

Across the three stories Wolff returned to the question of what shipping reveals that planning does not. He framed the core distinction between detours and dead ends in terms of how the pain evolved. When each iteration reduced bugs and improved behavior, as in the cursor case, the team stayed on the path. When effort uncovered new composition techniques, as with transient shells and snapshots, the result was a productive failure that yielded a better structure. When work increased fragility and user impact without a clear path to improvement, as with the SQLite experiment, the best decision was to undo the change.

Developers who want to learn more can watch InfoQ in the coming weeks for video and can view the slides here.

QConSF 2025 – Developing Claude Code at Anthropic at AI Speed

Leave a Reply

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply