Introducing Evalite: The TypeScript Testing Tool For AI Powered Apps

Evalite, a TypeScript-native eval runner from Matt Pocock, provides a purpose-built test harness for AI powered applications, letting developers write reproducible evals, capture traces, and iterate locally with a web UI. The project reached a v1 beta milestone and is positioning itself as the equivalent of Vitest or Jest for apps that rely on LLMs, with tooling tailored for scoring, tracing, and cost-aware iteration.

Evalite’s model treats an eval like a test suite, but with richer outputs. Instead of simple pass or fail results, Evalite runs .eval.ts files where each data point becomes a scored case, and provides first-class scorers and trace capture so teams can inspect model outputs, chain calls, and judge behaviour programmatically. It runs a local dev server with live reload and an interactive UI for exploring traces, and because it builds on Vitest you can reuse familiar test ergonomics such as mocks and lifecycle hooks.

The v1 beta focuses on developer ergonomics and iteration speed. Quickstart instructions show how to install Evalite, add an eval:dev npm script, and author a simple eval that uses an off-the-shelf scorer such as autoevals. Evalite can also run programmatically, expose different run modes including watch and run-once, and persist results to custom storage backends so teams can track eval trends over time.

Under the hood Evalite exposes features aimed at production workflows. Built-in scorers and support for custom scorers let teams encode domain specific success metrics. Evalite’s trace system captures inputs, LLM calls, and intermediate state so debugging and root cause analysis become more deterministic.

Recently it was announced that Evalite could cache AI SDK models, which drew positive reactions from users, with one commenter calling the feature a game changer for speed and iteration.

Community reaction has been very positive. The project’s GitHub repository has attracted strong interest with over a thousand stars and an active release cadence, and the author’s v1 beta announcement on X drew quick engagement from early adopters praising the release with one commenter saying that they’ll start using it tomorrow for a real world project. Another user explains why they think the project exists:

There are plenty of other eval runners out there… Evalite is different. It’s local-only, runs on your machine and you stay in complete control over your data.

As it is still in development, early issues are expected, and some have been spotted, for example one raised recently with dependency declarations. This issue has since been fixed and the author has indicated that they are actively fixing bugs from early adopters.

Evalite is open source under MIT and intentionally avoids vendor lock in by working with any LLM and by providing pluggable storage and scorer integrations. As organizations build more agentic and LLM driven features, Evalite aims to make evaluation reproducible, type safe, and fast enough to be part of everyday developer workflows. Early adopters should expect active iteration, but the tooling already offers a compelling, TypeScript first path to testing AI powered apps.