Test-Driven Development (TDD) is widely accepted as the gold standard for producing robust, reliable, and refactorable software.
We also know the reality: TDD is exhausting.
In the heat of a sprint, when a deadline is looming, TDD is often the first casualty. Why? Because TDD requires you to constantly switch cognitive gears. You have to wear the “adversarial tester” hat to define the requirements, and then immediately switch to the “problem-solver” hat to write the implementation. Doing both simultaneously drains mental energy fast.
As a result, many teams revert to “Test-After Development” (TAD), writing tests only after the feature works “on my machine.” This leads to brittle tests that often just confirm the biases of the implementation code already written.
The AI Pivot: A New Workflow
We are entering an era where generative AI is surprisingly good at writing boilerplate implementation code, but still mediocre at deep, contextual system design and understanding nuanced business requirements.
So, let’s play to our strengths and outsource our weaknesses.
The proposed workflow is simple but transformative:
- Human: Writes the Unit Tests (The “Red” phase).
- AI: Writes the Implementation to pass those tests (The “Green” phase).
- Human: Reviews the code and refactors if necessary (The “Refactor” phase).
Why This Works
The hardest part of programming isn’t remembering syntax; it’s defining exactly what the software should do.
When you write tests first, you are forced to crystallize the requirements before a single line of production code exists. You are defining the API surface area, expected inputs, and required outputs. This is high-value cognitive work that requires human context.
Once those constraints are codified in a test suite, the actual implementation is often just “connect-the-dots” logic. LLMs (like GPT-4, Claude, or Copilot) excel at connect-the-dots.
A Practical Example: The Discount Calculator
Let’s look at a simple, real-world scenario. We need a function that calculates the total price of a shopping cart, applying a 10% discount if the total is over $100. Let’s use JavaScript and Jest.
Step 1: The Human writes the tests
I am not thinking about how to write the loop or the if-statement yet. I am only thinking about the desired behavior.
// cartCalculator.test.js
const { calculateTotal } = require('./cartCalculator');
describe('Shopping Cart Calculator', () => {
test('should return 0 for an empty cart', () => {
expect(calculateTotal([])).toBe(0);
});
test('should calculate total correctly for carts under $100 (no discount)', () => {
const cart = [
{ name: 'Book', price: 30 },
{ name: 'Coffee', price: 20 }
];
// Total is 50. No discount.
expect(calculateTotal(cart)).toBe(50);
});
test('should apply a 10% discount for carts over $100', () => {
const cart = [
{ name: 'Keyboard', price: 80 },
{ name: 'Mouse', price: 40 }
];
// Total is 120. 10% discount is $12. Final price should be 108.
expect(calculateTotal(cart)).toBe(108);
});
test('should handle edge case where total is exactly $100 (no discount)', () => {
const cart = [
{ name: 'Item A', price: 50 },
{ name: 'Item B', price: 50 }
];
// Total is exactly 100. Discount threshold is > 100.
expect(calculateTotal(cart)).toBe(100);
});
});
If I run this now, it fails :red_circle:. Perfect.
Step 2: The AI writes the implementation
Now, I switch to my LLM of choice. I don’t need to explain the business logic to it in English prose. The tests are the explanation.
My Prompt:
“I have a JavaScript test file below for a shopping cart calculator. Please write the implementation file cartCalculator.js so that all these tests pass. Use clean, modern JavaScript syntax.”
[Paste the test code from Step 1 here]
The AI Output (GPT-4 in this instance):
// cartCalculator.js
/**
* Calculates the total price of a cart, applying a 10% discount
* if the gross total exceeds $100.
* @param {Array<{name: string, price: number}>} cartItems
* @returns {number}
*/
function calculateTotal(cartItems) {
if (!cartItems || cartItems.length === 0) {
return 0;
}
const grossTotal = cartItems.reduce((sum, item) => sum + item.price, 0);
const DISCOUNT_THRESHOLD = 100;
const DISCOUNT_RATE = 0.10;
if (grossTotal > DISCOUNT_THRESHOLD) {
const discountAmount = grossTotal * DISCOUNT_RATE;
return grossTotal - discountAmount;
}
return grossTotal;
}
module.exports = { calculateTotal };
Step 3: Verify and Refactor
I save the AI’s code and run my test suite.
PASS ./cartCalculator.test.js
✓ should return 0 for an empty cart (2 ms)
✓ should calculate total correctly for carts under $100 (no discount) (1 ms)
✓ should apply a 10% discount for carts over $100 (1 ms)
✓ should handle edge case where total is exactly $100 (no discount) (1 ms)
It’s Green.
I review the code. It’s actually quite good. It used reduce, handles the empty array check correctly, and even used constants for magic numbers like the threshold rate. I might rename a variable or two to fit team style guides, but the heavy lifting is done.
The Benefits of AI-Driven TDD
1. Guaranteed Test Coverage By definition, every line of code written by the AI exists solely to satisfy a test you wrote. You can’t “forget” to test a branch condition if the code for that branch only exists because a test demanded it.
2. Better Requirements Gathering If you write a vague test, the AI will write vague code. This workflow forces you to be extremely precise about edge cases (like the “exactly $100” example above) before implementation begins.
3. Mental Energy Conservation You stay focused on the “What” (the tests). You outsource the “How” (the implementation syntax) to the AI, treating it like a very fast junior developer parked next to you.
The Pitfall: Garbage In, Garbage Out
This workflow is not magic. It relies entirely on the quality of your tests.
If you write lazy tests that don’t cover edge cases, the AI will write lazy code that breaks in production. If your tests are tightly coupled to implementation details rather than behavioral outcomes, the AI’s output will be brittle.
The human remains the architect and the gatekeeper of quality. The AI is just the contractor laying the bricks based on your blueprints.
Conclusion
TDD is difficult to sustain because it requires discipline and constant context-switching. By using AI to handle the implementation phase, we can lower the barrier to entry for true TDD.
Don’t ask AI to write code and then try to figure out how to test it later. Write the tests first, and force the AI to earn its keep by passing them.
