AWS recently announced the launch of Durable Functions for Lambda, a new capability that enables developers to use standard Lambda functions to build complex, multi-step applications (workflows).
The core characteristics of durable functions are that they track progress, automatically retry on failures, and suspend execution for up to 1 year at defined points, without incurring idle compute costs during waits.
The design goal is to allow developers to express stateful application logic entirely within the function’s code, abstracting away the underlying state machines. Principal Developer Advocate Donnie Prakoso from AWS explains:
After enabling a function for durable execution, you add the new open source durable execution SDK to your function code. You then use SDK primitives like “steps” to add automatic checkpointing and retries to your business logic and “waits” to efficiently suspend execution without compute charges.
(Source: AWS Documentation)
Lambda durable functions introduce two core primitives that handle state management and recovery. The first Step through the context.step() method, which adds automatic retries and checkpointing to the business logic that a developer writes. After a step is completed, it will be skipped during replay. Secondly, Wait through the context.wait() method, which pauses execution for a specified duration, terminating the function, suspending and resuming execution without compute charges.
An example of a durable function can look like this:
import { DurableContext, withDurableExecution } from "@aws/durable-execution-sdk-js";
export const handler = withDurableExecution(
async (event: any, context: DurableContext) => {
const { orderId, amount, items } = event;
// Reserve inventory across multiple warehouses
const inventory = await context.step("reserve-inventory", async () => {
return await inventoryService.reserve(items);
});
// Process payment
const payment = await context.step("process-payment", async () => {
return await paymentService.charge(amount);
});
// Create shipment
const shipment = await context.step("create-shipment", async () => {
return await shippingService.createShipment(orderId, inventory);
});
return { orderId, status: 'completed', shipment };
}
);
The announcement immediately sparked comparisons to AWS Step Functions, which traditionally handled complex serverless orchestration using YAML definitions. AJ Stuyvenberg, a Staff Engineer at DataDog and AWS Hero, posted on LinkedIn:
Durable functions are very similar to Step Functions you already know, but entirely expressed as code instead of yaml step definitions. You can transition between states, retry failures, and even suspend/resume for up to a year – all while still using the Lambda event model you already know (and may or may not love).
A perspective was echoed by the community, with a respondent on a Hacker News thread stating:
This is basically just an application with steps that are checkpointed when they progress in a shared database (that’s abstracted away from you). It’s considerably simpler, less magical, and cheaper than the equivalent Step Function-style implementation would be.
Furthermore, durable functions provide other operations for more complex patterns: create_callback() which creates a callback that developers can use to await results for external events like API responses or human approvals, wait_for_condition() pauses until a specific condition is met like polling a REST API for process completion, and parallel() or map() operations for advanced concurrency use cases.
Mike Roberts noted how this solves a long-standing architectural challenge:
Before today’s announcement, I’d typically recommend using a Step Function. But often that’s annoying if most of what you’re doing is already in your app code. Durable functions solve this by being re-entrant (AWS will run the Lambda function multiple times for the same request), and also handling the state from previous passes to get your code back to where it needs to be. So now, one request can take longer than 15 minutes, as long as the Lambda function isn’t “active” for more than 15 minutes per call.
In addition, Alexey Vidanov, a Senior Consultant at Reply, mentions that with Lambda durable functions, developers can now run slow or chained LLM steps inside Lambda without waiting costs, starting containers, or managing extra compute paths:
Lambda Durable Functions bring orchestration directly into Lambda. This removes the time-based tax and unlocks a cleaner model for LLM, ML, and agent workflows.
For more complex scenarios, durable functions provide additional operations, including create_callback() (to await results for external events or human approvals), wait_for_condition() (to pause until a specific condition is met), and parallel() and map() for advanced concurrency use cases.
It is worth noting that competitors, most prominently Microsoft Azure, have offered a similar capability, Azure Durable Functions, for several years. Jason Miles mentioned this context in a LinkedIn post:
Durable functions have been a thing in Azure for a while. They’ve certainly got their place, but you really should evaluate whether a 15-minute run is the right solution. Occasionally, sure, but if you’re regularly waiting, one of the most frequent situations, you might need further decomposition.
Currently, Lambda durable functions are available in the US East (Ohio) AWS Region and support JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). The pricing details are on a dedicated pricing page.
