Key Takeaways
- Lambda extensions let you do post-response work by registering with
/extension/registerand using the blocking/extension/event/nextcall to decide when Lambda can freeze the environment. - Put
NextEvent()in one place and do not call it again until the flush finishes so you do not signal readiness while cleanup from the previous invoke is still running. - Return the API response as soon as the handler completes, then flush telemetry afterwards so exporter stalls do not sit on the request path.
- Use Go concurrency primitives like
goroutines,channels, andcontext.WithTimeoutto coordinate the handoff cleanly and to cap how long flush can run. - Validate the change under sustained traffic by comparing API Gateway latency outliers with the Honeycomb
telemetry.flush_traceduration to verify that post-response flushing has been removed from the critical path and is no longer what drives requests into the ten second gateway timeout.
Context
At Lead Bank, we run our API infrastructure on AWS Lambda behind API Gateway. Our Lambda functions power critical payment endpoints (e.g., wires, checks, and ACH) as well as core primitive-creation endpoints for objects like balances, accounts, cards, and entities. Because these are user-facing and operationally critical APIs, response time matters, and we rely heavily on observability during on-call incidents. Our Lambdas use the AWS Distro for OpenTelemetry (ADOT) layer, which runs a local OpenTelemetry Collector alongside the function. The code exports telemetry to that local collector and the collector forwards it to Honeycomb.
We chose ADOT over Lambda Powertools or native CloudWatch because we wanted vendor-neutral instrumentation via OpenTelemetry and the flexibility to route signals to Honeycomb for its querying capabilities. That said, the pattern described in this article is not ADOT-specific. Any setup where telemetry is exported through a collector or external sink can hit the same flush latency problem. If you’re using Lambda Powertools or CloudWatch EMF, the same extension mechanism applies. We send traces, metrics, and structured logs, and we care about the reliability of those signals because they directly affect how quickly we can triage and mitigate incidents. Here, p50, p95, and p99 refer to latency percentiles: p50 is the median, while p95 and p99 capture the slower tail of requests. We report latency in milliseconds or seconds, while 504 refers to the HTTP Gateway Timeout error returned when a request exceeds the configured limit. When we measured our entity endpoints, p50 stayed low, but p99 was not great. A small fraction of requests would occasionally spike and hit the ten-second API Gateway timeout that we configured, which showed up as HTTP 504 responses.
This article is about a specific failure mode we ran into, where synchronous telemetry flushing turned intermittent exporter stalls into user-visible timeouts, and how we moved flushing off the client response path while still keeping telemetry intact using synchronization primitives in Golang and Lambda Extensions API provided by AWS Lambda.
The Failure Mode
The Lambda handler itself was not consistently slow. The problem was that we were flushing telemetry data synchronously before returning the response. Most flushes completed quickly, but occasionally the exporter path would stall due to network variance, downstream backpressure, or retries. Because the flush was on the critical path, those rare stalls translated directly into user-facing timeouts at the gateway.
A simplified version of the original pattern looked like this code snippet.
func handler(ctx context.Context, request Request) (Response, error) {
response, err := processRequest(ctx, request) // typically 25–200ms
// ForceFlush was used to reduce telemetry loss when Lambda freezes the environment.
// Most of the time this was quick, but occasionally it would stall for 10 seconds.
flushCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if flushErr := otelProvider.ForceFlush(flushCtx); flushErr != nil {
// Logged and forwarded to Sentry via the logger integration
logger.Error("Error flushing telemetry", zap.Error(flushErr))
}
return response, err
}
When a request timed out at API Gateway, it often looked like the handler finished its business logic, then spent the remainder of the ten seconds waiting on flush related work. That meant we were timing out in the part of the invocation that should have been operationally helpful, but should not have been user-visible.
Why Lambda Makes This Approach Tricky
In a traditional server process, you can return the HTTP response and keep working in the background, because the process stays alive. Lambda is different. The runtime environment can be frozen quickly after the handler returns, and any background work you started is not guaranteed to finish unless you keep the environment alive.
If we simply removed ForceFlush(), we would intermittently lose telemetry when the environment froze, which made the long tail harder to understand and incidents harder to debug. If we kept flushing synchronously, we preserved telemetry, but accepted that occasional exporter stalls could breach the API Gateway deadline.
We needed a way to return the response when the handler was done, then flush telemetry after the response without losing the chance to complete the flush.
Lambda’s Extension API
Lambda extensions give you a lifecycle hook that can keep the environment from freezing until the extension signals readiness for the next event. At a high level, an extension registers with Lambda, then blocks on an Extensions API call that delivers lifecycle events. Lambda will not freeze the environment until the runtime and all extensions are in a state where they are ready to proceed.
Extensions communicate through an HTTP API. The contract:
- Extension registers via POST to
/extension/register - Extension makes a blocking GET to
/extension/event/next - Lambda sends INVOKE events through this connection
- Extension calls
/event/nextagain when ready
Lambda checks if every extension has called /event/next and is waiting. If not, the container stays alive. We can process the invocation, return the response, and delay calling /event/next until telemetry flushes.
Extensions are typically separate processes deployed via a layer or container image under /opt/extensions, while internal extensions run in-process with the runtime.
Figure 1. Initial post-response flush flow using a Lambda extension
Initial Approach and Failure Under Warm Reuse
First Naive Attempt
We started by registering as an extension and spawning a goroutine to poll for events:
eventChannel := make(chan *Event, 1)
go func() {
for {
event, err := extensionClient.NextEvent(ctx)
if err != nil {
// In production, log and count this; retry with backoff.
continue
}
// Deliver the INVOKE event to the handler wrapper.
eventChannel <- event
// Loop continues immediately and calls NextEvent() again.
// That can happen while deferred flush from the current invoke is still running.
}
}()
The handler wrapper reads from the channel and spawns a background flush:
func wrappedHandler(ctx context.Context, input Input) (Output, error) {
event := <-eventChannel // Instant - already fetched
output, err := handler.Handler(ctx, input)
go func() {
// Flush after the handler completes
// This does not block the response path, but it can overlap with the next invoke.
// timeout omitted for brevity, see below.
otelProvider.ForceFlush(context.Background())
}()
return output, err
}
This code worked fine for the first request. However, under realistic traffic (warm reuse plus overlap), we started seeing intermittent hangs.
The Timing Problem
After delivering the current event to the handler through the channel, the polling goroutine immediately looped back and called NextEvent() again, before the flush goroutine had finished. Issuing the blocking /extension/event/next request puts the extension back into a waiting state, which Lambda interprets as the extension being ready to proceed to the next lifecycle event.
This timing creates two failure scenarios. The first scenario is the environment freezing mid-flush. If there is no immediate follow-on work, the environment can become eligible to freeze while the flush goroutine is still running. When that happens, the flush can be interrupted and telemetry from the invocation may be dropped. The second scenario occurs when the next invocation starts before the flush completes. Under sustained traffic, the environment can be reused quickly and invocation N+1 can start while the flush from invocation N is still running. If this pattern repeats, flush goroutines accumulate and contention increases. In load tests at 100 requests per second (RPS) we saw API Gateway timeouts (HTTP 504) and Lambda function timeouts. Correlating the Lambda duration metrics with the goroutine lifecycle pointed us to the root cause that NextEvent() was being called too early.
In both cases, the root cause is the same. We were calling NextEvent() too early, before the previous flush had completed, which signaled readiness to Lambda while post-response work was still in flight.

Figure 2. Failure mode under warm reuse when NextEvent() is called too early.
Revised Design
The Fix: Goroutine Chaining
We needed only one goroutine calling NextEvent() at any time. The solution was single-shot goroutines that exit after handling one event.
type ExtensionRunner struct {
client *ExtensionClient
nextEventReceived chan *Event
}
func (r *ExtensionRunner) fetchNextEvent() {
event, err := r.client.NextEvent(context.Background())
if err != nil {
// Signal the handler that the extension is unavailable.
// The handler will fall back to synchronous flushing for this invocation if event is nil
r.nextEventReceived <- nil
return
}
r.nextEventReceived <- event
}
After flushing completes, spawn the next goroutine:
func deferredFlush(extensionRunner *ExtensionRunner) {
flushCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
otelProvider.ForceFlush(flushCtx)
go extensionRunner.fetchNextEvent() // Spawn next
}
The lifecycle becomes a chain as visualized in the sequence diagram below:

Figure 3. Revised design using goroutine chaining.
Each goroutine handles one event and dies. The next spawns only after the previous exits. This approach ensures NextEvent is never called before the flush completes, so Lambda never receives a premature readiness signal.
Guaranteeing Execution Order
One question came up during code review. How do we ensure the flush happens after the handler completes?
func wrappedHandler(ctx context.Context, input Input) (Output, error) {
output, err := handler.Handler(ctx, input) // Line 1
go func() { ForceFlush() }() // Line 2. Simplified the force flush for brevity
return output, err // Line 3
}
The go statement on Line 2 is synchronous. Only the code inside the goroutine runs concurrently. The handler completes before the goroutine is spawned, and the response is returned microseconds later while the flush runs in the background.
How Lambda Determines When to Wait
One question came up during code review. If the event is sitting in a channel, how does Lambda know not to freeze?
Lambda doesn’t look at channels. It looks at HTTP connections. When our goroutine makes the blocking call to /extension/event/next, that HTTP connection stays open until Lambda sends an event. From Lambda’s perspective, as long as it hasn’t sent a response to that HTTP request, the extension is busy.

Figure 4. How Lambda decides whether the execution environment can freeze.
When we spawn the deferred flush goroutine, we don’t call NextEvent() yet. If the extension has not issued the next blocking /extension/event/next call, Lambda treats the extension as not ready, so the environment remains active. After flush completes, the extension issues the next blocking wait; once the invocation is complete and all extensions are in a waiting/ready state, the environment becomes eligible to freeze.
Validation and Results
Production Results
We deployed the change to our entity endpoint first, which is serving approximately five requests per second. Typical request latency before the change was in the 25ms to 200ms range, but a small fraction of requests would occasionally hit our 10s API Gateway timeout. When we correlated those outliers in Honeycomb, the requests that timed out had unusually large telemetry.flush_trace durations, which pointed to synchronous flushing as a contributor to the long tail.
After moving flush work off the response path and gating extension readiness on flush completion, API Gateway latency stabilized:
- p50: 20ms
- p95: 150ms
- p99: 200ms
We monitored the entity endpoint for two weeks before rolling the pattern out to other endpoints. We deployed directly and relied on existing alerts and paging to catch any regressions. We then rolled the pattern out to other endpoints that had similar tail behavior. To validate telemetry integrity, we monitored trace volume and telemetry.flush_trace distributions during rollout and did not see a coverage regression.
Two failure modes are worth addressing explicitly. First, let’s look at a runaway flush. We cap the deferred flush with a context.WithTimeout of ten seconds. If the exporter stalls beyond that, the context is cancelled and the next invocation can proceed rather than hanging indefinitely.
flushCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
otelProvider.ForceFlush(flushCtx)
Second, there are silent flush failures. Because the deferred goroutine runs outside the request context, errors don’t propagate back to the caller. We log flush errors as errors on each provider individually so a single provider failure doesn’t abort the others, Sentry captures any exceptions from the flush goroutine, and we monitor telemetry.flush_trace duration distributions in Honeycomb to catch sustained degradation before it becomes an incident.
Cost Implications
An important point to consider: This approach doesn’t reduce Lambda costs. Whether you flush synchronously or asynchronously, Lambda bills you for the total execution time. For the sake of this example, let’s use p95 numbers for handler latency (i.e., 150ms).
Before: Handler (150ms) + Flush (10s) = 10.15s billed
After: Handler (150ms) + Flush (10s) = 10.15s billed
The benefit is user-facing latency, not cost. Your AWS bill stays the same, but clients’ responses will not occasionally timeout. For user-facing APIs, this trade-off makes sense. For background jobs where latency doesn’t matter, synchronous flushing is simpler.
When to Use This Pattern
We applied this pattern to all of our Lambda functions that were serving APIs. The others that are part of asynchronous flows still use synchronous flushing.
Use deferred post-response work when:
- Serving synchronous API requests
- Response time impacts user experience
- Handler is fast (under 500ms)
- Post-response work adds meaningful overhead (100ms+)
A concrete example beyond telemetry: our wire validation API accepts a wire request, generates a validation ID, persists the record to the database, and immediately returns a “202 Accepted” response, without blocking on the full validation pipeline. The deferred action then triggers the processing post-response, running checks, handling approvals, and recording outcomes. The extension keeps the environment alive long enough for that work to begin, without the caller ever waiting. Unlike the telemetry flush, which uses goroutine chaining, the wire validation deferred work collects actions into a list during the request and executes them sequentially in a single goroutine after the response is returned. This approach suits multi-step pipelines where ordering matters and uses the same extension mechanism, where withholding the NextEvent call keeps the environment open until all actions complete.
Situations Inappropriate for This Pattern
Lambda Timeouts
If your function timeout is set to three to five seconds, the deferred flush has very little runway to complete. The context.WithTimeout guard helps, but if flush regularly takes longer than the remaining time after the handler completes, you’ll either lose telemetry or hit the function timeout anyway. Similarly, if you are running post response tasks those need to be guaranteed to finish within the timeout window. If they cannot be guaranteed to complete within the timeout window, a more robust async job framework is a better fit. Because we use an internal extension, Lambda manages shutdown automatically when the function timeout is reached. The flush context is not automatically cancelled, but Lambda will force kill the process when the timeout expires. This approach to cancellation is why the context.WithTimeout on the flush matters and why setting your function timeout comfortably above your handler duration plus flush cap reduces the risk of hitting this window mid-flush.
Your Handler is Already Slow
If your business logic takes multiple seconds, deferring flush saves relatively little. Worse still, the extension now keeps the environment alive even longer after each invocation, which means Lambda bills you for more time without a meaningful latency improvement for the caller. The math only works when the handler itself is fast and the deferred work is the outlier.
Flush Failures Can Silently Accumulate
Deferred work runs outside the request context, so errors don’t propagate back to the caller. If your telemetry exporter fails repeatedly, those failures won’t surface as 500s, they’ll show up as gaps in your observability data at the worst possible time, like during an incident. You need explicit error logging and alerting on flush failures, or you will lose the visibility you were trying to protect.
Processing Background Jobs or Async Tasks
If the function isn’t serving a synchronous user request, there is no latency to protect. Synchronous flushing is simpler and easier to reason about.
Operational Considerations
Burst Traffic
Under burst traffic, Lambda scales by spinning up new environments in parallel, each with its own independent state. There is a caveat: If you have reserved concurrency set to a low value, Lambda won’t spin up additional environments and requests will queue or get throttled instead. In that case, the effective throughput of each environment is reduced because flush time adds to the total time the environment is busy between invocations.
Container Recycling
Lambda will not recycle the environment while a flush is in progress because the extension will not yet have called NextEvent. Lambda waits for that signal before freezing or recycling. The only scenario where telemetry could be dropped is if the flush hangs indefinitely, which is exactly what the ten-second context.WithTimeout guards against.
Cold Starts
Registering an internal extension adds negligible latency. The /extension/register call is a localhost socket round-trip measured in microseconds. The real cold start cost is any work the extension does before its first /event/next poll. Our extension registers and immediately blocks on NextEvent, so the overhead is minimal.
Lessons learned
- Extensions are less commonly used than layers or in-process libraries, but they solve real problems for post-response work beyond just telemetry.
- Timing bugs are easy to introduce and hard to debug. Our initial approach called NextEvent too early, signaling readiness to Lambda before the flush completed. Our initial approach seemed correct but broke under concurrent load. Single-shot goroutines removed a class of lifecycle timing bugs.
- Sequential execution provides safety without locks. By ordering code carefully and understanding Go’s execution model, we get correct behavior without employing complex synchronization primitives.
- We didn’t want to reduce telemetry or sampling just to win latency back, so we changed where the flush happens in the lifecycle.
Conclusion
Lambda’s execution model assumes your function returns when work is done. When you need post-response work, the Extensions API provides the mechanism. By treating telemetry flushing as background work, we removed flush-driven gateway timeouts while maintaining full observability. The pattern applies beyond telemetry. This approach is useful when post-response work like cleanup, logging, metrics, or async jobs must complete before the Lambda freezes. The implementation requires careful concurrency handling, but the payoff is substantial. For user-facing Lambda APIs where response time matters, deferred flushing with the Extensions API is a worthwhile optimization.
