AWS recently announced that it enhanced its Lambda response streaming capability by increasing the maximum default payload size from 20 MB to 200 MB, which allows developers to stream larger, high-data-volume responses directly from a serverless function.
Response streaming is an invocation mode that allows a Lambda function to send partial results back to a client as they are produced, rather than waiting to buffer the entire response. This mechanism is particularly beneficial for improving Time to First Byte (TTFB) performance, as it enables the client to start rendering or processing data immediately.
Previously, handling large responses that exceeded the 20 MB limit required developers to implement complex workarounds, such as compressing payloads or using intermediary services like Amazon S3.
Tobias Smidt, a freelance consultant, wrote in a LinkedIn post:
Before, if you needed to send more than 20 MB, you had to compress, chunk, or offload to S3 via presigned URLs. More moving parts, more latency (and more code to maintain! Probably the worst trade-off before). Now you can stream up to 200 MB straight from Lambda. No more S3 handoff for every oversized response.
(Source: LinkedIn Post)
The new 200 MB limit allows developers to process and stream large datasets, image-heavy PDF files, or even music files directly within Lambda, the company wrote in the announcement. Yet, Jin Tan Ruan, a CSE computer science – AI engineer, wrote in a LinkedIn article:
For generative AI and other data-intensive workloads, this is a game-changer. 200 MB is a substantial amount of data – roughly equivalent to ~200,000 pages of text or dozens of high-resolution images. By handling this size of payload in one go, Lambda can now return rich AI-generated content directly to users.
However, a respondent asked on a Reddit thread:
What about the API gateway?
With another responding:
It’s sadly not yet supported – only Lambda function URLs currently.
Also, Ivo Pinto, a Principal Cloud Architect, pointed out in another LinkedIn post:
What 20 MB → 200 MB unlocks:
Text: ~5M → 50M characters (~20K → 200K typical LLM tokens)
PDFs: ~200 → 2,000 pages with images
Images: ~20 → 200 high-res processed results
Audio: ~3 → 30 minutes of processed/enhanced audio files
Basically, eliminate bypassing complex chunking logic for outputs exceeding 20 MB.
But remember, Lambda still has a 15-minute execution limit.
Lastly, Lambda response streaming supports Node.js-managed runtimes as well as custom runtimes. In addition, the 200 MB response streaming payload limit is the default in all AWS regions where Lambda response streaming is supported.