Some years back, I remember feeling fancy about my AWS Lambda and RDS setup until we had to process some large data from a CSV. In summary, it was a bad day. Database memory usage flew up, database connections flew up, my heart flew up, and the whole thing crashed in my face.
Yes, you can assume load testing was never invented.
It’s been a long time since then, and the issue is long gone. However, I still meet developers struggling with the same problem to this day, so I decided to share my experience and how we resolved it.
Context
My team needed to process data from CSV files periodically, which required extracting the data, computing the data, updating the database, and communicating with other services.
For extracting and computing, we used separate Lambdas and communicated via SQS/SNS.
For the database, we went with Aurora Serverless v1 because the processing was periodic. Or that’s what we want you to think. We are cheap people, and we’ll always go for the cheapest best option!
Aurora Serverless V1
Aurora Serverless is an AWS-managed database that scales on demand. It is normally more expensive than regular RDS with a heavy traffic load, but its ability to drop to zero during inactivity does magic. That means you don’t get to pay when the database is not in use.
Pros |
Cons |
---|---|
Auto-scaling (eventually) |
Cold start delays up to 30s |
Can shut down when idle |
Dropped transactions during cold boots |
No instance to manage |
Scaling latency under load |
Why did it fail?
The major issue was the difference in scaling speed between AWS Lambda and RDS.
AWS Lambda is widely known for its scalability, and this is no joke. AWS Lambda functions scale horizontally in seconds, with each synchronously invoked function able to scale by 1,000 concurrent executions every 10 seconds.
Each new invocation may spin up a new environment (cold start) or reuse an existing environment (warm start). In most cases, connections are not shared across instances, and that means N invocations produce an equivalent of N environments, which produces N connections to the database.
Aurora Serverless can’t keep up. So they pile up and eventually, we hit the dreaded “Too many connections” error from the database.
Fixes We Attempted
- Retry Mechanism
We added retries with exponential backoff. It helped… barely.
- Connection Pooling (Don’t Do It)
We tried connection pooling inside Lambda. Never worked. As explained earlier, new invocations create new environments, therefore, pools can’t be reused.
- Leveraged Lambda Execution Freezing
Lambda tries to be helpful by freezing (sort of caching) the execution environment after a run. If Lambda is invoked again soon, AWS does a warm start, “thaws” the function, and reuses the environment.
Part of what is frozen by AWS are global variables. You can take advantage of this to reuse DB connections between invocations (if you’re lucky to get a warm start).
Don’t Do This
import os
def book_handler(event, context):
db_url = os.getenv("DB_URL")
db_client = db.connect(db_url) # Opens a connection on every call
book = db_client.get(book_id=event["book_id"])
return book
Do This Instead
import os
db_url = os.getenv("DB_URL")
db_client = db.connect(db_url) # Reused if Lambda stays warm
def book_handler(event, context):
book = db_client.get(book_id=event["book_id"])
return book
This minimizes connection churn and keeps the database happy. However, for large requests, this all means nothing.
Solution At Last
RDS Proxy
In late 2019, AWS announced RDS Proxy, and it became GA (General Availability) in mid-2020, and a game changer.
Amazon RDS Proxy is a fully managed database proxy for Amazon RDS. It acts as a connection pool between your application and the database, reducing the stress on database resources and improving application performance.
Why it’s your friend:
RDS Proxy provides lots of advantages that will help you sleep at night, these include:
- Manages connection pooling for you
- Works across multiple Lambda instances
- Handles auth and failover
- Reduces DB memory pressure
Caveats:
- Adds a small latency (~few ms)
- Not free, but worth it
Unfortunately, it doesn’t have support for older RDS engines, and this includes Aurora Serverless V1.
Aurora Serverless V2 to the Rescue
When Aurora Serverless V2 came out, things improved significantly.
- It scaled faster than v1.
- It had no cold start delay.
- Still cost-effective for bursty traffic
- And best of all, it had support for RDS Proxy!
TL;DR: The Golden Combo
Here’s the stack I now recommend, especially for periodic processing:
- Lambda (use warm connections wisely)
- Aurora Serverless V2
- RDS Proxy
- Retry logic with backoff
- SNS/SQS for decoupling workloads
Final Thoughts
Using Lambda with RDS used to feel like mediating the US and China trade agreement, but with RDS Proxy and Aurora Serverless V2, the dream of serverless with relational DB is now actually viable.
That said, don’t blindly go all in. For long-running or batch-heavy DB jobs, sometimes containers or Fargate are a better fit, and DynamoDB (NoSQL) does a better job at scaling.
But if you’re sticking with Lambda (and I don’t blame you).
Just remember:
- Don’t open DB connections inside the handler
- Use Aurora Serverless V2 or DynamoDB
- Prefer RDS Proxy for RDS
- Load test your setup
- And don’t trust blog posts too easily, especially this one. Test for your use case.