Avoiding ‘Too Many Connections’ In Lambda + RDS Workflows

Some years back, I remember feeling fancy about my AWS Lambda and RDS setup until we had to process some large data from a CSV. In summary, it was a bad day. Database memory usage flew up, database connections flew up, my heart flew up, and the whole thing crashed in my face.

Yes, you can assume load testing was never invented.

It’s been a long time since then, and the issue is long gone. However, I still meet developers struggling with the same problem to this day, so I decided to share my experience and how we resolved it.

Context

My team needed to process data from CSV files periodically, which required extracting the data, computing the data, updating the database, and communicating with other services.

For extracting and computing, we used separate Lambdas and communicated via SQS/SNS.

For the database, we went with Aurora Serverless v1 because the processing was periodic. Or that’s what we want you to think. We are cheap people, and we’ll always go for the cheapest best option!

Aurora Serverless V1

Aurora Serverless is an AWS-managed database that scales on demand. It is normally more expensive than regular RDS with a heavy traffic load, but its ability to drop to zero during inactivity does magic. That means you don’t get to pay when the database is not in use.

Pros	Cons
Auto-scaling (eventually)	Cold start delays up to 30s
Can shut down when idle	Dropped transactions during cold boots
No instance to manage	Scaling latency under load

Why did it fail?

The major issue was the difference in scaling speed between AWS Lambda and RDS.

AWS Lambda is widely known for its scalability, and this is no joke. AWS Lambda functions scale horizontally in seconds, with each synchronously invoked function able to scale by 1,000 concurrent executions every 10 seconds.

Each new invocation may spin up a new environment (cold start) or reuse an existing environment (warm start). In most cases, connections are not shared across instances, and that means N invocations produce an equivalent of N environments, which produces N connections to the database.

Aurora Serverless can’t keep up. So they pile up and eventually, we hit the dreaded “Too many connections” error from the database.

Fixes We Attempted

Retry Mechanism

We added retries with exponential backoff. It helped… barely.

Connection Pooling (Don’t Do It)

We tried connection pooling inside Lambda. Never worked. As explained earlier, new invocations create new environments, therefore, pools can’t be reused.

Leveraged Lambda Execution Freezing

Lambda tries to be helpful by freezing (sort of caching) the execution environment after a run. If Lambda is invoked again soon, AWS does a warm start, “thaws” the function, and reuses the environment.

Part of what is frozen by AWS are global variables. You can take advantage of this to reuse DB connections between invocations (if you’re lucky to get a warm start).

Don’t Do This

import os

def book_handler(event, context):
    db_url = os.getenv("DB_URL")
    db_client = db.connect(db_url)  # Opens a connection on every call
    book = db_client.get(book_id=event["book_id"])
    return book

Do This Instead

import os

db_url = os.getenv("DB_URL")
db_client = db.connect(db_url)  # Reused if Lambda stays warm

def book_handler(event, context):
    book = db_client.get(book_id=event["book_id"])
    return book

This minimizes connection churn and keeps the database happy. However, for large requests, this all means nothing.

Solution At Last

RDS Proxy

In late 2019, AWS announced RDS Proxy, and it became GA (General Availability) in mid-2020, and a game changer.

Amazon RDS Proxy is a fully managed database proxy for Amazon RDS. It acts as a connection pool between your application and the database, reducing the stress on database resources and improving application performance.

Why it’s your friend:

RDS Proxy provides lots of advantages that will help you sleep at night, these include:

Manages connection pooling for you
Works across multiple Lambda instances
Handles auth and failover
Reduces DB memory pressure

Caveats:

Adds a small latency (~few ms)
Not free, but worth it

Unfortunately, it doesn’t have support for older RDS engines, and this includes Aurora Serverless V1.

Aurora Serverless V2 to the Rescue

When Aurora Serverless V2 came out, things improved significantly.

It scaled faster than v1.
It had no cold start delay.
Still cost-effective for bursty traffic
And best of all, it had support for RDS Proxy!

TL;DR: The Golden Combo

Here’s the stack I now recommend, especially for periodic processing:

Lambda (use warm connections wisely)
Aurora Serverless V2
RDS Proxy
Retry logic with backoff
SNS/SQS for decoupling workloads

Final Thoughts

Using Lambda with RDS used to feel like mediating the US and China trade agreement, but with RDS Proxy and Aurora Serverless V2, the dream of serverless with relational DB is now actually viable.

That said, don’t blindly go all in. For long-running or batch-heavy DB jobs, sometimes containers or Fargate are a better fit, and DynamoDB (NoSQL) does a better job at scaling.

But if you’re sticking with Lambda (and I don’t blame you).

Just remember:

Don’t open DB connections inside the handler
Use Aurora Serverless V2 or DynamoDB
Prefer RDS Proxy for RDS
Load test your setup
And don’t trust blog posts too easily, especially this one. Test for your use case.

Avoiding ‘Too Many Connections’ in Lambda + RDS Workflows | HackerNoon

Context

Aurora Serverless V1

Why did it fail?

Fixes We Attempted