Transcript
Mao: My name is George. I am currently a Senior Distinguished Engineer at Capital One. I lead a lot of our AWS serverless technology implementations. I’m responsible for helping our teams implement best practices and everything we do on AWS. Before I joined Capital One, I was the tech leader at AWS for serverless computing, so I spent a lot of time, basically since the beginning of 2015, where serverless was first created at Amazon.
Capital One is one of the largest banks in the United States. We’re generally somewhere at 10 or 11 in terms of ranking. We’re not that big internationally. We do have a pretty good presence in the UK, but that’s about it. What’s unique about us is we’re mostly structured like a tech organization, so we have about 9,000 software engineers. In 2020, we completed our all-in migration into AWS. As far as I know, I think we’re one of the only major banks in the world that has ever done an all-in like this. Now what we’re trying to do is modernize our entire tech stack running in the cloud. What that means is becoming more cloud-native, taking advantage of all of the AWS managed services, and then becoming more efficient in the cloud.
Outline
This is what we’re going to talk about. I’ll cover why we decided to make this journey. In chapter 2, we’ll talk about some of the lessons that we’ve learned, and I’ll share with you so that you might not run into some of the trouble that we ran into. Then we’ll go through a bunch of best practices that you can take home and implement in your organizations.
Chapter 1: Why Did Capital One Adopt a Serverless-First Approach?
Why did Capital One adopt a serverless-first approach? Many of you are in the financial industry, in banking, or in related industries. Capital One has a ton of regulations and a ton of things that we have to follow to meet our auditing and compliance needs. A lot of that stuff stems from vulnerability assessments, to addressing problems and all kinds of issues that we find that has to be addressed immediately. An example is like, every 60 to 90 days, we have to rehydrate an EC2 instance, regardless of what we’re doing with that instance. By our measurements, on an average team of 5 engineers, that team spends 20% of our time simply working on EC2, delivering things that don’t really add value, but we have to do because of the industry that we’re in. This is basically the gold standard of a traditional architecture that Amazon tells us to implement.
For highly available, you would deploy EC2 instances across multiple availability zones, two at least, at Capital One we do at least three. Then you would just create autoscaling groups so that they can spin up and down as they need. The goal here is to allow Amazon to handle the scaling of your instances based on metrics or failure. Then you have load balancers and NAT gateways ahead of them so that they can front your traffic and then spread load across your clusters. When you have an environment like this, think about the things that you have to maintain. This is just a small list. We have to maintain the EC2 infrastructure, the networking behind it, all the IP addresses, the VPC subnets, the AMIs that go on to the instances, updates, patches, scaling policies, everything that is in that picture, some engineer has to touch. What you’ll notice is none of this stuff adds any value to your customers. All of this is basic needs that you have to deliver to make your applications work in a traditional architecture.
Pre-serverless, our responsibility looked like this. We would deploy stuff to the cloud, and then we’d deploy infrastructure to AWS, and what that really means is EC2 compute. We’d choose operating systems that go on the EC2 instances. Then, generally, we containerize our applications. I think that’s becoming the standard these days. Then we run app servers on these containers. This is a tried-and-true method that most enterprises run today. Then we deploy our business apps that run on top of them. When you go to capitalone.com, all of the stuff that the customers see go top-down through this stack. Everything below business apps is what we call run-the-engine tasks, so things that are necessary behind the scenes to even begin deploying applications on top. If you talk to AWS, they’ll use a term called undifferentiated heavy lifting.
If anybody has spoken to AWS people, they like to say that a lot. It’s basically things that your developers hate doing. I don’t know how to do any of this stuff. I know how to write app code. I’m not a EC2 engineer. When you move into serverless, your architectures generally are event-based, and they really become one of three types. Synchronous, an example would be, you create a REST API. Requests come through API Gateway, and then API Gateway drives requests to your Lambda functions. An example would be, maybe you have an order submitted on your website, and that’s an event, but that’s a synchronous event because it needs to return an order ID to your customer who is waiting for that confirmation. If you can do asynchronous workloads, that’s even better, because then you can decouple the work that’s happening at the frontend with what’s happening at the backend. Has anybody purchased something from amazon.com before? I have a package arriving every other day or something at my garage.
All the orders are asynchronous. You click order, your credit card isn’t charged immediately, they have an order processing system. They can take hundreds and millions of orders without even having a system up on the backend that’s processing them. It’s decoupled and asynchronous. That’s actually the best way to write serverless applications. The last piece is poll-based. One of the best and unknown features of AWS is they have something called a poller system. A poller system is their fleet of workers that will poll certain event sources on your behalf and deliver records from those event sources to your Lambda functions. You don’t have to do any of that work. Examples are DynamoDB, Kinesis, SQS, anything that’s in those data sources, AWS will poll and deliver to you. That removes all of the scaling and the need that you have to do in order to process those events.
If you look at serverless architectures, generally, all of that stuff at the bottom is just handled by AWS. We don’t have to do any of that stuff. We just decide, do we want to run Lambda, which is Functions as a Service, or Fargate, which is Containers as a Service. Then, we just write our business logic right on top of that. Our engineers are basically only working with that top box. The first thing they do is write application code. They don’t have to worry about patching and operating systems and all that stuff. Engineers love this. Our developers really like this type of development. That means there’s no more burden on our developers. All of that time spent doing all those EC2 activities are just entirely gone. We all know that human costs are generally the most expensive piece of any application team. That’s why we moved into serverless. Today, we are trying to be serverless first, everywhere, where possible. That’s our goal. We’re still pushing forward into that space.
Chapter 2: Lessons Learned, and the Launch of Our Serverless Center of Excellence
We’ve learned a lot of lessons, and I’ll share some with you, so that if you’re doing this exercise, you won’t run into some of the challenges that we learned. There is going to be a learning curve. A beginner serverless developer generally will write Lambda functions in the console. Who’s done this before? You can write app code directly in the console. It’s really cool because you can save it and execute that function immediately. The bad news is there’s no CI/CD, and this goes right out to production if it’s there, and you can change it at any time without any version control. You also can’t debug or trace a Lambda function in the console.
For those who have worked on Lambda, there is no way to debug or trace. What do you do? Basically, you write print statements everywhere. Don’t copy this code and put it into production, but all it’s doing is writing print statements so that I can see the value of these variables that I have. Back when Lambda was first released, this was the only way to test functions. Everybody did this because there was no other way to test functions. Today, there’s a tool called SAM. It’s the Serverless Application Model. It comes in two pieces. One is the CLI, which you install locally on your machine. What that will do is you’ll basically install an image of the Lambda container on your machine as a Docker image. This will allow you to run your Lambda functions locally exactly as it would be in the AWS environment. That means you’ll see log generation. You’ll see exactly the same thing you would see if you ran it live in AWS.
Second, you can use SAM to perform your CI/CD deployment. It’ll do deploys. It’ll do code synchronization. It’ll do everything that you can do to push it through your development stack. If anybody has used CloudFormation, it’s pretty verbose. You can have a 50-page template for your application. That’s not great. What Amazon has done is they’ve created a shorthand syntax for serverless components that make it a lot more concise. Here’s an example. I’m writing two Lambda functions. First one is called FooFunction. Second one is called BarFunction. They’re both Node.js based 16, both memory size 128. Entry points are defined by the handler property. Just with five lines of code for each function, this will deploy into AWS without a massive CloudFormation template. The backend AWS is translating this to a real CFT, CloudFormation. You don’t have to worry about any of that translation. We use this everywhere. I encourage all of our engineers to move to this method because you can test applications really easily.
The next thing that was new for us is that the unit of scale for Lambda is concurrency. That’s a brand-new concept to almost everybody touching serverless. The traditional unit of scale is TPS, RPS, transactions per second, requests per second. That drives how wide you need to scale your EC2 cluster. With Lambda, it’s a little bit different. Concurrency is the number of in-flight requests that your Lambda functions are processing at any given second. Lambda only bills us when we run them. If you’re not running anything, there’s no cost. That’s really cool. What that means is when you’re not running anything, there are no environments available to run your functions. The very first time you have to run your function, it goes through something called a cold start.
A cold start is all of the work Amazon has to do to bring your code into memory, initialize the runtime, and then execute your code. That pink box right there is all of the overhead that’s going to happen before your function can begin executing. Once your function’s warm, the second time it’s invoked, it doesn’t have to go through that method. It’s going to be warm, and that’s what Amazon people will talk to you about as warm starts. The second invoke is going to be really fast. This will drive your concurrency across your entire fleet of Lambda functions. You could have 1,000 concurrent functions that you need to scale to 2,000. All of those new containers are going to go through this cold start. Keep that in mind. That’s usually the first thing Lambda engineers run into. I talk about this formula all the time with our engineers. This is the formula that Amazon uses to measure concurrency, and it’s average requests per second, TPS, driven to Lambda, multiplied by the average duration in seconds.
If you look at these three examples here, we’re all driving 100 TPS, RPS. These Lambda functions run at about half a second, so 500 milliseconds. That means your concurrency needs are going to be 50. It actually drives down your concurrency needs because you’re running for under a second. If you double your duration to 1 full second, your concurrency now is going to be 100. If you double that again, same TPS, but now you’re running for 2 seconds, your concurrency needs are 200. You’re going to need 200 warm containers serving all of this traffic, and you have to be able to scale into that. This is a concept that you’ll likely have to work through as you walk into your serverless journey.
The next thing here is, before we ran into serverless, or started working on serverless, our infrastructure costs were generally managed by our infrastructure team, and our developers were not really concerned with cost. With Lambda, everybody is responsible for cost. At re:Invent 2023, one of the 10 top tenets that Amazon gave us was, everybody is responsible for cost, and that’s 100% true when you move into serverless. Lambda has two pricing components.
First is number of invocations per month, and it’s tiny, it’s 20 cents per million. We don’t even look at this. This first component of the formula, we just ignore, because it’s basically dollars. The second is compute. If compute is measured in gigabyte seconds, and that sounds complicated, but gigabyte seconds is the combination of memory allocated to your function multiplied by the duration that function runs for. Memory allocated in megabytes times the milliseconds that that function runs for. The bottom line there is just focus on the compute cost. The number of invocations is relevant. You can run 1 million invokes for free on every account forever. If you’re under that, you could run Lambda for very cheap. Going along the same thing that we learned is every Lambda function operates and generates a report structure in CloudWatch logs every single time it’s invoked. There’s always going to be a start, always going to be an end, and always going to be a report line. The report line is the most important line that you should be aware of.
What you’re going to see there is, at the bottom in the report line, they’ll give you all of the metrics that you need to understand how your function executed. One of the most important ones is duration. This function ran for, it’s a little bit small, but 7.3 milliseconds. It was billed for 8 milliseconds. Anybody know why? Lambda rounds us up to the nearest 1 millisecond. It’s the most granular service that AWS, or I think any cloud provider offers. Everybody else is either at 1 second or 100 milliseconds. This really represents pay-for-use. It’s the best service that we can find that’s pay-for-use. I configured this function at 256 megs, max memory used is 91 megs. Remember, Amazon bills us at memory configured, not used. This is a piece of confusion that my engineers run into a lot. It doesn’t matter if you use 1 out of a gig, Amazon’s going to bill you for a gig of memory. We’ll get into that. Sometimes there’s a great reason for why you might overprovision memory.
Capital One, we operate thousands of accounts. We have over 1000 accounts. We have tens of thousands of Lambda functions spread out across those accounts, which means we have to be able to handle compliance. We have to be able to control these functions. We have to have standards so we can do these things. Metrics and logs, we have to understand how long to save them for and be able to maintain these functions.
In order to do that, we learned that we needed to create a center of excellence because what we were doing before was, we were making isolated decisions across single lines of businesses that would affect other lines of businesses. That creates tech debt and it creates decisions that have to be unrolled. We created a center of excellence and now we use it to basically talk to all of our representatives in each line of business that we can make a correct decision. I’ll talk through some examples that we’ve worked on.
Some of the sample things that our center of excellence leads is everything from Lambda defaults. What should a Lambda default be? What are the programming languages that we even allow? What are their naming conventions or the default memory settings that we’re going to choose? AWS regularly deprecates runtimes because Java 8 is deprecated. They don’t want to support Java 8. We also talk about how we want to deprecate our runtimes because if we wait too long and Amazon’s deprecated theirs, we’re not going to be able to deploy on these deprecated runtimes anymore. The center of excellence also handles something really important, which is training and enablement. We host a serverless tech summit twice a year. We have internal certifications on serverless. We have continuous enablement, and to educate our engineers on a regular basis.
Here’s an example of a development standard. You can create an alias that points to a Lambda function, and that alias is just like a pointer. You can use that to invoke your function. We mandate that every development team uses a standard alias called LIVE_TRAFFIC. That is the only entry point for my function. What this does is it allows me to jump across any development team and understand where this function is executed from and what all the permissions are. I work across every dev team that exists at Capital One, and this helps me a lot. Many other people could be transitioning from one team to another and they can easily onboard really quickly. Another thing that we standardize is we require versioned rollouts for all Lambda functions so that we can roll back if there’s a problem. We require encryption on our environment variables. We don’t want to have sensitive data exposed in environment variables.
Other thing is, if you’re working at AWS, you can tag nearly every resource out there. It’s just a key-value pair to give you some metadata. Basically, we have a set of standardized tags that will help us understand who owns this application, who to contact if there’s a problem, and who gets paged, essentially. Some other things here, IAM, we have some standardized rules on IAM and what you can and can’t do. Mostly is with no wildcards anywhere in your IAM policies.
Then, we have open-sourced an auditing tool called Cloud Custodian. It’s just cloudcustodian.io, but we actually use this to audit all of these rules that we’re putting in place. If anybody deploys anything that doesn’t meet these standards, it immediately gets caught. Also, I highly encourage you to use multi-account strategies. What we do is we deploy an account per application group. Then, we give that application group multiple accounts representing each development tier, so dev all the way through pod. What that allows you to do is separate blast radius, but also give you separate limits, AWS limits on every account.
Chapter 3: Best Practices for All – Development Best Practices
We’re going to talk about best practices that I’ve learned throughout 10 years of working with serverless. We’ll start with development best practices. Here’s a piece of sample code. Basically, the concept here is, don’t load code until you need it, so lazy load when you can. If you look at the top here, the very first line, that is a static load of the AWS SDK, just the DynamoDB client, and it’s just some SDKs allowing me to list tables in my account. It’s going to do that on every single invocation of this function, but if you look at the handler method, down below, there are two code paths. The first code path actually will use this SDK. It’s going to do this interaction with Dynamo. The second code path isn’t going to do anything with Dynamo.
However, on every invoke of this function, any cold start is going to load in this SDK. 50% of my invocations, in this case, are going to go through a longer cold start because I’m pulling in bigger libraries and more things than I need. What you can do, a really good strategy is lazy load. In the same example, if you define the same variables up ahead in the global scope, but you don’t initialize them, down in the handler method, you can initialize those SDKs only when you need them, so on the first code path, right there, that first if statement. What you need to do is you need to check if those variables are initialized already.
If they’re already initialized, don’t do it again. This is going to avoid extra initialization, and 50% of the time, it’s going to go to the second code path. You need to look at the profile and anatomy of your function and see what code path your applications are following. If you have anything that has separate paths like this, I highly encourage you to lazy load what you can, not just the SDK, but anything else that you might be using as dependencies.
Next concept is, use the right AWS SDK. If you look at Java, the version 1 of the Java SDK was created before Lambda was even in existence. What that meant was the SDK team had no idea that they needed to optimize for Lambda. That SDK is 30-plus megs. If you were to use the version 1 Java SDK, you’re going to have 30 megs of dependencies. Use all of the latest SDKs for Java. You want to use version 2. It allows you to modularize and only pull in the pieces that you need. Same thing with Node. For those who are using Python, you’re lucky. They do upgrade in place on Boto3, so you don’t have to do anything. We continue to use Boto3. Next thing here is, try to upgrade to the latest runtimes for Lambda. Because what Amazon does is they will upgrade the images they use behind the scenes. What you’ll notice is the latest runtimes for Lambda, so Node 20, Java 21, and Python 3.12 and beyond, use what Amazon calls Amazon Linux 2023. That image is only 40 megs. Everything else before uses AL2, Amazon Linux 2, which is 120 megs.
Behind the scenes, it’s just a lot more efficient. You’re going to cold start better, perform a lot better. Then, I know you guys have Java 8 running around everywhere. We did. We still do. If you can get out of it, simply moving from Java 8 to Java 17 gives you a 15% to 20% performance boost. That’s a free upgrade if you can get there. Next is just import what you need. Don’t import extra things like documentation and sample code and extra libraries, because in AWS, when you’re running these, they’re not going to be useful. You’re not going to be able to read them.
An example here, this is a Node package.json. I accidentally imported mocha, which is my test suite, and esbuild. None of those things are going to be useful when I’m running my Lambda function. All they’re going to do is add to the package size. Lambda actually has a package size limit. You can only deploy a 50-meg zip or 250 megs uncompressed. If you have too many libraries, you’re going to run into this limit and you’re not going to be able to deploy.
One of Gregor Hohpe’s main concepts is always to use AWS configuration and integration instead of writing your own code where possible. Think about this piece of architecture where if your Lambda function needs to write a record to Dynamo, and then there’s some other resource waiting to process that record, we could do it like this, where the Lambda function first writes to Dynamo, it waits for the committed response, and then it publishes a notification to SNS or SQS telling the downstream service that, ok, we’re done and we’re ready to process.
Then that downstream service may live on Lambda or EC2, wherever, and then it goes and queries the Dynamo table and processes the work. This is a fully functional app, it’ll work, but we can do better. What I would do is take advantage of out-of-the-box AWS features. You can write to Dynamo, and then within Dynamo, there’s a feature called DynamoDB Streams, and it’s basically a stream of changes that have happened on that table. You can set up Lambda to listen to that stream, so you don’t even have to poll the stream. All you’re really doing in this example is two Lambda functions: one is writing, one is receiving events. You’re not even polling. These will be cheaper, faster, easy to scale. In general, think about your application architectures and try to move towards this type of architecture. Use Lambda to transform data, not to move data. That’s the key principle that we have.
Then, the last development tip I have here is establish and reuse. Objects that are going to be used more than once should only be loaded once, globally. Every Lambda function has an entry point, and it’s called the handler method, right there in the middle. Everything outside of that is global scope. During a cold start, everything above that will be executed. During a warm start, entry point begins right at the handler method. All of the global scope stuff is held in memory and ready to go. A lot of times, we have to pull secrets in order to hit some downstream system. Secrets don’t change that often. What you can do is load it once during global scope and reuse it every time you get warm invokes down below. Just make sure you’re checking to see if that warm secret is available, not expired, and ok to use. You can use the same concept for pretty much anything that can be reused across Lambda invocations.
Build and Deploy Tricks
Next part is some tips on how to build and deploy your Lambda functions. We talked a little bit about this. Make sure you’re deploying small packages, as small as possible. Minify, optimize, and remove everything that you don’t need. Here’s a Node-based Lambda function. It’s written in SAM, which we talked about earlier. The name is called first function. It’s pretty basic. It’s just a Node.js function, memory size 256, and it’s using something called arm64 as the application, the CPU architecture. We’ll talk a little bit about that. This is a strategy for how I can build a really optimized function. I’m using esbuild.
For those who are doing Node stuff, esbuild is a very common build process. Once I use esbuild, it will create and generate a single minified file for deployment, and it will combine all dependencies and all source code into a single file. It’s not going to be human-readable, which doesn’t really matter, because you can’t debug in production anyways. I’m formatting as an es module, and then I’m just outputting into the esbuild. When I do an esbuild, this function is 3.3 megs in size. It’s got the AWS SDK in it, and it’s tiny. If I don’t do esbuild, it’s a 24-meg package, a standard zip. This is zipped and compressed with the AWS SDK. I have almost no source code in this, 24 megs. The largest I can get to is 50, so I’m already almost halfway there just because I included the AWS SDK. If we look at performance, this is a screenshot of a service called AWS X-Ray. X-Ray gives me a trace of the entire lifecycle of my function’s invocation. You can see it topped it down.
The first line is initialization, and that’s the cold start time my function took to really become ready to run. This is my esbuild function, and it took 735 milliseconds to cold start. The actual runtime was 1.2 seconds, so 1.2 minus 735 milliseconds is the actual invocation of my function. If we look at my standard zip file build for that function, it was at over 1,000 milliseconds, so 300 milliseconds slower. That’s basically 40% faster because I used esbuild, simply by changing the method of build for my application. This type of optimization exists for pretty much every language out there, but Node is my default programming language, so this is the example that I have. Next thing is, remove stuff that you don’t need or turn off things that you don’t want.
In Java, Java has a two-tier compilation process. By default, it’s going to go through both tiers. Tier one is standard compilation, tier two is optimization. Lambda functions are not generally living that long for tier two to have any good effect. You can just turn it off. There’s an environment variable called JAVA_TOOL_OPTIONS. You can set this and it’ll turn it off. I think 90% of the time, you’ll see cold start performance improvements when you do this.
Optimize Everything
Optimize, so memory allocation controls CPU allocation. What that means is, there’s a direct proportional relationship between memory and CPU. If you notice, you can’t specify CPU on your lambda function, only memory. If you have a 256-meg function, if you were to drop that to 128, that cuts your CPU allocation in half. Same thing, 512. If you were to double that to a gig, you get double the CPU power. Think about this. If you double your memory for your functions, can you run twice as fast in all scenarios? Is that fact or fiction? The answer is, it depends. Depends on your code and depends on if you’ve multithreaded your code to take advantage of the number of vCPUs Amazon gives you. It’s all dependent on the use case. The principle here is, you must test your application.
The best way to do that, that I found, is using Lambda Power Tuner. It’s an open-source tool. It’s going to generate a graph by creating multiple versions of your Lambda function at many different memory settings, and it’ll show you exactly which one is the best. Red line here represents performance. Basically, invocation time, lower the better. Blue line represents cost. Also, lower the better, but we’ll walk through this.
At 256 megs, we can see the cost is ok, pretty low, but performance is really bad, upwards of 10,000 milliseconds. If we move this function to 512, you can see cost actually drops a little bit, but performance increases drastically, time drops by a factor of two. If we continue increasing to 1 gig, we see more performance improvements, almost at no cost. Go to 1.5 gigs, we start seeing some increase in the invocation cost, and then past that, we’re basically wasting money. Every single Lambda function is going to perform differently based on your use case, based on your memory, based on your runtime. Make sure you’re running this code against your functions as you go through your QA and performance tests.
Billing basics. Lambda pricing, remember this, the formula is always memory configured times the duration it runs for. If you look at this chart here, it’s very interesting. We have three lambda functions, all running 1 million invokes per month, 128 megs for the first one, if it runs for 1,000 milliseconds, it’s going to cost you $2.28. That same function bumped up to 256 megs, if it were to run twice as fast, it’s going to cost you the exact same amount. However, if you bump it to 512, so you 4x the memory, you don’t improve performance, so back in that chart that we saw, it’s going to get a 4x increase in cost. Anytime you’re thinking about performance and cost tradeoff, it’s a direct proportional relationship on both sides of this formula. We talked a little bit about ARM. ARM is that little chip that’s in all of our mobile phones. It’s faster. It’s more cost efficient. It’s more power efficient. It’s going to be cheaper, 20% generally from AWS. Try to move to ARM if you can. It’s free to move, it doesn’t cost us anything.
Then, logs cost money. Logs are pretty expensive. They’re 50 cents to ingest, 3 cents to store. You get charged every month, forever. I’ve seen applications where logging costs more than the Lambda compute itself. When somebody from finance finds that, that’s generally not a fun conversation. Reduce logging when you can. Think about swapping to infrequent access, which reduces cost by 50%. There’s a tradeoff, you won’t be able to do live subscription features on those logs. You can set a retention policy as well. You can age out these logs as you need to based on your data retention policy. I like to use this as a guide here between levels of environments. That way, you don’t have logs around too long.
Observe Everything
The last area we’re going to talk about is observability. If you’re in AWS, there are tons of metrics out there and it really gets confusing. One of the most important ones at the account level is a metric called ClaimedAccountConcurrency. This is really just the sum of all possible Lambda configurations that are actively using concurrency in your account. By default, AWS will only give you 1,000 concurrent Lambda functions as a cap. It’s a soft cap. You can ask for more. Your goal here is to create an alarm off of this metric so that your SRE team can be warned when you’re approaching that hard cap, or the soft cap, which you can lift if you start approaching that.
Next thing here is, at a function level, we talked about Lambda operating a poller and then delivering those records to your functions on your behalf. There’s no metric that AWS gives us for that. I don’t know why there isn’t, but they don’t give us a metric. If SQS is delivering 5, 10, 20, 100 messages per second to your function, there’s no way for you to tell how many you’re getting. Make sure you create a metric on your own. What I would do is use Lambda Powertools for that. It’s a free SDK, open source. Here’s an example in Node on how to do that. It’s really easy. You can use something called the EMF format, which is the embedded metric format. It looks just like that. That’s the EMF format. It writes a JSON log into CloudWatch logs, which gets auto-ingested by AWS, and creates that metric for you.
That’s basically the cheapest way to create metrics. It’s much cheaper than doing PutMetricData calls. Those are really expensive calls. Try to avoid that API call at all costs. It’s really cool because it’s all asynchronous. There’s no impact on your Lambda performance.
Then these are the top things that we’ve put together that have caused us a lot of pain. Just be careful about setting maximum configurations for your Lambda functions. Usually, that results in high bills. You want to set lower configs. You want your functions to error and timeout, rather than allowing them to expand to the largest possible setting. Number two, don’t PutMetricData. That’s really expensive. Number three is, there’s a mode called provisioned concurrency. You can actually tell AWS to warm your functions up when you need them and keep them warm. The downside is, if you do that, it’s going to cost you money if you don’t use that concurrency. Be careful about setting that too high and be careful about setting the provisioned concurrency equal to anything that’s close to your account concurrency because that will cause other functions to brown out. Then, just think through the rest here.
The very last one I’ll talk a little bit about, which is, don’t use the wrong CPU architecture. Back when we talked about moving to ARM, not every workload performs better on ARM. If you think about your mobile phones, we can watch videos, we can send messages, and those cost no power. If you go on your computer, desktop at home, and you watch some YouTube video, it consumes a gigantic amount of power because it’s running on an x86 architecture. Your use case will have a heavy impact on the right CPU architecture. Use the right libraries compiled for the right CPU architecture. A lot of us are doing, like compression is a good example, or image manipulation. All of those libraries have compilation libraries for ARM and x86, and make sure you’re using the right one for the right place.
Questions and Answers
Participant 1: What’s the incentive for Amazon to provide decent performance? If the metric is time times memory, then why wouldn’t they just give all the serverless all the cheap rubbish CPUs that don’t perform very well?
Mao: If you think about how Lambda functions work, they’re not magic. Behind the scenes, when you want to invoke a Lambda function, that function has to be placed on an EC2 instance somewhere. What Amazon wants to do is optimize the placement of that container in their EC2 fleet so that they can optimize the usage of a single EC2 instance. If you think about an EC2 instance, it may have 3 gigs of memory. If I have a 1 gig function that I run for a long amount of time, and you’re not doing anything else, I might get placed on that 3-gig instance, and the rest of that instance is empty. That’s extremely wasteful for AWS. They don’t want to do that. What they actually want to do is they want to pack that instance as much as possible so that they can have high utilization and then pass on the EC2 savings to the rest of AWS. They’re incentivized for us to improve performance.
The worst-case scenario for them is I create a Lambda function and I run it once and never again, because they have to allocate that environment, and based on your memory setting, they have to decide what to do. There’s a gigantic data science team behind the scenes at Amazon that’s handling all of this. I don’t know the details anymore, but that’s what they’re paid to do.
Participant 2: Can you talk more about how Capital One does automated testing with so many Lambdas? You mentioned you use, I think it was called SAM. Do you use that in your CI pipelines as well for testing?
Mao: Every release that goes out there, basically every merge or commit into main ends up running our entire test suite and we use SAM to do most of that stuff. SAM is integrated right into our pipeline, so it executes all of our unit tests and acceptance tests right in the pipeline. We customize all of it to work for SAM, but at the beginning, none of this existed, because EC2 doesn’t have any of this. We had to upgrade our entire pipeline suite to handle all of that.
Participant 3: Lambda functions now can support containers and it has way higher resources, you can have bigger container images. My question is about performance, especially cold starts. Have you tested using containers for Lambda functions and did it have any implication on the performance and especially cold starts?
Mao: Remember I said Lambda functions are packaged as zip files, 50-meg zip, 250 uncompressed. There’s a secondary packaging mechanism called just containers. You can package your function as a Docker image, that allows you to get to 10 gig functions if you need to have a lot of dependencies. I don’t recommend defaulting to that because there are a lot of caveats once you go there. You lose a lot of features with Lambda.
For example, you can’t do Lambda layers. Behind the scenes, it’s a packaging format. It’s not an execution format. What AWS is doing is they’re taking that container and they’re extracting the contents of it, loading it into the Lambda environment and running that, just like your zip is run. You’re not really getting any of the benefits of a container and you’re going to end up with container vulnerabilities. I recommend just using it if you have a large use case where you can’t fit under 50 or 250 megabytes. Generally, I see that when you’re doing like large AI, ML models that can’t fit in the 50-meg package or you just have a lot of libraries that all get put together, so like if you’re talking to a relational database, Oracle, you might be talking to Snowflake, and just a ton of libraries you need that can’t fit. I recommend just stay with zip if you can. If you can’t, then look at containers.
Participant 4: Following up on the testing question from earlier. Lambda function tends to be almost like an analogy of a Unix tool, a small unit of work. It might talk to a Dynamo SNS, SQS. One of the challenges I’ve at least encountered is that it’s hard to mock all of that out. As far as I know, SAM doesn’t mock the whole AWS ecosystem. There are tools that can try to do that, like LocalStack. How do you do local development at Capital One given so many integrations with other services?
Mao: I get this question from our engineers all the time. SAM only mocks three services, I think. It’s Lambda itself. It’s the API Gateway, which is the REST endpoint. It can integrate with Step Functions local and DynamoDB local. Everything else, if you’re doing SQS or SNS, it cannot simulate locally. AWS is not interested in investing more effort in adding more simulation. LocalStack is an option. If you use LocalStack, you can stand up, basically, mocks of all of these services. What you’re going to have to do on the SAM side is configure the endpoints so they’re talking to local endpoints, all of it. What I usually recommend our team do is, SAM actually has an ability to generate payload events for almost every AWS service. You can do sam local generate, and then there’s a SQS and then the event type.
Then you can invoke your function using that payload that it generates. Then you can simulate what it would look like if you were to get a real event from one of those sources. That’s usually the best place to start. LocalStack is good as well. We actually just test integrating into development, so like your local SAM might talk to a development SQS resource. That’s really the best way to test.
Ellis: You’ve done a lot already. What’s on your to-do list? What’s the most interesting thing that you think you’re going to get to in the next year?
Mao: Right now, we’ve been focused on compute, like moving our compute away from EC2. I think the next is data. Our data platforms, we do a lot of ETL. I think everybody does a lot of ETL. We use a lot of EMR. We’d like to move away from that. EMR is one of the most expensive services that you can put into production at AWS. You pay for EC2, you pay for the EMR service, and then you pay for your own staff to manage that whole thing. We want to move to more managed services in general, so like Glue, and other things that don’t require management of EC2. I think data transformation or data modernization is definitely big.
See more presentations with transcripts