Transcript
George Mao: The title of this talk is how to build a planet scale, global architecture for modern apps. There’s a bunch of buzzwords in there, but we’ll walk through it. My name is George. I currently work at Google Cloud. I look after a group of specialist architects. I help customers onboard into the cloud and use all of our cool technologies. In the past, I was the tech leader at AWS for serverless computing, and I’ve had a chance to actually work with a lot of you out in the field. Then I spent some time at Capital One as a distinguished engineer. If you’re familiar with Capital One, they’re the first major bank that has gone all-in on any cloud provider, and they decided to go all serverless.
Outline
This talk will be fairly intermediate technical level. You don’t really need to have cloud experience. If you don’t have cloud experience, actually, this might be a very good topic for you. Then, I’m going to try to do a quick demo. I’ll show you one option for how to automate the deployment of some of these architectures. Then I will walk you through both AWS and Google Cloud technologies to make some of these architectures happen.
The Five Stages of Latte Art Maturity
I just want to start with something fun. Has anybody tried Latte Art before? I got really into Latte Art in 2022, when we were all at home 99% of the time. I just want to throw it out there that I think Latte Art might be harder than cloud architecture. There’s five stages of Latte Art maturity. I started in May of 2022, and really, stage 1, I call it nothing. It’s really hard. I’ve actually had a newfound appreciation for baristas, like the skill that they have in creating that amazing espresso for you. Really, pouring into that cup, I have no idea what I’m doing. Stage 2, you can get some shapes, and notice this is in November 2022, so it’s taken me six months to figure out how to pour any resemblance of a shape into a cup. Of course, if you’re a barista, you would pour hundreds of cups a day.
At home, you’re maybe pouring two cups a day. Stage 3, you can look, get a little bit of flower-like shapes. This is February 2023. About a year for me to get anywhere close to this. Stage 4, I think this might be the first time I would call it Latte Art. It looks like a flower or tulip, whatever you want to call it. Then, stage 5, this is basically today. I can do cool little flower designs. If I ever get bored of cloud architecture, I can probably go be a barista. I thought this was cool. We have five stages of maturity in Latte Art, which is similar to what we’re going to talk about today. Just for the record, I don’t think Latte Art is harder than cloud architecture. The maturity you have to go through to get to the end is similar.
Five Stages of Maturity when Designing a Globally Scalable Compute Architecture
I want to start with this. How would you build an architecture that serves your first 100 users? Just 100. Think about that. I’ve done this before. When I was at Amazon, we put a little server box underneath our desk to serve something like this. Even though we’re a cloud provider. Hundred users, you should be able to run any kind of application on any one of these laptops. You can probably run it on one of the most powerful smartphones today. It’s really easy to do. You probably wouldn’t want to do a globally scalable architecture that has global presence, massively scaled, because it’s going to cost you an insane amount of money that nobody would approve. Let’s think about our compute choices for hosting something like this. The traditional way, if you’re in the cloud, is to use VMs.
If you’re in AWS or Google, you might use EC2 or Google Compute Engine. Has anybody hosted a web app, any kind of app like this before using VM technology? I think it’s still the most popular way to do it today. Very common. Very tried and true. A lot of people today are transitioning into what I call the more modern way, which is containers. Containers, you might think of Kubernetes as the main container platform. If you’re in AWS, you might be using ECS, the Elastic Container Service. Or if you’re in Google, you’ll use Kubernetes Engine, Autopilot, potentially. My favorite way is actually to do it the cloud-native way: hosting your compute, fully serverless, cloud native. If you’re on AWS, my favorite way was Lambda.
Then if you’re on Google Cloud, a very equivalent option for Lambda is Google Cloud Run. These are fully serverless compute offerings that none of us have to manage the operating systems. We don’t have to patch, scale. The cloud provider handles all of that for us. Today we’re going to talk about building this system using the fully serverless option because we want to be cloud native.
Stage 1 Maturity – Basic Architecture
Stage 1, if we’re going to build this architecture, we’re going to start with our basic architecture, our stage 1 maturity. All you need is some kind of HTTP service that your compute serves up. Pretty much every architecture you deploy will look just like that diagram on the bottom. Has anybody used Cloud Run before? How about AWS Lambda? Lambda was really the pioneer in serverless computing. When we launched Lambda, when I was at AWS, I had a chance to launch it from basically zero to where it is today. Cloud Run is a Google offering, and it is an enterprise-grade serverless platform. You can just deploy code to it. We handle the scaling of it just like AWS Lambda. There’s no patching, no operating systems. You don’t remote desktop into it. When traffic arrives at the compute, we scale and operate and manage everything for you.
As a developer, I love that. The only thing I have to do is write code and deploy to the service. I don’t have to choose instance types. I don’t have to add persistence. There’s no operating system patching levels and vulnerabilities to manage. It’s cool, because you can deploy nearly any code language that you’re familiar with. Any one of the popular common languages: Python, Java, Node, Go, C#. Almost anything. This is the architecture that most people will start with. Cloud Run will be deployed to a single region. If you’re using cloud computing, you know that the main scaling architecture is regions. There will be multiple regions across multiple different areas to service the compute for the customers in that area. Cloud Run will give you a publicly addressable HTTP endpoint right out of the box. It will be something Google gives you. Basically, it will be a deterministic URL. One of those formats down at the bottom. Then it will be publicly addressable. All of your customers can just go straight to this URL. Cloud Run will service it.
The benefits of this architecture, it’s serverless. Easy to deploy. You can manage this on your own. It’s simple. It autoscales. Autoscaling is one of the most difficult things that we have to do when you’re running on VMs. Utilization of your cluster is a very difficult task. Do you overcapacitize your cluster and spend more money, or do you underserve it and then scale it up as your traffic comes? There’s a very small sweet spot that most customers will have to end up finding. This is a very simple architecture.
If you’re in AWS, it’s remarkably similar. It’s the same architecture. You put Lambda on the backend. It will autoscale on the backend on your behalf as traffic arrives at the HTTPS endpoint. Lambda calls it lambda-urls. It’s the same thing. They give you a publicly addressable URL. You just use the cloud-native IAM, or Identity Access Management, to control who can access the endpoint. Then you also get a deterministic URL. It’s just whatever, GUID, .lambda-url and then .aws.
Think just a minute about, what are the issues with this architecture? If you were to deploy something like this for your customers, even though you’re only serving 100 people, there are a bunch of issues that you will face as soon as you go live. Let’s think about that. This architecture lacks security. We didn’t put any kind of security on the front. We just rely on the Identity Access Management natively to Google or AWS. You can’t manage traffic. There’s no throttling. There’s no way for you to protect yourself against DDoS attacks, or bots, or anything like that.
On top of that, these architectures run in a single region. Both of these services are regional services. That means if you need to operate in multiple regions, if you have customers in Asia and Europe and other locations in the United States, this is only going to operate in a single spot, which means you can’t do disaster recovery properly. When I worked for Capital One, one of our biggest challenges was, for the most resilient applications like capitalone.com, we could not be down. If you were down, there would be penalties from the regulators. We had to operate in multiple regions for some of our applications. This architecture would never work. Also, finally, content delivery is expensive. Anybody has used CDNs? CDNs help us with content delivery. Using compute to deliver static assets is probably the worst way to go, in terms of cost efficiency.
Stage 2 Maturity – Secured Architecture
Let’s tackle security first. Security is likely to be the most important topic at almost any enterprise. How do you improve security and add traffic management at the same time to this architecture? You might have heard the term, if you’re in the security world, defense in depth. Really, that just means adding security at every checkpoint, in every layer in your architecture. If you look at this, stage 2 of maturity, when I usually talk to customers, is about making sure your architectures are secure and ready to be protected against all kinds of vulnerabilities and attacks. You want to do this before you scale out. Because once you scale out, your architecture is going to be much more difficult to protect. In stage 2, you could go forward with an architecture like this. Remember our compute, which was Google Cloud Run, or in the AWS world, Lambda. Those are public services. What I would do is turn off public access. They no longer have any publicly addressable HTTP endpoint.
In the Google world, we’re going to front it with a global load balancer. A global load balancer will be able to accept traffic from all sources across the world. The global load balancer presents a single Anycast IP. Anycast IP is going to be the same IP across every single region in every part of the world. When you look up the DNS for that global load balancer, you get a single IP that’s routable from any part of the world. On top of that, most cloud providers will have the ability to enable layer 7 protection, so a WAF, Web Application Firewall. I think one of the talks talked about debugging a web application firewall. In Google, we call it Cloud Armor. Cloud Armor allows you to automatically enable 50-plus different OWASP web vulnerabilities. If you’re familiar with OWASP, they define things like SQL injection, cross-site scripting, and all of the major vulnerabilities that exist in web apps. You can customize these rules. Then it also allows you to deploy DNS or DDoS protection. You can rate limit. Make sure nobody is attacking your applications.
Then, finally, who’s used reCAPTCHA before? It’s that little checkmark you have to click, when you go to a website, when you hit submit. ReCAPTCHA protects us against bots. There’s a new version, version 3. Version 3, you don’t have to check every box that has a car in it. It’s super annoying. Sometimes you don’t know if there’s a box that has a car. Version 3 is totally transparent. None of that stuff needs to happen anymore. If you’re using reCAPTCHA to protect your websites, think about upgrading from version 2 to version 3. The benefits here of this architecture is you can secure it, protects you against traffic, DDoS attacks, network attacks, defends against bots. There is zero ability for anyone to log in, SSH, RDP, VDI, anything into any piece of this architecture. Nobody can log in. There’s no vulnerability from that perspective.
Then, finally, we’re going to use DNS to prove authority that you own the service. You’re going to get a real DNS cert, and you’re no longer going to rely on a cloud provider to provide their generic cert. In AWS, again, I think this is very similar. Has anybody done this before with AWS? If you’re running a serverless application, it is very common to front that with Amazon API Gateway and then enable WAF. WAF is their version of the web application firewall. Same thing. You can deploy OWASP rules, custom rules, DDoS protection, rate limiting. The point I’m trying to make is really most cloud architectures are going to be very similar between cloud providers. I don’t have any experience with Microsoft, but in Amazon and GCP, so far what I’ve seen is most of the major components that you link together are going to be similar.
Again, let’s think about the issues that exist with this architecture. Let’s say you’ve evolved your architecture to stage 2, and you’re either in AWS or GCP, and you’ve deployed a regional service. You are protected. It is secure, but there’s still multiple issues that we want to address before we would want to go wider. It’s still single region. These services are regional. If there is a regional problem in any one of these cloud providers that does affect any one of these services, you might have a problem. I don’t know if you all follow AWS, but I think maybe 3 or 4 years ago, there was a DynamoDB outage that probably took out 50% of the services I use at home. My home’s smart lock stopped working. My kid goes to KinderCare. I think the KinderCare app just stopped working. I could no longer get updates from our daycare.
The goal here is to be able to load balance out of a broken region or serve in active-active in multiple regions. That way you have multi-region reliability and availability. In a world like Capital One, where I used to work, that was not optional. We had to have that as part of our architecture. The content delivery problem still exists, so let’s address the availability and reliability problem first. How do we improve both of those at the same time? We want to serve users from all over the world. The hint here is that instead of deploying regional architectures, we’re going to expand globally to improve our resilience, availability, and add disaster recovery.
Stage 3 Maturity – Global HA Architecture
Stage 3, and when I talk to customers, we think about enabling a global, highly available architecture. A global, highly available architecture in the Google world might look like this. Remember, in stage 2, we added a global load balancer in front of a single regional Cloud Run deployment. Now, we’re going to put multiple Cloud Run deployments, one in every region that we’re trying to serve, behind this global load balancer. We still have Cloud DNS applied at the top. You might have myapp.yourcompany.com, which will route to a single Anycast IP address that is going to be served up by a global load balancer.
Then, behind the scenes, the global load balancer will route the incoming traffic, based on latency or any other configuration that you choose, to the correct regional service to service the users that are coming to your application. The beauty of this architecture is that it is fully autoscaled on the backend. The services on the Cloud Run will scale as wide as you need to. If you have no users overnight, for example, right now, this example is deployed into us-central1, us-east, and whatever region.
Now, if you’re a global company, your peak traffic follows the sun, likely. That sun moves, and it’s different times of day, depending on where you are. The backends can scale up and down automatically in response to business hours as your users arrive and leave. We’re going to attach all of the regional deployments to an external global load balancer, and then configure the routing policy based on whatever you need. You can disable broken regions. You can enable new regions. This all happens pretty much with configuration and deployment options.
This architecture is pretty cool. It’s enabled globally. You can have automatic failover or manual failover. It’s globally highly available. You can have disaster recovery, however you want to choose your options. It’s 100% fully autoscaled. You can control how wide you want it to autoscale, which I definitely recommend. Let’s look at AWS. It’s largely the same. AWS has Amazon Route 53, which is their DNS service. Route 53 can be configured to point at multiple regional Amazon API Gateway deployments. Amazon API Gateway is a regional service, just like their Lambda service, regional. These are all hooked up in their lanes in the correct regions. You can add as many or as few as you want. Route 53 will route that traffic based on geo-latency, round robin, whatever configuration that you choose.
Let’s think about the issues with this iteration of our architecture. Is anybody familiar with serverless pricing, how all the cloud vendors charge us for serverless compute? In the VM world, what was a unit of pricing? It was uptime. You could deploy 100 instances, and you had zero traffic against them. You pay the same cost as 100 instances fully spiked at 100% utilization.
In the serverless world, every cloud provider has transitioned away from that into what they call pay-for-use. It’s actually the exact same formula for Google and for AWS. It’s called gigabyte seconds. It’s how long the compute is running for and how much resources it consumes. In the serverless world, generally, you’re going to find the formula is CPU allocation to your Cloud Run or Lambda service, plus the memory configuration. Maybe you have 4 CPUs and 16 gigs of memory. Multiply by how long it runs for, each request. Every request that arrives, it might run for 100 milliseconds, multiplied by the resource allocation, not consumption. That’s important. It doesn’t matter if your Lambda function has 8 gigs of memory and only uses 1, Amazon will charge you for that 8 gigs. That will arrive at a cost. The formula is measuring gigabyte seconds. It’s very complex. I would recommend using the pricing calculators on both cloud vendors to understand what the actual cost will be.
If you’re using compute like this to deliver a static asset, for example, an image, a text file, whatever it is, an MP3 file, versus actually doing real computational effort, it’s actually the same cost. It’s just how long you’re running for. It doesn’t matter what you’re doing on there. It doesn’t matter what percentage of CPU utilization or memory it is. We’re using compute resources to deliver static assets. That’s really not going to scale as soon as we go wide. It will be very cheap when you’re only serving 100 users. As we start going to 1,000, 10,000, a million, your finance team is going to be unhappy. We’re not using edge caching, which is the terminology I think most CDN providers use.
Stage 4 Maturity – Optimized, Global Content Delivery
Our next job is, how do we optimize our data transfer costs so we can deliver content efficiently across the world? Faster and more cost-effective. The solution really is just to use the right technology to serve the right resources. When I talk to customers, our fourth stage of maturity is usually being able to deliver content optimized globally across the world. In this world, we take our architecture. We add a CDN provider on top. If you’re in the Google world, it would be Cloud CDN. Then we would put all of the static assets into a bucket. A Google bucket is the same thing as an S3 bucket. Exact same conceptual service. You put all of your static assets in that bucket. It gets cached by the CDN service, and then served to your users close to where they are. If I’m a user here in Boston, I log into the service. It’s going to be served from a local data center around me. If I’m in the West Coast, I’ll be served from the West Coast instead. This shifts all of the pressure, the costs away from that backend Cloud Run service.
If I have a simple webpage that needs to load, instead of having Cloud Run spin up, execute some compute, I just have regional caches on the CDN doing that. We’re going to offload all of those static assets, all of those requests to the CDN. If you think about an event, like the Super Bowl. The Super Bowl is really the worst type of event for a cloud provider, because it’s a short event that happens. Typically, every commercial break is when traffic spikes to all of the web applications that the commercials are driving them to. Then, after the event’s over, it’s done. Never used again. That’s why you want to have a scalable architecture that drives through Cloud CDN instead of hitting your compute services on the backend. On the AWS world, Amazon CloudFront is the same technology for CDN. You would put your static assets into an S3 bucket. Then that would be cached and served from CloudFront instead of through AWS Lambda.
Here’s an exercise we’ll run through. You can hit this with a QR code. This will bring you to a pricing calculator for Google. Amazon has the same thing. Let’s say we were to deliver 10 million requests per month to the service. If you were using Cloud Run over here on the left side, this side, using the smallest possible configuration for Cloud Run, which is 128 megs of memory and 0.83 virtual CPUs with less than 100-millisecond average response time. Remember, we said the formula was resource configuration multiplied by duration of execution. This is the smallest possible average execution we can have for a serverless compute. Data transfer, let’s say it’s 250K per request, very small. 2.5 gigs of total data transfer per month. It costs you $50 a month.
If we offload all of these static requests to a Cloud CDN provider, and we have 10 million cache lookups. A cache lookup is when the Cloud CDN service needs to go to the backend bucket to fetch the actual content and then cache it. Ten million lookups, the same amount of data transfer, 2.5 gigs, this will cost us about $8 a month, $7.70. It’s multiple orders of magnitude cheaper. Imagine if you were to go wider, and this is $1,000 a month on Cloud Run. What would that cost be on CDN? How much cheaper would it be? Make sure you’re running these cost exercises. This link, https://bit.ly/qconboston25-content-costs, is a Google calculator. Amazon has the exact same cost calculator you can run.
We have to solve one problem, which is now that we’re global, we haven’t really tackled how to persist data consistently and strongly globally across the world. We have to choose a database provider that provides multi-region strong consistency. We’ll talk about what strong consistency is. Option one is to use a relational database. Relational databases are amazing at their jobs. They’ve been around probably the longest. Some examples are over here on the left, Postgres, Oracle, SQL Server. The cost model for relational databases is also uptime in general. You deploy an instance of your database. Maybe you have read replicas, write replicas. Then you pay for uptime. You choose the instance type that you want based on memory and CPU for your workload.
Then the resource model is typically capacity planning. You generally need to know how much storage your application is going to need. That’s how you choose, do you need a terabyte, 2 terabytes, how much storage on the backend. Deployments are always regional. I’ve not seen a global relational database except for a couple databases which have very unique use cases, like Google Spanner. I think I saw Titanium DB out there as well. The way to go global with a relational database is read replicas. You would replicate your data across multiple regions. There’s going to be lag. There’s going to be other considerations to worry about. The structure is always just joins. You have multiple tables. You have to join together tables to build your data. Option two is going to be a NoSQL database, so MongoDB, Cassandra. DynamoDB and Firestore are the two from AWS and Google. The cost model is usually pay for use. In DynamoDB, you pay for throughput or provisioned throughput.
In Firestore, you pay for similar. Resource model is you typically will plan out your queries before you design your database in a NoSQL database. Because most of your storage is schema-less. You’re storing documents or key values. Deployments can be multi-regional. That’s really the most powerful part. Which option do you think we would go with? I think we would go with option two, if you’re trying to build an architecture like this, which is a NoSQL database.
Today, we’re going to focus on just the two from the two cloud providers, DynamoDB and Firestore. Obviously, if you’re on-prem or if you have other requirements, make sure you evaluate the right database for your use case.
On the left there, DynamoDB is eventually consistent. What does eventually consistent mean? Eventually consistent means Amazon DynamoDB, when you write to Dynamo, it is a distributed datastore. It’s going to persist your data across three availability zones, so three physically separate data centers in one region. If you’re writing really fast, which happens a lot, think about the mobile games that you play or all of the statistics that happen behind the scenes for any social media resource. If you write really fast and you read immediately after, there’s a period of time where the writes have not persisted across all three availability zones, and you’re going to get a stale read. You’re going to get a stale out-of-date piece of data. You can do a strongly consistent read, but that’s going to cost you time. You cannot do a strongly consistent write into Dynamo.
The database service will handle the write for you, but you have to be aware that reading is eventually consistent. When I was at Capital One, think about if I were to have this service for my bank account, and I have a bunch of writes into the bank account and a bunch of reads, the order is very important. Strongly consistent data is important. Dynamo does have multi-regional replication. You can serve it across multiple regions. You can write to any region, but you have to be aware of the eventually consistent design pattern. In Google, Firestore, similar. It has strongly consistent multi-region replication. This problem is solved for you. You can do this on Google Cloud without the worries of DynamoDB. We’re going to go with Firestore for our solution, just because it’s easy.
Stage 5 Maturity – “The Reference Architecture”, Global Persistence
Our last evolution of our architecture is what I call global persistence. This will allow you to add the ability to store, process data on the backend. This is our reference architecture. At the very top, we see our globally available distributed architecture, fully serverless. On the backend, Cloud Run will write to Firestore. Firestore is a fully NoSQL database. You don’t pay for uptime. Then on the bottom there, we’re distributing and serving our static assets out through a Cloud CDN provider. Then you would add reCAPTCHA version 3, if you can, on the frontend to protect against bots. React is a very common web frontend that we can use to build and serve our app.
Let’s look at this architecture. It’s 100% serverless. That means you are not managing a single application or a single operating system, a single server, anywhere. No service to patch. Nothing to optimize. Nothing to have any operating system vulnerabilities. You pay for actual use of these services. You don’t pay for uptime. You don’t have to worry about, ok, at the middle of the night, there’s 1% of traffic. I need to scale it down. No idle and uptime charges. I have an asterisk there. There is a single service that does have idle uptime charges, which is the load balancer.
We cannot turn off the load balancer. It needs to be on all the time. It’s secure. We talked about this before. There is native security at every single layer in the application. There is zero surface area to attack. You cannot log in, SSH. You cannot do anything on any one of these cloud-native services. It’s built on top of either Google or AWS’s global network, which means you’re going to have really reliable network connections that are faster than just public internet. It’s self-healing. If one of your Cloud Run instances has a problem, it goes down, Google brings up the next one. You don’t have to worry about that. The heart is probably the most important one. It’s fully optimized, delivering content close to your users, and it scales with traffic.
Building this Architecture – Tech Details
We walked through this, five stages of architecture evolution. Number one, basic architecture. Number two, enterprise-grade security. Number three, global reliability. Number four, optimization, both for cost and performance. Then, finally, a globally consistent persistence layer. I want to show you a little bit about how easy it is to build something like this. If you haven’t done it before, you might think it’s difficult. I actually think it’s fairly simple. Think about this to yourself. How many staff do you think it would take for you to build and deploy an architecture that was a couple slides ago? One staff, 2, 3, 4, 10, 100? In the cloud world, I think you could do this with one, just a single person. You could do it all by yourself. I’m not saying you should. You probably will not operate it with one. You can deploy it and do it with one person. If you’re on on-prem, this is almost impossible. We couldn’t do this before.
Ten years ago, to build a system like this, there’s no possible way you could do this by yourself. You couldn’t even do it with one physical data center. Who’s used React before, or anything similar, like Vue.js, Angular? You know how this works. It is typically JavaScript, TypeScript, that you will write, compile, and deploy server-side or frontend side. If you’re doing this in the cloud world, all these compiled assets end up in your cloud storage bucket. Your frontend lives in the cloud storage bucket. It eventually gets cached and served from your CDN provider. These are some commands. npm run build is really the command for React to build and compile.
Then, there’s an interesting step here. Once you publish all of your assets to your cloud storage bucket, you have to invalidate the cache. Remember, a CDN provider operates fast because it is caching the data that’s in your origins, which is a cloud bucket in this case. It’s not reaching out to your origin when it doesn’t have to. When you publish new, updated content to it, a lot of people miss this step, which is, you must do a cache invalidation. I think Akamai calls it cache busting. It’s a simple command. You can do it in the console. You can do it using the CLI. Same exact process you would do if you’re on AWS. What’s really cool is, in the serverless world, even your CI/CD is serverless. Remember, our production application has zero servers, not a single one, so does our CI/CD.
From source code, in my example, I’m using Node.js 22, but I’m having Cloud Build execute the build, which would produce a Docker image. That Docker image will live in Artifact Registry, which is very similar to Docker Hub. It’s just a private registry for your private organization. It gets pushed to Cloud Run. Then Google serves it up. In this world, even your CI/CD operates entirely without you having to manage, scale, and patch any kind of CI/CD pipeline.
Demo
Let’s do a demo. I will show you just a little bit of the process. This is Visual Studio Code. I think Visual Studio Code has somehow become the number one choice for code editing. I have my React application right here. It’s just my-next-react application. The source directory is where all of the app lives. I have a page, a layout. None of this really matters for this presentation. I’m just showing you boilerplate React code.
We’re going to run that same command that we saw earlier, which is npm run build. This is compiling the source code that you wrote using TypeScript or JavaScript. It’s going to output a bunch of stuff into the out directory. I have an out directory right here. This is the compiled version of my application. I can take this and serve it anywhere. I can put this on a nginx server, I can put it on a VM, and it will host this service. In the Google Cloud console, we’ll take a look at a couple things. This is the Cloud CDN console. The Cloud CDN console, I already have an origin defined. This origin, I just call it my name-web-app. Origins can be anything. Origins can be VMs, can be buckets, can be load balancers, can be HTTP endpoints. It’s really where to fetch content from. Click into the origin. I can see there’s a tab for cache invalidation. This is what I was talking about. When you have uploaded new content to your origins, you have to run a cache invalidation.
Terminology is a little bit different. Cache bust, cache invalidation, it’s all the same. In the Google world, you would just pick the load balancer. Then you would pick the path that you want to invalidate, so like /* would just invalidate everything at this load balancer. On the backend, I have a Google Storage bucket, which is the same as an S3 bucket. The bucket is called gmao-web-app. It hosts all of the content that has been pre-compiled from a previous deployment that I had.
If you notice here, I have a 404 index file. I have a start.html, which were all the same exact files that were outputted by this compilation process. What I would do then is either using a script or CLI command, upload that entire set of content right out to your bucket, overriding everything, and then come back and then invalidate the cache. Invalidation is generally a low single-digit minute process, many times faster. You do have to wait until the invalidation is finished, then you’ll be able to see the new updated content. I believe that QR code will take you to the actual application that’s running. If you go and check it out yourself, this is being served across multiple regions delivered by Cloud CDN. All the static assets are being served directly close to the user.
Deployment Automation – Infrastructure as Code
I do want to leave you with one thing here is, in the enterprise world, we wouldn’t do what we just did, which is manually copying and pasting. You wouldn’t want to do that. You would be automated. You would probably have a pull request that kicks off the process if you’re using GitHub. Then it will end up triggering a CI/CD process that no human is actively copying, pasting stuff. That creates error. Every cloud provider has their own IaC tool, Infrastructure as Code. Google uses Terraform. AWS uses their own first-party CloudFormation service. I actually really like Pulumi. Pulumi is a third-party service. It works with multiple cloud providers. I like it because if you have to deal with multiple cloud providers, it will work across multiple ones. I really like it because you write code just like you would write your normal code, instead of writing this weird proprietary syntax. Anybody written CloudFormation YAML files?
Those are multi-page, 1,000-line documentations that if you have to change one thing, it could take you a day to find, if you have a little bug. Pulumi is like the Amazon CDK service. I’ll walk you through exactly how we would do what we just did on the console. Remember, we’re just writing straight code that you all know how to write. You can do it in multiple programming languages. This is Node. Top line there, we’re going to create a storage bucket. It’s just creating a bucket or giving it a name. Buckets have to have a regional location.
Then, at the second set of code at the bottom there, we are pushing the source code for my Cloud Run function. Source code gets pushed to the bucket. The bucket just has source code now for my function. Then I execute deployment of that function. It’s really just creating a new function using the Pulumi SDK. They call it cloudfunctions version 2, and pushing this out. This will trigger a full CI/CD pipeline using Cloud Build, Artifact Registry, all the way down, and then push it out to your actual service. There’s a bunch of options here that are being set, like memory at 256 in this case, the runtime node for this one, and where the source is stored at the name of the bucket that we just created.
Once that’s all done, it will push that resource out. You would repeat this in any region that you want to deploy it in. If you want 10 regions, you would do this 10 times. Then, here’s how you would create a Firestore database. Simple, exactly the same. You just use a different API call of gcp.firestore.database. I’m calling it pulumi-people database, giving it a location. In this case, the location is just all of North America. This database is multi-region, across all of North America.
The second set of code, I’m just publishing some sample documents to preload the database. I’m creating a new document. I’m giving a bunch of names and then adding the fields through JSON. Then this just loads my database with test data. You can see, this is super easy. You just do pulumi up as a CI command, and it will push everything. It’ll first give you a stack output of what’s going to happen. It’s saying it’s creating all of these resources. If these resources already existed, or if you’re adding a few resources, it’ll say, update, or it might say destroy. Some resources need to be destroyed and recreated instead of updated. At the end, you just say yes. If you’re doing this through a CI/CD pipeline, none of this is visible. It would just be in log files.
Resources
I have a blog in the yellow QR code, https://bit.ly/qconboston25-planet-scale-arch, that will walk you through a full creation of this on your own. Also, in the blue QR code, https://bit.ly/qconboston25-code-global, is a GitHub repo where you can just pull sample code that you can use to help you understand how to build something like this.
See more presentations with transcripts
