Amazon recently introduced the preview of garbage collection in the AWS CDK. The new feature automatically deletes old assets in bootstrapped S3 buckets and ECR repositories, reducing maintenance and deployment costs.
The recent cdk gc
command performs garbage collection on unused assets stored in the resources of the bootstrap stack, allowing developers to view, manage, and delete assets that are no longer needed. Kaizen Conroy, software engineer at AWS, and Adam Keller, senior cloud architect at AWS, explain:
For CDK developers that leverage assets at scale, they may notice over time that the bootstrapped bucket or repository accumulated old or unused data. If users wanted to clean this data on their own, CDK didn’t provide a clear way of determining which data is safe to delete. (…) We expect CDK Garbage Collection to help AWS CDK customers save on storage costs associated with using the product while not affecting how customers use CDK.
Source: AWS GitHub account.
The AWS Cloud Development Kit (CDK) is an open source framework that provides higher-level abstractions and enables developers to define cloud infrastructure using TypeScript, JavaScript, Python, Java, C#/.NET, and Go. Developers define reusable cloud components known as constructs that can be composed together into stacks and apps. The garbage collection feature has been a long-standing request by the community, with Janne Sinivirta, principal DevOps consultant at Polar Squad, highlighting the issue as far back as 2019:
Each cdk build creates a new assets folder under cdk.out. If this includes node_modules, the total size of the cdk.out folder can add up pretty quickly (mine was over 10Gb)!
According to the documentation, the cdk gc command
is still in development and preview mode, and while the current features of this command are considered production-ready and safe to use, the scope of the command and its features might be subject to change. Developers are required to explicitly opt-in by providing the —unstable=gc
option. For example, while the current version of garbage collection is scoped to an individual account and region, there is a feature request to scope it instead to each stack.
CDK Garbage Collection exposes some parameters to help developers customize the experience, determining how aggressive the garbage collection should be. This is achieved using the —rollback-buffer-days
and —created-buffer-days
parameters, specifying respectively the days an asset has to be marked as isolated before it is eligible for deletion and the days the asset must live before it is eligible for deletion. Conroy and Keller clarify:
Rollback Buffer Days should be considered when you are not using
cdk deploy
and instead use a deployment method that operates on templates only, like a pipeline. If your pipeline can rollback without any involvement of the CDK CLI, this parameter will help ensure that assets are not prematurely deleted.
Adam Keller, senior cloud architect at AWS, summarizes on LinkedIn:
This was a pain point for a lot of folks as they were required to clean up resources on their own without any sort of intervention from the CDK. With the new garbage collection feature in the CDK toolkit, old, unused assets can be cleaned up with ease.
The CDK Garbage Collection is available starting in AWS CDK version 2.165.0.