AWS has announced native support for Apache Avro and Protocol Buffers (Protobuf) formatted events in AWS Lambda when leveraging Apache Kafka event source mapping (ESM) in Provisioned Mode. According to the company, this enhancement significantly simplifies the processing of efficient binary Kafka events by integrating directly with popular schema registries, including AWS Glue Schema Registry (GSR), Confluent Cloud Schema Registry (CCSR), and self-managed Confluent Schema Registry (SCSR).
Previously, organizations using Avro and Protobuf for their Kafka data, valued for their compact message sizes, fast serialization and deserialization, and robust schema evolution, had to write custom code within their Lambda functions to validate, deserialize, and filter these events. With this new capability, Lambda’s ESM now natively handles these complexities, moving the schema registry integration logic from the application layer to the managed service. Rajesh Pandey, a Principal Engineer at AWS Lambda, emphasized this simplification in a LinkedIn post:
No more wiring up complex deserializers or juggling schema resolution inside your function. Just configure it, and Lambda takes care of the rest – schema fetching, validation, decoding – before your code even runs.
The built-in integration means that incoming JSON Schema, Avro, and Protobuf records are automatically validated against their registered schemas, which allows developers to consume and filter these more efficient binary formats while centralizing and consistently sharing data schemas. In addition, developers can now build their functions using Kafka’s open-source ConsumerRecords interface, and, with the help of Powertools for AWS Lambda, directly access Avro or Protobuf-generated business objects without writing custom deserialization code.
Lambda functions can also receive clean, validated JSON data regardless of the original serialization format, further streamlining development. Event filtering rules can also be set up upstream to discard irrelevant events before function invocations, optimizing compute costs. Yan Cui, an AWS Serverless Hero, further highlighted this benefit on LinkedIn, stating:
But the big thing is that it allows you to filter events at the ESM level (instead of inside your code), so it should lead to some cost savings from fewer unnecessary Lambda invocations.
Configuration requires enabling Provisioned Mode for Kafka ESM and specifying schema registry settings (endpoint, authentication, and validation fields) via the AWS Management Console, AWS CLI, SDKs, or Infrastructure as Code (IaC) tools, such as AWS Serverless Application Management (ASM) or AWS Cloud Development Kit (CDK). As Julian Wood and Nihar Sheth wrote in an AWS Compute blog post:
This new capability works with both Amazon Managed Streaming for Apache Kafka (Amazon MSK), Confluent Cloud, and self-managed Kafka clusters. To get started, update your existing Kafka ESM to Provisioned Mode and add schema registry configuration, or create a new ESM in Provisioned Mode with schema registry integration enabled.
(Source: AWS Compute blog post)
The ESM handles schema evolution automatically by detecting updated schema IDs and fetching the latest definitions. For error handling, events that fail validation or deserialization can be routed to configured failure destinations, such as Amazon SQS, SNS, or S3, for debugging purposes.
The Apache Avro and Protocol Buffers (Protobuf) formatted events in AWS Lambda are generally available in all AWS Commercial Regions where AWS Lambda Kafka ESM is available, except Israel (Tel Aviv), Asia Pacific (Malaysia), and Canada West (Calgary).