Recall.ai recently shared their experience running a platform for building and managing meeting bots on AWS, where they uncovered that using WebSockets was adding an extra 1M USD per year in costs. The team describes how they developed an alternative high-bandwidth, low-latency inter-process communication (IPC) solution to resolve the issue.
Providing an API for meeting bots on platforms such as Zoom, Google Meet, and Microsoft Teams, Recall.ai relies on real-time video processing within its AWS deployment. Elliot Levin, engineering team lead at Recall.ai, writes:
IPC is something that is rarely top-of-mind when it comes to optimising cloud costs. But it turns out that if you IPC 1TB of video per second on AWS it can result in enormous bills when done inefficiently.
While profiling a sample of bots, the team expected the majority of CPU usage to come from video encoding and decoding. However, they discovered that the largest contributors by far were the Python WebSocket client, which was receiving the data, followed by Chromium’s WebSocket implementation, which was sending the data. Levin explains:
WebSocket seemed like a decent fit for our needs. It was “fast” as far as web APIs go, convenient to access from within the JS runtime, supported binary data, and most importantly was already built-in to Chromium.
Looking for a more cost-effective transport layer, the Recall.ai team considered three options: raw TCP/IP, Unix Domain Sockets, and Shared Memory. Although there was no standard interface for transporting data over shared memory, both TCP/IP and Unix Domain Sockets would require, at a minimum, copying data between user-space and kernel-space. The team ultimately decided to design a custom transport to reduce their AWS costs, opting for a ring buffer as the high-level transport structure.
Source: Recall.ai blog
On Hacker News, some developers questioned the technology stack and the choice of video codecs, and user IX-103 writes:
Chromium already has a zero-copy IPC mechanism that uses shared memory built-in. It’s called Mojo. That’s how the various browser processes talk to each other. They could just have passed mojo::BigBuffer messages to their custom.process and not had to worry about platform-specific code. But writing a custom ring buffer implementation is also nice, I suppose.
While it is a common approach to build real-time applications using WebSockets on AWS, Allen Helton, ecosystem engineer at Momento, recently warned:
You don’t want WebSockets, you want PubSub. I’ve been trying out AppSync Events recently, and I’m learning that even when abstracted at a super high level, WebSockets are still hard. I’ve worked with real-time communication for years and the only way I’ve seen that makes it easy is to abstract the protocol away completely.
Focusing instead on cost optimization, Corey Quinn, chief cloud economist at The Duckbill Group, comments:
Clickbait headline aside, How WebSockets cost us $1M on our AWS bill is a good example of how it doesn’t make sense to go delving into the deep weeds of your application architecture for cost or performance reasons–most of the time. Except in circumstances like this, it absolutely does.
According to Levin, implementing and deploying the ring buffer allowed Recall.ai to reduce the CPU usage of their bots by up to 50%, optimizing IPC for CPU efficiency. This change resulted in an annual AWS cost reduction of over 1M USD.