Hit The Snooze On Cache Keys And How It Boosts Web App Performance

Using an in-memory side cache to improve response times and alleviate the load on a database is one of the most common design patterns seen in web applications. This design scales well and has low setup and maintenance costs. A side cache can be added to an existing web application backend to speed up the loading of all types of data, from database records to larger chunks like complete JSON or HTML responses. By setting an expiration time, “lazy caching” (or “lazy loading”) can be implemented, meeting the user’s tolerance for stale data without the complexities of cache invalidation. Many excellent open-source solutions are available both on-premises and in the cloud, memcache and redis are considered the most popular. A caching scheme like this has been used in all the projects I have worked on.

Slow database reads occur when there is a cache miss, which is expected to be infrequent

As system load increases, challenges inevitably emerge. Imagine your web application processing millions of requests daily. During peak traffic hours, users experience significant delays and occasional errors. To mitigate these issues, you expand your caching infrastructure by adding additional instances, expecting the increased cache size to resolve the bottlenecks. However, the performance issues persist, and your application continues to struggle under the load. This scenario, which I encountered repeatedly in the projects I worked on, necessitates a careful re-evaluation of our approach to caching under high demand.

Large technology companies often address such challenges with sophisticated software and intricate architectural changes. However, these solutions are not always accessible or cost-effective for smaller teams or organizations. Techniques such as implementing middleware proxies for request batching or adopting customized communication protocols may not align with your system’s constraints. This article presents a practical method that my colleagues and I successfully employed to address similar challenges in large-scale projects.

Latency spikes at top percentiles

One of the frequent problems we encountered was high response times observed in top percentiles like p99, while mean and median times looked great. These high response times occurred due to the need to make database reads when cache keys were missing.

I created a simulation to represent a typical workload for a web application we used to have (see the Simulation section below for more details). This model application handles 100 requests per second for a single type of abstract key. The keys’ popularity follows a Pareto distribution, with two-thirds of user requests concentrated among the first 1,000 keys out of a total of 1 million keys.

Mean cache read time is about 50 times higher than database read time. Cache expiration time is 20 minutes.

The p99 response times for these top 1000 keys are extremely high (top left chart), while the mean and median response times are within an acceptable threshold. The long tail of less popular keys isn’t considered here because their hit ratio is significantly lower, thus making caching much less efficient for them beforehand.

The p99 response times for these top 1000 keys are extremely high (top left chart)

But are high values at 99th percentile for popular keys that bad? That means that just about 1% of key reads are slow. Though it depends on your application and its usage patterns, this is generally unfavourable. Think about the following real-world scenarios:

A single user’s request fetches hundreds of keys — a typical case for social and news feeds etc. Though only 1 out of 100 keys are slow but almost every user request is slow because of that;
Hundreds of users request the same key at the same moment — a common thing pattern for any popular site. When this key expires, all these requests go straight to the database, multiplying its load, slowing down the whole backend.

I want to emphasize that this happens not to “some requests” but to the most popular ones, which represent the core of your users’ demand.

Setting keys expiration time to infinity could solve the problem with peaking loading times. Unfortunately, this also denies “lazy caching” and demands cache invalidation mechanisms. Is it possible to eliminate the issue of key expiration while still providing data that is acceptably up to date?

Dynamically extending keys expiration time

The desire to serve fresh data while maintaining low response times and database load is inherently contradictory. The ideal solution would be to know in advance if a retrieved key from the cache is about to expire and update it if necessary, preferably in the background. Here is an example solution written in pseudo-python code:

cache_response = cache.get(key)
if cache_response is not None: 
  if random() < α:
    run async: # this block will run in background and not delay the response
      db_response = dabase.get(key)
      if db_response is not None:
          # note cas_token, see explanation for CAS below
          cache.set(key, db_response.value, cache_response.cas_token) 

  # returning value to the user
  return cache_response.value

We read a key from cache and, if successful, with small probability set its value again, also extending the expiration time. This involves reading the key from the database to check for updates.

To avoid race conditions, CAS or Check and Set mode must be used. CAS ensures that the cache update only happens if the data hasn’t been modified by another process since it was last read, maintaining data consistency.

The key set should be done asynchronously to avoid delaying user responses. Even if it isn’t possible, it may be still beneficial to update the key blocking the user.

Choosing the optimal probability α depends on your traffic pattern and is generally recommended to be between 1% and 10%. Higher values increase database load without significantly affecting popular keys.

The p99 response times with a 5% chance of expiration time extension are closer to the mean and median times

In systems where keys are not frequently updated, expiration times are longer, and request rates are high, an additional strategy can be implemented. Keys may be updated with probability α, but only if their remaining expiration time is below t.

This technique requires the ability to read the expiration time. Redis provides the ttl command for this purpose, while memcache does not. This can be addressed by storing the expiration time alongside the value, for example, packed in JSON.

Simulation

Modifying caching behavior of your system in production is risky, to say the least, so it’s safer to apply changes gradually. However, due to the unpredictable nature of the incoming traffic, distinguishing small improvements from noise requires complex data analysis. Development or staging environments proved to be not much better in my cases as they had lower traffic.

To tackle this, I used discrete simulation tools for ad-hoc simulations. This approach allowed me to safely test any system behavior I wanted to research much faster than running such experiments in a development environment.

While I was composing this article, I compiled numerous code snippets utilized during these experiments into a single tool and made it available through a github repository: https://github.com/whisk/cachestudy.

Conclusion

The expiration time extension technique I described in this article shows that improving web application performance doesn’t always require expensive tools or complex changes. By addressing common caching challenges developers can significantly boost efficiency and handle high traffic with simple, cost-effective solutions.

However, these improvements must be carefully tested in production environments to ensure they work as intended and don’t create new problems. Simulations and pre-production environments often lack the complexity and scale of real-world traffic, and only by testing under actual conditions can you ensure your optimizations provide tangible gains without introducing new issues.

Misc

Cover image generated by DALL·E.

Hit The Snooze on Cache Keys and How It Boosts Web App Performance | HackerNoon

Latency spikes at top percentiles

Dynamically extending keys expiration time

Simulation

Conclusion

Misc

Leave a Reply Cancel reply

Stay Connected

Latest News

iPhone 17 rumors and everything we know

US cracks down on Russian ransomware networks

The best President’s Day sales you can already shop

Gift These Best Electronic Appliances To Your Wife

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Latency spikes at top percentiles

Dynamically extending keys expiration time

Simulation

Conclusion

Misc

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News