Let’s start. I had this article in my head for 2 years and never wrote it, worrying that it would not be perfect. The technology landscape is ever-changing and evolving, and the tools you use today may be obsolete 5 years later. This is why I’m writing down the core concepts of system design here. We can always fine-tune it based on our infrastructure and requirements. If I come across another framework, I shall write it as a sequel to this correspondence.
System design is an essential part of software development. You must have a 30,000 ft view before you dig into specifics. General purpose programming languages can be used everywhere, but you need to first know where to use it and how to use it. This is where system design comes into picture.
The following are the important steps I came across during my research(a few topics will not be explained in detail since they deserve their own article):
- UX Design & Components
Since you will most likely not be developing CLI, having good UX-UI is important. You need to know the important components and then improve upon them as you go forward. Just the important ones are sufficient. For example:
- A Twitter clone will have a component where tweets are drafted. It will have user profiles, followers, privacy settings, etc.
- An Uber clone will have maps integration.
- Chat messengers like WhatsApp will have person-to-person messages and group messages, forwarding and sharing options, etc.
Try to categorize the apps you are designing and observe their design. All social media apps will fall under one umbrella, all video streaming platforms/OTT in another, and all movie booking/reservation apps will fall under another.
╔═════════════════════════════════════╗
║ TWITTER CLONE UI COMPONENTS ║
╠═════════════════════════════════════╣
│ APPLICATION UI │
├─────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ │
│ │ Profile │ │ Tweet │ │
│ │Component │ │Component │ │
│ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Settings │ │ Messages │ │
│ │Component │ │Component │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────┘
- Classes/Subclasses
If you are thinking of an object-oriented design, it is easier if you list out all the various classes you may require in your system design. This helps chalk out all the objects, and then you can proceed to define methods pertaining to these classes. For example, a Twitter clone will have a class “User Profiles” with attributes such as “Name,” “Location,” etc.
╔═════════════════════════════════════╗
║ OBJECT-ORIENTED CLASS DESIGN ║
╚═════════════════════════════════════╝
┌─────────────────┐
│ User │
├─────────────────┤
│ - name │
│ - location │
│ - email │
├─────────────────┤
│ + post() │
│ + follow() │
└────────┬────────┘
│
┌────────┴────────┐
│ │
┌───▼────┐ ┌─────▼───┐
│Premium │ │ Basic │
│ User │ │ User │
└────────┘ └─────────┘
- Design Patterns
Once you have the class design, you can then select an adequate design pattern that serves your needs. For example, if your application has a news feed, you can consider the Publisher-Subscriber design pattern.
╔═════════════════════════════════════╗
║ PUBLISHER-SUBSCRIBER PATTERN ║
╚═════════════════════════════════════╝
┌──────────────┐
│ Publisher │
│ (News Feed) │
└──────┬───────┘
│
│ publish()
│
┌───▼────────────────┐
│ Message Broker │
└───┬────────────────┘
│
│ notify()
│
┌───┴───┬───────┬────────┐
│ │ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌───▼─┐
│Sub1 │ │Sub2 │ │Sub3 │ │Sub4 │
└─────┘ └─────┘ └─────┘ └─────┘
- CDN/Serverless/API/Cache
-
CDN is used to host static HTML content with lower latency. You can incorporate it in your infrastructure to reduce response time.
-
You can reduce costs by using serverless architecture for simple grunt work, automation, or cron jobs.
-
APIs help a lot in connecting disconnected systems and enabling segregation of concerns.
-
Cache helps store temporary data, but the catch is that cache memory is the most expensive type of memory. So, you must have an optimized cache strategy.
╔═════════════════════════════════════╗
║ CDN/SERVERLESS/API/CACHE STACK ║
╚═════════════════════════════════════╝
┌─────────┐
│ Client │
└────┬────┘
│
▼
┌─────────────┐ ┌──────────┐
│ CDN │◄───│ Static │
│ (Static) │ │ Content │
└────┬────────┘ └──────────┘
│
▼
┌─────────────┐ ┌──────────┐
│ Cache │◄───│ Hot │
│ │ │ Data │
└────┬────────┘ └──────────┘
│
▼
┌─────────────┐ ┌──────────┐
│ API │◄───│Serverless│
│ Gateway │ │Functions │
└────┬────────┘ └──────────┘
│
▼
┌─────────────┐
│ Backend │
│ Server │
└─────────────┘
- Eventual Consistency vs. Transactional Consistency
-
In eventual consistency, you can expect some delay in the transaction being considered “written.” This is used in applications where a little delay may not affect the overall application. For example, a WhatsApp message may be delivered with a delay of 2-3 seconds, and that should not impact the overall user experience. You can use a NoSQL database to achieve this.
-
In transactional consistency, you want the transaction to be committed instantly. For example, in a money transfer application, you want the money transferred immediately to reduce the risks of double spending. You can use a relational database to achieve this.
╔═════════════════════════════════════════════════════════════╗
║ EVENTUAL vs TRANSACTIONAL CONSISTENCY ║
╚═════════════════════════════════════════════════════════════╝
EVENTUAL CONSISTENCY TRANSACTIONAL CONSISTENCY
┌──────────────────┐ ┌──────────────────┐
│ Write Request │ │ Write Request │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ NoSQL │ │ SQL │
│ DB │ │ DB │
└────┬────┘ └────┬────┘
│ │
┌────▼────┐ ┌────▼────┐
│ Delay │ │Immediate│
│ 2-3 sec │ │ Commit │
└────┬────┘ └────┬────┘
│ │
▼ ▼
[Eventually [Immediately
Consistent] Consistent]
- Event Handling
Event handling is essential when you have different components trying to talk to each other. From the initiation of the request to the fulfillment of the response, you must clearly know the routes data packets will take.
╔═════════════════════════════════════╗
║ EVENT HANDLING FLOW ║
╚═════════════════════════════════════╝
┌──────────┐ Event ┌──────────┐
│Component │─────────────►│ Event │
│ A │ Fired │ Queue │
└──────────┘ └─────┬────┘
│
┌─────▼─────┐
│ Handler │
└─────┬─────┘
│
┌───────────┼───────────┐
│ │ │
┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│Comp B │ │Comp C │ │Comp D │
└───────┘ └───────┘ └───────┘
- SQL/NoSQL/Polyglot Persistence
SQL databases are used to store structured data. In distributed SQL systems, during network partitions, you typically prioritize consistency over availability.
NoSQL databases can be used to store semi-structured data, key-value pairs, images, and videos. Graph databases and search engine databases also come under NoSQL databases. In distributed NoSQL systems, you typically prioritize availability and partition tolerance, accepting eventual consistency.
According to the CAP theorem, in a distributed system experiencing network partitions, you must choose between consistency and availability. Partition tolerance is essential for distributed systems.
If you require a combination of SQL and NoSQL databases, we call it polyglot persistence.
╔═════════════════════════════════════╗
║ SQL/NoSQL/POLYGLOT PERSISTENCE ║
╚═════════════════════════════════════╝
┌─────────────────────────────────────┐
│ Application Layer │
└──────────────┬──────────────────────┘
│
┌───────┴────────┐
│ │
┌──────▼──────┐ ┌─────▼────────────┐
│ SQL │ │ NoSQL │
│ Database │ │ Database │
├─────────────┤ ├──────────────────┤
│ - ACID │ │ - Key-Value │
│ - Structured│ │ - Document │
│ - CA (CP) │ │ - Graph │
│ │ │ - AP │
└─────────────┘ └──────────────────┘
└────────┬────────┘
│
POLYGLOT PERSISTENCE
- Client-Side vs. Server-Side Data Storage
Sometimes the app uses the storage of the mobile device it is installed upon. For example, WhatsApp messages are saved on the local storage of your smartphone, with optional cloud backup.
Sometimes, all the data is stored on the servers. For example, all the messages in Facebook Messenger are saved on Facebook servers.
Before you start the development cycle, you need to clearly define which components would use the server’s resources and which ones would use the local resources of the device.
╔═══════════════════════════════════════════════════╗
║ CLIENT-SIDE vs SERVER-SIDE STORAGE ║
╚═══════════════════════════════════════════════════╝
CLIENT-SIDE SERVER-SIDE
┌──────────────┐ ┌──────────────┐
│ Mobile │ │ Server │
│ Device │ │ Cluster │
├──────────────┤ ├──────────────┤
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │ WhatsApp │ │ │ │ Facebook │ │
│ │ Messages │ │ │ │Messenger │ │
│ │ (Local) │ │ │ │ Messages │ │
│ └──────────┘ │ │ └──────────┘ │
│ │ │ │
│ + Optional │ │ + Backup │
│ Cloud │ │ + Sync │
│ Backup │ │ + Access │
└──────────────┘ └──────────────┘
- Optimized Data Structures
Every line of code you write taxes the server. The more the tax, the more you have to pay to maintain it. This is why it is good practice to write optimized software code and use data structures that have lower time and space complexities.
╔═════════════════════════════════════╗
║ OPTIMIZED DATA STRUCTURES ║
╚═════════════════════════════════════╝
Data Structure Complexity
O(1) ───────── Best (Hash Table)
O(log n) ────── Good (Binary Tree)
O(n) ───────── Acceptable (Array)
O(n²) ───────── Poor (Nested Loops)
O(2ⁿ) ───────── Worst (Exponential)
┌─────────────────────────────┐
│ Choose Lower Complexity │
│ = Less Server Tax │
│ = Lower Costs │
└─────────────────────────────┘
- Privacy Settings
Users of your application must have options to safeguard themselves on the internet. You must have good privacy options. For example, in WhatsApp, you can hide your profile photo, read receipts, and last seen. You can even block someone you think its a spam.
+-----------------------------------------+
| PRIVACY SETTINGS |
+-----------------------------------------+
+-----------------------------------------+
| User Privacy Panel |
+-----------------------------------------+
| [ ] Show Profile Photo |
| [X] Hide Last Seen |
| [X] Hide Read Receipts |
| [ ] Show Online Status |
| [X] Block User |
| |
| [Save Settings] |
+-----------------------------------------+
- Compliance
Your app should be compliant with required regulations. For example:
-
If your application handles Protected Health Information (PHI) for covered entities, it should be HIPAA compliant.
-
If your application is being launched in the EU, it should be GDPR compliant.
╔═════════════════════════════════════╗
║ COMPLIANCE REQUIREMENTS ║
╚═════════════════════════════════════╝
┌──────────────────────────────────┐
│ Your Application │
└────────────┬─────────────────────┘
│
┌──────┴──────┐
│ │
┌─────▼─────┐ ┌────▼──────┐
│ HIPAA │ │ GDPR │
│Compliance │ │Compliance │
├───────────┤ ├───────────┤
│ Medical │ │ EU │
│ PHI │ │ Privacy │
│ Protected │ │ Rules │
└───────────┘ └───────────┘
- Security Aspects
While security is a very broad topic, in your initial analysis you must consider protection from spam and DDoS. You must have proper IAM structure and strong password policies. You must also consider having 2FA enabled for extra security.
╔═════════════════════════════════════╗
║ SECURITY LAYERS ║
╚═════════════════════════════════════╝
┌───────────────────────────────────┐
│ Security Layers │
├───────────────────────────────────┤
│ Layer 5: 2FA Authentication │
├───────────────────────────────────┤
│ Layer 4: IAM & Access Control │
├───────────────────────────────────┤
│ Layer 3: DDoS Protection │
├───────────────────────────────────┤
│ Layer 2: Spam Filtering │
├───────────────────────────────────┤
│ Layer 1: Password Policy │
└───────────────────────────────────┘
- Services Available in the Area of Deployment
As the title suggests, all the services you are considering must be available in the geographical region where you intend to launch your application. For example, if you are planning to launch an app in China, you should not consider a Facebook API in your application design since it would not work.
+---------------------------------------+
| GEOGRAPHICAL SERVICE AVAILABILITY |
+---------------------------------------+
GLOBAL SERVICE MAP
+--------------+ +--------------+
| USA | | China |
+--------------+ +--------------+
| [+] Facebook | | [-] Facebook |
| [+] Google | | [-] Google |
| [+] AWS | | [+] Alibaba |
| [+] WhatsApp | | [+] WeChat |
+--------------+ +--------------+
Check Service Availability Before Launch!
- Cost Analysis
Without compromising on efficiency and performance, you should lower costs as much as possible. When considering options, try to select open-source frameworks with good developer community support.
╔═════════════════════════════════════╗
║ COST OPTIMIZATION ║
╚═════════════════════════════════════╝
┌────────────────────────────────────┐
│ Cost Optimization │
├────────────────────────────────────┤
│ │
│ Open Source >>> Proprietary │
│ (Lower Cost) (Higher Cost) │
│ │
│ Serverless >>> Always-On │
│ (Pay per use) (Fixed Cost) │
│ │
│ Cache >>> DB Query │
│ (Expensive) (Cheaper) │
│ │
└────────────────────────────────────┘
Balance Cost vs Performance
- On-Prem/Cloud/Hybrid
Decide if your data will reside on-prem, entirely on cloud, or part cloud and part on-prem (hybrid). Here, you also need to consider concepts like VPC.
╔═════════════════════════════════════╗
║ DEPLOYMENT OPTIONS ║
╚═════════════════════════════════════╝
┌──────────────────────────────────────┐
│ Deployment Options │
└───────────┬──────────────────────────┘
│
┌───────┼───────┐
│ │ │
┌───▼───┐ ┌─▼────┐ ┌▼──────┐
│On-Prem│ │Cloud │ │Hybrid │
├───────┤ ├──────┤ ├───────┤
│Local │ │ AWS │ │On-Prem│
│Servers│ │Azure │ │ + │
│ Data │ │ GCP │ │ Cloud │
│Center │ │ │ │ Mix │
└───────┘ └──────┘ └───────┘
│
┌───▼────┐
│ VPC │
└────────┘
- Horizontal Scaling/Vertical Scaling & Load Balancers
In horizontal scaling, you place replicas of the current DB server/app server and use a load balancer to balance the traffic between these instances.
In vertical scaling, you improve the system configuration of the instance. There is some limit to what you can improve. There are other considerations too – for example, if you increase the compute, you have to increase the RAM accordingly. A wise bet would be to have a fair mix of horizontal and vertical scaling.
Load balancers are used to balance traffic between various instances. Load balancers help reduce system downtime.
+-------------------------------------------------------------+
| HORIZONTAL & VERTICAL SCALING WITH LOAD BALANCERS |
+-------------------------------------------------------------+
HORIZONTAL SCALING VERTICAL SCALING
+---------------+ +---------------+
| Load Balancer | | Upgrade |
+-------+-------+ | CPU + RAM |
| +-------+-------+
| |
+-------------+-------------+ |
| | | |
v v v (and v) v
+---+ +---+ +---+ +---------------+
|DB1| |DB2| |DB3|... | Powerful |
+---+ +---+ +---+ | Server |
+---------------+
Add more servers Improve single server
- Customer Acquisition and Customer Retention
Incorporate analytics like funnel analytics and A/B testing to improve your customer base.
╔═════════════════════════════════════╗
║ CUSTOMER ACQUISITION & RETENTION ║
╚═════════════════════════════════════╝
┌─────────────────────────────────────┐
│ Analytics Funnel │
├─────────────────────────────────────┤
│ │
│ 1000 Users ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ │
│ │ │
│ 500 Signups ▓▓▓▓▓▓▓▓▓ │
│ │ │
│ 200 Active ▓▓▓▓ │
│ │ │
│ 100 Paid ▓▓ │
│ │
├─────────────────────────────────────┤
│ A/B Testing │
│ ┌─────────┐ ┌─────────┐ │
│ │Version A│ │Version B│ │
│ │ 50% │ │ 50% │ │
│ └────┬────┘ └─────┬───┘ │
│ └──────┬────────┘ │
│ Compare │
└─────────────────────────────────────┘
Summary:
================================
SYSTEM DESIGN IN A NUTSHELL
17 Core Concepts
================================
START HERE
|
↓
1. UX DESIGN & COMPONENTS
→ Design UI elements
→ Profile, messages, feeds
|
↓
2. CLASSES & SUBCLASSES
→ Define objects
→ User profiles with attributes
|
↓
3. DESIGN PATTERNS
→ Publisher-Subscriber
→ For news feeds
|
↓
4. INFRASTRUCTURE STACK
→ CDN for static content
→ Serverless functions
→ API Gateway
→ Cache strategy
|
↓
5. CONSISTENCY MODELS
→ Eventual (NoSQL)
→ Transactional (SQL)
|
↓
6. EVENT HANDLING
→ Request to response
→ Data packet routes
|
↓
7. DATABASE STRATEGY
→ SQL: Structured data
→ NoSQL: Key-value, docs
→ CAP Theorem considerations
→ Polyglot Persistence
|
↓
8. DATA STORAGE
→ Client-side (local)
→ Server-side (cloud)
→ Hybrid approach
|
↓
9. OPTIMIZED DATA STRUCTURES
→ Lower time complexity
→ Lower space complexity
→ Reduce server tax
|
↓
10. PRIVACY SETTINGS
→ Hide profile info
→ Block users
→ Read receipts control
|
↓
11. COMPLIANCE
→ HIPAA for medical PHI
→ GDPR for EU users
|
↓
12. SECURITY LAYERS
→ DDoS protection
→ IAM structure
→ 2FA authentication
→ Password policies
→ Spam filtering
|
↓
13. GEOGRAPHICAL SERVICES
→ Check availability
→ Regional restrictions
→ Example: Facebook in China
|
↓
14. COST OPTIMIZATION
→ Open source frameworks
→ Serverless options
→ Balance cost vs performance
|
↓
15. DEPLOYMENT OPTIONS
→ On-Premises
→ Cloud (AWS/Azure/GCP)
→ Hybrid approach
→ VPC considerations
|
↓
16. SCALING STRATEGIES
→ Horizontal: Add replicas
→ Vertical: Upgrade config
→ Load balancers
→ Mix both approaches
|
↓
17. CUSTOMER ANALYTICS
→ Funnel analytics
→ A/B testing
→ Acquisition & retention
================================
KEY TAKEAWAYS
================================
★ 30,000 ft view first
★ Design before coding
★ Consider all 17 aspects
★ Balance trade-offs
★ Optimize for scale
★ Security & compliance
★ Cost-effective choices
================================
This framework will also help you clear entry-level system design interview questions at any top MNC. When I started writing this article, I had much more in my mind. I had to trim the content since many of the concepts mentioned here can be explained in separate articles. By doing this, I will be doing some justice to them.
Thank you for reading. Stay tuned.
If you have any questions, please feel free to send me an email. You can also contact me via Linkedin. You can also follow me on X
