A Story About How to Turn Unpredictable Upgrades Into Steady-State Transitions
Most individuals may recall their first time upgrading Linux and how unpleasant it was. It happened to me late on a Friday night, when the office was empty, the lights were too loud, and I was looking at a terminal window, hoping the system wouldn’t crash in the middle of a package update.
Anyone who has ever done an in-place upgrade knows what it’s like to have to rebuild a whole machine because of a single wrong dependency, an outdated library, or a service that isn’t working properly.
After enough of these experiences, I started asking a simple question:
Why do we upgrade the OS on the same partition that’s running our production system?
It felt like renovating a house while people were still living in it.
That question eventually led me to a cleaner, calmer approach—one that treats OS upgrades the way modern DevOps teams treat application deployments. Instead of tearing down the running system, you build the new one beside it. You test it. You rehearse the cutover. And only when you’re ready, you flip the switch.
This is the idea behind dual root partitions.
The Problem With Traditional Upgrades
Most Linux migrations fall into familiar patterns:
· Upgrade everything in place and hope nothing breaks
· Reinstall from scratch and accept the downtime
· Replace the system virtually and pray the environment matches production
All of these approaches share the same flaw: they modify the system you’re depending on.
If something goes wrong, you’re stuck repairing a half‑upgraded OS or scrambling to restore from backups. Rollback becomes a rescue mission instead of a simple decision.
I wanted something different—something predictable.
The Dual‑Root Idea
We had a breakthrough when we started thinking about OS updates the same way we think about application rollouts.
In modern DevOps, you don’t overwrite the running code. You deploy a second version, test it, then switch traffic over. If something breaks, you instantly revert. No drama. No data loss. No 3am panic.
So why not do the same with Linux itself?
A dual-root layout looks something like this:
/dev/sda1 /boot (shared - contains kernels for both OS versions)
/dev/sda2 Active OS (currently running - Ubuntu 20.04)
/dev/sda3 Staged OS (prepared upgrade - Ubuntu 22.04)
/dev/sda4 Shared data (/home, /var, application data)
Visual representation:
┌─────────────────────────────────────────────────┐
│ /boot (shared) │
│ ├── vmlinuz-5.4.0 (old kernel) │
└── vmlinuz-5.15.0 (new kernel) │
└─────────────────────────────────────────────────┘
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Root A │ │ Root B │
│ (Active OS) │ │ (Staged OS) │
│ Ubuntu 20.04 │ ◄──► │ Ubuntu 22.04 │
│ │ │ (being tested) │
└──────────────────┘ └──────────────────┘
▼ ▼
┌─────────────────────────────────────────────────┐
│ /home, /var, /opt (shared data partition) │
│ User data, databases, application state │
└─────────────────────────────────────────────────┘
GRUB Bootloader chooses which root to boot
The visual representation shows two complete operating systems side by side. One running. One waiting.
The beauty of this setup is that the new OS can be built, inspected, and validated without touching the live environment. If it passes your checks, you promote it. If not, you simply reboot into the original system and carry on.
Everything is calm. Full availability. No late-night firefighting.
How the Migration Actually Works
The workflow is surprisingly straightforward. Here’s the actual process:
Step 1: Prepare the Staging Partition
# Format the staging partition (assuming /dev/sda3)
mkfs.ext4 -L "root-staging" /dev/sda3
# Mount it
mkdir -p /mnt/newroot
mount /dev/sda3 /mnt/newroot
Step 2: Extract the New Root Filesystem
# Download and extract base system (example: Ubuntu)
cd /mnt/newroot
wget http://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64-root.tar.xz
tar xf ubuntu-22.04-server-cloudimg-amd64-root.tar.xz
rm ubuntu-22.04-server-cloudimg-amd64-root.tar.xz
# Or use debootstrap for Debian-based systems
debootstrap jammy /mnt/newroot http://archive.ubuntu.com/ubuntu/
Step 3: Bind Mount System Directories
This makes the staged OS “see” the running system’s devices and processes:
mount --bind /dev /mnt/newroot/dev
mount --bind /proc /mnt/newroot/proc
mount --bind /sys /mnt/newroot/sys
mount --bind /run /mnt/newroot/run
Step 4: Enter the New Environment
# Now you're "inside" the new OS
chroot /mnt/newroot /bin/bash
# Set your hostname
echo "production-server" > /etc/hostname
# Configure network (example)
cat > /etc/netplan/01-netcfg.yaml << EOF
network:
version: 2
ethernets:
eth0:
dhcp4: true
EOF
Step 5: Install and Configure Everything
Still inside the chroot:
# Update package listsapt update
# Install kernel and essential packages
apt install -y linux-image-generic grub-pc
# Install your required services
apt install -y nginx postgresql redis-server
# Configure fstab
for the new rootcat > /etc/fstab << EOF
/dev/sda3 / ext4 defaults 0 1
/dev/sda1 /boot ext4 defaults 0 2
/dev/sda4 /home ext4 defaults 0 2
EOF
Step 6: Update GRUB Bootloader
# Still in chroot - install GRUB
grub-install /dev/sda
update-grub
# Exit chrootexit
# Back on the host system - update its GRUB too
update-grub
Now when you reboot, GRUB will show both OS options:
Ubuntu 22.04 (on /dev/sda3) ← New OS
Ubuntu 20.04 (on /dev/sda2) ← Current OS (fallback)
Step 7: Test Boot
# Reboot and select the new OSreboot
Step 8: Validate Everything
Once booted into the new OS, run your validation checklist:
#!/bin/bash
# validation-checklist.sh
echo "Checking network connectivity..."
ping -c 3 8.8.8.8 || echo "Network failed"
echo "Checking disk mounts..."
df -h | grep -E "/home|/boot" || echo "Mounts failed"
echo "Checking critical services..."
systemctl is-active nginx || echo "nginx not running"
systemctl is-active postgresql || echo "postgresql not running"
echo "Checking application endpoints..."
curl -f http://localhost/ || echo "Web server failed"
echo "All checks passed - safe to promote!"
Step 9: Promote or Rollback
If everything works:
# Make the new OS the default boot option
grub-set-default 0
update-grub
# Optional: You can now repurpose the old partition
# But keep it around for a few weeks just in case
If something’s wrong:
# Just reboot and select the old OS from GRUB
# No reinstall. No restore. Just boot the other partition.
reboot
Common Pitfalls (And How to Avoid Them)
Pitfall 1: Forgetting to Update fstab
Problem: The new OS boots but can’t mount shared partitions.
Solution: Always verify /etc/fstab in the staged OS:
# Inside chroot, check fstab
cat /etc/fstab
# Make sure it references the correct partition UUIDs
blkid /dev/sda3
# Get UUID of new root
blkid /dev/sda4
# Get UUID of shared data
# Update fstab with actual UUIDs
vim /etc/fstab
Pitfall 2: Network Configuration Conflicts
Problem: The new OS boots with the wrong IP or no network.
Solution: Before finalizing, copy and verify the network configuration.
# Copy current network settings to staged OS
cp /etc/netplan/* /mnt/newroot/etc/netplan/
cp /etc/network/interfaces /mnt/newroot/etc/network/
Pitfall 3: Missing Boot Partition Entry
Problem: GRUB can’t find kernels for the new OS.
Solution: Ensure /boot is properly mounted in both systems:
# In the staged OS fstab:
/dev/sda1 /boot ext4 defaults 0 2
# Verify kernel is installed
chroot /mnt/newroot ls -la /boot/vmlinuz*
Pitfall 4: Service Configuration Drift
Problem: Services start but aren’t configured correctly.
Solution: Use configuration management:
# Apply Ansible/Salt/Chef configs to staged OS
ansible-playbook -i staging-inventory configure-services.yml
# Or manually verify critical configs
diff /etc/nginx/nginx.conf /mnt/newroot/etc/nginx/nginx.conf
Why This Approach Feels So Different
Once you try this method, it’s hard to go back. It changes the emotional tone of upgrades.
- Upgrades stop being emergencies—you’re no longer gambling with a running system.
- Rollback becomes instant—there’s no reinstalling or restoring—just a bootloader choice.
- This approach significantly reduces the system’s downtime by completing all necessary tasks while the system remains operational.
- This approach aligns with the principles of DevOps, emphasizing the importance of early testing. Validate safely. Promote with confidence.
It’s the same mindset we use for microservices, CI/CD pipelines, and cloud deployments—just applied to the operating system itself.
| Aspect | Traditional In-Place | Dual-Root Partition |
|—-|—-|—-|
| Risk Level | High-modifying running system | Low—staged separately |
| Rollback Time | Hours (reinstall or restore) | Seconds (reboot) |
| Downtime | Hours during upgrade | Minutes (just reboot) |
| Testing | Production is the test | Full testing before commit |
| Dependency Conflicts | Breaks the live system | Isolated to staging |
| Mental Stress | High | Low |
| Disk Space Required | Minimal | 2x root partition size |
| Complexity | Low (but risky) | Medium (but safe) |
When This Method Shines
This approach is especially useful when:
- You’re upgrading between major distro releases.
- You’re switching distributions entirely.
- You’re maintaining critical systems with tiny maintenance windows.
- You need a safe validation environment before committing.
While it may not be as suitable for small devices or systems with strict Secure Boot rules, it can significantly improve the performance of most servers and workstations.
A Real Migration Story
Let me share a concrete example from production.
Scenario: Upgrading 50 Ubuntu 18.04 servers to 22.04 across multiple data centers.
The Old Way (What We Used to Do) was to schedule a 4-hour maintenance window for each server.
- Schedule 4-hour maintenance window per server
- Run do-release-upgrade and pray
- Watch nervously as packages download over WAN
- Hit dependency conflicts on 12 servers
- Spend 2 days fixing broken systems
- Total time: 3 weeks of stressful nights
The Dual-Root Way (What We Do Now):
# Automated script runs during business hours
for server in server{01..50}; do
ssh $server 'bash -s' < prepare-dual-root.sh
ssh $server 'bash -s' < install-new-os.sh
ssh $server 'bash -s' < configure-new-os.sh
done
# Each server now has two OS versions
# Test 5 canary servers first
for canary in server{01..05}; do
ssh $canary 'grub-reboot 0 && reboot'
sleep 300 # Wait 5 minutes
ssh $canary 'run-validation-suite.sh'
done
# If canaries pass, roll out to remaining servers
# Each server reboots into new OS during maintenance window
# Rollback is just selecting old GRUB entry
Results:
- Zero failed upgrades
- Each server tested before commitment
- 15-minute maintenance window per server (just a reboot)
- Instant rollback capability for 2 weeks
- Total time: 1 week, zero stress
The difference isn’t just technical—it’s psychological. You stop fearing upgrades.
Final Thoughts
Linux upgrades don’t have to be nerve-wracking. With dual root partitions, you get a controlled, reversible, and almost elegant migration path. You can rehearse the upgrade before it ever touches production. And if something goes wrong, you’re one reboot away from safety.
It’s a simple idea—two root partitions instead of one—but it transforms the entire experience.
The mindset shift is profound:
- Upgrades become routine instead of risky
- Testing happens before commitment, not during
- Rollbacks are instantaneous, not rescue missions
- You sleep better during maintenance windows
And once you’ve done it this way, you’ll wonder why we ever upgraded Linux any other way.
Try It Yourself
Start small:
- Spin up a VM with dual partitions
- Install Ubuntu 20.04 on one and 22.04 on the other
- Practice the chroot workflow
- Get comfortable with GRUB switching
- Then try it on a non-critical server
The first time takes an hour to set up. After that, the process becomes instinctive.
Have you tried dual-root partitions? What’s your upgrade horror story? Drop a comment below—I’d love to hear how others are handling this challenge.
Want to dive deeper? Check out these resources:
- GRUB bootloader documentation
- Debootstrap guide
- Chroot best practices
About the Author
Srinivas Thotakura is a DevOps engineer and infrastructure architect who has spent the better part of a decade eliminating unnecessary complexity from Linux operations.
After one too many failed Friday night upgrades, he became obsessed with making infrastructure changes boring (in the best possible way). His work focuses on lifecycle management, configuration automation, and reliability engineering across large-scale distributed systems.
Srinivas automates things that should be automated, and shares hard-won lessons through technical writing and community engagement. He believes the best infrastructure is the kind you don’t have to think about.
Connect: [https://www.linkedin.com/in/sri-h2o/]()
