By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: The Anatomy of a Write Operation | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > The Anatomy of a Write Operation | HackerNoon
Computing

The Anatomy of a Write Operation | HackerNoon

News Room
Last updated: 2025/11/28 at 8:42 AM
News Room Published 28 November 2025
Share
The Anatomy of a Write Operation | HackerNoon
SHARE

When your Python program writes to a file, the return of that function call is not a guarantee of storage; it is merely an acknowledgment of receipt. As developers, we rely on high-level abstractions to mask the complex realities of hardware. We write code that feels deterministic and instantaneous, often assuming that a successful function call equates to physical permanence.

Consider this simple Python snippet serving a role in a transaction processing system:

transaction_id = "TXN-987654321"
# Open a transaction log in text mode
with open("/var/log/transactions.log", "a") as log_file:
    # Write the commitment record
    log_file.write(f"COMMIT: {transaction_id}n")
    print("Transaction recorded")

When that print statement executes, the application resumes, operating under the assumption that the data is safe. However, the data has not hit the disk. It hasn’t even hit the filesystem. It has merely begun a complex relay race across six distinct layers of abstraction, each with its own buffers and architectural goals.

In this article, we will describe the technical lifecycle of that data payload namely, the string "COMMIT: TXN-987654321n" as it moves from Python user space down to the silicon of the SSD.

[Layer 1]: User Space (Python & Libc)

The Application Buffer

Our journey begins in the process memory of the Python interpreter. When you call file.write() on a file opened in text mode, Python typically does not immediately invoke a system call. Context switches to the kernel are expensive. Instead, Python employs a user-space buffer to accumulate data. By default, this buffer is 8KB in size, chosen specifically to align with the memory page size of the underlying operating system.

Our data payload sits in this RAM buffer. It is owned entirely by the Python process. If the application terminates abruptly, perhaps due to a SIGKILL signal or a segmentation fault, the data is lost instantly. It never left the application’s memory space.

The Flush and The Libc Wrapper

The with statement concludes and triggers an automatic .close(). This subsequently triggers a .flush(). Python now ejects this data and passes the payload down to the system’s C standard library, such as glibc on Linux. libc acts as the standardized interface for the kernel. While C functions like fwrite manage their own user-space buffers, Python’s flush operation typically calls the lower-level write(2) function directly. libc sets up the CPU registers with the file descriptor number, the pointer to the buffer, and the payload length. It then executes a CPU instruction, such as SYSCALL on x86-64 architectures, to trap into the kernel.

At this point, we cross the boundary from User Space into Kernel Space.

[Layer 2]: The Kernel Boundary (VFS)

The CPU switches to privileged mode. The Linux kernel handles the interrupt, checks the CPU registers, and identifies a request to write to a file descriptor. It hands the request to the Virtual File System (VFS). The VFS serves as the kernel’s unification layer. It provides a consistent API for the system regardless of whether the underlying storage is Ext4, XFS, NFS, or a RAM disk.

The VFS performs initial validity checks, such as verifying permissions and file descriptor status. It then uses the file descriptor to locate the specific filesystem driver responsible for the path, which in this case is Ext4. The VFS invokes the write operation specific to that driver.

[Layer 3]: The Page Cache (Optimistic I/O)

We have arrived at the performance center of the Linux storage stack: the Page Cache.

In Linux, file I/O is fundamentally memory-mapped. When the Ext4 driver receives the write request, it typically does not initiate immediate communication with the disk. Instead, it prepares to write to the Page Cache. The Page Cache is a section of system RAM dedicated to caching file data. It should be noted that Ext4 generally delegates the actual Page Cache related memory operations back to the generic kernel memory management subsystem. What happens next is

  1. The kernel manages memory in fixed-size units called pages (typically 4KB on standard Linux configurations). Because our transaction log payload is small ("COMMIT: TXN-987654321n"), it fits entirely within a single page. The kernel allocates (or locates) the specific 4KB page of RAM that corresponds to the file’s current offset.
  2. It copies the data payload into this memory page.
  3. It marks this page as “dirty”. A dirty page implies that the data in RAM is newer than the data on the persistent storage.

The Return: Once the data is copied into RAM, the write(2) system call returns SUCCESS to libc, which returns to Python. Crucially, the application receives a success signal before any physical I/O has occurred. The kernel prioritizes throughput and latency over immediate persistence, deferring the expensive disk operation to a background process. The data is currently vulnerable to a kernel panic or power loss.

[Layer 4]: The Filesystem (Ext4 & JBD2)

The data may reside in the page cache for a significant duration. Linux default settings allow dirty pages to persist in RAM for up to 30 seconds. Eventually, a background kernel thread initiates the writeback process to clean these dirty pages. The Ext4 filesystem must now persist the data. It must also update the associated metadata, such as the file size and the pointers to the physical blocks on the disk. These metadata structures initially exist only in the system memory. To prevent corruption during a crash, Ext4 employs a technique called Journaling.

Before the filesystem permanently updates the file structure, Ext4 interacts with its journaling layer, the JBD2 (Journaling Block Device). Ext4 typically operates in a mode called “ordered journaling.” It orchestrates the operation by submitting distinct write requests to the Block Layer (Layer 5 – next section) in a specific sequence.

  • Step 1: The Data Write. First, Ext4 submits a request to write the actual data content to its final location on the disk. This ensures that the storage blocks contain valid information before any metadata pointers reference them.
  • Step 2: The Journal Commit. Once the data write is finished, JBD2 submits a write request for the metadata. It writes a description of the changes to a reserved circular buffer on the disk called the journal. This entry acts as a “commitment” that the file structure is effectively updated.
  • Step 3: The Checkpoint. Finally, the filesystem flushes the modified metadata from the system memory to its permanent home in the on-disk inode tables. If the system crashes before this step, the operating system can replay the journal to restore the filesystem to a consistent state.

[Layer 5]: The Block Layer & I/O Scheduler

The filesystem packages its pending data into a structure known as a bio (Block I/O). It then submits this structure to the Block Layer. The Block Layer serves as the traffic controller for the storage subsystem. It optimizes the flow of requests before they reach the hardware using an I/O Scheduler, such as MQ-Deadline or BFQ. If the system is under heavy load with thousands of small, random write requests, the scheduler intercepts them to improve efficiency. It generally performs two key operations.

  • Merging Requests. The scheduler attempts to combine adjacent requests into fewer, larger operations. By merging several small writes that target contiguous sectors on the disk, the system reduces the number of individual commands it must send to the device.
  • Reordering Requests. The scheduler also reorders the queue. It prioritizes requests to maximize the throughput of the device or to ensure fairness between different running processes.

Once the scheduler organizes the queue, it passes the request to the specific device driver, such as the NVMe driver. This driver translates the generic block request into the specific protocol required by the hardware, such as the NVMe command set transmitted over the PCIe bus.

[Layer 6]: The Hardware (The SSD Controller)

The payload traverses the PCIe bus and reaches the SSD. However, even within the hardware, buffering plays a critical role. Modern Enterprise SSDs function as specialized computers. They run proprietary firmware on multi-core ARM processors to manage the complex physics of data storage.

The DRAM Cache and Acknowledgment.

To hide the latency of NAND flash, which is slow to write compared to reading, the SSD controller initially accepts the data into its own internal DRAM cache. Once the data reaches this cache, the controller sends an acknowledgment back to the operating system that the write is complete. At this precise nanosecond, the data is still in volatile memory. It resides on the drive’s printed circuit board rather than the server’s motherboard. High-end enterprise drives contain capacitors to flush this cache during a sudden power loss, but consumer drives often lack this safeguard.

Flash Translation & Erasure

The SSD’s Flash Translation Layer (FTL) now takes over. Because NAND flash cannot be overwritten directly, it must be erased in large blocks first. The FTL determines the optimal physical location for the data to ensure even wear across the drive, a process known as wear leveling.

Physical Storage

Finally, the controller applies voltage to the transistors in the NAND die. This changes their physical state to represent the binary data.

Only after this physical transformation is the ==data truly persistent==.

Conclusion: Understanding the Durability Contract

The journey of a write highlights the explicit trade-off operating systems make between performance and safety. By allowing layers to buffer and defer work, systems achieve high throughput, but the definition of “written” becomes fluid. If an application requires strict data durability at the moment of completion where data loss is unacceptable, developers cannot rely on the default behavior of a write() call at the application layer.

To guarantee persistence, one must explicitly pierce these abstraction layers using os.fsync(fd). This Python call invokes the fsync system call (in Linux based systems) which forces a flush of the dirty pages to the filesystem, commits the journal, dispatches the block I/O, and issues a standard “Flush Cache” command to the storage controller, demanding the hardware empty its volatile buffers onto the NAND. Only when fsync returns has the journey truly ended.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges
Next Article These Are The Best Black Friday TV Deals This Year These Are The Best Black Friday TV Deals This Year
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

iRobot’s Black Friday deals drop feature-packed robot vacs to as low as 9
iRobot’s Black Friday deals drop feature-packed robot vacs to as low as $149
News
Asia-Pacific 5G subscriptions set to reach 4.6 billion by 2030 | Computer Weekly
Asia-Pacific 5G subscriptions set to reach 4.6 billion by 2030 | Computer Weekly
News
Android is adding a new Wi-Fi hotspot option that maximizes speed and compatibility
Android is adding a new Wi-Fi hotspot option that maximizes speed and compatibility
Gadget
Top 5 Chinese Manufacturers of Professional-Grade Scuba Gear for Recreational and Commercial Diving
Top 5 Chinese Manufacturers of Professional-Grade Scuba Gear for Recreational and Commercial Diving
Gadget

You Might also Like

Blending Worlds: How to Integrate External IPs Without Losing Your Game’s DNA | HackerNoon
Computing

Blending Worlds: How to Integrate External IPs Without Losing Your Game’s DNA | HackerNoon

12 Min Read
Three features I would add If I was a Product Manager at WhatsApp | HackerNoon
Computing

Three features I would add If I was a Product Manager at WhatsApp | HackerNoon

6 Min Read
7 African startups redefining research, regulation, retail, and recreation |
Computing

7 African startups redefining research, regulation, retail, and recreation |

19 Min Read
Lessons From The Night I Met Dbt on Databricks | HackerNoon
Computing

Lessons From The Night I Met Dbt on Databricks | HackerNoon

15 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?