In addition to the VFS changes merrged yesterday for allowing multi-device file-systems to better cope with losing a disk, another notable change as part of the VFS pull requests for Linux 6.17 allows more efficiently zeroing out a range on modern NVMe SSDs or SCSI drives.
The fallocate changes submitted for Linux 6.17 introduce a new “FALLOC_FL_WRITE_ZEROES” for more efficiently zeroing out an area on modern storage devices. The pull request that was merged on Monday to Linux Git explains:
“fallocate() currently supports creating preallocated files efficiently. However, on most filesystems fallocate() will preallocate blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified.
The extent state must later be converted to a written state when the user writes data into this range, which can trigger numerous metadata changes and journal I/O. This may leads to significant write amplification and performance degradation in synchronous write mode.
At the moment, the only method to avoid this is to create an empty file and write zero data into it (for example, using ‘dd’ with a large block size). However, this method is slow and consumes a considerable amount of disk bandwidth.
Now that more and more flash-based storage devices are available it is possible to efficiently write zeros to SSDs using the unmap write zeroes command if the devices do not write physical zeroes to the media.
For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support the DEAC bit[1], the write zeroes command does not write actual data to the device, instead, NVMe converts the zeroed range to a deallocated state, which works fast and consumes almost no disk write bandwidth.
This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices.
fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that subsequent writes to that range do not require further changes to the file mapping metadata. This flag is beneficial for subsequent pure overwriting within this range, as it can save on block allocation and, consequently, significant metadata changes.”
With these patches, in addition to the block subsystem changes and adding the FALLOC_FL_WRITE_ZEROES flag to fallocate, the plumbing work also introduces FALLOC_FL_WRITE_ZEROES support for the EXT4 file-system as a working example.
The /sys/block/[disk]/queue/write_zeroes_unmap will indicate for users whether a disk supports the efficient unmap write zeroes operation.