Linus Torvalds took to reverting some code tonight within the mainline Linux kernel that inadvertently had broken support having filenames with ❤️ and other special Unicode characters in filenames when on file-systems with case-folding (optional case insensitive file/folder name) support.
Merged to the Linux kernel last month was this change to the kernel’s Unicode handling to not special case ignorable code points. This commit stripping around 3k lines of kernel code left the ignorable code points to decompose/casefold themselves. Unfortunately though this ended up breaking things for file-systems with Unicode case-folding support for case insensitive file/folder handling, like F2FS. In turn those running new Linux kernels were no longer able to read files with special characters, such as the ❤️ emoji.
This kernel bug report raised the issue over being unable to find certain files on an F2FS file-system now after the specified Unicode change.
With that Unicode change clearly causing problems and breaking existing user-space support with accessing existing files of all things, Linus Torvalds immediately took to reverting the problematic code.
Linus Torvalds commented in the revert:
“It turns out that we can’t do this, because while the old behavior of ignoring ignorable code points was most definitely wrong, we have case-folding filesystems with on-disk hash values with that wrong behavior.
So now you can’t look up those names, because they hash to something different.
Of course, it’s also entirely possible that in the meantime people have created *new* files with the new (“more correct”) case folding logic, and reverting will just make other things break.
The correct solution is to not do case folding in filesystems, but sadly, people seem to never really understand that. People still see it as a feature, not a bug.”
At least if you don’t case-folding on a supported file-system and running on a very recent kernel, you have nothing to worry about especially if you don’t typically toss special characters into your filenames. In any case one more interesting/unique Linux kernel regression now resolved.