There is some tension on the Linux kernel mailing list with some late Bcachefs feature work sent in as part of “fixes” for the ongoing Linux 6.16 kernel cycle. Established rules aim for only new feature code to be introduced during the kernel merge windows, which ended nearly two weeks ago for Linux 6.16, but Bcachefs wanting to be exempt to continue to allow new feature code to still land for the cycle in the name of data safety.
Sent out yesterday was the latest round of Bcachefs code changes hoped for Linux 6.16-rc3 this weekend. There are some small check/repair fixes, fixing some Linux 6.16 regressions, and then a new feature of adding “journal_rewind”. The new Bcachefs journal_rewind code allows the file-system to be reset to an earlier point in time and aims to serve as a disaster recovery tool. The tool isn’t yet fully complete with some known “major caveats” but the hope being that it’s a step-forward in case of Bcachefs file-system problems.
In response to the Bcachefs pull request, Linus Torvalds commented:
“You seem to have forgotten what the point of the merge window was again.
We don’t start adding new features just because you found other bugs.
I remain steadfastly convinced that anybody who uses bcachefs is expecting it to be experimental. They had better.
Make the -rc fixes be pure fixes.”
To which Bcachefs lead developer Kent Overstreet responded to Torvalds:
“The goal is to get users _code that works_, is it not?
…
Honestly, most of the people using bcachefs from what I’ve seen just want something that works.There are a _lot_ of people who’ve been burned by btrfs. I’ve even been seeing more and more people in recent discussions talking about unrecoverable filesystems with XFS (!).
That last one has been a surprise to me (and I don’t think it’s anything to do with the quality of the code), but it honestly should serve as a wakeup call as to how much is falling through the cracks and how badly we’ve been failing.
There are still a lot of people who don’t want to move off ext4… and I can’t really blame them.
If you go looking, you won’t find those stories about bcachefs – except from me, when I’m telling people what to watch out for.
And that’s because of a lot of hard work, and because I’m dead set on not repeating past mistakes; I actively hunt down bug reports and I frequently tell people – “I don’t care if you think it’s a hardware issue or pebcak, it’s the filesystem’s job to not lose data; get me the info I need and I’ll get it sorted and get it working again”.
That’s the goal here, delivering something that users can trust and rely on.”
Kent further commented in response to another comment noting that Linux kernel merge window periods are clearly known:
“That’s an easy rule for the rest of the kernel, where all your mistakes are erased at a reboot. Filesystems don’t have that luxury.
In the past, I’ve had to rush entire new on disk format features in response to issues I saw starting to arise – I think more than once, but the btree bitmap in the member info section was the big one that sticks in my mind; that one was very hectic, but 100% proved its worth.
Thankfully, we’re well past that. This time, we’re just talking about a ~70 line patch that just picks overwrites instead of updates from the journal and sorts them in reverse order.
So your next question might be – why not leave that in a branch for the users that need it until the next merge window?
For a lot of users, compiling a kernel from some random git repository is a lot to ask. I spend a lot of time doing what amounts to support; that’s just how it is these days. But rc kernels are packaged by most kernels, and we absolutely do not want to wait an additional 3 months for it to show up in a release kernel -
For something that might be the difference between losing a filesystem and getting it back.
There’s also a couple patches for tracepoints and introspection improvements; I don’t know if Linus was referring to those. But those are important too.
I think at least as much about “how do I make this codebase easy to debug; how do I make it _practical_ to support and QA when it’s running on random user machines in the wild” as I do about the debugging itself. That’s at least as important as the debugging; making it maintainable.
Partly that’s about maintaining a quick feedback cycle between myself and the users reporting issues; that builds trust, brings people into the community, turns into opportunities to teach them more about testing and QA and bug reporting.
I also never cease to be amazed how often I add some bit of logging or improve a tracepoint or some introspection – and then a week later I’m working on something else and it’s exactly the thing I need.
IOW – it’s not just about fixing the bugs, it’s about how we fix the bugs.
Tools to repair, tools to debug, it’s all just tools, all the way down…”
And then further elaborating:
“There is a time and a place for rules, and there is a time and a place for using your head and exercising some common sense and judgement.
I’m the one who’s responsible for making sure that bcachefs users have a working filesystem. That means reading and responding to every bug report and keeping track of what’s working and what’s not in fs/bcachefs/. Not you, and not Linus.
There’s no need for any of this micromanaging, which is what this has turned into. All it’s been doing is generating conflict and drama.”
As it stands as of writing, Linus Torvalds has now pulled in this latest Bcachefs pull request with the new journal_rewind functionality.