Several years ago Google engineers began exploring address space isolation for the Linux kernel and ultimately proposing Linux ASI for better dealing with CPU speculative execution attacks. While the hope was it would better cope with the ever growing list of CPU speculative execution vulnerabilities, the effort was thwarted initially by I/O throughput seeing a 70% performance hit. That level of performance cost was unsustainable. But now that I/O overhead has been reduced to just 13%.
Google engineer Brendan Jackman is back to bringing up ASI to Linux kernel developers now that “ASI is fast again…I’ve now prepared an up-to-date ASI branch that demonstrates a technique for solving the page cache performance devastation…The goal of this prototype is to increase confidence that ASI is viable as a broad solution for CPU vulnerabilities. (If the community still has to develop and maintain new mitigations for every individual vuln, because ASI only works for certain use-cases, then ASI isn’t super attractive given its complexity burden). The biggest gap for establishing that confidence was that Google’s deployment still only uses ASI for KVM workloads, not bare-metal processes. And indeed the page cache turned out to be a massive issue that Google just hasn’t run up against yet internally.“
Random reads with FIO were still hit by a 13% regression but at least better than 70%. ASI in current form also increased Linux kernel compilation times by 6~7%. Jackman added:
“Despite my title these numbers are kinda disappointing to be honest, it’s not where I wanted to be by now, but it’s still an order-of-magnitude better than where we were for native FIO a few months ago. I believe almost all of this remaining slowdown is due to unnecessary ASI exits, the key areas being:
– On every context_switch(). Google’s internal implementation has fixed this (we only really need it when switching mms).
– Whenever zeroing sensitive pages from the allocator. This could potentially be solved with the ephmap but requires a bit of care to avoid opening CPU attack windows.
– In copy-on-write for user pages. The ephmap could also help here but the current implementation doesn’t support it (it only allows one allocation at a time per context).”
With this LKML thread the hope now is to figure out if the state is improving good enough that the ASI work can move forward for potentially upstreaming into the Linux kernel.
“So, x86 folks: Does this feel like “line of sight” to you? If not, what would that look like, what experiments should I run?”
We’ll see what happens of Linux ASI.