The Linux 6.18 merge window is winding down this weekend ahead of Linux 6.18-rc1 expected on Sunday. Merged today were some remaining x86 core updates, which includes a Retpoline optimization patch intended to help out Intel E core CPUs.
Return trampolines “Retpolines” are needed for Spectre Variant Two mitigations. Intel engineer Peter Zijlstra landed a patch for optimizing the x86 patch_retpoline() code within the kernel. He explains with the patch:
Currently the very common retpoline: “CS CALL __x86_indirect_thunk_r11” is transformed into “CALL *R11; NOP3” for eIBRS/BHI_NO parts.
Similarly, paranoid fineibt has: “CALL *R11; NOP”.
Recognise that CS stuffing can avoid the extra NOP. However, due to prefix decode penalties, make sure to not emit too many CS prefixes. Notably: “CS CALL __x86_indirect_thunk_rax” must not become “CS CS CS CS CALL *RAX”. Prefix decode penalties are typically many more cycles than decoding an extra NOP.
Additionally, if the retpoline is a tail-call, the “JMP *%reg” should be followed by INT3 for straight-line-speculation mitigation, since emit_indirect() now has a length argument, move this into emit_indirect() such that other users (paranoid-fineibt) also do this.
The original mailing list post for the patch adds more context:
“Finding the exact prefix decode penalties for uarchs that have eIBRS/BHI_NO is not a fun time. I’ve stuck to the general wisdom that 3 prefixes is mostly good (notably, the instruction at hand has no 0x0f escape which is sometimes counted towards the prefix budget — it can have a REX prefix, but those are generally not counted towards the prefix budget).
In general Intel P-cores do not have prefix decode penalties, but the E-cores (or rather the Atom line) generally does. And since this all runs on hybrid cores, the code must accommodate them.
I hate all this.”
That patch was merged to Linux Git today via the x86/core pull ahead of Linux 6.18-rc1 tomorrow.