The latest work by Qualcomm on the RISC-V CPU architecture is sending out their first non-RFC patch series for enabling Reliability, Availability and Serviceability (RAS) support by making use of the RISC-V RERI specification. This RISC-V RAS support is useful for conveying hardware errors to users and will be especially important with future RISC-V Linux servers.
The RISC-V RERI specification is for the RAS Error-record Register Interface to standardize the logging and reporting of errors by a memory-mapped register interface. RISC-V RERI is flexible enough for handling hardware errors from PCIe, CXL, and other device types / interfaces.
Himanshu Chauhan of Qualcomm sent out the patches today for enabling RAS support for RISC-V and succeed the earlier “request for comments” patches. The code relies on the highest priority Supervisor Software Events and is already supported by OpenSBI. RISC-V RAS can already be tested with the likes of QEMU paired with the latest OpenSBI and EDK2 plus these kernel patches.
With this patch series and relying on error injection to artificially cause errors, the RISC-V RAS support has been successfully tested.
