Currently debugging a nasty issue with kernel+disk controller-combo.
It seems when there’s a PCIe-NVMe -device controlled by Broadcom SAS38xx -controller (currently using a model from SMC, with it’s latest fw) – the controller firmware eventually hangs and writes garbage to the disks.
How it’s visible in dmesg?
megaraid_sas 0000:xx:00.0: [150]waiting for 2 commands to complete for scsi0
megaraid_sas 0000:xx:00.0: Trigger snap dump
megaraid_sas 0000:xx:00.0: FW in FAULT state Fault code:0x10000 subcode:0x0 func:megasas_trigger_snap_dump
megaraid_sas 0000:xx:00.0: resetting fusion adapter scsi0.
megaraid_sas 0000:xx:00.0: Outstanding fastpath IOs: 0
megaraid_sas 0000:xx:00.0: Waiting for FW to come to ready state