Szymon Niedźwiedź 2023/03/04
Content of this post is licensed under a Creative Commons Attribution 4.0 International License except for code sections which are licensed under CC0 1.0

Regresion in kernel 6.*

Newest kernel 6.x does not support old way of loading vfio-pci driver. The issue is freeze when loading udevd in initramfs stage.

Definitions and assumptions

Easy solution

Stay on old 5.19.3 linux kernel for some time until ArchWiki gets upgraded by someone else. This is somethings contradicting philospohy of using rolling distro like ArchLinux and it just does not feel good (at least for me). It also makes you an user of outdated kernel without cutting edge features and using this less popular/tested configuration of packages might finally cause issues.

#/etc/default/grub
GRUB_CMDLINE_LINUX=" [...] vfio-pci.ids=10de:13c2,10de:0fbb"

if you still want to stay on older kernel, then script below may help.

# file: download.sh [755]
#!/usr/bin/bash
set -e

# received from link below
# https://archive.archlinux.org/repos/2022/10/13/core/os/x86_64/
LINKS=(linux-5.19.13.arch1-1-x86_64.pkg.tar.zst
linux-5.19.13.arch1-1-x86_64.pkg.tar.zst.sig
linux-api-headers-5.18.15-1-any.pkg.tar.zst
linux-api-headers-5.18.15-1-any.pkg.tar.zst.sig
linux-docs-5.19.13.arch1-1-x86_64.pkg.tar.zst
linux-docs-5.19.13.arch1-1-x86_64.pkg.tar.zst.sig
linux-firmware-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-bnx2x-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-bnx2x-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-liquidio-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-liquidio-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-marvell-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-marvell-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-mellanox-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-mellanox-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-nfp-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-nfp-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-qcom-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-qcom-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-qlogic-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-qlogic-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-firmware-whence-20220913.f09bebf-1-any.pkg.tar.zst
linux-firmware-whence-20220913.f09bebf-1-any.pkg.tar.zst.sig
linux-headers-5.19.13.arch1-1-x86_64.pkg.tar.zst
linux-headers-5.19.13.arch1-1-x86_64.pkg.tar.zst.sig)

DOWNLOAD_DIR="$(dirname "$(realpath "$0")")/packages"
mkdir -p "$DOWNLOAD_DIR"
cd "$DOWNLOAD_DIR"
for i in "${LINKS[@]}"; do
  wget https://archive.archlinux.org/repos/2022/10/13/core/os/x86_64/$i
done
echo packages downloaded succesfully

Manually downgrading packages and adding kernel (and related packages) to IgnoredPkg section of /etc/pacman.conf should solve this issue.

# file: /etc/pacman.conf
# assumes using non lts kernel
# keep in mind that you may have even more kernel dependent packages
[...]
IgnorePkg   = linux linux-*
[...]

Good solution

Main requirement for this solution was to be simple enough and compatible with latest kernel.

Method of finding a solution

I was trying various solutions posted on internet, and debugging them inside initramfs using break=postmount kernel commandline. I have discovered that loading vfio-pci inside initrd would cause same runtime hanging behaviour as with old commandline vfio-pci.ids=[...], but same command modprobe -i vfio-pci would work flawlessly inside late userspace.

Solution description

Let’s force initramfs to run hook before anything else (before udevd most importantly in order to avoid surprises) and load module later using systemd service inside late userspace. This can be achieved by creating and adding file to initramfs(controled by mkinitcpio.conf file), adding hook and systemd service (systemd runs after initramfs has done it’s job). This solution is almost same as on ArchWiki with small difference of modprobe -i vfio-pci executed in late userspace and having early hook for driver override instead of kernel module aliasing in /etc/modprobe.d.

Prerequisites
Important remarks

Using this method

Steps

  1. Create /sbin/vfio-pci-override-vga.sh file.

Get PCI bus addresses of devices from IOMMU group

lspci

Put these adresses in DEVS array

# file: /sbin/vfio-pci-override-vga.sh [755]
#!/bin/sh
DEVS="0000:01:00.0 0000:01:00.1"
if [ -z "$(ls -A /sys/class/iommu)" ]; then
    exit 0
fi
for DEV in $DEVS; do
    echo "vfio-pci" > "/sys/bus/pci/devices/$DEV/driver_override"
done

Make script executable

chmod +x /sbin/vfio-pci-override-vga.sh 
  1. Add initcpio hook
/etc/initcpio
├── hooks
│   └── vfio
└── install
    └── vfio

Hook script will run inside initramfs

# file: /etc/initcpio/hooks/vfio [644]
#!/usr/bin/ash

run_hook() {
    /sbin/vfio-pci-override-vga.sh
}

# vim: set ft=sh ts=4 sw=4 et:

Build hook must also be created in order to copy hook file to initramfs image, otherwise hook would not work as it would not be seen by mkinicpio.

# file: /etc/initcpio/install/vfio [644]
#!/usr/bin/env bash

build() {
    add_runscript
}

help() {
    cat <<HELPEOF
vfio hook help
HELPEOF
}

name of hook is vfio but it may differ if you wish

  1. Add hook to /etc/mkinitcpio.conf and also add file /sbin/vfio-pci-override-vga.sh to FILES array so it will be copied to initramfs.
FILES=(/sbin/vfio-pci-override-vga.sh)
[...]
# place vfio after base and before udev
HOOKS=(base vfio udev [...])
[...]
  1. Add systemd service which loads vfio-pci driver inside late userspace.
# file: /etc/systemd/system/vfio-load.service [644]
[Unit]
Description=Insert vfio-pci driver

[Service]
Type=oneshot
ExecStart=modprobe -i vfio-pci

[Install]
WantedBy=multi-user.target

Remember to enable the service

# systemctl daemon-reload
# systemctl enable vfio-load.service
  1. Apply changes from /etc/mkinitcpio.conf by generating new initramfs:
sudo mkinitcpio -P 
  1. Reboot you computer and verify it worked by checking whether line Kernel driver in use: vfio-pci is present.
$ lspci -nnk
[...]
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation [REDACTED]
	Subsystem: Lenovo Device [17aa:[REDACTED]]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation [REDACTED]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
[...]

Debugging

Conclusion

I hope it helps. Feel free to suggest or point out any improvements, mistakes or bug fixes. And most importantly - Enjoy your awesome VMs!

enable disqus comments