I have an optimus laptop, and after the update to KDE6 optimus-manager stopped working. I needed a second display, and all my display outputs are on the Nvdia GPU, so I needed to switch. I tried many different X11 configs, envycontrol then more X11 configs, but I couldn’t get it working right, it would only be the internal display or the external one, not both. after a few hours I gave up and tried optimus-manager again. This time I checked the error log and it was failing to load the nvidia module, I tried loading it manually but I got a “No such device” error, which is where the title of the post comes in. My GPU has disappeared from linux, it won’t show up in lspci, lshw, nvidia-smi, or anything else it should. The only reference to the thing in dmesg I can find are :

[    0.216410] pci 0000:01:00.0: [10de:1ba1] type 00 class 0x030000
[    0.216419] pci 0000:01:00.0: reg 0x10: [mem 0xde000000-0xdeffffff]
[    0.216427] pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.216435] pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
[    0.216440] pci 0000:01:00.0: reg 0x24: [io  0xe000-0xe07f]
[    0.216445] pci 0000:01:00.0: reg 0x30: [mem 0xdf000000-0xdf07ffff pref]
[    0.216460] pci 0000:01:00.0: Enabling HDA controller
[    0.257300] pci 0000:01:00.0: vgaarb: bridge control possible
[    0.257300] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.270521] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

and then nothing, it doesn’t even seem to try to load the nvidia module. I tried booting into windows and it shows up there fine, so the GPU didn’t randomly die.
As far as I can tell I’ve rolled back everything I did in my histfile until it stopped working, The only thing I could think is I upgraded my kernel to (6.7.9) from (6.6.10), could that have caused it? I also tried adding pcie_port_pm=off to the kernel params from the archwiki, but still nothing. I’m just at a loss here, anyone have any ideas?

EDIT: I’m using the nvidia-dkms package
EDIT2: one kernel downgrade later and it’s still not appearing, so thats not it.
EDIT3: fixed, see comments

  • taaz@biglemmowski.win
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    8 months ago

    I think I had this occur to me once and it was something really dumb but I can’t remember what.

    @thomasdouwes@sopuli.xyz just for the sake of trying everything, you could rebuild the dkms and initrams, then reboot:

    dkms autoinstall -F -a kernel-6.8.5-arch1 # change the kernel version according what you have now (read from uname -a)
    mkinitcpio -P
    

    E: Exhaustive of what I would try

    • check if drivers and modprobe blacklist make sense (this one is broad and requires digging into arch wiki but the optimus laptop I had required blacklisting some drivers from early loading afaik)
    • fiddle with re-scans and power states in the sys bus PCI folders for the GPU
    • check that my mkinitcpio makes sense, additionally look for .pacnew (/etc/mkinitcpio.conf.pacnew) and see if the changes might affect the system
    • downgrade kernel - already tried
    • downgrade dkms packages
    • update BIOS and firmwares from windows
    • cold boot the laptop (shutdown, remove AC and battery, leave it cold for few seconds)
    • on windows, look into ROG Armoury/MSI Center for any kind of toggles that could have impact on the GPUs (iGPU/dGPU) stuff like power states, optimizations etc)
    • Thomas DouwesOP
      link
      fedilink
      arrow-up
      5
      ·
      8 months ago

      Looks like you where right about the udev rules earlier, I ran a pacman command to find all untracked files in /usr and I found /usr/lib/udev/rules.d/50-remove-nvidia.rules was there. Contents:

      # Automatically generated by EnvyControl
      
      # Remove NVIDIA USB xHCI Host Controller devices, if present
      ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{power/control}="auto", ATTR{remove}="1"
      
      # Remove NVIDIA USB Type-C UCSI devices, if present
      ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{power/control}="auto", ATTR{remove}="1"
      
      # Remove NVIDIA Audio devices, if present
      ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{power/control}="auto", ATTR{remove}="1"
      
      # Remove NVIDIA VGA/3D controller devices
      ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", ATTR{power/control}="auto", ATTR{remove}="1"
      
      

      looks like EnvyControl left some extra files after uninstalling.
      Personally, I think it’s pretty weird that it put runtime files in /usr/lib, if they where in /etc I would have found them quickly.
      The GPU is back on the bus now and I can run optimus-manager to get my extra screen. Thank you for the help troubleshooting this issue.

    • Thomas DouwesOP
      link
      fedilink
      arrow-up
      1
      ·
      8 months ago

      I don’t seem to have an -F on my dkms? when I ran that it without, it didn’t rebuild all the DKMS modules for some reason, just bbswitch and evdi

      • taaz@biglemmowski.win
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        8 months ago

        ah the -F might be wrong then actually, I was playing with custom kernels recently and my dkms is a mess, wouldn’t worry about that option

      • Thomas DouwesOP
        link
        fedilink
        arrow-up
        1
        ·
        8 months ago

        dkms status doesn’t even list half of my DKMS modules for some reason