My work computer is a ThinkPad Z13. It’s on most of the time, including overnight and during the weekend. I’m one of those horrible people who like to just wiggle their mouse, unlock, and get working. I often leave a ton of windows open, so I quite like to sit down and start working without having to wait for boot up, and subsequent app launch.
So when I arrive at my desk on a Monday and discover my GPU has crashed, that’s a poor start to the week. The GPU crashing doesn’t completely kill the machine, just my desktop session and all the applications that were open. 😭
I see this kind of thing in the output of
dmesg -Tw | grep amdgpu.
[Mon Aug 14 08:06:06 2023] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5346515, emitted seq=5346517 [Mon Aug 14 08:06:06 2023] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 3456 thread Xorg:cs0 pid 3464 [Mon Aug 14 08:06:06 2023] amdgpu 0000:63:00.0: amdgpu: GPU reset begin!
I use Xorg instead of Wayland on my laptop. I’ve tried Wayland, but it’s never been great for the software I use on a daily basis, and the hardware combination I’m using. I use two external monitors, attached via a USB-C docking thing. So my desk looks a bit like this.
Although, more accurately, like this, when the GPU driver dies.
This crash happened on a second Monday morning in succession. So I figured it was time to file a bug. I ran
ubuntu-bug linux and followed the prompts. That got me a bug 2031289.
“There are some AMD FW updates in lunar-proposed linux-firmware 20230323.gitbcdcfbcf-0ubuntu1.6. Can you give that a try?”
It’s not a good idea to enable the proposed pocket. So instead, I just grabbed the deb via packages.ubuntu.com, then did the old
sudo apt install ./linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.6_all.deb dance to install it.
Four days later, the following Monday, I arrived at the office with all my fingers and toes crossed.
Great success 🥳
I’m awarding one hundred Internet points to Juerg for the quick and friendly bug interaction. Plus more points for doing the upload of that package in the first place, according to the changelog.
As I understand it, what I have done here is update a binary blob of GPU firmware on my machine, in the hope that it fixed a crasher. I always understood that the bad, evil, horrible people at nVidia made nasty binary blobs, but the Godlike do-no-wrong people at AMD only made saintly open source stuff.
Seems we still need that horrid non-free stuff, even for the “good” kind of GPU. I went looking for more info and found a thread on Reddit (spit!) from the past, with a post from an AMD person, explaining this situation.
Today, I learned.