Plasma Crash Course - KCrash

Tuesday, 20 August 2024

A while ago a colleague of mine asked about our crash infrastructure in Plasma and whether I could give some overview on it. This seems very useful to others as well, I thought. Here I am, telling you all about it!

Our crash infrastructure is comprised of a number of different components.

KCrash: a KDE Framework performing crash interception and prepartion for handover to…
coredumpd: a systemd component performing process core collection and handover to…
DrKonqi: a GUI for crashes sending data to…
Sentry: a web service and UI for tracing and presenting crashes for developers

We will look at them in turn. This post introduces KCrash.

KCrash

KCrash, as the name suggests, is our KDE framework for crash handling. While it is a mid-tier framework and could be used by outside projects, it mostly doesn’t make sense to, because some behavior is very KDE-specific.

It installs POSIX signal handlers to intercept crash signals and then prepares the crashed process for handover to coredumpd and DrKonqi. More on these two in another post. Once prepared it sends the crash signal into the next higher level crash handler until the signal eventually reaches the default handler and cause the kernel to invoke the core pattern.

Before that can happen, a bunch of work needs doing inside KCrash. Most of it quite boring, but also somewhat challenging.

You see, when handling a signal you need to only use signal-safe functions. The manpage explains very well why. This proves quite challenging at the level we usually are at (i.e. Qt) because it is entirely unclear what is and isn’t ultimately signal-safe under the hood. Additionally, since we are dealing with crash scenarios, we must not trigger new memory allocation, because the heap management may have had an accident.

To that end, KCrash has to use fairy low-level API. To make that easier to work with, there are actually two parts to KCrash:

The Initialization Stage
The Crash Stage

The Initialization Stage

Initialization is generally triggered by calling KCrash::initialize. You may already wonder what kind of initialization KCrash could possibly need. Well, the obvious one is setting up the signal handling. But beyond that the init stage is also used to prepare us for the crash stage. I’ve already mentioned the serious constraints we will encounter once the signal hits, so we had best be prepared for that. In particular we’ll do as much of the work as possible during initialization. This most important includes copying QString content into pre-allocated char * instances such that we later only need to read existing memory. The second most important aspect is the metadata file preparation for use in…

The Crash Stage

Once initialization has happened, we are ready for crashes. Ideally the application doesn’t crash, of course. 😉

But if it does the biggest task is rescuing our data!

Metadata

Inside KCrash we have the concept of Metadata: everything we know about the crashed application: the signal, process ID, executable, used graphics device… and so on and so forth. All this data is collected into a metadata file on-disk in ~/.cache/kcrash-metadata at the time of crash.

Here’s an example file:

[KCrash]
exe=/usr/bin/kwin_wayland
glrenderer=
platform=wayland
appname=kwin_wayland
apppath=/usr/bin
signal=11
pid=1353
appversion=6.1.80
programname=KWin
bugaddress=submit@bugs.kde.org

The actual fields vary depending on what is available for any given application, but it’s generally more or less what is shown in the example.

This metadata file will later be consumed by DrKonqi in an effort to obtain information that only existed at runtime inside the application - such as the version that was running, or whether it was running in legacy X11 mode.

Handoff

Once the metadata is safely saved to disk, KCrash simply calls raise(). This re-raises the signal into the default handler, and through that causes a core dump.

What happens next is up to the system configuration as per the core manpage.

The recommended setup for distributions is that a crash handler be configured as core_pattern and that this handler consumes the crash. We recommend an implementation of the coredumpd and journald interfaces as that will then allow our crash handler to come in and log the crash with KDE.

So that was KCrash, the first in our four-step crash-handling pipeline. In the next blog post I’ll tell you all about the next one: coredumpd.