regreSSHion (CVE-2024-6387): pre-auth RCE in sshd, with asterisks

1 July 2024. Qualys publishes an advisory with a name as catchy as its headline: regreSSHion, a race condition in the SIGALRM signal handler in sshd that opens a pre-auth RCE as root. CVSS 8.1, NVD publishes the same day. OpenSSH 9.8p1 reaches servers on 6 July with the patch. The news bounces across every feed: “pre-auth SSH RCE, no credentials, as root, on millions of exposed servers”.

The detail Qualys calmly explains in the advisory and the news cycle cuts off puts the story back in its place. Exploitation requires glibc, x86 i386 more tractable than amd64, ~10,000 connections and 6 to 8 hours of race window to hit the right heap layout. It doesn’t work on OpenBSD. It doesn’t work on servers with low MaxStartups. And the bug being exploited is from 2006, patched then, reintroduced in 2020 without anyone noticing for four years.

Lab: Debian image with OpenSSH 9.6p1 and glibc 2.36, Qualys public PoC. Server with no countermeasures, full client control. Not sent against external hosts.

The bug in one sentence

In sshd, when a client opens a connection and doesn’t complete authentication within LoginGraceTime (120 seconds by default), the kernel delivers SIGALRM to the process. The signal handler calls sigdie(), which internally calls syslog() to record the incident. syslog() is not async-signal-safe: in glibc it calls malloc() and free(), and inside a signal handler that’s Pandora.

Canonical list of async-signal-safe functions: man 7 signal-safety on Linux. _exit, signal, write, read, kill, pause, alarm and about seventy more. syslog() is not on it. malloc() / free() either. Any handler that calls something outside that list is a latent bug waiting for a signal at the wrong moment.

If the signal arrives exactly when the sshd process is in the middle of a malloc() (for example, building the PAM structure after receiving the client’s username), the second malloc() from the handler corrupts glibc’s heap arena. From there, with patience, the heap can be manipulated so the next pointer lands on attacker-controlled shellcode. The process is root (pre-fork, before the privilege drop). RCE.

The regression that took four years

This is the interesting part. In 2006 CVE-2006-5051 was reported, with the same shape: a signal handler calling non-async-signal-safe functions. The fix at the time wrapped sigdie() in a conditional guard, controlled by a DO_LOG_SAFE_IN_SIGHAND macro. When defined (by default on glibc), the problematic calls were replaced by _exit(14): the process terminates without calling any destructor, without touching the heap, without syslog.

In October 2020, OpenSSH 8.5p1 introduces commit 752250c that refactors the logging code. In that commit, the #ifdef DO_LOG_SAFE_IN_SIGHAND protections vanish. It isn’t a malicious or controversial change: the refactor passes code review, the tests stay green (there was no test for this class of race), and no one notices the guard has fallen.

The relevant change in log.c, trimmed from the commit:

-void
-sigdie(const char *fmt,...)
-{
-#ifdef DO_LOG_SAFE_IN_SIGHAND
-	va_list args;
-
-	va_start(args, fmt);
-	do_log(SYSLOG_LEVEL_FATAL, fmt, args);
-	va_end(args);
-#endif
-	_exit(1);
-}
+void
+sshsigdie(const char *file, const char *func, int line, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	sshlogv(file, func, line, 0, SYSLOG_LEVEL_FATAL, fmt, args);
+	va_end(args);
+	_exit(1);
+}

DO_LOG_SAFE_IN_SIGHAND isn’t defined on glibc systems, so before the refactor the original sigdie() did a direct _exit(1) and never touched syslog. After it, the new sshsigdie() unconditionally calls sshlogv → do_log → syslog. The guard is gone. On glibc, every SIGALRM timeout tries to log from the handler.

Four years later, Bharat Jogi and the Qualys team audit signal handler code after an internal project on historical bug classes and find the protection has been lost. They reproduce the attack. They report it. Damien Miller and Theo de Raadt confirm; the patch ships.

Timeline in a table:

Date	Event
2006	CVE-2006-5051 reported. Fix with `_exit(14)` and macro `DO_LOG_SAFE_IN_SIGHAND`
2006	OpenSSH 4.4p1 ships the patch
Oct 2020	Commit `752250c` in OpenSSH 8.5p1 removes the guard. Unreported regression
2020 – 2024	Bug present, unreported
May 2024	Qualys identifies the regression during audit
1 Jul 2024	Qualys publishes advisory; OpenSSH announces imminent fix
6 Jul 2024	OpenSSH 9.8p1 with the fix

Affected versions are OpenSSH 8.5p1 to 9.7p1 on glibc systems. 4.4p1 through 8.4p1 include the original patch. Before 4.4p1, also vulnerable.

Real exploitability, no headlines

The Qualys advisory is honest about the conditions:

glibc. The bug exists because glibc’s syslog() calls malloc(). In musl, in bionic, in OpenBSD libc, syslog() is implemented differently and doesn’t touch the heap. musl-based distros (Alpine without glibc) aren’t exploitable via this route.
i386 easier than amd64. On 64-bit, ASLR entropy is much larger and heap position less predictable. Qualys reports a stable exploit on 32-bit; on amd64 the initial advisory describes it as “harder” without full demos.
MaxStartups and LoginGraceTime. For the race to fire, you need to open many connections and let each one reach 120 seconds without authenticating. The default MaxStartups 10:30:100 allows 10 concurrent before starting to drop; with those parameters and Qualys’s model, a successful exploitation takes about 6 to 8 hours of sustained traffic against the server. Setting LoginGraceTime to 0 disables the vulnerable handler entirely — it’s the official mitigation for those who can’t patch immediately.
Success rate. ~10,000 attempts per successful exploitation per the model. Each attempt can take seconds or minutes depending on load.

In practice, a public SSH server with logging enabled and monitored detects the attack within the first half hour. Ten thousand failed auth attempts against a host is noise any reasonable SIEM jumps on. In-the-wild exploitations after publication have been limited, based on what’s been made public, to internal networks with low monitoring or exposed servers with unreviewed logging.

This isn’t to minimise it: it’s pre-auth RCE as root in sshd and that is serious. But the 1 July panic (“everyone is going to fall today”) doesn’t materialise. What materialises is the legitimate patch fast wave and the rediscovery that LoginGraceTime 0 exists.

Lab: reproduce the setup without reproducing the exploit

To confirm the affected version and the signal handler behaviour without building the full exploit, a Docker is enough:

# Dockerfile — vulnerable sshd, no exploit
FROM debian:bookworm-slim
RUN apt-get update && \
    apt-get install -y openssh-server=1:9.6p1-* && \
    mkdir /run/sshd
RUN useradd -m -s /bin/bash lab && echo 'lab:lab' | chpasswd
RUN sed -i 's/^#LoginGraceTime.*/LoginGraceTime 30/' /etc/ssh/sshd_config && \
    sed -i 's/^#LogLevel.*/LogLevel DEBUG2/' /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D", "-e"]

LoginGraceTime 30 speeds up the test. Build and run the container:

docker build -t sshd-vuln . && docker run -d --name sshd-vuln -p 2222:22 sshd-vuln
docker exec sshd-vuln sshd -V 2>&1
# OpenSSH_9.6p1 Debian-...

Connect and leave the session half-open (without sending username) for 30 seconds. In docker logs output:

sshd[24]: Timeout, client not responding from user-not-yet-authenticated
sshd[24]: fatal: Timeout before authentication for ::ffff:172.17.0.1 port ...

That fatal: is printed by the sigdie() calling syslog() from the handler. That’s the bug. The race between that moment and a concurrent malloc() is what the exploit leverages.

Apply the official mitigation without patching:

docker exec sshd-vuln sed -i 's/^LoginGraceTime.*/LoginGraceTime 0/' /etc/ssh/sshd_config
docker exec sshd-vuln pkill -HUP sshd

With LoginGraceTime 0 no SIGALRM is delivered, sigdie() isn’t invoked by timeout, race closed.

Lessons from the bug

A lost _exit(14) in a refactor takes four years to be exploited. The code review that let the 2020 commit through was reasonable on a human level: the refactor is legitimate, tests passed. What was missing was a specific test for signal-handler-safety. The bug classes that get reintroduced are the ones no test covers because “it was already fixed”.
man 7 signal-safety exists and no one reads it. The list of async-signal-safe functions is short and known; syslog() isn’t on it, malloc() either. Any signal handler audit that had looked at the list would have caught the regression in 2020.
CVSS without context misleads. “Pre-auth RCE as root” in the abstract is CVSS 9.8+. With the glibc + x86 + 10,000 attempts + hours constraints, NVD gives it 8.1 (AC:H, attack complexity high). That H matters for prioritisation.
OpenBSD benefits from its own paranoia. The bug doesn’t reproduce on OpenBSD because its syslog() doesn’t touch the heap. The difference is that OpenBSD invests in audit lockdown of the standard C environment; the regression would likely have been caught there in its own CI before reaching a release.

Mitigations, in order

Patch to OpenSSH 9.8p1+. Distros publish backports the same 1 July:

Platform	Advisory	Status
Debian 12 (bookworm)	DSA-5724-1	patched
Ubuntu 22.04 / 24.04	USN-6859-1	patched
RHEL 9	RHSA-2024:4312	patched
RHEL 7 / 8	—	not affected (OpenSSH < 8.5p1)
SUSE / openSUSE Leap 15.6	SUSE-SU-2024:2304-1	patched
FreeBSD 13.x / 14.x	FreeBSD-SA-24:11.openssh	patched
Alpine	—	not affected (musl libc)
OpenBSD	—	not affected

LoginGraceTime 0 as a bridge while patching isn’t possible. Disables the vulnerable handler. Downside: connections opened without authenticating stay open indefinitely — pair it with a firewall that limits concurrent connections per IP.
Audit logs. The attack is noisy. Timeout before authentication repeated massively from one IP is the signature.
If the distro uses musl (Alpine without glibc, for example), the bug doesn’t apply. But patching anyway costs nothing and is good hygiene.

Quick detection

Local version compared against the vulnerable range (8.5p1 to 9.7p1):

ssh -V
# OpenSSH_8.5p1 to OpenSSH_9.7p1, on glibc  → vulnerable
# OpenSSH_8.4p1 or earlier                  → not affected by this CVE
# OpenSSH_9.8p1 or later                    → patched

Top 20 IPs with SSH pre-auth timeouts in the last 24h (system with systemd):

journalctl -u ssh --since "1 day ago" \
  | grep "Timeout before authentication" \
  | grep -oE 'from [0-9.]+' | awk '{print $2}' \
  | sort | uniq -c | sort -rn | head -20

Detection pattern, Sigma sketch:

title: Possible regreSSHion (CVE-2024-6387) probing
logsource:
  service: sshd
detection:
  selection:
    Message|contains: 'Timeout before authentication'
  condition: selection | count(src_ip) > 20 within 10m
falsepositives:
  - clients with unstable networks
  - authenticated scanners with low MaxStartups
level: medium

Different from a normal brute-force: here no correlated Failed password or Failed publickey shows up, only repeated timeouts. It’s the signature of an exploit that opens sessions and lets them time out until it hits the race.