Skip to content

Kernel panic in ext4_es_scan / kswapd freezes entire VM (linuxkit 6.12.76 / Docker Desktop 4.73.0) #7877

@nosenuggetz

Description

@nosenuggetz

Kernel panic in ext4_es_scan / kswapd freezes entire VM (linuxkit 6.12.76 / Docker Desktop 4.73.0)

Summary

The Linux VM inside Docker Desktop panicked during memory reclaim, freezing every container and rendering the Docker daemon completely unresponsive from the host. The macOS-side com.docker.backend process remained running but every API request returned HTTP 500 Internal Server Error because the VM at 192.168.65.7:2376 was no route to host. Only a full Docker Desktop restart recovered the system.

Root frame: rb_erase invoked from ext4_es_scan via kswapd ‚Üí null-pointer deref ‚Üí kernel panic.

Environment

Docker Desktop 4.73.0
Docker Engine 29.4.3
Linux kernel (VM) 6.12.76-linuxkit #1 SMP PREEMPT_DYNAMIC Thu Apr 30 11:25:59 UTC 2026 x86_64
macOS 14.8.2 (23J126) Sonoma
Host CPU Intel Xeon E5-1650 v3 @ 3.50 GHz (12 cores)
Host RAM 64 GiB
VM allocation at time of crash 8192 MiB (default) / 2 CPUs / 1 GiB swap
Virtualisation backend Apple Virtualization framework
Loaded modules at panic shiftfs(O) rosetta(O) grpcfuse(O) fakeowner(O) selfowner(O) vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock

Workload at time of panic

Long-running container stack hosting two WordPress sites behind a Cloudflare tunnel:

  • 2√ó mysql:8.0
  • 2√ó wordpress:php8.3-apache (with WP-cron sidecar)
  • 1√ó redis:alpine
  • 1√ó nginx:alpine
  • 1√ó cloudflare/cloudflared:latest

Total RSS across containers at the time: ~1.6 GiB. Free memory was not exhausted — this is not an OOM, it's a race / null deref in the EXT4 extent-status reclaim path under normal reclaim pressure.

Timing

  • VM start: 2026-05-14 21:38:52 UTC
  • VM uptime at panic: 156104.95 s (‚âà 43.4 hours)
  • VM totally unreachable from host until manual docker desktop restart.

Panic trace (verbatim from VM console log)

[156104.949323] BUG: unable to handle page fault for address: 0000000088468846
[156104.949401] #PF: supervisor write access in kernel mode
[156104.949500] #PF: error_code(0x0002) - not-present page
[156104.949550] PGD 800000013b695067 P4D 800000013b695067 PUD 0
[156104.949628] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
[156104.949672] CPU: 4 UID: 0 PID: 114 Comm: kswapd0 Tainted: G           O       6.12.76-linuxkit #1
[156104.949710] Tainted: [O]=OOT_MODULE
[156104.949741] RIP: 0010:rb_erase+0x2a2/0x380
[156104.949777] Code: 89 16 48 8b 11 48 89 10 48 89 01 48 83 fa 03 76 6a 48 83 e2 fc 48 3b 4a 10 74 2f 48 89 42 08 48 89 f0 e9 92 fe ff ff 48 8b 07 <48> 89 02 48 83 f8 03 76 1d 48 83 e0 fc 48 3b 78 10 0f 84 ac 00 00
[156104.949847] RSP: 0018:ffff8d58c10f7a18 EFLAGS: 00010246
[156104.950004] RAX: 0000000000000001 RBX: ffff8d58c10f7abc RCX: 0000000000000001
[156104.950054] RDX: 0000000088468846 RSI: 0000000000000000 RDI: ffff8d57fe0a2d20
[156104.950119] RBP: 00000000ffffffff R08: ffff8d59855ccc48 R09: ffffffff994fc953
[156104.950176] R10: 000000000066005c R11: 0000000000000000 R12: ffff8d58c10f7a6c
[156104.950226] R13: ffff8d59855cc880 R14: 0000000000000000 R15: ffff8d57fe0a2d20
[156104.950263] FS:  0000000000000000(0000) GS:ffff8d59f7b00000(0000) knlGS:0000000000000000
[156104.950328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[156104.950366] CR2: 0000000088468846 CR3: 000000011197a006 CR4: 00000000001706b0
[156104.950391] Call Trace:
[156104.950411]  <TASK>
[156104.950444]  es_do_reclaim_extents+0xa6/0xf0
[156104.950531]  es_reclaim_extents+0x5c/0xf0
[156104.950575]  ext4_es_scan+0xa6/0x3c0
[156104.950615]  do_shrink_slab+0x13d/0x340
[156104.950659]  shrink_slab+0xd8/0x3a0
[156104.950698]  ? try_to_shrink_lruvec+0x22d/0x320
[156104.950747]  shrink_one+0x121/0x1f0
[156104.950782]  shrink_node+0xa52/0xbe0
[156104.950810]  balance_pgdat+0x455/0x920
[156104.950848]  ? hrtimer_try_to_cancel.part.0+0x52/0x100
[156104.950891]  ? dequeue_entities+0x2e8/0x6a0
[156104.950933]  kswapd+0x1f8/0x3b0
[156104.950964]  ? __pfx_autoremove_wake_function+0x10/0x10
[156104.950996]  ? __pfx_kswapd+0x10/0x10
[156104.951023]  kthread+0xd2/0x100
[156104.951045]  ? __pfx_kthread+0x10/0x10
[156104.951068]  ret_from_fork+0x34/0x50
[156104.951104]  ? __pfx_kthread+0x10/0x10
[156104.951138]  ret_from_fork_asm+0x1a/0x30
[156104.951172]  </TASK>
[156104.951210] Modules linked in: shiftfs(O) rosetta(O) grpcfuse(O) fakeowner(O) selfowner(O) vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common vsock
[156104.951257] CR2: 0000000088468846
[156104.951288] ---[ end trace 0000000000000000 ]---
[156104.951876] Kernel panic - not syncing: Fatal exception
[156104.952759] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[156104.952822] ---[ end Kernel panic - not syncing: Fatal exception ]---

Full panic file (~233 KB) available on request — paths below.

Host-side symptom

com.docker.backend continued running but every API call routed to the VM failed:

[2026-05-16T17:26:13.419775000Z][com.docker.backend.apiproxy] still dialing 192.168.65.7:2376 after 1.000760104s: connect tcp 192.168.65.7:2376: no route to host
[...]
{"component":"apiproxy","level":"info","msg":"<< GET /containers/json?all=true&filters= Internal Server Error: context deadline exceeded (10.000129545s)"}
{"component":"apiproxy","level":"info","msg":"<< GET /networks Internal Server Error: context deadline exceeded (10.000234719s)"}

docker ps, docker info, docker version (server) all returned HTTP 500 Internal Server Error for API route….

Reproducer

Not deterministic. Pattern observed: after ~43 hours of normal VM uptime running the workload described above. Memory was not exhausted at the time of panic; this appears to be a race in ext4_es_scan's red-black tree traversal during normal kswapd reclaim activity.

A similar code-path crash class (null deref in rb_erase reached via es_do_reclaim_extents) has been reported on mainline kernels — appears to be a pre-existing EXT4 extent-status shrinker race, surfaced under the memory pressure profile that LinuxKit's default 8 GiB allocation creates when running a multi-container persistent workload.

What I tried

  1. Manual recovery (only option once panicked): pkill -9 -f "Docker Desktop\|com.docker.backend" then open -a "Docker Desktop". Daemon back in ~30 s, all containers restarted cleanly.
  2. Mitigations applied locally to reduce panic odds:
    • Bumped VM MemoryMiB 8192 ‚Üí 16384 (host has 64 GiB free) to reduce kswapd activity.
    • Installed a launchd watchdog that probes the daemon every 2 min and restarts Docker Desktop if unreachable, so a future panic auto-recovers in ~3 min instead of waiting for me to notice.

Neither fixes the kernel bug. Filing this so your team has the panic trace.

Diagnostic files available

If your team would like the raw bundle, the following files are still present:

File Contents
~/Library/Containers/com.docker.docker/Data/log/vm/console.log.20260517-112856.549 VM kernel/console log containing the panic
~/Library/Containers/com.docker.docker/Data/log/host/com.docker.backend.log.20260517-032716.530 Host-side backend log from the unreachable period
~/Library/Containers/com.docker.docker/Data/log/host/monitor.log.20260517-033122.650 Host monitor log over the failure window
~/Library/Containers/com.docker.docker/Data/log/host/com.docker.virtualization.log VM lifecycle / resource allocation log

Happy to upload a full docker desktop diagnose bundle on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions