Skip to content

Darwin: fix sockaddr_un sun_len causing silent WRITE_START drops#789

Open
jverkoey wants to merge 1 commit intodrolbr:masterfrom
ClutchEngineering:pr-darwin-sun-len
Open

Darwin: fix sockaddr_un sun_len causing silent WRITE_START drops#789
jverkoey wants to merge 1 commit intodrolbr:masterfrom
ClutchEngineering:pr-darwin-sun-len

Conversation

@jverkoey
Copy link
Copy Markdown

@jverkoey jverkoey commented Apr 19, 2026

On Darwin struct sockaddr_un has a sun_len field that the kernel reads as the total length of the sockaddr struct currently in use (the byte that precedes sun_family on BSD-derived socket APIs), not the length of the path. The current value socket_name.size() + 1 is 1 byte short — the Darwin layout is:

  uint8_t     sun_len;     // 1 byte
  sa_family_t sun_family;  // 1 byte
  char        sun_path[];  // strlen(path) bytes

so the portable SUN_LEN macro (defined in sys/un.h on Darwin, *BSD, and Linux) expands to strlen(path) + 2.

Symptom

update_from_dir's Dispatcher_Client::write_start() blocks in select() indefinitely waiting for a WRITE_START ack from the dispatcher. The dispatcher never logs receiving the request — the kernel silently drops the message because the sockaddr length header is inconsistent. Reads via the shm + query connections still work because the read protocol does less work on the sockaddr and the off-by-one isn't fatal there; only writes trigger the breakage.

Fix

Use the portable SUN_LEN(&local) macro instead of a hand-computed length. Header definition:

SUN_LEN(su) = sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path)

On Darwin that expands to strlen(path) + 2, matching what the kernel expects.

Reproducer

Native build of osm-3s on Apple Silicon (macOS 26). Dispatcher starts, serves read queries fine. Any invocation of update_from_dir (including via apply_osc_to_db.sh) hangs on startup. Dispatcher's database.log shows no write_start of process N line for the caller.

Verified fixed by applying this patch: update_from_dir completes cleanly and the dispatcher immediately logs write_start of process N on the next attempt.

Companion PRs: #788 (off64_t alias), #790 (Mmap → pread).

The AF_UNIX sun_len field on Darwin is the *total sockaddr struct*
length used, not the path length. The current value
'socket_name.size() + 1' is still 1 byte short -- on Darwin the
struct layout is:

  uint8_t     sun_len;     // 1 byte
  sa_family_t sun_family;  // 1 byte
  char        sun_path[];  // strlen(path) bytes

so the portable SUN_LEN macro expands to strlen(path) + 2.

Symptom of the off-by-one: update_from_dir's
Dispatcher_Client::write_start() would block in select() forever
waiting for a WRITE_START ack from the dispatcher. The dispatcher
never logged receiving the request -- the kernel silently dropped
the message because the sockaddr length header was inconsistent.
Reads via the shm + query connections worked because the read
protocol does less work on the sockaddr and the mismatch wasn't
fatal there.

Use the portable SUN_LEN(&local) macro:
  sizeof(sockaddr_un) - sizeof(sun_path) + strlen(sun_path)

On Darwin that expands to strlen(path) + 2, matching what the kernel
expects. SUN_LEN is defined in sys/un.h on Darwin, *BSD, and Linux.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant