Skip to content

server: unlink unix socket file on SIGTERM/SIGINT and process.exit#29168

Open
robobun wants to merge 12 commits intomainfrom
farm/57acaa15/unix-socket-signal-cleanup
Open

server: unlink unix socket file on SIGTERM/SIGINT and process.exit#29168
robobun wants to merge 12 commits intomainfrom
farm/57acaa15/unix-socket-signal-cleanup

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented Apr 11, 2026

Repro

// bun /tmp/EADDRINUSE.js
Bun.serve({
  unix: '/tmp/EADDRINUSE.sock',
  fetch: _ => new Response('Hello'),
});

Run once, Ctrl+C / kill %1, run again → EADDRINUSE: address already in use.

Cause

PR #28798 made clean server.stop() / listener.stop() unlink the socket file, matching Node.js/libuv semantics. But the unlink only runs on the clean-stop path. If the process is killed by SIGINT/SIGTERM, or exits without calling .stop() (the common case — Bun.serve keeps the loop alive indefinitely, so Ctrl+C is how you stop it), the socket file is left on disk and the next run EADDRINUSEs.

Fix

New src/bun.js/api/server/UnixSocketCleanup.zig — a process-global wait-free linked list of live unix-listener paths. Registered in onListen, cleared on clean stop. Walked from:

  • an atexit / at_quick_exit hook (covers process.exit() and implicit end-of-script termination).
  • the SIGTERM / SIGINT handler in c-bindings.cpp, now installed unconditionally (previously only if stdio was a TTY). unlink(2) is on the POSIX async-signal-safe list, and the registry uses no allocation or locking on the walk path, so calling it from the signal handler is safe.

Abstract sockets (Linux, path[0] == 0) and Windows named pipes are skipped — they have no filesystem entry.

Verification

New regression test test/regression/issue/29166.test.ts spawns a child running the user's exact script, kills it with SIGTERM / SIGINT, and asserts the socket file is gone. Also covers Bun.listen and the process.exit(0) path. Before this commit all four fail; after, all four pass. Existing test/js/bun/net/unix-socket-unlink.test.ts (9 tests) still pass.

Fixes #29166.

PR #28798 made `server.stop()` and `listener.stop()` remove the unix
socket file, matching Node.js/libuv semantics. But it only unlinks on
clean shutdown — if the process is killed by SIGINT/SIGTERM or exits
without calling `.stop()` (the common case for `Bun.serve({ unix })`),
the file is left on disk and the next run fails with `EADDRINUSE`.

Adds a process-global registry (`UnixSocketCleanup`) tracking the paths
of live unix listeners. Registered in `onListen` and cleared on a clean
stop. Walked from:

  - an `atexit` / `at_quick_exit` hook, covering `process.exit` and
    implicit end-of-script termination
  - the SIGTERM / SIGINT handler in `c-bindings.cpp`, now installed
    unconditionally (previously only on TTY stdio). `unlink(2)` is on
    the POSIX async-signal-safe list, and the registry walks a wait-free
    linked list with no allocation or locking, so it is safe to call
    from the signal handler.

Abstract sockets (Linux, `path[0] == 0`) and Windows named pipes are
skipped — they have no filesystem entry.

Fixes #29166.
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

Updated 9:17 PM PT - Apr 11th, 2026

@robobun, your commit d78ce3f has 2 failures in Build #45231 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29168

That installs a local version of the PR into your bun-29168 executable, so you can run:

bun-29168 --bun

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a process-global, lock-free Unix-domain socket registry and wires it into Listener and Server lifecycles; registers cleanup on normal exit and POSIX signals via C bindings. Includes regression tests verifying unix-socket file unlinking after abrupt termination.

Changes

Cohort / File(s) Summary
Unix Socket Cleanup Registry
src/bun.js/api/server/UnixSocketCleanup.zig
New module implementing an append-only, atomic registry of unix socket paths with pub const Handle, pub fn register(path) ?Handle, pub fn unregister(handle), pub fn cleanupAll(), and exported Bun__cleanupUnixSocketPaths(); performs best-effort unlink of registered paths and tombstones nodes when handled.
Server integration
src/bun.js/api/server.zig
Imports UnixSocketCleanup; adds #unix_cleanup_handle: ?UnixSocketCleanup.Handle; on .unix in onListen() registers path and stores handle; in stopListening() unregisters and clears handle before unlinking.
Listener integration
src/bun.js/api/bun/socket/Listener.zig
Adds private field #unix_cleanup_handle: ?UnixSocketCleanup.Handle = null, local alias UnixSocketCleanup; listen() registers .unix paths and saves handle; unlinkUnixSocketPath changed to unlinkUnixSocketPath(this: *Listener) void and now unregisters/clears handle before performing unlink logic.
C bindings / process exit
src/bun.js/bindings/c-bindings.cpp
Adds extern "C" void Bun__cleanupUnixSocketPaths(); and extern "C" void Bun__atexit(void (*func)(void));; registers Bun__cleanupUnixSocketPaths with atexit() on non-Windows and invokes it from the POSIX signal handler; makes signal-handler installation unconditional.
Tests
test/regression/issue/29166.test.ts
Adds regression tests (skipped on Windows) that spawn child Bun processes using Bun.serve/Bun.listen with unix sockets, signal or exit the child, and assert the socket file is removed and reruns succeed without EADDRINUSE.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: ensuring unix socket files are unlinked on SIGTERM/SIGINT and process.exit, which is the core functionality added in this PR.
Description check ✅ Passed The description provides a clear problem statement (Repro), root cause analysis, the solution approach, and verification steps. All key sections are well-documented and directly address the issue.
Linked Issues check ✅ Passed The PR fully addresses the requirement from issue #29166: it implements automatic cleanup of unix socket files on process exit via SIGINT/SIGTERM/process.exit(), resolving the EADDRINUSE error on subsequent runs.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing unix socket cleanup on abnormal termination and process exit. No extraneous modifications were introduced outside the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bun.js/api/server.zig (1)

1557-1563: ⚠️ Potential issue | 🟠 Major

Unregister the cleanup handle before unlinking the socket path.

This ordering leaves a race where SIGINT/SIGTERM cleanup can still see the node as live after the file is removed. If another process binds the same path in that gap, the later cleanup walk can unlink the new socket file.

Suggested fix
             if (this.config.address == .unix) {
-                const path = this.config.address.unix;
-                if (path.len > 0 and path[0] != 0) {
-                    _ = bun.sys.unlink(path);
-                }
                 UnixSocketCleanup.unregister(this.unix_cleanup_handle);
                 this.unix_cleanup_handle = null;
+                const path = this.config.address.unix;
+                if (path.len > 0 and path[0] != 0) {
+                    _ = bun.sys.unlink(path);
+                }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/api/server.zig` around lines 1557 - 1563, The cleanup currently
unlinks the UNIX socket path then calls UnixSocketCleanup.unregister, which can
race with signal handlers; change the order so you call
UnixSocketCleanup.unregister(this.unix_cleanup_handle) and set
this.unix_cleanup_handle = null before invoking bun.sys.unlink(path) (use
this.config.address.unix, UnixSocketCleanup.unregister,
this.unix_cleanup_handle, and bun.sys.unlink in that function) to ensure the
cleanup handle is removed prior to removing the socket file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/api/server/UnixSocketCleanup.zig`:
- Around line 38-45: The register function currently swallows allocator failures
by returning null from both bun.default_allocator.create(Node) and
bun.default_allocator.dupeZ(u8, path), making OOM indistinguishable from an
intentional “no filesystem entry” case; change both catch handlers to call
bun.handleOom() when the error is error.OutOfMemory and only treat other errors
normally (cleanup node when dupeZ fails and then return null). Specifically,
update the create(Node) catch to detect error.OutOfMemory -> bun.handleOom(),
otherwise return null, and update the dupeZ(u8, path) catch to detect
error.OutOfMemory -> bun.handleOom(), else destroy the created Node and return
null so non-OOM failures remain recoverable while OOM aborts.

In `@test/regression/issue/29166.test.ts`:
- Around line 108-114: The test should not use a broad stderr assertion and
should assert the socket disappearance before the exit code; in the block
dealing with proc, remove or narrow the generic
expect(stderr).not.toContain("Error"), then reorder the assertions so you first
assert expect(existsSync(sock)).toBe(false) (using the existing sock and
existsSync), then assert expect(exitCode).toBe(0) (using exitCode from
proc.exited); keep proc.stderr.text() available only if you add a targeted
stderr check later.

---

Outside diff comments:
In `@src/bun.js/api/server.zig`:
- Around line 1557-1563: The cleanup currently unlinks the UNIX socket path then
calls UnixSocketCleanup.unregister, which can race with signal handlers; change
the order so you call UnixSocketCleanup.unregister(this.unix_cleanup_handle) and
set this.unix_cleanup_handle = null before invoking bun.sys.unlink(path) (use
this.config.address.unix, UnixSocketCleanup.unregister,
this.unix_cleanup_handle, and bun.sys.unlink in that function) to ensure the
cleanup handle is removed prior to removing the socket file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd17ae54-422e-437c-83ce-75f0dd9ff1a8

📥 Commits

Reviewing files that changed from the base of the PR and between 4b1d889 and 75b58b4.

📒 Files selected for processing (5)
  • src/bun.js/api/bun/socket/Listener.zig
  • src/bun.js/api/server.zig
  • src/bun.js/api/server/UnixSocketCleanup.zig
  • src/bun.js/bindings/c-bindings.cpp
  • test/regression/issue/29166.test.ts

Comment thread src/bun.js/api/server/UnixSocketCleanup.zig Outdated
Comment thread test/regression/issue/29166.test.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/api/bun/socket/Listener.zig`:
- Around line 17-20: Rename the Listener field unix_cleanup_handle to a private
field by adding the '#' prefix (i.e., change unix_cleanup_handle:
?UnixSocketCleanup.Handle = null to `#unix_cleanup_handle`:
?UnixSocketCleanup.Handle = null) and update all usages and assignments of that
field to use the private-field syntax (replace references like
this.unix_cleanup_handle with this.#unix_cleanup_handle) to match the project's
Zig private-field convention for Listener and UnixSocketCleanup.Handle.
- Around line 324-329: After bind succeeds in listenUnix(), register the unix
socket cleanup immediately (call UnixSocketCleanup.register with the bound path)
and pass that handle into the Listener during its initialization instead of
setting this.unix_cleanup_handle after initialization; update Listener
construction/initializer to accept an unix_cleanup_handle parameter and store it
(replacing the post-return assignment of unix_cleanup_handle), so the cleanup
handle is captured before the post-bind signal window where SIGINT/SIGTERM could
otherwise miss registering the socket.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7677f428-e9fa-498e-ab37-68d338ca9904

📥 Commits

Reviewing files that changed from the base of the PR and between 75b58b4 and a5818e1.

📒 Files selected for processing (1)
  • src/bun.js/api/bun/socket/Listener.zig

Comment thread src/bun.js/api/bun/socket/Listener.zig
Comment thread src/bun.js/api/bun/socket/Listener.zig Outdated
…anup

- UnixSocketCleanup.register: use bun.handleOom instead of catching and
  returning null. Allocator failures here are OOM-only, and silently
  starting a listener missing from the registry would be indistinguishable
  from the intentional abstract-socket path.

- Server / Listener: rename unix_cleanup_handle to #unix_cleanup_handle
  (private Zig field convention for new fields in src/**/*.zig).

- Listener.listen / Server.onListen: register the path as early as
  possible after the bind succeeds. A signal delivered between bind
  returning and the register call would otherwise leave a stale socket
  file on disk with no registry entry. In Listener, the register now
  happens before the Listener struct is built and the handle is threaded
  into the initial struct literal. In Server, the register is now the
  first statement inside onListen after the null-check.

- Server.stopListening: already reordered to tombstone before unlink so
  a signal landing mid-call can't re-unlink a successor's socket.

- test/29166: drop the broad stderr.not.toContain('Error') assertion —
  it would trip on unrelated subprocess diagnostics. Assert socket file
  is gone first, exit code last.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

No merge conflicts detected when merging into main.

Your branch is good to go!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/api/server.zig`:
- Around line 1821-1828: The unix-socket cleanup registration in Bun.serve is
happening too late (after onListen()); move the UnixSocketCleanup.register call
out of ThisServer or the post-bind onListen callback and into the lower-level
listen wrapper that performs the successful unix bind (mirror the pattern used
by Listener.zig), so the registry entry is created immediately after bind
succeeds; then propagate the returned cleanup handle into ThisServer (e.g., via
constructor or a setter) and remove the late register call in ThisServer so the
socket path cannot be missed by a SIGINT/SIGTERM between bind and onListen.

In `@src/bun.js/api/server/UnixSocketCleanup.zig`:
- Around line 82-92: The cleanupAll() function (invoked by
Bun__cleanupUnixSocketPaths() in the signal handler) must not call
bun.sys.unlink() because that function logs and is not async-signal-safe;
replace the call to bun.sys.unlink(node.path) with a signal-safe unlink variant
(either call the raw POSIX unlink(2) via a C-import wrapper or introduce a
dedicated no-log helper in src/sys.zig that calls unlink without any
logging/alloc/stdio), keeping the rest of the loop logic (head, node.next,
node.tombstoned.store(true)) unchanged so the tombstone flip remains a no-op on
repeated invocations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 45b020b3-c4b9-4743-840f-b6d602b55188

📥 Commits

Reviewing files that changed from the base of the PR and between a5818e1 and 16404c7.

📒 Files selected for processing (4)
  • src/bun.js/api/bun/socket/Listener.zig
  • src/bun.js/api/server.zig
  • src/bun.js/api/server/UnixSocketCleanup.zig
  • test/regression/issue/29166.test.ts

Comment thread src/bun.js/api/server.zig
Comment thread src/bun.js/api/server/UnixSocketCleanup.zig Outdated
Comment thread src/bun.js/api/server/UnixSocketCleanup.zig Outdated
robobun added 2 commits April 11, 2026 12:44
The previous review commit reordered stopListening() to call unregister
BEFORE unlink (to close the signal-vs-unlink race), but the docstrings
on `unregister` and the module-level comment still described the old
order. A future maintainer trusting the docstring could swap the call
sites back and reintroduce the race.

- Module doc: `unregister` does NOT call `unlink`. Note that the caller
  owns the `bun.sys.unlink` call and must do it AFTER `unregister`.
- `unregister` docstring: call-order is load-bearing; explain why.
bun.sys.unlink logs via a scoped logger that holds a mutex and writes
to stdio, which is not async-signal-safe. On debug builds where logs
are enabled, a signal delivered while the main thread holds the logger
mutex (or is mid-flush) would deadlock inside cleanupAll().

Switch the signal-handler path to std.posix.system.unlink, which is
a direct libc call on Linux/macOS and is on the POSIX async-signal-safe
list. The behavior is otherwise identical — best-effort, no retry on
EINTR (the process is exiting anyway).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/bun.js/api/server/UnixSocketCleanup.zig (1)

91-103: ⚠️ Potential issue | 🟠 Major

bun.sys.unlink() may not be async-signal-safe.

cleanupAll() is called from Bun__cleanupUnixSocketPaths() which runs inside the SIGINT/SIGTERM handler (per c-bindings.cpp:435-445). The comment in C++ assumes raw unlink(2) is used, but bun.sys.unlink() may perform logging that is not async-signal-safe, potentially causing deadlock or re-entrancy issues in stdio/allocator state.

Consider using a direct syscall wrapper that bypasses any logging:

// Example: use std.posix.unlink directly for signal safety
_ = std.posix.unlink(node.path);

Or introduce a dedicated no-log helper in src/sys.zig for signal-handler contexts.

#!/bin/bash
# Description: Check if bun.sys.unlink has logging that would make it async-signal-unsafe
# Expected: Find logging calls (Output, log, print) in the unlink implementation

rg -n -A 10 -B 2 'pub fn unlink' src/sys.zig | head -50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/api/server/UnixSocketCleanup.zig` around lines 91 - 103,
cleanupAll() is calling bun.sys.unlink() from a signal handler (via
Bun__cleanupUnixSocketPaths) which may perform non-async-signal-safe work
(logging/allocations); replace the call with a direct async-signal-safe unlink
(e.g., call std.posix.unlink or a new low-level syscall wrapper) or add a
dedicated no-log unlink helper in src/sys.zig and call that from cleanupAll() to
avoid any logging/stdio/allocator usage; ensure you reference cleanupAll,
bun.sys.unlink, and Bun__cleanupUnixSocketPaths when making the change so the
signal-path uses the new safe helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/bun.js/api/server/UnixSocketCleanup.zig`:
- Around line 91-103: cleanupAll() is calling bun.sys.unlink() from a signal
handler (via Bun__cleanupUnixSocketPaths) which may perform
non-async-signal-safe work (logging/allocations); replace the call with a direct
async-signal-safe unlink (e.g., call std.posix.unlink or a new low-level syscall
wrapper) or add a dedicated no-log unlink helper in src/sys.zig and call that
from cleanupAll() to avoid any logging/stdio/allocator usage; ensure you
reference cleanupAll, bun.sys.unlink, and Bun__cleanupUnixSocketPaths when
making the change so the signal-path uses the new safe helper.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1b6cf0dd-c397-4eda-85b9-813113e0e91b

📥 Commits

Reviewing files that changed from the base of the PR and between 16404c7 and 66ef866.

📒 Files selected for processing (1)
  • src/bun.js/api/server/UnixSocketCleanup.zig

Comment thread src/bun.js/api/server.zig
Comment thread src/bun.js/api/server/UnixSocketCleanup.zig
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

The darwin-x64 build-bun link step failed with exit -1 (runner killed), but both build-cpp and build-zig passed on the same commit and same target, and darwin-aarch64-build-bun linked fine in 2m40s. Looks like infra flake on the darwin-x64 runner. Pushing an empty commit to retry.

Comment thread src/bun.js/bindings/c-bindings.cpp
Comment thread src/bun.js/api/server/UnixSocketCleanup.zig
robobun added 2 commits April 11, 2026 13:19
Two review fixes:

- Global.Bun__atexit: replace `catch {}` with `bun.handleOom()`. The
  caller is `void` with no signal channel, so silently swallowing OOM
  here would drop the cleanup callback registered for #29166 and
  reintroduce the stale-socket bug on `process.exit()` paths.

- UnixSocketCleanup.cleanupAll: replace the load-then-store tombstone
  guard with `cmpxchgStrong(false, true, .acq_rel, .acquire)`. POSIX
  doesn't block signals during atexit callbacks, so the SIGTERM/SIGINT
  handler can preempt an atexit walk mid-iteration and both paths then
  race to call `unlink(2)` on the same node. The CAS means only one
  caller per node wins and issues the unlink; the other skips. The
  double-unlink outcome was benign in the typical case (ENOENT
  swallowed) but could theoretically remove another process's
  newly-bound socket in the tight window between the two unlinks.
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

CI build-cpp is flaking on different arches each run (1m33s failures on darwin-aarch64 / linux-aarch64, while linux-x64-build-cpp passes cleanly in 7m53s on the same commit). Latest commit is an empty retry — code didn't change between 443be66 and 1782442. Looks like dep-fetch / runner setup flake, not a code issue. Not spamming more retry commits; the gate and human review will catch anything real.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

Build #45108: all 36 build jobs passed, 15/17 test jobs passed. The only two failures are darwin-13-aarch64-test-bun and darwin-14-aarch64-test-bun with status Expired — buildkite queue timeout waiting for a macOS aarch64 agent to pick up the job, not a test failure. Every test lane that got a runner (including darwin-13/14 x64, all linux/windows aarch64 lanes) passed. Not pushing further retry commits; this needs more darwin-aarch64 test capacity, not a code change.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all previously raised concerns (async-signal-safety, atomic tombstone CAS, Bun__atexit OOM handling, docstring accuracy) have been addressed in subsequent commits.

Extended reasoning...

Overview

This PR adds a process-global, wait-free linked list () tracking unix socket paths owned by /, and wires cleanup into the SIGTERM/SIGINT signal handler and the / hook. It touches , , , , and adds a new module and regression test.

Security Risks

No security-sensitive code is touched. The signal handler uses only async-signal-safe operations (raw via , a wait-free list walk with no allocation or locking). The cleanup is best-effort and only removes socket files this process owns.

Level of Scrutiny

This is production-quality systems code touching signal handlers and process lifecycle, which warrants careful review. However, the implementation is well-reasoned, the previous review rounds were thorough, and all raised concerns have been addressed: OOM handling uses throughout, the tombstone uses to prevent duplicate unlinks, and the async-signal-safety concern (using which logs and takes a mutex) was fixed by switching to the raw syscall. The author also correctly justified keeping the registration in (it is invoked synchronously on the same call stack as the C++ ).

Other Factors

The regression test covers all four scenarios (SIGTERM, SIGINT, process.exit, Bun.listen). No bugs were found by the automated bug-hunting system. The existing unix-socket-unlink test suite (9 tests) should continue to pass. CI passed on all lanes that got a runner; the two failures are infra queue timeouts, not test failures.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

Build #45157: 2 real test failures, both unrelated to this PR and both intermittent:

  • test/js/web/fetch/fetch.upgrade.test.ts on ubuntu-25.04-aarch64 and debian-13-aarch64 — websocket upgrade test on TCP port: 0. My onListen changes are guarded by if (this.config.address == .unix) so they are a no-op for TCP servers. Same test passed on linux aarch64 in build #45108 with identical code (retry commits are empty).
  • test/bundler/bundler_defer.test.ts on windows-2019-x64-baseline — unrelated bundler test. Already auto-retried on buildkite and now showing SUCCESS.

Plus the usual darwin-13/14 aarch64 Expired queue-timeouts. Not pushing another retry commit — the flake pattern points at aarch64 test infra on this repo, not my code.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior review concerns resolved across the follow-up commits.

Extended reasoning...

Overview

Adds a process-global wait-free linked list (UnixSocketCleanup.zig) tracking active unix-socket paths, wired into Bun.serve and Bun.listen lifecycles. Cleanup is triggered from both an atexit hook (covering process.exit()) and the SIGTERM/SIGINT handler installed unconditionally in bun_initialize_process.

Security risks

None. The cleanup removes only paths the current process registered; abstract sockets and Windows named pipes are excluded. Signal-handler safety is achieved via raw std.posix.system.unlink (POSIX async-signal-safe) and lock-free list traversal.

Level of scrutiny

This touches signal handling, process exit paths, and a new C-Zig interop module — all areas that warrant careful review. The logic is well-documented, the concurrent-unlink race is closed with cmpxchgStrong, OOM is handled with bun.handleOom at both allocation sites, and the docstrings now accurately describe call ordering.

Other factors

All issues raised in review (OOM swallowing, signal-unsafe bun.sys.unlink, incorrect docstring, tombstone race, catch {} in Bun__atexit) were fixed in subsequent commits before this approval. The regression test exercises SIGTERM, SIGINT, and process.exit(0) paths for both Bun.serve and Bun.listen. Build failures in CI are unrelated flakes on aarch64 infra.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all review issues from multiple passes have been addressed.

Extended reasoning...

Overview

Fixes the EADDRINUSE regression introduced by #28798: unix socket files created by Bun.serve({ unix }) and Bun.listen({ unix }) are now unlinked on SIGTERM/SIGINT (via a signal handler) and on process.exit() / end-of-script (via an atexit hook). The core mechanism is a new UnixSocketCleanup.zig module: an append-only, wait-free linked list of live paths walked from both contexts.

Security risks

None identified. The cleanup is best-effort (errors swallowed on the termination path) and uses only async-signal-safe operations in the signal handler. The tombstone-before-unlink ordering prevents the cleanup walk from re-unlinking a path that another process has already re-bound. Abstract sockets (Linux path[0] == 0) and Windows named pipes are correctly skipped.

Level of scrutiny

High — the PR touches signal handling, process lifecycle (atexit hooks), and C/C++ bindings. It received thorough iterative review: CodeRabbit flagged several issues (OOM handling, async-signal-safety, CAS atomicity), I independently found the same issues plus a docstring inaccuracy, and the author (robobun) addressed every point across five commits (16404c7, e91d85c, 66ef866, 443be66).

Other factors

The only open item from the bug-hunter is a Nit: the four regression tests use test.skipIf(isWindows) without .concurrent, contrary to the CLAUDE.md convention for tests that spawn subprocesses. Each test uses a unique temp directory with no shared state, so they could safely run concurrently. This is a style/performance issue only — the tests are correct as written. It does not warrant blocking the fix.

CI shows only infrastructure flakes (darwin-aarch64 queue timeouts, unrelated bundler/fetch test intermittents); no test failures attributable to this PR.

Comment on lines +55 to +75
test.skipIf(isWindows)("#29166 Bun.serve(unix) unlinks socket file on SIGTERM", async () => {
using dir = tempDir("issue-29166-serve-sigterm", {});
const sock = join(String(dir), "EADDRINUSE.sock");

await spawnChildAndKill(sock, "serve", "SIGTERM");

// The cleanup walk should have removed the socket file. (Before the fix
// this was still present and the next run hit EADDRINUSE.)
expect(existsSync(sock)).toBe(false);

// Re-running the exact same script must succeed — this was the user's
// original reproducer.
await spawnChildAndKill(sock, "serve", "SIGTERM");
expect(existsSync(sock)).toBe(false);
});

test.skipIf(isWindows)("#29166 Bun.serve(unix) unlinks socket file on SIGINT", async () => {
using dir = tempDir("issue-29166-serve-sigint", {});
const sock = join(String(dir), "EADDRINUSE.sock");

await spawnChildAndKill(sock, "serve", "SIGINT");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The four tests in 29166.test.ts each spawn Bun subprocesses and write to unique temp directories, but none are marked test.concurrent, violating the test/CLAUDE.md convention that says to prefer concurrent tests when multiple tests in the same file spawn processes or write files. The fix is trivial: use test.concurrent.skipIf(isWindows) (already used elsewhere in the codebase, e.g. test/cli/install/symlink-path-traversal.test.ts).

Extended reasoning...

What the violation is: test/CLAUDE.md explicitly states "Prefer concurrent tests over sequential tests: When multiple tests in the same file spawn processes or write files, make them concurrent with test.concurrent or describe.concurrent unless it is very difficult to make them concurrent." All four tests in 29166.test.ts spawn Bun subprocesses, yet none are marked test.concurrent or wrapped in a describe.concurrent block.

The four tests and their isolation: Each test uses a distinct tempDir call with a unique name (issue-29166-serve-sigterm, issue-29166-serve-sigint, issue-29166-listen-sigterm, issue-29166-serve-exit), so there is no shared state between them. The only sequential dependency is internal to the SIGTERM test, which calls spawnChildAndKill twice in sequence within a single test body — that is sequential within that one test, not a cross-test dependency. All four tests can run concurrently with each other.

Why the refutation is not persuasive: The refutation claims there is no established pattern of combining test.skipIf(isWindows) with .concurrent. But three verifiers independently found that describe.concurrent.skipIf is already used in the codebase (test/cli/install/symlink-path-traversal.test.ts and test/js/bun/s3/s3.test.ts), so the combination is supported and precedented. The fix is genuinely trivial.

Other regression tests follow the convention: Tests such as 28948.test.ts, 03216.test.ts, and 03830.test.ts in the same directory use test.concurrent or describe.concurrent for the exact same pattern (spawning subprocesses in unique temp dirs).

Impact: No functional bug — the tests work correctly as written. The only consequence is that the four tests run sequentially when they could run concurrently, unnecessarily slowing the test suite.

Proof via step-by-step example: Each of the four tests takes ~1-2s to spawn, kill, and assert. Running sequentially: ~6-8s total. Running concurrently: ~2s. No shared socket paths (each uses a distinct temp directory), so there is no EADDRINUSE risk between tests. The two sequential spawnChildAndKill calls within the SIGTERM test are internal to that single test and are unaffected by making the test itself concurrent with the other three.

Fix: Change each test.skipIf(isWindows) to test.concurrent.skipIf(isWindows), or wrap all four in a single describe.concurrent.skipIf(isWindows) block. Either approach matches existing codebase patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bun.serve does not clean up the unix socket file after closing

1 participant