Skip to content

fix(url): reject malformed percent-encoding in fileURLToPath#29176

Open
robobun wants to merge 5 commits intomainfrom
farm/ec839e12/validate-fileurltopath-percent-encoding
Open

fix(url): reject malformed percent-encoding in fileURLToPath#29176
robobun wants to merge 5 commits intomainfrom
farm/ec839e12/validate-fileurltopath-percent-encoding

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented Apr 11, 2026

Fixes #29174.

Reproduction

import { fileURLToPath } from 'node:url';
fileURLToPath('file:///tmp/%%20users.txt');

Node.js: throws URIError: URI malformed.
Bun (before): silently returned /tmp/% users.txt.

Cause

functionFileURLToPath in src/bun.js/bindings/BunObject.cpp was handing the URL pathname to WebKit's WTF::URL::fileSystemPath(), which uses a lenient percent-decoder that treats invalid sequences as literal characters. Node, in contrast, implements fileURLToPath via decodeURIComponent(pathname) which throws URIError: URI malformed for any % not followed by exactly two hex digits.

Fix

Before the call to url.fileSystemPath(), iterate the pathname and throw ERR_INVALID_URI (a URIError with message URI malformed, matching Node) if any % is not followed by two hex digits. The existing encoded-slash check (%2F, %5C) still runs first, preserving Node's error precedence.

Verification

test/regression/issue/29174.test.ts — 16 cases covering:

  • malformed sequences: %%20, %GG, %2., %., lone trailing %, lone /%
  • valid encoding still decodes: %20, %7E
  • unencoded paths untouched
  • encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)

Each case was cross-checked against Node.js — same error name (URIError), code, and message.

fileURLToPath was delegating to WebKit's lenient percent-decoder, which
silently interpreted malformed sequences like `%%20` as a literal `%`
followed by a space. Node.js throws `URIError: URI malformed` for any
`%` not followed by two hex digits.

Validate the URL pathname strictly before handing it to `fileSystemPath()`,
matching Node's `decodeURIComponent` semantics.

Fixes #29174
@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented Apr 11, 2026

Updated 12:22 PM PT - Apr 11th, 2026

@robobun, your commit 60792ec has 2 failures in Build #45134 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29176

That installs a local version of the PR into your bun-29176 executable, so you can run:

bun-29176 --bun

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

Walkthrough

Adds strict validation for percent-encoded sequences in file:// URL pathnames: every % must be followed by two ASCII hex digits and decoded bytes must be valid UTF‑8. On invalid sequences the function throws URI malformed (ERR_INVALID_URI) and returns before converting the URL to a filesystem path. Includes regression tests.

Changes

Cohort / File(s) Summary
URL pathname validation
src/bun.js/bindings/BunObject.cpp
Added #include <wtf/ASCIICType.h>; added strict percent-sequence validator (every % must be followed by two ASCII hex digits), percent-decode to a byte buffer, validate UTF‑8 via simdutf::validate_utf8, and throw ERR_INVALID_URI (URI malformed) on failures before calling url.fileSystemPath().
Regression tests
test/regression/issue/29174.test.ts
New test suite asserting fileURLToPath throws URIError with code: "ERR_INVALID_URI" and message URI malformed for malformed percent-encodings (invalid hex, truncated %, %%20, and invalid UTF‑8), plus platform-scoped assertions for valid decodings and %2F behavior.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(url): reject malformed percent-encoding in fileURLToPath' clearly and concisely describes the main change: adding validation to reject malformed percent-encoding sequences in the fileURLToPath function.
Description check ✅ Passed The PR description comprehensively covers the template requirements: explains what the PR does (fixes malformed percent-encoding validation), provides reproduction steps, identifies the root cause, describes the fix, and includes verification details with 16 test cases.
Linked Issues check ✅ Passed The PR fully addresses issue #29174 by implementing strict percent-encoding validation that throws URIError for malformed sequences, matching Node.js behavior. Changes include validation logic in BunObject.cpp and comprehensive regression tests.
Out of Scope Changes check ✅ Passed All changes are directly related to the objective of fixing #29174: the C++ implementation adds percent-encoding validation, and the test file validates the fix with 16 regression test cases. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 52-60: The tests hardcode POSIX paths; on Windows fileURLToPath
returns Windows-formatted paths, so update the three assertions that call
fileURLToPath to branch by platform: if process.platform === "win32" compute
expected values with path.win32.join("\\", "tmp", ...) (e.g.
path.win32.join("\\", "tmp", " space.txt"), path.win32.join("\\", "tmp",
"a~b.txt"), path.win32.join("\\", "tmp", "~.txt")), otherwise keep the POSIX
literals ("/tmp/..."); apply the same platform branch for the plain.txt test,
and import/use the path module where needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 20291d0f-f4df-4767-a5d2-29949d2389c2

📥 Commits

Reviewing files that changed from the base of the PR and between 4b1d889 and 74b524b.

📒 Files selected for processing (2)
  • src/bun.js/bindings/BunObject.cpp
  • test/regression/issue/29174.test.ts

Comment thread test/regression/issue/29174.test.ts Outdated
Addresses coderabbit review — on Windows, fileURLToPath('/tmp/...')
rejects the path as non-absolute. Skip the posix assertions on Windows
and mirror them with drive-letter variants under test.if(isWindows).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 2-6: Remove the multi-line bug-history prose comment block that
follows the single-line issue URL at the top of the test file and leave only the
single-line issue URL comment; in other words, delete the descriptive lines
about Bun's fileURLToPath/WebKit decoding while keeping the one-line issue
reference so the test header follows the standard single-line format used in
test/regression/issue/*.test.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50d84c21-9624-4f21-a173-8a405170b1de

📥 Commits

Reviewing files that changed from the base of the PR and between 74b524b and 213ed81.

📒 Files selected for processing (1)
  • test/regression/issue/29174.test.ts

Comment thread test/regression/issue/29174.test.ts Outdated
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, correct fix with good test coverage.

Extended reasoning...

Overview

Adds a strict percent-encoding validation pass in functionFileURLToPath (BunObject.cpp) before delegating to WTF's lenient fileSystemPath(). Only two files changed: the C++ fix and a regression test.

Security risks

None. The change makes validation stricter, not looser.

Level of scrutiny

Low — the logic is a simple linear scan with short-circuit bounds checking. The condition i + 2 >= length correctly detects trailing % and single-char sequences, and C++ short-circuit evaluation prevents any out-of-bounds access on p[i+1]/p[i+2].

Other factors

No bugs found by the bug hunting system. The fix restores Node.js compatibility for a well-defined edge case, and the 16 test cases cover malformed sequences, valid encoding, and the unchanged encoded-slash error path.

Use explicit unsigned loop counter matching StringView::length()'s return
type, and bind the two hex-digit chars to named locals before calling
isASCIIHexDigit. The previous size_t loop + templated call through
operator[] failed to compile on Windows baseline.
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — simple, correct fix with comprehensive test coverage; the multi-line comment nit is already flagged inline.

Extended reasoning...

Overview

The PR modifies in to reject malformed percent-encoded sequences. It adds a single loop that scans the URL pathname for characters not followed by exactly two ASCII hex digits, throwing before delegating to WTF's lenient decoder. A new regression test () covers 16 cases: malformed inputs, valid encodings, platform variants, and the existing encoded-slash check.

Security risks

None. This is a correctness/compatibility fix that makes Bun stricter — it can only cause previously-silent bad input to throw, which is the desired behavior.

Level of scrutiny

Low. The change is small (one loop, ~10 lines), touches a single well-isolated code path, uses existing helpers (), and has no side effects on the happy path. The boundary check correctly handles all edge cases (lone trailing , single-char remainder).

Other factors

The Windows path assertion issue raised by CodeRabbit was addressed in commit 213ed81 using / . The only remaining open item is the multi-line bug-history prose comment (lines 2–6 of the test file), which is a trivial style nit already posted as an inline comment — it has no functional impact.

Comment thread test/regression/issue/29174.test.ts Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/BunObject.cpp`:
- Around line 933-951: The current loop only checks percent-encoding hex digits
but not UTF-8 validity; update the logic in BunObject.cpp (around the
fileURLToPath-related validation that inspects p and uses isASCIIHexDigit) to
either (A) perform UTF-8 sequence validation on the percent-decoded bytes and
throw ERR_INVALID_URI ("URI malformed") for invalid UTF-8 sequences, or (B)
after calling WTF::URL::fileSystemPath(), check its return (e.g., null/empty or
an error indicator) and throw ERR_INVALID_URI if it indicates invalid UTF-8;
locate the code that builds/uses p and the call to WTF::URL::fileSystemPath()
and add the chosen validation/ null-check to ensure inputs like "%E0%A4%A" are
rejected consistently with Node's decodeURIComponent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b65e470-272b-4452-95b1-0b2b0723d508

📥 Commits

Reviewing files that changed from the base of the PR and between 213ed81 and 3303760.

📒 Files selected for processing (1)
  • src/bun.js/bindings/BunObject.cpp

Comment thread src/bun.js/bindings/BunObject.cpp
robobun added 2 commits April 11, 2026 15:54
Node's decodeURIComponent rejects percent sequences that are syntactically
valid but decode to invalid UTF-8 (e.g. truncated multi-byte sequences,
overlong encodings, lone continuation bytes). WTF's fileSystemPath decoder
is lenient on both counts, so decode the pathname bytes ourselves in the
same pass that validates the hex syntax and run simdutf::validate_utf8
over the result.
Per repo convention in test/regression/issue/*.test.ts, only the
single-line issue URL comment should live at the top of the file.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 81-88: Extend the existing test "encoded slash still throws
ERR_INVALID_FILE_URL_PATH (unchanged)" to assert that Windows-encoded
backslashes (%5C and %5c) also throw the same error by calling
fileURLToPath("file:///tmp/%5Chack") and fileURLToPath("file:///tmp/%5chack")
when running on Windows; wrap those two additional expect(...) assertions in a
platform check (process.platform === "win32") so they only run on Windows, and
keep the same expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" })
matcher as the existing assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 169132c8-9f03-4963-b9cd-004a2a7e3e57

📥 Commits

Reviewing files that changed from the base of the PR and between 3303760 and 60792ec.

📒 Files selected for processing (2)
  • src/bun.js/bindings/BunObject.cpp
  • test/regression/issue/29174.test.ts

Comment on lines +81 to +88
test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add Windows %5C/%5c encoded-backslash assertions for completeness.

This test currently validates encoded / only. Adding encoded \ checks on Windows would fully lock in the unchanged platform-specific behavior.

Suggested test extension
   test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
     expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow(
       expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
     );
     expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow(
       expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
     );
   });
+
+  test.if(isWindows)("encoded backslash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
+    expect(() => fileURLToPath("file:///C:/tmp/%5Chack")).toThrow(
+      expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
+    );
+    expect(() => fileURLToPath("file:///C:/tmp/%5chack")).toThrow(
+      expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
+    );
+  });
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
});
test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
});
test.if(isWindows)("encoded backslash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
expect(() => fileURLToPath("file:///C:/tmp/%5Chack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
expect(() => fileURLToPath("file:///C:/tmp/%5chack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/regression/issue/29174.test.ts` around lines 81 - 88, Extend the
existing test "encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)"
to assert that Windows-encoded backslashes (%5C and %5c) also throw the same
error by calling fileURLToPath("file:///tmp/%5Chack") and
fileURLToPath("file:///tmp/%5chack") when running on Windows; wrap those two
additional expect(...) assertions in a platform check (process.platform ===
"win32") so they only run on Windows, and keep the same
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }) matcher as the
existing assertions.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all review feedback addressed.

Extended reasoning...

Overview

This PR fixes fileURLToPath in src/bun.js/bindings/BunObject.cpp to reject malformed percent-encodings (invalid hex digits, truncated sequences) and invalid UTF-8 byte sequences, matching Node.js behavior. Changes touch two files: the C++ binding and a new regression test.

Security risks

None. The change adds stricter input validation (rejecting bad input rather than accepting it), which is strictly safer than the prior lenient behavior.

Level of scrutiny

Moderate. The C++ change adds a pathname-scan loop and a simdutf UTF-8 validation call before delegating to WTF::URL::fileSystemPath(). The logic is straightforward and follows the existing error-throwing pattern in the file. No memory safety concerns — Vector<uint8_t, 256> with reserveInitialCapacity is a standard WTF pattern.

Other factors

Three rounds of reviewer feedback were raised and all addressed: (1) Windows path assertions scoped with test.skipIf/test.if (commit 213ed81), (2) UTF-8 validation added via simdutf (commit 47a7aee), (3) multi-line prose comment trimmed to single-line issue URL (commit 60792ec). No bugs were found by the automated bug hunting system. The remaining CodeRabbit comment (add %5C Windows backslash tests) is a trivial enhancement, not a correctness issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent validation of percent-encoded file URLs compared to Node.js

1 participant