fix(url): reject malformed percent-encoding in fileURLToPath#29176
fix(url): reject malformed percent-encoding in fileURLToPath#29176
Conversation
fileURLToPath was delegating to WebKit's lenient percent-decoder, which silently interpreted malformed sequences like `%%20` as a literal `%` followed by a space. Node.js throws `URIError: URI malformed` for any `%` not followed by two hex digits. Validate the URL pathname strictly before handing it to `fileSystemPath()`, matching Node's `decodeURIComponent` semantics. Fixes #29174
|
Updated 12:22 PM PT - Apr 11th, 2026
❌ @robobun, your commit 60792ec has 2 failures in
🧪 To try this PR locally: bunx bun-pr 29176That installs a local version of the PR into your bun-29176 --bun |
WalkthroughAdds strict validation for percent-encoded sequences in file:// URL pathnames: every Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 52-60: The tests hardcode POSIX paths; on Windows fileURLToPath
returns Windows-formatted paths, so update the three assertions that call
fileURLToPath to branch by platform: if process.platform === "win32" compute
expected values with path.win32.join("\\", "tmp", ...) (e.g.
path.win32.join("\\", "tmp", " space.txt"), path.win32.join("\\", "tmp",
"a~b.txt"), path.win32.join("\\", "tmp", "~.txt")), otherwise keep the POSIX
literals ("/tmp/..."); apply the same platform branch for the plain.txt test,
and import/use the path module where needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 20291d0f-f4df-4767-a5d2-29949d2389c2
📒 Files selected for processing (2)
src/bun.js/bindings/BunObject.cpptest/regression/issue/29174.test.ts
Addresses coderabbit review — on Windows, fileURLToPath('/tmp/...')
rejects the path as non-absolute. Skip the posix assertions on Windows
and mirror them with drive-letter variants under test.if(isWindows).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 2-6: Remove the multi-line bug-history prose comment block that
follows the single-line issue URL at the top of the test file and leave only the
single-line issue URL comment; in other words, delete the descriptive lines
about Bun's fileURLToPath/WebKit decoding while keeping the one-line issue
reference so the test header follows the standard single-line format used in
test/regression/issue/*.test.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 50d84c21-9624-4f21-a173-8a405170b1de
📒 Files selected for processing (1)
test/regression/issue/29174.test.ts
There was a problem hiding this comment.
LGTM — small, correct fix with good test coverage.
Extended reasoning...
Overview
Adds a strict percent-encoding validation pass in functionFileURLToPath (BunObject.cpp) before delegating to WTF's lenient fileSystemPath(). Only two files changed: the C++ fix and a regression test.
Security risks
None. The change makes validation stricter, not looser.
Level of scrutiny
Low — the logic is a simple linear scan with short-circuit bounds checking. The condition i + 2 >= length correctly detects trailing % and single-char sequences, and C++ short-circuit evaluation prevents any out-of-bounds access on p[i+1]/p[i+2].
Other factors
No bugs found by the bug hunting system. The fix restores Node.js compatibility for a well-defined edge case, and the 16 test cases cover malformed sequences, valid encoding, and the unchanged encoded-slash error path.
Use explicit unsigned loop counter matching StringView::length()'s return type, and bind the two hex-digit chars to named locals before calling isASCIIHexDigit. The previous size_t loop + templated call through operator[] failed to compile on Windows baseline.
There was a problem hiding this comment.
LGTM — simple, correct fix with comprehensive test coverage; the multi-line comment nit is already flagged inline.
Extended reasoning...
Overview
The PR modifies in to reject malformed percent-encoded sequences. It adds a single loop that scans the URL pathname for characters not followed by exactly two ASCII hex digits, throwing before delegating to WTF's lenient decoder. A new regression test () covers 16 cases: malformed inputs, valid encodings, platform variants, and the existing encoded-slash check.
Security risks
None. This is a correctness/compatibility fix that makes Bun stricter — it can only cause previously-silent bad input to throw, which is the desired behavior.
Level of scrutiny
Low. The change is small (one loop, ~10 lines), touches a single well-isolated code path, uses existing helpers (), and has no side effects on the happy path. The boundary check correctly handles all edge cases (lone trailing , single-char remainder).
Other factors
The Windows path assertion issue raised by CodeRabbit was addressed in commit 213ed81 using / . The only remaining open item is the multi-line bug-history prose comment (lines 2–6 of the test file), which is a trivial style nit already posted as an inline comment — it has no functional impact.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/bindings/BunObject.cpp`:
- Around line 933-951: The current loop only checks percent-encoding hex digits
but not UTF-8 validity; update the logic in BunObject.cpp (around the
fileURLToPath-related validation that inspects p and uses isASCIIHexDigit) to
either (A) perform UTF-8 sequence validation on the percent-decoded bytes and
throw ERR_INVALID_URI ("URI malformed") for invalid UTF-8 sequences, or (B)
after calling WTF::URL::fileSystemPath(), check its return (e.g., null/empty or
an error indicator) and throw ERR_INVALID_URI if it indicates invalid UTF-8;
locate the code that builds/uses p and the call to WTF::URL::fileSystemPath()
and add the chosen validation/ null-check to ensure inputs like "%E0%A4%A" are
rejected consistently with Node's decodeURIComponent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7b65e470-272b-4452-95b1-0b2b0723d508
📒 Files selected for processing (1)
src/bun.js/bindings/BunObject.cpp
Node's decodeURIComponent rejects percent sequences that are syntactically valid but decode to invalid UTF-8 (e.g. truncated multi-byte sequences, overlong encodings, lone continuation bytes). WTF's fileSystemPath decoder is lenient on both counts, so decode the pathname bytes ourselves in the same pass that validates the hex syntax and run simdutf::validate_utf8 over the result.
Per repo convention in test/regression/issue/*.test.ts, only the single-line issue URL comment should live at the top of the file.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/29174.test.ts`:
- Around line 81-88: Extend the existing test "encoded slash still throws
ERR_INVALID_FILE_URL_PATH (unchanged)" to assert that Windows-encoded
backslashes (%5C and %5c) also throw the same error by calling
fileURLToPath("file:///tmp/%5Chack") and fileURLToPath("file:///tmp/%5chack")
when running on Windows; wrap those two additional expect(...) assertions in a
platform check (process.platform === "win32") so they only run on Windows, and
keep the same expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" })
matcher as the existing assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 169132c8-9f03-4963-b9cd-004a2a7e3e57
📒 Files selected for processing (2)
src/bun.js/bindings/BunObject.cpptest/regression/issue/29174.test.ts
| test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => { | ||
| expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow( | ||
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | ||
| ); | ||
| expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow( | ||
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | ||
| ); | ||
| }); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add Windows %5C/%5c encoded-backslash assertions for completeness.
This test currently validates encoded / only. Adding encoded \ checks on Windows would fully lock in the unchanged platform-specific behavior.
Suggested test extension
test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow(
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
);
});
+
+ test.if(isWindows)("encoded backslash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => {
+ expect(() => fileURLToPath("file:///C:/tmp/%5Chack")).toThrow(
+ expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
+ );
+ expect(() => fileURLToPath("file:///C:/tmp/%5chack")).toThrow(
+ expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }),
+ );
+ });
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => { | |
| expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| }); | |
| test("encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => { | |
| expect(() => fileURLToPath("file:///tmp/%2Fhack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| expect(() => fileURLToPath("file:///tmp/%2fhack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| }); | |
| test.if(isWindows)("encoded backslash still throws ERR_INVALID_FILE_URL_PATH (unchanged)", () => { | |
| expect(() => fileURLToPath("file:///C:/tmp/%5Chack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| expect(() => fileURLToPath("file:///C:/tmp/%5chack")).toThrow( | |
| expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }), | |
| ); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/regression/issue/29174.test.ts` around lines 81 - 88, Extend the
existing test "encoded slash still throws ERR_INVALID_FILE_URL_PATH (unchanged)"
to assert that Windows-encoded backslashes (%5C and %5c) also throw the same
error by calling fileURLToPath("file:///tmp/%5Chack") and
fileURLToPath("file:///tmp/%5chack") when running on Windows; wrap those two
additional expect(...) assertions in a platform check (process.platform ===
"win32") so they only run on Windows, and keep the same
expect.objectContaining({ code: "ERR_INVALID_FILE_URL_PATH" }) matcher as the
existing assertions.
There was a problem hiding this comment.
LGTM — all review feedback addressed.
Extended reasoning...
Overview
This PR fixes fileURLToPath in src/bun.js/bindings/BunObject.cpp to reject malformed percent-encodings (invalid hex digits, truncated sequences) and invalid UTF-8 byte sequences, matching Node.js behavior. Changes touch two files: the C++ binding and a new regression test.
Security risks
None. The change adds stricter input validation (rejecting bad input rather than accepting it), which is strictly safer than the prior lenient behavior.
Level of scrutiny
Moderate. The C++ change adds a pathname-scan loop and a simdutf UTF-8 validation call before delegating to WTF::URL::fileSystemPath(). The logic is straightforward and follows the existing error-throwing pattern in the file. No memory safety concerns — Vector<uint8_t, 256> with reserveInitialCapacity is a standard WTF pattern.
Other factors
Three rounds of reviewer feedback were raised and all addressed: (1) Windows path assertions scoped with test.skipIf/test.if (commit 213ed81), (2) UTF-8 validation added via simdutf (commit 47a7aee), (3) multi-line prose comment trimmed to single-line issue URL (commit 60792ec). No bugs were found by the automated bug hunting system. The remaining CodeRabbit comment (add %5C Windows backslash tests) is a trivial enhancement, not a correctness issue.
Fixes #29174.
Reproduction
Node.js: throws
URIError: URI malformed.Bun (before): silently returned
/tmp/% users.txt.Cause
functionFileURLToPathinsrc/bun.js/bindings/BunObject.cppwas handing the URL pathname to WebKit'sWTF::URL::fileSystemPath(), which uses a lenient percent-decoder that treats invalid sequences as literal characters. Node, in contrast, implementsfileURLToPathviadecodeURIComponent(pathname)which throwsURIError: URI malformedfor any%not followed by exactly two hex digits.Fix
Before the call to
url.fileSystemPath(), iterate the pathname and throwERR_INVALID_URI(aURIErrorwith messageURI malformed, matching Node) if any%is not followed by two hex digits. The existing encoded-slash check (%2F,%5C) still runs first, preserving Node's error precedence.Verification
test/regression/issue/29174.test.ts— 16 cases covering:%%20,%GG,%2.,%., lone trailing%, lone/%%20,%7EERR_INVALID_FILE_URL_PATH(unchanged)Each case was cross-checked against Node.js — same error name (
URIError),code, andmessage.