Skip to content

Fix path traversal in local file fetch#84

Merged
campoy merged 5 commits intomasterfrom
fix/path-traversal
Apr 11, 2026
Merged

Fix path traversal in local file fetch#84
campoy merged 5 commits intomasterfrom
fix/path-traversal

Conversation

@campoy
Copy link
Copy Markdown
Owner

@campoy campoy commented Apr 11, 2026

Summary

  • Extract a checkPath function in embedmd/embedmd.go that verifies a local path stays within the base directory using filepath.Abs on both sides of the comparison.
  • Call it from embedder.runCommand before delegating to e.Fetch, so the check applies to every embed directive regardless of which Fetcher implementation is in use — including custom fetchers supplied via WithFetcher.
  • The check is intentionally not inside fetcher.Fetch in content.go. Placing it there would only protect the default fetcher; any WithFetcher(...) caller would bypass it. Centralising it in runCommand means it runs once for every command, unconditionally.
  • mixedContentProvider.Fetch in the test file no longer needs to duplicate the logic either.

A directive like [embedmd]:# (../../../etc/passwd) previously allowed reading arbitrary files accessible to the running user.

Test plan

  • path traversal rejected when base dir is set../secret.go with a base dir returns an error
  • deeply nested path traversal rejected../../../secret.go with a nested base dir returns an error
  • normal relative path within base dir still workscode.go with a matching base dir succeeds (regression)
  • Integration test passes — embedmd docs.md in sample/ correctly resolves hello.go relative to .
  • All existing tests continue to pass (go test ./...)

Closes #80

Francesc Campoy added 2 commits April 10, 2026 18:19
Verify that the resolved path stays within the base directory before
reading it. A directive like [embedmd]:# (../../../etc/passwd) would
previously read arbitrary files accessible to the running user.

Apply the same bounds check to the test fake so tests exercise the
real restriction.

Closes #80
The original bounds check used filepath.Clean which doesn't expand
relative components, so a base dir of "." produced the prefix "./"
which never matched paths like "hello.go". Use filepath.Abs on both
sides so the comparison always works regardless of how the base dir
is expressed.
Comment thread embedmd/content.go Outdated
Francesc Campoy added 2 commits April 10, 2026 18:30
The bounds check was duplicated between fetcher.Fetch (production) and
mixedContentProvider.Fetch (test fake). Any other Fetcher implementation
would have needed to copy it a third time.

Extract it into checkPath in embedmd.go and call it from runCommand
before delegating to the Fetcher. The check now runs once, centrally,
regardless of which Fetcher is in use.
Copy link
Copy Markdown
Owner Author

@campoy campoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is sound and well-structured. Here is a detailed assessment.

Correctness of checkPath

The core logic is correct. Appending string(os.PathSeparator) to both sides before the HasPrefix comparison is the right technique — it prevents a directory like /tmp/docs from being fooled by a sibling path like /tmp/docs-evil. Using filepath.Abs (rather than just filepath.Clean) correctly handles relative dir values by anchoring to the real working directory.

One gap worth calling out: the check is skipped when dir == "" (line 100–102). This is the default when no WithBaseDir option is passed. In that case, embedmd resolves paths relative to the process working directory, and a traversal directive like [embedmd]:# (../../etc/passwd) will still read arbitrary files. This is arguably the same behaviour as before the fix (no regression), but it is worth documenting explicitly in the function comment. Something like:

// When dir is empty, no base-directory constraint is in force and any relative
// path is resolved against the process working directory; no check is performed.

Symlinks are not covered. filepath.Abs calls filepath.Clean under the hood; it does not resolve symlinks. A symlink inside the base directory that points outside it will pass the check. Fixing this properly requires filepath.EvalSymlinks, but that comes with its own portability and error-handling costs. This is a known limitation of the approach (not unique to this PR), and it is acceptable to leave it for a follow-up — but it should be documented.

Edge cases

Case Behaviour Correct?
../secret.go with dir="sample" Blocked
../../etc/passwd with dir="sample" Blocked
../../../secret.go with dir="a/b/c" Blocked
Absolute path /etc/passwd with relative dir Allowed — filepath.Join("sample", "/etc/passwd") on Unix strips the leading slash, producing sample/etc/passwd ✓ correct
Absolute path /etc/passwd with absolute dir Allowed — filepath.Join("/tmp/docs", "/etc/passwd") produces /tmp/docs/etc/passwd, safely inside ✓ correct
dir="" + any traversal Allowed (early return) ⚠ acceptable but undocumented
Symlink inside base dir pointing outside Allowed ⚠ known limitation
dir is exact string prefix of escaped path (sam vs sample) Blocked correctly — separator-suffix trick works

Placement of the check in runCommand

The rationale in the PR description is sound. Placing the guard in runCommand rather than in fetcher.Fetch means it is enforced for every Fetcher implementation, including custom ones supplied via WithFetcher. This is the right level of abstraction.

Test quality

Three new table-driven cases are added to TestProcess:

  • path traversal rejected when base dir is set — covers the basic reported case.
  • deeply nested path traversal rejected — covers deeper nesting.
  • normal relative path within base dir still works — regression guard.

These are good, but there is no direct unit test for checkPath itself. Since checkPath is an unexported function containing the entire security logic, a TestCheckPath table-driven test would make it much easier to exhaustively cover edge cases (empty dir, absolute path, the separator-trick, etc.) without going through the full Process pipeline. This is not a blocker but it is a meaningful coverage gap.

One minor observation: the "deeply nested path traversal rejected" test case has no files map. This is intentional and correct — the error fires in checkPath before the fetcher is ever called — but a brief comment to that effect would help future readers.

Duplication / design

The refactor of mixedContentProvider.Fetch is a good cleanup. Renaming path = filepath.Join(...) to resolved := filepath.Join(...) avoids shadowing the parameter, which was a latent readability issue.

The content.go change (collapsing two lines into one) is fine and removes a now-unnecessary intermediate variable.

Error messages

path %q escapes base directory is clear and actionable. The wrapping in runCommand produces could not read ../secret.go: path "../secret.go" escapes base directory — the path appears twice, which is slightly redundant, but it is consistent with how all other errors in runCommand are wrapped and not worth changing in isolation.

Summary

The security fix is correct for the common case and the implementation is clean. Two things are worth addressing before or shortly after merge:

  1. Document the dir == "" case in the checkPath comment — either explicitly accepting that traversal is unconstrained without WithBaseDir, or deciding to treat the process cwd as the implicit base and block traversal unconditionally.
  2. Add a TestCheckPath unit test to give the security-critical function its own focused coverage.

Neither is a blocker for the fix itself. The change is a clear improvement over the status quo.

@campoy campoy merged commit 2cd71c5 into master Apr 11, 2026
3 checks passed
@campoy campoy deleted the fix/path-traversal branch April 11, 2026 02:41
campoy pushed a commit that referenced this pull request Apr 11, 2026
Without a base directory there is no boundary to enforce for path
traversal checks, so any embed directive passed via stdin can read
arbitrary files on the host. Additionally, stdin mode has little
practical utility because local file paths embedded in a document have
no anchor without a known base directory.

This commit removes the stdin-reading path entirely. Calling embedmd
with no file arguments now exits with an error instead of blocking on
stdin. The `stdin` package-level variable and all associated branches
in `embed` are deleted; `TestEmbedStreams` is replaced by
`TestEmbedNoPaths` which verifies the new error behaviour.

Follow-up to #84.
campoy added a commit that referenced this pull request Apr 11, 2026
Without a base directory there is no boundary to enforce for path
traversal checks, so any embed directive passed via stdin can read
arbitrary files on the host. Additionally, stdin mode has little
practical utility because local file paths embedded in a document have
no anchor without a known base directory.

This commit removes the stdin-reading path entirely. Calling embedmd
with no file arguments now exits with an error instead of blocking on
stdin. The `stdin` package-level variable and all associated branches
in `embed` are deleted; `TestEmbedStreams` is replaced by
`TestEmbedNoPaths` which verifies the new error behaviour.

Follow-up to #84.

Co-authored-by: Francesc Campoy <francesc@campoy.cat>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Path traversal: embed directives can read arbitrary files via ../ sequences

1 participant