Conversation
Verify that the resolved path stays within the base directory before reading it. A directive like [embedmd]:# (../../../etc/passwd) would previously read arbitrary files accessible to the running user. Apply the same bounds check to the test fake so tests exercise the real restriction. Closes #80
The original bounds check used filepath.Clean which doesn't expand relative components, so a base dir of "." produced the prefix "./" which never matched paths like "hello.go". Use filepath.Abs on both sides so the comparison always works regardless of how the base dir is expressed.
The bounds check was duplicated between fetcher.Fetch (production) and mixedContentProvider.Fetch (test fake). Any other Fetcher implementation would have needed to copy it a third time. Extract it into checkPath in embedmd.go and call it from runCommand before delegating to the Fetcher. The check now runs once, centrally, regardless of which Fetcher is in use.
campoy
left a comment
There was a problem hiding this comment.
The fix is sound and well-structured. Here is a detailed assessment.
Correctness of checkPath
The core logic is correct. Appending string(os.PathSeparator) to both sides before the HasPrefix comparison is the right technique — it prevents a directory like /tmp/docs from being fooled by a sibling path like /tmp/docs-evil. Using filepath.Abs (rather than just filepath.Clean) correctly handles relative dir values by anchoring to the real working directory.
One gap worth calling out: the check is skipped when dir == "" (line 100–102). This is the default when no WithBaseDir option is passed. In that case, embedmd resolves paths relative to the process working directory, and a traversal directive like [embedmd]:# (../../etc/passwd) will still read arbitrary files. This is arguably the same behaviour as before the fix (no regression), but it is worth documenting explicitly in the function comment. Something like:
// When dir is empty, no base-directory constraint is in force and any relative
// path is resolved against the process working directory; no check is performed.Symlinks are not covered. filepath.Abs calls filepath.Clean under the hood; it does not resolve symlinks. A symlink inside the base directory that points outside it will pass the check. Fixing this properly requires filepath.EvalSymlinks, but that comes with its own portability and error-handling costs. This is a known limitation of the approach (not unique to this PR), and it is acceptable to leave it for a follow-up — but it should be documented.
Edge cases
| Case | Behaviour | Correct? |
|---|---|---|
../secret.go with dir="sample" |
Blocked | ✓ |
../../etc/passwd with dir="sample" |
Blocked | ✓ |
../../../secret.go with dir="a/b/c" |
Blocked | ✓ |
Absolute path /etc/passwd with relative dir |
Allowed — filepath.Join("sample", "/etc/passwd") on Unix strips the leading slash, producing sample/etc/passwd |
✓ correct |
Absolute path /etc/passwd with absolute dir |
Allowed — filepath.Join("/tmp/docs", "/etc/passwd") produces /tmp/docs/etc/passwd, safely inside |
✓ correct |
dir="" + any traversal |
Allowed (early return) | ⚠ acceptable but undocumented |
| Symlink inside base dir pointing outside | Allowed | ⚠ known limitation |
dir is exact string prefix of escaped path (sam vs sample) |
Blocked correctly — separator-suffix trick works | ✓ |
Placement of the check in runCommand
The rationale in the PR description is sound. Placing the guard in runCommand rather than in fetcher.Fetch means it is enforced for every Fetcher implementation, including custom ones supplied via WithFetcher. This is the right level of abstraction.
Test quality
Three new table-driven cases are added to TestProcess:
path traversal rejected when base dir is set— covers the basic reported case.deeply nested path traversal rejected— covers deeper nesting.normal relative path within base dir still works— regression guard.
These are good, but there is no direct unit test for checkPath itself. Since checkPath is an unexported function containing the entire security logic, a TestCheckPath table-driven test would make it much easier to exhaustively cover edge cases (empty dir, absolute path, the separator-trick, etc.) without going through the full Process pipeline. This is not a blocker but it is a meaningful coverage gap.
One minor observation: the "deeply nested path traversal rejected" test case has no files map. This is intentional and correct — the error fires in checkPath before the fetcher is ever called — but a brief comment to that effect would help future readers.
Duplication / design
The refactor of mixedContentProvider.Fetch is a good cleanup. Renaming path = filepath.Join(...) to resolved := filepath.Join(...) avoids shadowing the parameter, which was a latent readability issue.
The content.go change (collapsing two lines into one) is fine and removes a now-unnecessary intermediate variable.
Error messages
path %q escapes base directory is clear and actionable. The wrapping in runCommand produces could not read ../secret.go: path "../secret.go" escapes base directory — the path appears twice, which is slightly redundant, but it is consistent with how all other errors in runCommand are wrapped and not worth changing in isolation.
Summary
The security fix is correct for the common case and the implementation is clean. Two things are worth addressing before or shortly after merge:
- Document the
dir == ""case in thecheckPathcomment — either explicitly accepting that traversal is unconstrained withoutWithBaseDir, or deciding to treat the process cwd as the implicit base and block traversal unconditionally. - Add a
TestCheckPathunit test to give the security-critical function its own focused coverage.
Neither is a blocker for the fix itself. The change is a clear improvement over the status quo.
Without a base directory there is no boundary to enforce for path traversal checks, so any embed directive passed via stdin can read arbitrary files on the host. Additionally, stdin mode has little practical utility because local file paths embedded in a document have no anchor without a known base directory. This commit removes the stdin-reading path entirely. Calling embedmd with no file arguments now exits with an error instead of blocking on stdin. The `stdin` package-level variable and all associated branches in `embed` are deleted; `TestEmbedStreams` is replaced by `TestEmbedNoPaths` which verifies the new error behaviour. Follow-up to #84.
Without a base directory there is no boundary to enforce for path traversal checks, so any embed directive passed via stdin can read arbitrary files on the host. Additionally, stdin mode has little practical utility because local file paths embedded in a document have no anchor without a known base directory. This commit removes the stdin-reading path entirely. Calling embedmd with no file arguments now exits with an error instead of blocking on stdin. The `stdin` package-level variable and all associated branches in `embed` are deleted; `TestEmbedStreams` is replaced by `TestEmbedNoPaths` which verifies the new error behaviour. Follow-up to #84. Co-authored-by: Francesc Campoy <francesc@campoy.cat>
Summary
checkPathfunction inembedmd/embedmd.gothat verifies a local path stays within the base directory usingfilepath.Abson both sides of the comparison.embedder.runCommandbefore delegating toe.Fetch, so the check applies to every embed directive regardless of whichFetcherimplementation is in use — including custom fetchers supplied viaWithFetcher.fetcher.Fetchincontent.go. Placing it there would only protect the default fetcher; anyWithFetcher(...)caller would bypass it. Centralising it inrunCommandmeans it runs once for every command, unconditionally.mixedContentProvider.Fetchin the test file no longer needs to duplicate the logic either.A directive like
[embedmd]:# (../../../etc/passwd)previously allowed reading arbitrary files accessible to the running user.Test plan
path traversal rejected when base dir is set—../secret.gowith a base dir returns an errordeeply nested path traversal rejected—../../../secret.gowith a nested base dir returns an errornormal relative path within base dir still works—code.gowith a matching base dir succeeds (regression)embedmd docs.mdinsample/correctly resolveshello.gorelative to.go test ./...)Closes #80