Summary
The extract function in embedmd/embedmd.go compiles POSIX regular expressions taken directly from embedmd directives in markdown files. There is no validation of the pattern before it is handed to regexp.CompilePOSIX. While Go's RE2-based engine prevents catastrophic backtracking, this still enables regex injection: a crafted pattern can match unintended content and silently embed wrong or sensitive data from the target file.
Vulnerable code
embedmd/embedmd.go:123:
re, err := regexp.CompilePOSIX(s[1 : len(s)-1])
The string s comes directly from the parsed embed directive with only the leading and trailing / stripped. No further sanitisation is applied before compilation.
Exploit: silent content substitution
Consider a source file secrets.go that contains both public API constants and internal credentials:
const APIKey = "public-key-abc"
// internal
const InternalToken = "super-secret-token-xyz"
A legitimate embed directive in README.md intends to show only the public key:
[embedmd]:# (secrets.go go /const APIKey/)
An attacker with write access to the markdown file (e.g. via a pull-request modifying docs) substitutes:
[embedmd]:# (secrets.go go /.*Token.*/)
Running embedmd -w README.md will silently replace the intended snippet with the internal token. Because the diff shows only the code block changing — not the directive — reviewers focusing on logic changes may miss it.
Exploit: error message leaks path information
An invalid pattern causes regexp.CompilePOSIX to return an error that is propagated as-is:
[embedmd]:# (main.go go /[invalid/)
could not extract content from main.go: missing slashes (/) around "[invalid"
The error message includes the literal pattern and the file name, which could be useful for enumerating project structure in automated tooling.
Suggested fix
- Validate that the pattern compiles before any output is produced (already done implicitly, but errors should be surfaced with more context).
- In environments where markdown is treated as trusted input this is lower priority; document the trust assumption explicitly.
- If untrusted markdown processing is a supported use-case, consider an allow-list of safe regex features or a sandbox for pattern evaluation.
Summary
The
extractfunction inembedmd/embedmd.gocompiles POSIX regular expressions taken directly from embedmd directives in markdown files. There is no validation of the pattern before it is handed toregexp.CompilePOSIX. While Go's RE2-based engine prevents catastrophic backtracking, this still enables regex injection: a crafted pattern can match unintended content and silently embed wrong or sensitive data from the target file.Vulnerable code
embedmd/embedmd.go:123:The string
scomes directly from the parsed embed directive with only the leading and trailing/stripped. No further sanitisation is applied before compilation.Exploit: silent content substitution
Consider a source file
secrets.gothat contains both public API constants and internal credentials:A legitimate embed directive in
README.mdintends to show only the public key:An attacker with write access to the markdown file (e.g. via a pull-request modifying docs) substitutes:
Running
embedmd -w README.mdwill silently replace the intended snippet with the internal token. Because the diff shows only the code block changing — not the directive — reviewers focusing on logic changes may miss it.Exploit: error message leaks path information
An invalid pattern causes
regexp.CompilePOSIXto return an error that is propagated as-is:The error message includes the literal pattern and the file name, which could be useful for enumerating project structure in automated tooling.
Suggested fix