Skip to content

User-supplied regex compiled without validation enables regex injection #83

@campoy

Description

@campoy

Summary

The extract function in embedmd/embedmd.go compiles POSIX regular expressions taken directly from embedmd directives in markdown files. There is no validation of the pattern before it is handed to regexp.CompilePOSIX. While Go's RE2-based engine prevents catastrophic backtracking, this still enables regex injection: a crafted pattern can match unintended content and silently embed wrong or sensitive data from the target file.

Vulnerable code

embedmd/embedmd.go:123:

re, err := regexp.CompilePOSIX(s[1 : len(s)-1])

The string s comes directly from the parsed embed directive with only the leading and trailing / stripped. No further sanitisation is applied before compilation.

Exploit: silent content substitution

Consider a source file secrets.go that contains both public API constants and internal credentials:

const APIKey = "public-key-abc"
// internal
const InternalToken = "super-secret-token-xyz"

A legitimate embed directive in README.md intends to show only the public key:

[embedmd]:# (secrets.go go /const APIKey/)

An attacker with write access to the markdown file (e.g. via a pull-request modifying docs) substitutes:

[embedmd]:# (secrets.go go /.*Token.*/)

Running embedmd -w README.md will silently replace the intended snippet with the internal token. Because the diff shows only the code block changing — not the directive — reviewers focusing on logic changes may miss it.

Exploit: error message leaks path information

An invalid pattern causes regexp.CompilePOSIX to return an error that is propagated as-is:

[embedmd]:# (main.go go /[invalid/)
could not extract content from main.go: missing slashes (/) around "[invalid"

The error message includes the literal pattern and the file name, which could be useful for enumerating project structure in automated tooling.

Suggested fix

  • Validate that the pattern compiles before any output is produced (already done implicitly, but errors should be surfaced with more context).
  • In environments where markdown is treated as trusted input this is lower priority; document the trust assumption explicitly.
  • If untrusted markdown processing is a supported use-case, consider an allow-list of safe regex features or a sandbox for pattern evaluation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions