fix(plugin-git): support submodule#621
Conversation
- add helper to detect git repository root via `git rev-parse --show-toplevel` - normalize/resolve target file path to repo-relative before calling `git log` - run `git log` from repository root fallback to current cwd w
There was a problem hiding this comment.
The Git plugin is slow enough. I believe a directory cache can be stored here. A directory shall be resolved only once.
e.g.: /path/a.md triggers a check for the /path/ directory, and /path/b.md shall reuse the result from the cache later
Since extendsPage hooks are async, you may need to handle promises here in case the commands are still executed several times due to the delayed return of the first one.
Adding extra memory here shall be worthy for a cache, and the cache map shall be cleared in the onPrepared stage (to save build memory usage), as at this moment we can no longer add pages.
There was a problem hiding this comment.
Pull request overview
Fixes @vuepress/plugin-git commit collection when VuePress runs from a subdirectory / monorepo setup by ensuring git commands run from the actual repository root and file paths are resolved relative to that root.
Changes:
- Add a helper to detect the git repository root via
git rev-parse --show-toplevel. - Convert the target file path to repo-relative form and run
git logfrom the detected repo root.
| const format = getGitLogFormat(options) | ||
| const gitRoot = await getGitRepoRoot(filepath, cwd) | ||
|
|
There was a problem hiding this comment.
getGitRepoRoot() is executed for every file in getRawCommits(), which adds an extra git spawn per file and can noticeably slow down builds on large sites (especially since getCommits() runs for every page). Consider memoizing repo roots by directory (e.g. a Map keyed by dir/cwd) or computing the repo root once per getCommits() call and reusing it for all filepaths.
| if (code === 0) { | ||
| resolve(path.normalize(stdoutData.trim())) | ||
| } else { | ||
| logger.error( |
There was a problem hiding this comment.
getGitRepoRoot() logs an error for any non-zero git rev-parse exit, but git rev-parse --show-toplevel can fail in expected situations (e.g. included files outside the repo / wrong directory). Logging at error level here can create noisy, duplicated output (this function then returns null and git log may fail and be logged again). Consider treating common non-repo failures as a non-error (return null silently or at debug level) and only surface an error once with full context when commit collection ultimately fails.
| logger.error( | |
| logger.debug( |
|
It seems that copilot shares the same view with me. A cache is necessary |
|
Thanks for the update, will review this later. |
| const format = getGitLogFormat(options) | ||
| const gitRoot = await getGitRepoRoot(filepath, cwd) | ||
|
|
||
| try { | ||
| let repoRelativeFilePath = filepath | ||
| if (gitRoot) { | ||
| // Resolve to absolute path first, then convert to repo-relative path | ||
| const absFilePath = path.isAbsolute(repoRelativeFilePath) | ||
| ? repoRelativeFilePath | ||
| : path.resolve(cwd, repoRelativeFilePath) | ||
|
|
||
| repoRelativeFilePath = path.relative(gitRoot, absFilePath) | ||
| } else { | ||
| logger.warn( | ||
| `Failed to resolve git repo root for "${filepath}" under cwd "${cwd}", falling back to cwd; git history may be incomplete.`, | ||
| ) | ||
| } | ||
|
|
||
| const stdout = await runGitLog( | ||
| [ | ||
| '--max-count=-1', | ||
| `--format=${format}${COMMIT_SPLITTER}`, | ||
| '--date=unix', | ||
| '--follow', | ||
| '--', | ||
| filepath, | ||
| repoRelativeFilePath, | ||
| ], | ||
| cwd, | ||
| gitRoot || cwd, |
There was a problem hiding this comment.
getCommits() can now fetch commit histories from different git roots within a single call (because getGitRepoRoot() is computed per filepath). However, mergeRawCommits() still merges commits solely by hash, which assumes all commits come from the same repository. If filepaths ever span multiple repos (e.g. via gitInclude pointing outside a submodule), identical hashes across repos could be merged incorrectly. Consider either enforcing that all filepaths resolve to the same gitRoot (and warn/skip otherwise), or include the repo identity (e.g. gitRoot) in the merge key / raw commit data so merges remain correct across repos.
| if (cachedResult !== undefined || gitRepoRootResultCache.has(dir)) | ||
| return Promise.resolve(cachedResult ?? null) |
There was a problem hiding this comment.
The cache hit check is overly complex: gitRepoRootResultCache stores only string | null, so get(dir) returning undefined already means “not cached”. The extra || gitRepoRootResultCache.has(dir) branch is redundant and makes it harder to reason about caching null vs “missing”. Consider rewriting this to a single has(dir) check (or a single cachedResult !== undefined check) for clarity.
| if (cachedResult !== undefined || gitRepoRootResultCache.has(dir)) | |
| return Promise.resolve(cachedResult ?? null) | |
| if (cachedResult !== undefined) return Promise.resolve(cachedResult) |
| logger.error( | ||
| `git rev-parse failed (code=${code}): ${stderrData.trim()}`, | ||
| ) | ||
| resolve(null) |
There was a problem hiding this comment.
When git rev-parse --show-toplevel fails, the error log currently doesn’t include which directory it was executed in. Including dir (or the original filePath) in the logged message would make it much easier to diagnose submodule / monorepo edge cases, especially since failures may be intermittent depending on which file is being processed.
Mister-Hope
left a comment
There was a problem hiding this comment.
I am not sure how the logger.error will be triggered, can you plz provide some examples? Could it be too noisy in some cases?
|
@pengzhanbo Please review it. I think the changelog might be broken after this is merged with submodules, but previously it will be broken too. We might need another PR to fix that. |
|
The related Vue component of the plugin renders a link pointing to the repository's commit URL, which consists of the repository address + hash. If the file belongs to a sub-repository, this commit record will lead to an incorrect main repository commit link. |
During the document compilation process, the version repository was corrupted (deleting the .git directory during compilation can generate logs). Of course, this is an extreme case, and there might also be noise. If necessary, I can delete this part of the log and keep only the error-handling section. |
|
Can you try to fix changelog in the same PR? |
|
I’ve tried to fix the link issue. @Mister-Hope @pengzhanbo Please review it when convenient. Thanks! |
|
Appreciated, I am not an expert at Git, so let me review it carefully across different agents with their explanations about these commands. |
|
@pengzhanbo Local security check suggests git plugin is now dangerous as filePath are passed to command args without validating. It would be dangerous if the filePath is rewritten to something like Could you check this part once this PR is merged? I will currently leave these security issue here. |
Before submitting the PR, please make sure you do the following
close #123).What is the purpose of this pull request?
Description
When VuePress is built from a subdirectory or within a monorepo, plugin-git may fail to locate files correctly.
In my case, the documentation content of the site is maintained in an independent private repository and linked via Git submodules (the site itself is public, but the content repository is private). In this setup, the Git plugin is unable to correctly retrieve commit metadata such as author and date.
This PR improves the behavior with the following workflow:
1. Detect the Git repository root using:
git rev-parse --show-toplevel2. Resolve the target file path:
• Convert to absolute path first
• Then transform it into a path relative to the detected repository root
3. Execute git log from the repository root directory instead of the current working directory
Screenshots
Before
After