Include non-code files in dependency-based analysis via FILE_LEVEL fallback#11
Draft
Copilot wants to merge 14 commits into
Draft
Include non-code files in dependency-based analysis via FILE_LEVEL fallback#11Copilot wants to merge 14 commits into
Copilot wants to merge 14 commits into
Conversation
Copilot
AI
changed the title
[WIP] Add support for non-code files in dependency analysis
Include non-code files in dependency-based analysis via FILE_LEVEL fallback
Apr 28, 2026
Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/4b62fa28-3730-42ad-b839-2b9a534b6cb4 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
…CLI flag Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/728f8c8c-7f13-402c-9f1d-78a97d55d181 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/5e83c305-e4d2-4110-9c07-9115f74e779c Co-authored-by: Thomas Bock <bockthom@cmu.edu>
…ll.ts Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/8ac8d730-d57e-4a93-8ae2-9913e965f113 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
…s have no commits Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/71ad0181-6804-4401-bd54-7c41a20cee1b Co-authored-by: Thomas Bock <bockthom@cmu.edu>
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
…ement Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/54ae5ce0-944c-4d66-ad07-b966169cd4a8 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/940b6e95-31b6-40b3-a3b5-717dd395cf15 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/8fa45b90-96b2-4b56-9e28-b373eb50240a Co-authored-by: Thomas Bock <bockthom@cmu.edu>
…windows Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/a5144f38-53d6-465b-9909-f81152ebd84c Co-authored-by: bockthom <5575176+bockthom@users.noreply.github.com>
…ommit repo Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/2e8961f6-a26f-49dc-b1c5-336bcfa5b48e Co-authored-by: bockthom <5575176+bockthom@users.noreply.github.com>
Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/9eebcada-84a1-42b8-875a-d4f5cca423b0 Co-authored-by: bockthom <5575176+bockthom@users.noreply.github.com>
…and ls-tree Prevent Python 2 UnicodeDecodeError in execute_command when filenames contain non-ASCII bytes Agent-Logs-Url: https://github.com/se-sic/codeface/sessions/4586695e-7500-4873-aff0-bca899936fd8 Co-authored-by: Thomas Bock <bockthom@cmu.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0inexecute_commandexecute_commandto safely buildjcmdin Python 2 whencmdcontains mixed unicode and bytes elements (repo path from PyYAML is unicode, filenames from git output are bytes)Original prompt
Create a pull request in
se-sic/codefacetargeting branchinfosaar-updatesto include non-code files (e.g., Markdown) in the dependency-based analysis pipeline so thatcommit_dependencyis not empty for repos that primarily contain non-code files.Context:
codeface/R/analyse_ts.r, functiongen.full.ts) returns NULL when there are no commits represented incommit_dependency..md, function extraction via Doxygen/ctags yields no functions, so no dependencies are computed and nothing is inserted intocommit_dependency.Proposed fix (minimal-impact):
codeface/VCS.pywithingitVCS._getFunctionLines(self, file_layout_src, file_commit), after the existing ctags fallback (if not func_lines: func_lines = self._parseSrcFileCtags(...)), add a second fallback:func_linesis still empty, synthesize a single file-level artifact for the whole file, e.g.func_lines = {0: "FILE_LEVEL"}.file_commit.artefact_line_range = Truefor this synthetic artifact.commit_dependency.gitVCS.addFiles4Analysis()(same filecodeface/VCS.py) to include.md(and optionally.markdownif appropriate)..mdchanges socommit_dependencyis populated withFILE_LEVELentries. If creating a new fixture is too heavy, add a focused unit test around_getFunctionLinesto assert the fallback behavior for.mdcontent.PR Requirements:
Files likely to change:
codeface/VCS.pycodeface/test/...(choose appropriate location)Notes:
FILE_LEVELis already used as a special entityId in the repo (e.g., R experiments), so reuse it.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: ```
gen.full.ts <- function(conf) {
boundaries <- conf$boundaries
full.series <- vector("list", dim(boundaries)[1])
tstamps <- conf$tstamps.release
subset <- c("commitDate", "AddedLines", "DeletedLines")
ts <- get.commits.by.ranges(conf, subset, make.index.unique)
if (dim(boundaries)[1] != length(ts)) {
stop("Internal error: Release boundaries don't match ts list length")
}
for (i in 1:length(ts)) {
ts[[i]]$ChangedLines <- ts[[i]]$AddedLines + ts[[i]]$DeletedLines
full.series[[i]] <- na.omit(xts(ts[[i]]$ChangedLines,
order.by=ts[[i]]$commitDate))
full.series[[i]] <- trim.series(full.series[[i]], boundaries$date.start[i],
boundaries$date.end[i])
}
full.series <- full.series[sapply(full.series, length)!=0]
full.series <- do.call(c, full.series)
return (full.series)
}
If no
full.series[[i]]haslength(...) != 0, thenfull.seriesbecomes an empty list (list()), and then:In R,
do.call(c, list())evaluates toNULL, so the function returnsNULL.Typical reasons all series become empty
get.commits.by.ranges(...)returns data for each range, but aftertrim.series(...)the xts objects contain no points (e.g., commit timestamps fall outsideboundaries$date.start[i]..date.end[i], or boundaries are wrong).xts(...)construction +na.omit(...)produces a 0-length series for every range (e.g., allChangedLinesareNA, or there are no commits at all in each range).When it doesn’t return
NULLIf at least one release range yields a non-empty trimmed xts series,
do.call(c, ...)returns an xts object (possibly with only that one segment), notNULL.User: how can it be that all series become empty? there are definitely commits in the repo. but none that are part of commit_dependency
Assistant: All series can become empty even if the repository has plenty of commits, because
gen.full.ts()does not look at “all commits in the repo”. It only...This pull request was created from Copilot chat.