Skip to content

feat: Filter copr_build jobs based on paths and files changed#2780

Open
LecrisUT wants to merge 4 commits intopackit:mainfrom
LecrisUT:feat/paths
Open

feat: Filter copr_build jobs based on paths and files changed#2780
LecrisUT wants to merge 4 commits intopackit:mainfrom
LecrisUT:feat/paths

Conversation

@LecrisUT
Copy link
Copy Markdown
Collaborator

@LecrisUT LecrisUT commented Apr 16, 2025

TODO:

  • Skip the check if /packit command is used
  • Actually check the files changed
  • Write new tests or update the old ones to cover new functionality.
  • Update or write new documentation in packit/packit.dev.

Depends-on: packit/ogr#921

Fixes packit/packit#1997
Fixes #2006

RELEASE NOTES BEGIN

copr_build jobs are triggered only if there are changed files under the paths field

RELEASE NOTES END

Comment thread packit_service/worker/checker/copr.py Outdated
@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

This change depends on a change that failed to merge.

Change packit/ogr#921 is needed.

Comment thread packit_service/worker/checker/copr.py Outdated
Comment on lines +180 to +184
for changed_file in self.get_files_changed():
# Check if any of the files changed are under the paths that are being tracked
if any(changed_file.is_relative_to(p) for p in paths):
return True
return False
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what do we do if we have an empty commit, like the Fedora rebuild commits? Should it be a special case to skip this check?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the Fedora rebuild commits?

those wouldn't be happening in upstream, so I don't think we have to think about that, or am I missing something?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if they want to have that behaviour. I occasionally do that 1, but if it's documented that it is or it is not supported, than that's fine as well.

Footnotes

  1. https://github.com/LecrisUT/FedoraRPM-atuin/commit/99555ccec9f6ed94fe474767e3b88f18aa835387

Comment thread packit_service/events/github/push.py Outdated
@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

@lbarcziova
Copy link
Copy Markdown
Member

I briefly looked into the code and I am thinking if it wouldn't be more fitting to have the logic for checking this in this method, which would mean it would work generally, not just for Copr build jobs.

@LecrisUT
Copy link
Copy Markdown
Collaborator Author

I briefly looked into the code and I am thinking if it wouldn't be more fitting to have the logic for checking this in this method, which would mean it would work generally, not just for Copr build jobs.

Probably, but it's not as extendable as with the Checker, needing another place to be aware of that check are being done. I see that there already are checks like pr_labels_match_configuration. What about extending the Checker design (or rather reimplement a similar/simpler class) or at least aggregate the checks in a list of lambdas?

@lbarcziova lbarcziova moved this from new to in-progress in Packit Kanban Board May 29, 2025
@lbarcziova
Copy link
Copy Markdown
Member

@LecrisUT can you elaborate more on what you are suggesting? I agree the code I linked could be refactored, but it is something different than the Checker's logic, as that is tight to the handlers, while the checks I linked are meant for actually getting the relevant job configurations to the particular event.

@LecrisUT
Copy link
Copy Markdown
Collaborator Author

LecrisUT commented May 30, 2025

@LecrisUT can you elaborate more on what you are suggesting? I agree the code I linked could be refactored, but it is something different than the Checker's logic, as that is tight to the handlers, while the checks I linked are meant for actually getting the relevant job configurations to the particular event.

Indeed, not using the same Checker, but mainly not having all the checks implemented in the function, i.e. something like

    def get_jobs_matching_event(self) -> list[JobConfig]:
        jobs_matching_trigger = []
        checkers = JobChecker.get_all_checkers()
        for job in self.event.packages_config.get_job_views():
            if all(c.check(job) for c in checkers):
                jobs_matching_trigger.append(job)

The Checker design that I like is being able to just create a class and boom a new check is introduced, but I would design it a bit simpler, like

class JobChecker:
    all_checkers: Final[ClassVar[list[type[JobChecker]]]] = []

    def __init_subclass__(cls):
        # Register all checkers when the class is created
        cls.all_checkers.append(cls)

    @classmethod
    def get_all_checkers(cls) -> list[JobChecker]:
        return  [checker_cls() for checker_cls in cls.all_checkers]

    def check(job: JobView) -> bool: ...

(Of course with a better interface, constructor, etc.)

@lbarcziova
Copy link
Copy Markdown
Member

@LecrisUT understood, thanks for the explanation! Yes, that sounds like it could improve the readability and extensibility.

@lachmanfrantisek
Copy link
Copy Markdown
Member

Hi @LecrisUT , thanks for all the work so far. Do you think it's real to finish this? (Asking for the Copr team that would hugely benefit from this..) Anything we can help with?

@LecrisUT
Copy link
Copy Markdown
Collaborator Author

I can try to make some progress this week.

One question to be answered is #2780 (comment). There are a few ways to approach this:

  • Use ogr and navigate to the PR and get the files changed from there
  • Use the webhook metadata directly
  • Teach ogr to give compare patches 1 and use the before-after that are already in the webhook payload

I am inclined to go with the 3rd option, just because it will come in handy for the future.

Footnotes

  1. https://github.com/PyGithub/PyGithub/blob/2d4785dd08374147a8eab866eeb704508558336a/github/Repository.py#L1342

@lachmanfrantisek
Copy link
Copy Markdown
Member

Nice, thanks @LecrisUT !

I don't have a strong opinion but using webhook might avoid extra API calls (compared to OGR approach). But on the other hand, not sure if this would be same for other git forges. (With OGR, we might be able to find a way more easily.)

@lbarcziova
Copy link
Copy Markdown
Member

My suggestion would be to try to use webhooks directly as it seems the most straightforward (we have also already been having issues with API rate limits).

@LecrisUT
Copy link
Copy Markdown
Collaborator Author

Where do I get the push event? Otherwise, it seems we still need the ogr changes for the pull-request trigger, the webhook does not have equivalent data.

@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

@betulependule betulependule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. It looks good to me so far. Do you think you'll be able to make progress with this soon?

@betulependule
Copy link
Copy Markdown
Contributor

Where do I get the push event? Otherwise, it seems we still need the ogr changes for the pull-request trigger, the webhook does not have equivalent data.

Could you please elaborate what you mean?

Comment thread packit_service/worker/checker/copr.py Outdated
@LecrisUT
Copy link
Copy Markdown
Collaborator Author

Nice work. It looks good to me so far. Do you think you'll be able to make progress with this soon?

Yes, it's just one blocker to figure out and adding the relevant tests

Where do I get the push event? Otherwise, it seems we still need the ogr changes for the pull-request trigger, the webhook does not have equivalent data.

Could you please elaborate what you mean?

Sure, when I was looking at this, I couldn't find the appropriate mixin that can be used with Checker so that we have the Commit class that would now include the list of commits that is added in f57d857. We have GetPagurePullRequestMixin but I couldn't find the equivalent that would do for upstream PRs.

@betulependule
Copy link
Copy Markdown
Contributor

Sure, when I was looking at this, I couldn't find the appropriate mixin that can be used with Checker so that we have the Commit class that would now include the list of commits that is added in f57d857. We have GetPagurePullRequestMixin but I couldn't find the equivalent that would do for upstream PRs.

Yes, you're right that there are no equivalents for upstream PRs (as far as I can tell). I suppose we'll need a new Mixin class for this, then? @lbarcziova, what do you think? I'm still rather new to this project so I don't understand this part too much just yet 😅.

Comment thread packit_service/worker/checker/copr.py Outdated
Comment thread packit_service/worker/checker/copr.py Outdated
Comment on lines +157 to +158
if self.job_config.trigger == JobConfigTriggerType.pull_request:
changes = self.copr_build_helper.pull_request_object.changes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the definitions of GitHub and GitLab definitions of the MR / PR webhook events and you are definitely right that the information related to what files were changed is not present.

Do I understand correctly that your current approach is the following?

  1. In parser.py, once a PR / MR webhook event is received, use ogr to get a list of commits (cast as CommitInfo) contained in this PR / MR.
  2. Inside the AreFilesChanged class, take the list of commits and aggregate the three lists of files added, modified and removed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, it does not go through the ogr and with the changes in the parser.py (parse_github_push_event) it would be materializing the webhook data to a more complete python object with the included files changed (but I did not properly test it, the failing CI should give us some reference for where this is connected).

2. Inside the AreFilesChanged class, take the list of commits and aggregate the three lists of files added, modified and removed.

That is what it is supposed to happen (replacing self.copr_build_helper.pull_request_object.changes, that is still from the ogr required change, I will drop it to make it clearer). As soon as we can get the output of parse_github_push_event in this checker class using a mixin, then that would unlock everything that we were missing here.

@softwarefactory-project-zuul
Copy link
Copy Markdown
Contributor

Comment thread packit_service/worker/checker/copr.py Outdated
if not isinstance(pr_event, (github.pr.Action, gitlab.mr.Action)):
# TODO: What about comments?
raise NotImplementedError()
# TODO: How to handle PRs?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we create another Mixin class, which will use the existing get_pr_files_diff function in ogr to retrieve the files diff? It should return a dictionary, so we would need to process it into a list of files changed.

// dictionary representing files diff
files_changed = project.get_pr_files_diff(pr_id=id_from_pr_event) 

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to be available for github and gitlab? https://github.com/search?q=repo%3Apackit%2Fogr%20get_pr_files_diff&type=code And I think the internal API would not make it efficient to query that format. Maybe we go through packit/ogr#921?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. In that case using packit/ogr#921 sounds good to me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Shall we just try-except this part and make it behave as previously until packit/ogr#921 lands? Or we wait for that first?

Thing is I don't know how raise is handled and if it is safe to use.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat, the depends-on should actually work according to zuul below 🤔. I guess we can test it with that cross-dependency

Signed-off-by: Cristian Le <git@lecris.dev>
Signed-off-by: Cristian Le <git@lecris.dev>
@centosinfra-prod-github-app
Copy link
Copy Markdown
Contributor

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/packit/ogr for 921,9f0bec667f0c9634e7a3c8581d8ec3a31987de1e

@LecrisUT LecrisUT marked this pull request as ready for review April 15, 2026 14:42
@LecrisUT LecrisUT requested review from a team, majamassarini and mfocko as code owners April 15, 2026 14:42
@LecrisUT LecrisUT changed the title [WIP] feat: Filter copr_build jobs based on paths and files changed feat: Filter copr_build jobs based on paths and files changed Apr 15, 2026
Signed-off-by: Cristian Le <git@lecris.dev>
@centosinfra-prod-github-app
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: in-progress

Development

Successfully merging this pull request may close these issues.

[RFE] Filter copr builds on path change When using monorepo, react only on changes done to configured paths

6 participants