Skip to content

Integrate BugsInPy#184

Open
t-sorger wants to merge 52 commits intomasterfrom
BugsInPy
Open

Integrate BugsInPy#184
t-sorger wants to merge 52 commits intomasterfrom
BugsInPy

Conversation

@t-sorger
Copy link
Copy Markdown
Collaborator

WIP: Issue #178

@andre15silva
Copy link
Copy Markdown
Member

Hi @t-sorger,

Any update here?

FYI, it would probably be a good idea to rebase with master, I made some updates recently.

@t-sorger
Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva,

I am still encountering issues with BugsInPy and how it works.
I will send you an email so we can arrange a meeting to discuss the problems I’m facing.

Thanks for the hint, I will rebase with master as soon as possible!

@monperrus
Copy link
Copy Markdown
Contributor

hi all, how to complete this important task?

@andre15silva
Copy link
Copy Markdown
Member

hi all, how to complete this important task?

The major blocker right now is the execution of samples since executing them locally requires installing a lot of dependencies.

@t-sorger is working on dockerizing BugsInPy so the samples execute in an isolated environment with their own set of dependencies

@monperrus
Copy link
Copy Markdown
Contributor

monperrus commented Mar 24, 2025 via email

@andre15silva
Copy link
Copy Markdown
Member

Hey @t-sorger , what is the status of this PR?

@t-sorger
Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva, I've been very busy lately finishing my thesis. Therefore, I will continue working on this PR after my defence, which is scheduled for mid-June.

@andre15silva
Copy link
Copy Markdown
Member

Thanks for the updates @t-sorger ! What are the current blockers here?

@t-sorger
Copy link
Copy Markdown
Collaborator Author

t-sorger commented Jul 1, 2025

Hi @andre15silva, I still need to check why the tests are running locally but not here; I haven’t had a chance to look into it yet.
Other than that, the tests for core should be more or less done. There were some internal dependency issues that I need to double-check and figure out which bugs are affected and how to resolve them.
Next step (please correct me if I’m wrong) would be to continue/start writing the tests for sampling and evaluation.

@andre15silva
Copy link
Copy Markdown
Member

Got it.

As for the next steps, yes that's about it. The list we had defined is still valid: #178 (comment)

@t-sorger
Copy link
Copy Markdown
Collaborator Author

Hi @andre15silva,
I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them.
From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why.
Is there anything else I should keep in mind?
Thanks!

@andre15silva
Copy link
Copy Markdown
Member

Hi @andre15silva, I’d like to run a test across the entire benchmark now to identify unreproducible bugs and exclude them. From my understanding, this can be done by running test_checkout_all_bugs and test_run_all_bugs and then checking which ones fail and why. Is there anything else I should keep in mind? Thanks!

Hi @t-sorger !

test_checkout_all_bugs is just to ensure they can all be checked-out
test_run_all_bugs should check out each bug and run both the buggy and fixed version.

For identifying the flaky ones, you want to run them several times. One solution is to add a for loop in the test_run_all_bugs, to confirm e.g. 5 times that the results are as expected for both version of each sample.

@t-sorger
Copy link
Copy Markdown
Collaborator Author

t-sorger commented Oct 1, 2025

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run).
I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

@andre15silva
Copy link
Copy Markdown
Member

The test_checkout_all_bugs runs fine for all bugs (takes around 3 hours to run). I also started running the test_run_all_bugs. They take quite a while, so I let them run for a few hours (~130 bugs in so far). It looks like many of them fail, some, which I double-checked manually, with a command not found error. I’m not sure if I’m missing a dependency, but the library’s documentation doesn’t mention any additional installation steps or external dependencies required.

How many failed to run the tests and what command fails to run? Would be nice to have the statistics of this and a list of common errors.

@monperrus
Copy link
Copy Markdown
Contributor

ping @t-sorger for completion. thanks!

@t-sorger
Copy link
Copy Markdown
Collaborator Author

After running the tests on all the bugs, I got the following results.

Is there a specific project I should prioritise to analyse why they fail or why the command not found errors occur? I remember using ansible and pysnooper to go through the setup process manually to understand the flow, so I may have installed some dependencies when the setup failed. I assume some more dependencies are missing for the other projects as well.

@monperrus
Copy link
Copy Markdown
Contributor

you can debug with any project and hopefully the root cause and its fix will be shared with the other ones.

@t-sorger
Copy link
Copy Markdown
Collaborator Author

t-sorger commented Mar 5, 2026

I added more extensive logging and ran it over the entire dataset again, so I can now investigate the log files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants