4 different benchmark sets focused around str.replace_all#7
Conversation
|
Thank you for the submission! However, I am worried about the diversity of the benchmark families. Especially, the second set with its 3170 benchmarks is large. I would also suggest merging the sets into one family and using sub folders to structure then. Finally, can you insert white paces between the solvers in the |
|
I removed some of the benchmarks. There might be some merit in the future to have all 3170 benchmarks or find a more refined subset but right now they all seem equally hard to solve for string solvers. |
|
Thank you for the update and the submission in general. From our perspective, the size of a benchmark set is a difficult question. In the past we accepted some large, but not very diverse benchmark sets. Those are imho not very useful. I suspect that the subset you selected already give the SMT solver developers some hard nuts to crack. |
This pull request introduces four new benchmark sets aimed at evaluating and challenging string solvers on diverse and difficult problem instances focusing on the operator str.replace_all.
Benchmark Sets
1. pcp-3-3-random (1000 benchmarks)
2. pcp-3-4-hard (3170 benchmarks)
Benchmarks encoding PCP[3,4] that were considered hard to solve by the paper "Creating Difficult Instances of the Post Correspondence Problem". The instances still have 3 tiles but strings up to length 4. Now the length is not exact but at most 4.
The benchmarks were originally considered hard; many have later been solved using alternative techniques such as Parikh Automata and Model Checking. However, string solvers have not been extensively tested on these benchmarks. Current experiments show that none of those benchmarks are solved by string solvers.
3. rna-sat and rna-unsat (500+500 benchmarks)
These benchmarks model a reverse transcription process inspired by bioinformatics.
An unknown RNA string
yis converted into a DNA string by applying a series ofstr.replace_alloperations that simulate nucleotide base pairing.