feat: Proposed SIMBAUQ Sampling Strategy by radum2275 · Pull Request #785 · generative-computing/mellea

radum2275 · 2026-04-03T16:39:57Z

Sampling Strategy PR

Use this template when adding or modifying sampling strategies in mellea/stdlib/sampling/.

Description

Link to Issue: Fixes Proposal: Integrating Similarity‑Based Aggregation for Uncertainty Quantification into Mellea #718

Implementation Checklist

Base Class

Extends appropriate base class:
- BaseSamplingStrategy if your changes are mostly modifying the repair and/or select_from_failure functions
- SamplingStrategy if your changes involve a new sample method
- Other defined sampling strategies if your implementation is similar to existing implementations

Return Value

Returns a properly typed SamplingResult. Specifically, this means:
- ModelOutputThunks in sample_generations are properly typed from the Component and the parsed_repr is the expected type.

Integration

Strategy exported in mellea/stdlib/sampling/__init__.py

Testing

Tests added to tests/sampling/
New code has 100% coverage
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

github-actions · 2026-04-03T16:40:10Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1

Also noticed we don't export SOFAISamplingStrategy in all - not an issue from this PR, but observed

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 · 2026-04-09T09:35:01Z

@planetf1 @jakelorocco

I made all the required changes. I also replaced the RITS backend in my example with the ollama one.

However, I need some guidance with the following: the "classifier" confidence estimation method we developed requires a probabilisitic (skelearn) classifier, which we either receive from the user or we train it on-the-fly based off of the training examples provided by the user. I'd like to pre-train one using the datasets we already collected for our paper and have it as default option but it needs to live somewhere in the package as a serialised object (e.g., pickle file). What would be the best way to do that without messing up too much with the package structure. Thanks.

psschwei · 2026-04-09T11:00:39Z

it needs to live somewhere in the package as a serialised object (e.g., pickle file)

Do you have an estimate on how large this file would be? If it's tens of MBs that's probably not a problem, but if we're looking at hundreds of MBs or a GB+ then could be a different story.

radum2275 · 2026-04-09T11:11:39Z

@psschwei it's actually not that big. the one we trained for our paper was about 250KB. it's a basic sklearn RandomForestClassifier.

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

psschwei · 2026-04-09T11:15:50Z

about 250KB

cool, I don't think that will be a problem

psschwei · 2026-04-09T11:26:53Z

I'm assuming then we would need to add sklearn as another dependency?

If we're pickling in a file, we probably should consider how to best do so safely.

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 · 2026-04-09T11:51:01Z

@psschwei yes, added scikit-learn as a required dependency in pyproject.toml

radum2275 · 2026-04-09T11:53:48Z

I'm assuming then we would need to add sklearn as another dependency?

If we're pickling in a file, we probably should consider how to best do so safely.

sure, any suggestion is welcome :). i usually do this:

pickle.dump(model, open(filename, 'wb'))
# some time later...
loaded_model = pickle.load(open(filename, 'rb'))

planetf1 · 2026-04-10T15:57:00Z

+            except ImportError:
+                msg = (
+                    "scipy is required for harmonic mean aggregation. "
+                    "Please install with `pip install scipy`."


correct command, but if this is base library should we have scipy as a core dependency? And what's the relationship to granite-retriever?

I added scipy to the core dependencies next to numpy and scikit-learn required by this sampling strategy. However, as far as I know scikit-learn requires scipy, so probably we only need numpy and scikit-learn. Please advise.

The granite_retriever dependency group defined in pyproject.toml already contains the sentence-transformers required by the sbert similarity metric. Should we move sentence-transformers to the core dependencies?

planetf1 · 2026-04-10T15:58:39Z

I'm assuming then we would need to add sklearn as another dependency?

If we're pickling in a file, we probably should consider how to best do so safely.

Apologies noticed this after doing a per-line review.
There's no sklearn any more -- it's scikit-learn

Also I'm not 100% sure of the intent -- if this is stdlib is it core function? If so shouldn't all dependencies be in our core dependencies.

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

psschwei · 2026-04-10T16:58:52Z

Also I'm not 100% sure of the intent -- if this is stdlib is it core function? If so shouldn't all dependencies be in our core dependencies.

Agree, deps should be in the core deps if this is going into core.
I could also see a case for putting this in contribs rather than core stdlib (numpy/sklearn (old habits die hard)/scipy aren't huge like pytorch but they do add some size)

planetf1 · 2026-04-13T10:33:37Z

Also I'm not 100% sure of the intent -- if this is stdlib is it core function? If so shouldn't all dependencies be in our core dependencies.

Agree, deps should be in the core deps if this is going into core. I could also see a case for putting this in contribs rather than core stdlib (numpy/sklearn (old habits die hard)/scipy aren't huge like pytorch but they do add some size)

So the key decision here is where this code belongs. What do you think @jakelorocco -- is this core stdlib, or an optional, additional strategy?

I can see there might be three options
a) core
b) shipped with mellea, but not core & in an optional package (we may not have precedent for this, but it's an option)
c) contrib

once we've agreed on this we can nail the actual dependency config needed.

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 · 2026-04-17T18:43:08Z

@planetf1 either option is fine with me.

Another question: I’ve trained an sklearn classifier on the datasets from our paper so it can be used with the classifier confidence estimation method. It’s stored as a standard pickle file. Would it be possible to ship it with mellea?

planetf1 · 2026-04-20T10:04:58Z

@planetf1 either option is fine with me.

Another question: I’ve trained an sklearn classifier on the datasets from our paper so it can be used with the classifier confidence estimation method. It’s stored as a standard pickle file. Would it be possible to ship it with mellea?

You could use joblib to be consistent with library overall?
On pickle - I guess if it's small it's ok - but there's no provenence? Is it desirable to ship the training process/data?

@jakelorocco @HendrikStrobelt any thoughts on where this code should go. mellea (core, optional), contrib?

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 · 2026-04-21T08:18:46Z

@planetf1 Actually I realised that providing a trained classifier (i.e., pickle file) with the library is not really a good idea. The two distributions (target domain and classifier) shouldn't be too different in order to get good performance. Instead, I extended the example script docs/examples/simbauq/simbauq_example.py with an example of a standalone classifier trained on data sampled from a HF dataset (that's basically what we did in our paper).

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

nrfulton · 2026-05-13T18:23:07Z

Is there a reason this is proposed for stdlib?

radum2275 · 2026-05-14T09:09:07Z

@jakelorocco suggested adding it there when i first proposed it. i don't think the exact location matters, as long as it's usable downstream.

@nrfulton

planetf1

Thanks for the continued work on this, @radum2275 — a few things not caught in previous rounds below.

On @nrfulton's question about whether this belongs in stdlib: I think it does conceptually — it's a sampling algorithm that sits naturally alongside sofai.py and majority_voting.py. The dependency question is separable though. The existing pattern in this repo is optional extras (see granite_retriever for sentence-transformers). A simbauq extra for scikit-learn would keep the strategy in mellea.stdlib.sampling without forcing the ML deps on everyone:

[project.optional-dependencies]
simbauq = ["scikit-learn", "numpy<=2.2"]

scipy can be dropped entirely (see inline comment on _aggregate), and sentence-transformers is already behind granite_retriever. The import guards are already in place — they'd just need the install hint updated to point at the new extra.

planetf1 · 2026-05-14T11:41:19Z

+            result_mot.parsed_repr = action.parse(result_mot)
+            all_mots.append(result_mot)
+            all_contexts.append(result_ctx)
+            all_actions.append(action)


This appends the same action reference N times — the deep copies created for each generation task are discarded here. So SamplingResult.sample_actions[0] through [N-1] all point at the same object. Either keep the per-task copies or deepcopy(action) here.

Radu Marinescu added 4 commits April 2, 2026 14:12

feat: initial commit for the SIMBAUQSamplingStrategy

c5236f0

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: added a separate filed to mot.meta for the similarity matrix

ea51043

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: added a second aggregation by classification CE algorithm

5c23a58

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

refactor: revised and moved the SIMBAUQSamplingStrategy in docs/examples

d7f3b6a

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

radum2275 requested a review from a team as a code owner April 3, 2026 16:39

jakelorocco changed the title ~~Proposed SIMBAUQ Sampling Strategy~~ feat: Proposed SIMBAUQ Sampling Strategy Apr 3, 2026

github-actions Bot added the enhancement New feature or request label Apr 3, 2026

planetf1 requested changes Apr 7, 2026

View reviewed changes

Update test/stdlib/sampling/test_simbauq.py

908258c

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

radum2275 requested a review from a team as a code owner April 7, 2026 18:50

radum2275 requested review from HendrikStrobelt and nrfulton April 7, 2026 18:50

radum2275 and others added 10 commits April 7, 2026 19:50

Update docs/examples/simbauq/simbauq_example.py

8b8c336

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update .gitignore

865e85f

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update docs/examples/simbauq/README.md

a6b356a

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update docs/examples/simbauq/README.md

cbae30c

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

a3c51a8

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

e9b05f1

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

372046a

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

refactor: refactored the simbauq sampling strategy

af55899

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

fix: added the ollama backend in simbauq example

da1440d

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: set aggregation by mean in simbauq example

11b180f

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

psschwei reviewed Apr 9, 2026

View reviewed changes

Comment thread docs/examples/simbauq/README.md Outdated

chore: fixed a typo in the simbauq README.md file

6c6c099

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: added scikit-learn as required dependency for simbauq strategy

78fe6c7

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

planetf1 requested changes Apr 10, 2026

View reviewed changes

radum2275 and others added 5 commits April 10, 2026 17:31

Update test/stdlib/sampling/test_simbauq.py

65a1268

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update test/stdlib/sampling/test_simbauq.py

41728a5

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

f90a466

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

1cd588c

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

Update mellea/stdlib/sampling/simbauq.py

c8bd228

Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>

chore: revised the dependencies for simbauq strategy

e0b5952

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

refactor: added two more similarity metrics for simbauq strategy

9321c44

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

chore: extended simbauq example with classifier trained on HF dataset

40641c3

Signed-off-by: Radu Marinescu <radu.marinescu@ie.ibm.com>

planetf1 requested changes May 14, 2026

View reviewed changes

Conversation

radum2275 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sampling Strategy PR

Description

Implementation Checklist

Base Class

Return Value

Integration

Testing

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radum2275 commented Apr 9, 2026

Uh oh!

Uh oh!

psschwei commented Apr 9, 2026

Uh oh!

radum2275 commented Apr 9, 2026

Uh oh!

psschwei commented Apr 9, 2026

Uh oh!

psschwei commented Apr 9, 2026

Uh oh!

radum2275 commented Apr 9, 2026

Uh oh!

radum2275 commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

radum2275 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Apr 10, 2026

Uh oh!

psschwei commented Apr 10, 2026

Uh oh!

planetf1 commented Apr 13, 2026

Uh oh!

radum2275 commented Apr 17, 2026

Uh oh!

planetf1 commented Apr 20, 2026

Uh oh!

radum2275 commented Apr 21, 2026

Uh oh!

nrfulton commented May 13, 2026

Uh oh!

radum2275 commented May 14, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

planetf1 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radum2275 commented Apr 3, 2026 •

edited

Loading