Skip to content

Proposal: v3 special constraints proto schema#841

Draft
termoshtt wants to merge 1 commit intomainfrom
proposal/special-constraints-v3
Draft

Proposal: v3 special constraints proto schema#841
termoshtt wants to merge 1 commit intomainfrom
proposal/special-constraints-v3

Conversation

@termoshtt
Copy link
Copy Markdown
Member

Summary

  • Drafts a proto schema proposal for OMMX v3 special constraints, replacing the v2 ConstraintHints-based representation with first-class messages for OneHot / SOS1 / Indicator.
  • Unifies the renamed RegularConstraint and the three special-constraint types under a single shape: ID held by the enclosing collection (map<uint64, T>), inline optional RemovedReason, serialized Provenance chain, extracted ConstraintMetadata.
  • Bumps format_version 0 → 1; v3 readers keep loading v2 data via the existing convert_hints_to_collections path; v2 readers refuse v3 data via the existing format_version check.

This PR contains the design doc only (SPECIAL_CONSTRAINTS_V3.md). Proto and Rust SDK changes will land in follow-up PRs once the design is agreed.

Open questions

See the "Open questions" section at the bottom of the doc:

  1. Naming of the renamed regular constraint (RegularConstraint / ScalarConstraint / GeneralConstraint / keep Constraint and rename v2 to LegacyConstraint).
  2. Granularity of the [deprecated = true] markers on ConstraintHints / OneHot / SOS1.
  3. SampledActiveVariable shared between OneHot and SOS1 vs. per-type wrappers.
  4. Solver-adapter migration plan (out of scope for the proto change itself, but downstream adapters touching instance.constraints directly will need updates).

Test plan

  • Design review on the doc — approve naming, deprecation markers, and the inline RemovedReason choice.
  • Sign off on format_version 0 → 1 timing (this PR vs. the follow-up implementation PR).

🤖 Generated with Claude Code

Captures the v3 design discussion: first-class proto messages for
OneHot / SOS1 / Indicator special constraints, unified shape
(ID owned by collection, inline RemovedReason, serialized Provenance,
extracted ConstraintMetadata) shared with a renamed RegularConstraint,
and the corresponding format_version bump and v2 backward-compat path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt added a commit that referenced this pull request Apr 27, 2026
- Reframe the proposal as a prerequisite for #841: the proto wire shape of
  ConstraintMetadata (inline vs. top-level map) is deferred to #841 and
  decided based on the parse / serialize boundary defined by Stage 2 here.
- Add a 3-stage migration diagram (Python API → Rust SoA → proto wire shape)
  and a "Why not Rust-first" subsection acknowledging the alternative order.
- Add ParametricInstance to the Rust SDK design (also owns
  decision_variables: BTreeMap<...> directly).
- Add a section on Python modeling-chain impact for standalone constraints
  ((x[0] + x[1] == 1).add_name("c") chains) with two options and a
  recommendation; flagged as Stage 2 concern.
- Add a "Boundary changes" subsection covering From<v1::*> parse /
  serialize boundaries moving to the collection level, and an "Other types
  affected" subsection for LogicalMemoryProfile derive and pyo3-stub-gen.
- Switch Python examples from `merge(on="id")` to `join()` to reflect that
  entries_to_dataframe sets `id` as the DataFrame index.
- Move removed_reason to a separate long-format DataFrame in the Target API
  block (matching Solution's existing removed_reasons_df pattern).
- Add concrete recommendations to all 6 open questions and add a 7th for
  the Stage 2 modeling-chain choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt added a commit that referenced this pull request Apr 27, 2026
Replace the Stage 1 / Stage 2 / Stage 3 split with a single connected
v3-alpha redesign covering Rust runtime layout and Python API surface.

Major content changes:

- Reframe the goal around "the same fact lives in 3 places (Rust struct,
  Python dict accessor, Python wide df)" and target a single source of
  truth on the Rust side.
- Introduce Series-based collection accessors: `instance.constraints`,
  `decision_variables`, and the special-constraint accessors all become
  `pandas.Series[ID -> Object]` instead of dict / list.
- Make `*_df` methods explicitly derived views: type-specific core
  columns extracted from the Series, joined with sidecar metadata /
  parameters / provenance / removed_reasons dfs.
- Remove all `Constraint.name` / `.subscripts` / `.parameters` /
  `.description` getters from Python wrappers — the only path to
  metadata is the metadata df.
- Drop the Stage 1 / 2 / 3 ordering discussion and the "Why not
  Rust-first" subsection; replace with a Breaking changes section.
- Keep the proto deferral to #841, the Rust SoA design, and the open
  questions intact; add an open question for Series-dtype semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt added a commit that referenced this pull request Apr 27, 2026
## Summary

- Drafts the runtime / Python-API design for v3 metadata storage.
**Prerequisite for #841**: the proto wire shape of `ConstraintMetadata`
(inline per message vs. top-level columnar map) cannot be finalized
until the runtime / Python-API direction here is settled.
- A single connected redesign — the document describes the target shape;
phasing of the implementation across PRs is decided in the
implementation issues, not here.

## Target shape

- **Rust SDK**: metadata moves into ID-keyed Struct-of-Arrays stores.
`ConstraintCollection<T>` owns `ConstraintMetadataStore<T::ID>`;
`Instance` and `ParametricInstance` own `VariableMetadataStore`
directly. `DecisionVariable` and `Constraint<S>` (and the
special-constraint variants) lose their `metadata` field. Parse /
serialize boundaries move from per-element to per-collection. Internal
call sites that read `c.metadata.*` (e.g. `sample_set/extract.rs`)
switch to `collection.metadata()` accessors.
- **Python SDK**:
- `instance.constraints`, `decision_variables`, and the
special-constraint accessors become `pandas.Series[ID -> Object]`
instead of `dict` / `list`.
- `*_df` methods (`constraints_df(kind=..., include=...)`,
`decision_variables_df(include=...)`, and Solution / SampleSet
counterparts) take an `include=` parameter selecting which sidecars
(`"metadata"`, `"parameters"`, `"removed_reasons"`) to fold in.
**Default `include=("metadata","parameters")` reproduces v2's
wide-DataFrame shape** — v2 user code keeps working with just a
`kind=...` argument added.
- Long-format sidecar dfs (`constraint_metadata_df`,
`constraint_parameters_df`, `constraint_provenance_df`,
`constraint_removed_reasons_df`, `variable_metadata_df`,
`variable_parameters_df`) are bulk-built from the SoA store.
`provenance` is intentionally not included via `include=`
(variable-length chains pivot poorly) and is only available as the
long-format df.
- Per-kind `indicator_constraints_df` / `one_hot_constraints_df` /
`sos1_constraints_df` collapse into the single
`constraints_df(kind=...)` overload set, dispatched via `Literal` +
`@overload` so the IDE / type checker still sees kind-specific column
schemas.
- **Wrapper objects with back-reference**: PyO3 wrappers stay rich. They
run in two modes — Standalone (modeling chain, owns a staging bag) or
Attached (collection-derived, holds `Py<Instance>` + id and reads /
writes the SoA store via back-reference). Getters `.name`,
`.subscripts`, `.parameters`, `.description` are preserved; they switch
from owning data to reading the store. The modeling chain `(x[0] + x[1]
== 1).add_name("c")` keeps working through the staging bag, which drains
into the SoA store on insertion.
- **Cross-ID-space JOIN safety**: each constraint kind plus decision
variables uses a kind-qualified index name (`variable_id`,
`regular_constraint_id`, `indicator_constraint_id`,
`one_hot_constraint_id`, `sos1_constraint_id`). The default `include=`
covers most "I want a wide table" cases without manual join, removing
the most common opportunity for a wrong-kind merge.
- **Proto**: deferred to #841, picked once the parse / serialize
boundary here is concrete.

## Relationship to #841

#842 first, #841 second. The proto-schema work in #841 currently
sketches `ConstraintMetadata` inline per constraint message but defers
the wire-shape decision (inline AoS vs. top-level `map<uint64,
ConstraintMetadata>` per collection). That choice becomes concrete only
after this proposal lands and the parse / serialize boundary has a clean
shape.

## Open questions (with recommendations)

See the bottom of the doc for full reasoning.

1. Constraint kind dispatch in Python — **rec: single method with
`Literal` + `@overload`**.
2. `removed_reason` placement — **rec: separate long-format df** (and
`"removed_reasons"` opt-in via `include=` for the wide form).
3. Builder-style metadata setter — **rec: add `insert_with` on the Rust
side** (independent of the Python staging bag).
4. `parameters` Rust-side storage — **rec: nested `FnvHashMap<ID,
FnvHashMap<…>>`**.
5. Optional `subscripts_df` long format — **rec: defer**.
6. Polars as primary in Python — **rec: pandas stays primary for v3**.
7. `drop_constraint` / wrapper invalidation — **rec: do not add
`drop_constraint` in v3**; defer the invalidation semantics until it's
actually needed.
8. Attached wrapper `Py<Instance>` cycles — **rec: documented behavior,
no code-level mitigation**.

## Test plan

- [ ] Design review on the doc — approve the SoA-on-collection
placement, the Standalone/Attached wrapper architecture with
back-reference, the Series + `*_df(include=...)` Python API, and the
v2-compatible default.
- [ ] Sign off on each of the 8 open-question recommendations (or push
back).
- [ ] Sign off on the v3-alpha breaking-change window for the Python
`dict` → `Series` migration and the `*_df` `include=` reshape.
- [ ] Coordinate with #841 on the proto wire shape once the parse /
serialize boundary is concrete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt added a commit that referenced this pull request Apr 28, 2026
…oA stores (#843)

## Summary

Moves per-constraint and per-variable auxiliary metadata (`name`,
`subscripts`, `parameters`, `description`, `provenance`) off per-element
structs and onto Struct-of-Arrays stores at the collection / Instance
layer (METADATA_STORAGE_V3.md). The Rust SDK and the PyO3 bindings are
both fully migrated; `cargo test` and `task python:test` pass
end-to-end.

This PR is the runtime / Python-API prerequisite for #841
(special-constraint proto v3). The proto-schema decision in #841 —
`ConstraintMetadata` inline per message vs. top-level columnar map —
needed the runtime / Python-API direction settled first; that's what
this PR establishes.

## End state

### Rust SDK

- New `ConstraintMetadataStore<ID>` (generic over the four constraint ID
types: regular / indicator / one-hot / SOS1) and `VariableMetadataStore`
under `rust/ommx/src/{constraint,decision_variable}/metadata_store.rs`.
Sparse `FnvHashMap`-per-field representation with per-field borrowing
getters (`name(id) -> Option<&str>`, `subscripts(id) -> &[i64]`,
`parameters(id) -> &FnvHashMap<…>`, …) and a shared empty-sentinel for
the collection-shaped fields. Owned exchange via `insert(id,
ConstraintMetadata)` / `remove(id)` / `collect_for(id)`.
- `ConstraintCollection<T>` / `EvaluatedCollection<T>` /
`SampledCollection<T>` each carry a `metadata` field with `metadata()` /
`metadata_mut()` accessors and `with_metadata(...)` constructors.
`insert_with(id, c, metadata)` performs atomic insert + metadata write.
- `Instance` and `ParametricInstance` gain a `variable_metadata` field
with the same accessor pattern, plus narrow per-kind metadata accessors
(`variable_metadata_mut()`, `constraint_metadata_mut()`,
`indicator_constraint_metadata_mut()`,
`one_hot_constraint_metadata_mut()`, `sos1_constraint_metadata_mut()`
and their immutable siblings) so callers can drain metadata into the SoA
stores after `builder().build()` without exposing invariant-breaking
access to the underlying collections.
- The per-element `metadata: ConstraintMetadata` field is removed from
`Constraint<S>` / `IndicatorConstraint<S>` / `OneHotConstraint<S>` /
`Sos1Constraint<S>` / `DecisionVariable` / `EvaluatedDecisionVariable` /
`SampledDecisionVariable`.
- Parse / serialize boundary moves to the collection layer. `Parse`
impls now produce `(map, ConstraintMetadataStore)` / `(map,
VariableMetadataStore)` pairs at the collection level; the
collection-level serializers (`From<Instance> for v1::Instance`,
`From<ParametricInstance> for v1::ParametricInstance`, `From<Solution>
for v1::Solution`, `From<SampleSet> for v1::SampleSet`) drain the SoA
stores and overlay metadata onto each per-element proto via the explicit
`*_to_v1` helpers.
- The default-metadata `From<X> for v1::Y` impls (`impl
From<DecisionVariable> for v1::DecisionVariable` and the Evaluated /
Sampled / Constraint siblings) are removed entirely. Per-element
conversion would have to default every metadata field, which silently
drops any caller-supplied metadata; making the helpers `pub(crate)` and
require an explicit `metadata: ConstraintMetadata |
DecisionVariableMetadata` argument forces the silent path to surface as
a type error.
- Bare-element bytes round-trips (`DecisionVariable::to_bytes` /
`from_bytes` and the `Constraint` / `EvaluatedConstraint` /
`SampledConstraint` / `EvaluatedDecisionVariable` /
`SampledDecisionVariable` siblings) are removed. Top-level container
`to_bytes` / `from_bytes` (`Instance`, `ParametricInstance`, `Solution`,
`SampleSet`, plus the DTOs `State`, `Samples`, `Parameters`) preserve
full metadata.
- Evaluate path threads metadata through: `Instance::evaluate` and
`Instance::evaluate_samples` clone `variable_metadata` into the produced
`Solution` / `SampleSet`; `SampleSet::get` carries it forward into the
per-sample `Solution` along with all four constraint-kind metadata
stores. Constraint-side metadata flows through
`ConstraintCollection::evaluate` into `EvaluatedCollection` /
`SampledCollection` automatically.
- Mutation sites in
`instance/{slack,sos1,one_hot,log_encode,indicator,evaluate}.rs` write
through `variable_metadata_mut()` and the per-kind `*_metadata_mut()`
accessors. Special-constraint promotion paths (one-hot → constraint,
indicator → constraint, SOS1 → constraint) carry metadata via
`insert_with` and `push_provenance`.
- MPS / QPLIB readers populate the SoA stores after `Instance::new`,
replacing the previous per-element metadata writes.
- Memory profile snapshots updated to account for the collection-level
visits.

### PyO3 wrappers (snapshot model)

- Each wrapper struct now holds its own metadata snapshot:
  ```rust
  Constraint(pub ommx::Constraint, pub ommx::ConstraintMetadata)
DecisionVariable(pub ommx::DecisionVariable, pub
ommx::DecisionVariableMetadata)
IndicatorConstraint(...), EvaluatedConstraint(...),
SampledConstraint(...),
  EvaluatedDecisionVariable(...), SampledDecisionVariable(...)
  ```
Standalone construction starts with `Default::default()` metadata.
Reading from an `Instance` fills the snapshot from the SoA store via
`from_parts(inner, metadata.collect_for(id))`.
`Instance.from_components(...)` and
`ParametricInstance.from_components(...)` drain each wrapper's metadata
back into the instance's SoA stores. Mutations on a wrapper retrieved
from an instance therefore do not propagate back; the caller must re-add
the constraint / variable to apply changes (matches the prior
`clone()`-based semantics).
- `pandas.rs` introduces a `WithMetadata<'a, T, M>` wrapper.
`ToPandasEntry` impls that previously read `self.metadata.X` now consume
`WithMetadata<'_, T, ConstraintMetadata | DecisionVariableMetadata>`;
call sites in `Instance` / `ParametricInstance` / `Solution` /
`SampleSet` pre-snapshot the SoA store and zip the metadata in alongside
each item before handing the iterator to `entries_to_dataframe`.
- `from_bytes` / `to_bytes` removed from non-top-level Python wrappers
(Linear, Quadratic, Polynomial, Function, DecisionVariable,
EvaluatedDecisionVariable, SampledDecisionVariable, NamedFunction,
EvaluatedNamedFunction, SampledNamedFunction, Parameter). Only the
top-level types (`Instance`, `ParametricInstance`, `Solution`,
`SampleSet`) plus the cross-evaluate DTOs (`State`, `Samples`,
`Parameters`) keep them. `__init__.pyi` regenerated via `task
python:stubgen` to match.

> The originally-proposed Standalone / Attached two-mode design with
`Py<Instance>` back-references (write-through getters, live shared state
across wrappers pointing at the same id) is intentionally **not**
implemented in this PR. The snapshot model preserves the v2 semantics
with minimum surface change; the two-mode design lands together with the
Series / `include=` work in the next wave.

### Public API surface

The `ommx` crate's public surface matches `main` plus the new SoA
accessors required by this refactor:

- `pub struct ConstraintMetadataStore<ID>` and `pub struct
VariableMetadataStore` (returned by the metadata accessors below).
- `pub fn variable_metadata() / variable_metadata_mut() /
constraint_collection() / constraint_metadata() /
constraint_metadata_mut() / indicator_constraint_collection() /
indicator_constraint_metadata() / indicator_constraint_metadata_mut() /
one_hot_constraint_collection() / one_hot_constraint_metadata() /
one_hot_constraint_metadata_mut() / sos1_constraint_collection() /
sos1_constraint_metadata() / sos1_constraint_metadata_mut()` on
`Instance` and `ParametricInstance`. The mutable surface is
intentionally narrowed to metadata only — metadata is outside the
constraint-collection invariants (a sparse ID-keyed store), so `&mut` is
safe; full `&mut ConstraintCollection<T>` would expose `active_mut()` /
`removed_mut()` / `insert_with()` and let callers register constraints
that reference unknown variable IDs.
- `pub fn metadata() / metadata_mut() / with_metadata() / insert_with()`
on `ConstraintCollection<T>` / `EvaluatedCollection<T>` /
`SampledCollection<T>`.

The `*_to_v1` helpers and the `parse` submodules stay `pub(crate)`.
Module visibility (`mod constraint`, `mod decision_variable`, `mod
instance`, `mod indicator_constraint` in `lib.rs`) is unchanged — `git
diff origin/main...HEAD -- rust/ommx/src/lib.rs` is empty.

### v1 wire-format limitations

`v1::Solution` and `v1::SampleSet` only have a single
`evaluated_constraints` / `constraints` field for regular constraints —
they have no fields for indicator / one-hot / sos1 evaluated/sampled
constraints. The in-memory Rust types carry those four collections
separately (with their own metadata stores), but `to_bytes` /
`from_bytes` are lossy for the three special kinds. This is a
pre-existing wire-format limitation (the matching `Parse` impls have
always initialized those collections to `Default::default()`) and is
documented on the `From<Solution> for v1::Solution` and `From<SampleSet>
for v1::SampleSet` impls. Wire-shape resolution is the subject of #841.

### Docs

`METADATA_STORAGE_V3.md` updated: status header reflects the two-wave
landing, each section now carries an explicit **(landed)** /
**(deferred)** tag, and the breaking-changes list is split between the
two waves. The deferred sections are kept verbatim so the follow-up PR
has a concrete spec to land against.

## Test plan

- [x] `cargo test -p ommx --lib` — **474 tests passing**, including four
bytes-round-trip regression tests:
-
`instance::parse::tests::test_parametric_instance_roundtrip_preserves_metadata`
  - `sample_set::tests::test_sample_set_get_preserves_metadata`
  - `solution::parse::tests::test_solution_roundtrip_preserves_metadata`
-
`sample_set::parse::tests::test_sample_set_roundtrip_preserves_metadata`
- [x] `cargo test` (workspace, examples included) — clean compile, all
green
- [x] `task python:test` — full Python suite including ommx + adapter
tests (highs / pyscipopt / python-mip / openjij) passes
- [x] Snapshot review for the collection-level memory profile (accepted
via `cargo insta accept --workspace`)
- [x] `__init__.pyi` regenerated via `task python:stubgen`

## Out of scope (follow-up PRs)

- **#841 — special-constraint proto v3**: picks the `ConstraintMetadata`
wire shape (inline per message vs. top-level columnar map) on top of the
runtime SoA stores landed here.
- Tighten `ConstraintCollection<T>::active_mut()` / `removed_mut()` /
`insert_with()` to `pub(crate)` (or smaller) — these are heavily used
inside the crate but should not be on the public API surface either, per
the same invariant-safety rationale that motivated narrowing the
`Instance` accessors here.
- Conversion of `instance.constraints` / `decision_variables` /
`*_constraints` Python accessors from `dict` / `list` to
`pandas.Series[ID -> Object]`.
- The `*_df` API reshape with the `include=` parameter and the
long-format sidecar dfs (`constraint_metadata_df` /
`constraint_parameters_df` / `constraint_provenance_df` /
`constraint_removed_reasons_df` / `variable_metadata_df` /
`variable_parameters_df`).
- Standalone / Attached two-mode wrappers with `Py<Instance>`
back-references (write-through metadata mutation).
- Doc reorganization (METADATA_STORAGE_V3.md → SDK docs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant