Proposal: v3 special constraints proto schema#841
Draft
Conversation
Captures the v3 design discussion: first-class proto messages for OneHot / SOS1 / Indicator special constraints, unified shape (ID owned by collection, inline RemovedReason, serialized Provenance, extracted ConstraintMetadata) shared with a renamed RegularConstraint, and the corresponding format_version bump and v2 backward-compat path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
termoshtt
added a commit
that referenced
this pull request
Apr 27, 2026
- Reframe the proposal as a prerequisite for #841: the proto wire shape of ConstraintMetadata (inline vs. top-level map) is deferred to #841 and decided based on the parse / serialize boundary defined by Stage 2 here. - Add a 3-stage migration diagram (Python API → Rust SoA → proto wire shape) and a "Why not Rust-first" subsection acknowledging the alternative order. - Add ParametricInstance to the Rust SDK design (also owns decision_variables: BTreeMap<...> directly). - Add a section on Python modeling-chain impact for standalone constraints ((x[0] + x[1] == 1).add_name("c") chains) with two options and a recommendation; flagged as Stage 2 concern. - Add a "Boundary changes" subsection covering From<v1::*> parse / serialize boundaries moving to the collection level, and an "Other types affected" subsection for LogicalMemoryProfile derive and pyo3-stub-gen. - Switch Python examples from `merge(on="id")` to `join()` to reflect that entries_to_dataframe sets `id` as the DataFrame index. - Move removed_reason to a separate long-format DataFrame in the Target API block (matching Solution's existing removed_reasons_df pattern). - Add concrete recommendations to all 6 open questions and add a 7th for the Stage 2 modeling-chain choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt
added a commit
that referenced
this pull request
Apr 27, 2026
Replace the Stage 1 / Stage 2 / Stage 3 split with a single connected v3-alpha redesign covering Rust runtime layout and Python API surface. Major content changes: - Reframe the goal around "the same fact lives in 3 places (Rust struct, Python dict accessor, Python wide df)" and target a single source of truth on the Rust side. - Introduce Series-based collection accessors: `instance.constraints`, `decision_variables`, and the special-constraint accessors all become `pandas.Series[ID -> Object]` instead of dict / list. - Make `*_df` methods explicitly derived views: type-specific core columns extracted from the Series, joined with sidecar metadata / parameters / provenance / removed_reasons dfs. - Remove all `Constraint.name` / `.subscripts` / `.parameters` / `.description` getters from Python wrappers — the only path to metadata is the metadata df. - Drop the Stage 1 / 2 / 3 ordering discussion and the "Why not Rust-first" subsection; replace with a Breaking changes section. - Keep the proto deferral to #841, the Rust SoA design, and the open questions intact; add an open question for Series-dtype semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
termoshtt
added a commit
that referenced
this pull request
Apr 27, 2026
## Summary - Drafts the runtime / Python-API design for v3 metadata storage. **Prerequisite for #841**: the proto wire shape of `ConstraintMetadata` (inline per message vs. top-level columnar map) cannot be finalized until the runtime / Python-API direction here is settled. - A single connected redesign — the document describes the target shape; phasing of the implementation across PRs is decided in the implementation issues, not here. ## Target shape - **Rust SDK**: metadata moves into ID-keyed Struct-of-Arrays stores. `ConstraintCollection<T>` owns `ConstraintMetadataStore<T::ID>`; `Instance` and `ParametricInstance` own `VariableMetadataStore` directly. `DecisionVariable` and `Constraint<S>` (and the special-constraint variants) lose their `metadata` field. Parse / serialize boundaries move from per-element to per-collection. Internal call sites that read `c.metadata.*` (e.g. `sample_set/extract.rs`) switch to `collection.metadata()` accessors. - **Python SDK**: - `instance.constraints`, `decision_variables`, and the special-constraint accessors become `pandas.Series[ID -> Object]` instead of `dict` / `list`. - `*_df` methods (`constraints_df(kind=..., include=...)`, `decision_variables_df(include=...)`, and Solution / SampleSet counterparts) take an `include=` parameter selecting which sidecars (`"metadata"`, `"parameters"`, `"removed_reasons"`) to fold in. **Default `include=("metadata","parameters")` reproduces v2's wide-DataFrame shape** — v2 user code keeps working with just a `kind=...` argument added. - Long-format sidecar dfs (`constraint_metadata_df`, `constraint_parameters_df`, `constraint_provenance_df`, `constraint_removed_reasons_df`, `variable_metadata_df`, `variable_parameters_df`) are bulk-built from the SoA store. `provenance` is intentionally not included via `include=` (variable-length chains pivot poorly) and is only available as the long-format df. - Per-kind `indicator_constraints_df` / `one_hot_constraints_df` / `sos1_constraints_df` collapse into the single `constraints_df(kind=...)` overload set, dispatched via `Literal` + `@overload` so the IDE / type checker still sees kind-specific column schemas. - **Wrapper objects with back-reference**: PyO3 wrappers stay rich. They run in two modes — Standalone (modeling chain, owns a staging bag) or Attached (collection-derived, holds `Py<Instance>` + id and reads / writes the SoA store via back-reference). Getters `.name`, `.subscripts`, `.parameters`, `.description` are preserved; they switch from owning data to reading the store. The modeling chain `(x[0] + x[1] == 1).add_name("c")` keeps working through the staging bag, which drains into the SoA store on insertion. - **Cross-ID-space JOIN safety**: each constraint kind plus decision variables uses a kind-qualified index name (`variable_id`, `regular_constraint_id`, `indicator_constraint_id`, `one_hot_constraint_id`, `sos1_constraint_id`). The default `include=` covers most "I want a wide table" cases without manual join, removing the most common opportunity for a wrong-kind merge. - **Proto**: deferred to #841, picked once the parse / serialize boundary here is concrete. ## Relationship to #841 #842 first, #841 second. The proto-schema work in #841 currently sketches `ConstraintMetadata` inline per constraint message but defers the wire-shape decision (inline AoS vs. top-level `map<uint64, ConstraintMetadata>` per collection). That choice becomes concrete only after this proposal lands and the parse / serialize boundary has a clean shape. ## Open questions (with recommendations) See the bottom of the doc for full reasoning. 1. Constraint kind dispatch in Python — **rec: single method with `Literal` + `@overload`**. 2. `removed_reason` placement — **rec: separate long-format df** (and `"removed_reasons"` opt-in via `include=` for the wide form). 3. Builder-style metadata setter — **rec: add `insert_with` on the Rust side** (independent of the Python staging bag). 4. `parameters` Rust-side storage — **rec: nested `FnvHashMap<ID, FnvHashMap<…>>`**. 5. Optional `subscripts_df` long format — **rec: defer**. 6. Polars as primary in Python — **rec: pandas stays primary for v3**. 7. `drop_constraint` / wrapper invalidation — **rec: do not add `drop_constraint` in v3**; defer the invalidation semantics until it's actually needed. 8. Attached wrapper `Py<Instance>` cycles — **rec: documented behavior, no code-level mitigation**. ## Test plan - [ ] Design review on the doc — approve the SoA-on-collection placement, the Standalone/Attached wrapper architecture with back-reference, the Series + `*_df(include=...)` Python API, and the v2-compatible default. - [ ] Sign off on each of the 8 open-question recommendations (or push back). - [ ] Sign off on the v3-alpha breaking-change window for the Python `dict` → `Series` migration and the `*_df` `include=` reshape. - [ ] Coordinate with #841 on the proto wire shape once the parse / serialize boundary is concrete. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
termoshtt
added a commit
that referenced
this pull request
Apr 28, 2026
…oA stores (#843) ## Summary Moves per-constraint and per-variable auxiliary metadata (`name`, `subscripts`, `parameters`, `description`, `provenance`) off per-element structs and onto Struct-of-Arrays stores at the collection / Instance layer (METADATA_STORAGE_V3.md). The Rust SDK and the PyO3 bindings are both fully migrated; `cargo test` and `task python:test` pass end-to-end. This PR is the runtime / Python-API prerequisite for #841 (special-constraint proto v3). The proto-schema decision in #841 — `ConstraintMetadata` inline per message vs. top-level columnar map — needed the runtime / Python-API direction settled first; that's what this PR establishes. ## End state ### Rust SDK - New `ConstraintMetadataStore<ID>` (generic over the four constraint ID types: regular / indicator / one-hot / SOS1) and `VariableMetadataStore` under `rust/ommx/src/{constraint,decision_variable}/metadata_store.rs`. Sparse `FnvHashMap`-per-field representation with per-field borrowing getters (`name(id) -> Option<&str>`, `subscripts(id) -> &[i64]`, `parameters(id) -> &FnvHashMap<…>`, …) and a shared empty-sentinel for the collection-shaped fields. Owned exchange via `insert(id, ConstraintMetadata)` / `remove(id)` / `collect_for(id)`. - `ConstraintCollection<T>` / `EvaluatedCollection<T>` / `SampledCollection<T>` each carry a `metadata` field with `metadata()` / `metadata_mut()` accessors and `with_metadata(...)` constructors. `insert_with(id, c, metadata)` performs atomic insert + metadata write. - `Instance` and `ParametricInstance` gain a `variable_metadata` field with the same accessor pattern, plus narrow per-kind metadata accessors (`variable_metadata_mut()`, `constraint_metadata_mut()`, `indicator_constraint_metadata_mut()`, `one_hot_constraint_metadata_mut()`, `sos1_constraint_metadata_mut()` and their immutable siblings) so callers can drain metadata into the SoA stores after `builder().build()` without exposing invariant-breaking access to the underlying collections. - The per-element `metadata: ConstraintMetadata` field is removed from `Constraint<S>` / `IndicatorConstraint<S>` / `OneHotConstraint<S>` / `Sos1Constraint<S>` / `DecisionVariable` / `EvaluatedDecisionVariable` / `SampledDecisionVariable`. - Parse / serialize boundary moves to the collection layer. `Parse` impls now produce `(map, ConstraintMetadataStore)` / `(map, VariableMetadataStore)` pairs at the collection level; the collection-level serializers (`From<Instance> for v1::Instance`, `From<ParametricInstance> for v1::ParametricInstance`, `From<Solution> for v1::Solution`, `From<SampleSet> for v1::SampleSet`) drain the SoA stores and overlay metadata onto each per-element proto via the explicit `*_to_v1` helpers. - The default-metadata `From<X> for v1::Y` impls (`impl From<DecisionVariable> for v1::DecisionVariable` and the Evaluated / Sampled / Constraint siblings) are removed entirely. Per-element conversion would have to default every metadata field, which silently drops any caller-supplied metadata; making the helpers `pub(crate)` and require an explicit `metadata: ConstraintMetadata | DecisionVariableMetadata` argument forces the silent path to surface as a type error. - Bare-element bytes round-trips (`DecisionVariable::to_bytes` / `from_bytes` and the `Constraint` / `EvaluatedConstraint` / `SampledConstraint` / `EvaluatedDecisionVariable` / `SampledDecisionVariable` siblings) are removed. Top-level container `to_bytes` / `from_bytes` (`Instance`, `ParametricInstance`, `Solution`, `SampleSet`, plus the DTOs `State`, `Samples`, `Parameters`) preserve full metadata. - Evaluate path threads metadata through: `Instance::evaluate` and `Instance::evaluate_samples` clone `variable_metadata` into the produced `Solution` / `SampleSet`; `SampleSet::get` carries it forward into the per-sample `Solution` along with all four constraint-kind metadata stores. Constraint-side metadata flows through `ConstraintCollection::evaluate` into `EvaluatedCollection` / `SampledCollection` automatically. - Mutation sites in `instance/{slack,sos1,one_hot,log_encode,indicator,evaluate}.rs` write through `variable_metadata_mut()` and the per-kind `*_metadata_mut()` accessors. Special-constraint promotion paths (one-hot → constraint, indicator → constraint, SOS1 → constraint) carry metadata via `insert_with` and `push_provenance`. - MPS / QPLIB readers populate the SoA stores after `Instance::new`, replacing the previous per-element metadata writes. - Memory profile snapshots updated to account for the collection-level visits. ### PyO3 wrappers (snapshot model) - Each wrapper struct now holds its own metadata snapshot: ```rust Constraint(pub ommx::Constraint, pub ommx::ConstraintMetadata) DecisionVariable(pub ommx::DecisionVariable, pub ommx::DecisionVariableMetadata) IndicatorConstraint(...), EvaluatedConstraint(...), SampledConstraint(...), EvaluatedDecisionVariable(...), SampledDecisionVariable(...) ``` Standalone construction starts with `Default::default()` metadata. Reading from an `Instance` fills the snapshot from the SoA store via `from_parts(inner, metadata.collect_for(id))`. `Instance.from_components(...)` and `ParametricInstance.from_components(...)` drain each wrapper's metadata back into the instance's SoA stores. Mutations on a wrapper retrieved from an instance therefore do not propagate back; the caller must re-add the constraint / variable to apply changes (matches the prior `clone()`-based semantics). - `pandas.rs` introduces a `WithMetadata<'a, T, M>` wrapper. `ToPandasEntry` impls that previously read `self.metadata.X` now consume `WithMetadata<'_, T, ConstraintMetadata | DecisionVariableMetadata>`; call sites in `Instance` / `ParametricInstance` / `Solution` / `SampleSet` pre-snapshot the SoA store and zip the metadata in alongside each item before handing the iterator to `entries_to_dataframe`. - `from_bytes` / `to_bytes` removed from non-top-level Python wrappers (Linear, Quadratic, Polynomial, Function, DecisionVariable, EvaluatedDecisionVariable, SampledDecisionVariable, NamedFunction, EvaluatedNamedFunction, SampledNamedFunction, Parameter). Only the top-level types (`Instance`, `ParametricInstance`, `Solution`, `SampleSet`) plus the cross-evaluate DTOs (`State`, `Samples`, `Parameters`) keep them. `__init__.pyi` regenerated via `task python:stubgen` to match. > The originally-proposed Standalone / Attached two-mode design with `Py<Instance>` back-references (write-through getters, live shared state across wrappers pointing at the same id) is intentionally **not** implemented in this PR. The snapshot model preserves the v2 semantics with minimum surface change; the two-mode design lands together with the Series / `include=` work in the next wave. ### Public API surface The `ommx` crate's public surface matches `main` plus the new SoA accessors required by this refactor: - `pub struct ConstraintMetadataStore<ID>` and `pub struct VariableMetadataStore` (returned by the metadata accessors below). - `pub fn variable_metadata() / variable_metadata_mut() / constraint_collection() / constraint_metadata() / constraint_metadata_mut() / indicator_constraint_collection() / indicator_constraint_metadata() / indicator_constraint_metadata_mut() / one_hot_constraint_collection() / one_hot_constraint_metadata() / one_hot_constraint_metadata_mut() / sos1_constraint_collection() / sos1_constraint_metadata() / sos1_constraint_metadata_mut()` on `Instance` and `ParametricInstance`. The mutable surface is intentionally narrowed to metadata only — metadata is outside the constraint-collection invariants (a sparse ID-keyed store), so `&mut` is safe; full `&mut ConstraintCollection<T>` would expose `active_mut()` / `removed_mut()` / `insert_with()` and let callers register constraints that reference unknown variable IDs. - `pub fn metadata() / metadata_mut() / with_metadata() / insert_with()` on `ConstraintCollection<T>` / `EvaluatedCollection<T>` / `SampledCollection<T>`. The `*_to_v1` helpers and the `parse` submodules stay `pub(crate)`. Module visibility (`mod constraint`, `mod decision_variable`, `mod instance`, `mod indicator_constraint` in `lib.rs`) is unchanged — `git diff origin/main...HEAD -- rust/ommx/src/lib.rs` is empty. ### v1 wire-format limitations `v1::Solution` and `v1::SampleSet` only have a single `evaluated_constraints` / `constraints` field for regular constraints — they have no fields for indicator / one-hot / sos1 evaluated/sampled constraints. The in-memory Rust types carry those four collections separately (with their own metadata stores), but `to_bytes` / `from_bytes` are lossy for the three special kinds. This is a pre-existing wire-format limitation (the matching `Parse` impls have always initialized those collections to `Default::default()`) and is documented on the `From<Solution> for v1::Solution` and `From<SampleSet> for v1::SampleSet` impls. Wire-shape resolution is the subject of #841. ### Docs `METADATA_STORAGE_V3.md` updated: status header reflects the two-wave landing, each section now carries an explicit **(landed)** / **(deferred)** tag, and the breaking-changes list is split between the two waves. The deferred sections are kept verbatim so the follow-up PR has a concrete spec to land against. ## Test plan - [x] `cargo test -p ommx --lib` — **474 tests passing**, including four bytes-round-trip regression tests: - `instance::parse::tests::test_parametric_instance_roundtrip_preserves_metadata` - `sample_set::tests::test_sample_set_get_preserves_metadata` - `solution::parse::tests::test_solution_roundtrip_preserves_metadata` - `sample_set::parse::tests::test_sample_set_roundtrip_preserves_metadata` - [x] `cargo test` (workspace, examples included) — clean compile, all green - [x] `task python:test` — full Python suite including ommx + adapter tests (highs / pyscipopt / python-mip / openjij) passes - [x] Snapshot review for the collection-level memory profile (accepted via `cargo insta accept --workspace`) - [x] `__init__.pyi` regenerated via `task python:stubgen` ## Out of scope (follow-up PRs) - **#841 — special-constraint proto v3**: picks the `ConstraintMetadata` wire shape (inline per message vs. top-level columnar map) on top of the runtime SoA stores landed here. - Tighten `ConstraintCollection<T>::active_mut()` / `removed_mut()` / `insert_with()` to `pub(crate)` (or smaller) — these are heavily used inside the crate but should not be on the public API surface either, per the same invariant-safety rationale that motivated narrowing the `Instance` accessors here. - Conversion of `instance.constraints` / `decision_variables` / `*_constraints` Python accessors from `dict` / `list` to `pandas.Series[ID -> Object]`. - The `*_df` API reshape with the `include=` parameter and the long-format sidecar dfs (`constraint_metadata_df` / `constraint_parameters_df` / `constraint_provenance_df` / `constraint_removed_reasons_df` / `variable_metadata_df` / `variable_parameters_df`). - Standalone / Attached two-mode wrappers with `Py<Instance>` back-references (write-through metadata mutation). - Doc reorganization (METADATA_STORAGE_V3.md → SDK docs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConstraintHints-based representation with first-class messages for OneHot / SOS1 / Indicator.RegularConstraintand the three special-constraint types under a single shape: ID held by the enclosing collection (map<uint64, T>), inlineoptional RemovedReason, serializedProvenancechain, extractedConstraintMetadata.format_version0 → 1; v3 readers keep loading v2 data via the existingconvert_hints_to_collectionspath; v2 readers refuse v3 data via the existingformat_versioncheck.This PR contains the design doc only (
SPECIAL_CONSTRAINTS_V3.md). Proto and Rust SDK changes will land in follow-up PRs once the design is agreed.Open questions
See the "Open questions" section at the bottom of the doc:
RegularConstraint/ScalarConstraint/GeneralConstraint/ keepConstraintand rename v2 toLegacyConstraint).[deprecated = true]markers onConstraintHints/OneHot/SOS1.SampledActiveVariableshared between OneHot and SOS1 vs. per-type wrappers.instance.constraintsdirectly will need updates).Test plan
RemovedReasonchoice.format_version0 → 1 timing (this PR vs. the follow-up implementation PR).🤖 Generated with Claude Code