feat(qdp): Encoding + Dtype enums, static encoder dispatch by rich7420 · Pull Request #1276 · apache/mahout

rich7420 · 2026-04-19T08:54:02Z

Related Issues

Changes

Bug fix
New feature
Refactoring
Documentation
Test
CI/CD pipeline
Other

Why

"amplitude" was parsed and lowercased at 6 independent sites across the stack, causing silent divergence (e.g.
encode_from_parquet(…, "Amplitude") silently failed while encode_batch accepted it). get_encoder heap-allocated
a Box per batch for zero-sized unit structs. float32_pipeline: bool was scattered with
inconsistent defaults, making the published baseline ambiguous.

How
- Added Encoding enum and Dtype alias in qdp-core/src/types.rs — single parse at every API boundary, no heap
  allocation in dispatch
- Replaced get_encoder(&str) → Box with Encoding::encoder() → &'static dyn QuantumEncoder
  via OnceLock statics for IQP variants
- PipelineConfig: encoding_method: String → encoding: Encoding, float32_pipeline: bool → dtype: Dtype
- Removed encoding_supports_f32 and vector_len free functions; replaced by methods on Encoding
- Fixed streaming Parquet dispatcher case-sensitivity gap in encoding/mod.rs
- PyO3 boundary (engine.rs, loader.rs, lib.rs, pytorch.rs): strings consumed at boundary, enum passed
  internally
- qumat_qdp/loader.py: added _VALID_ENCODINGS frozenset, early validation in _validate_loader_args and
  .encoding() setter

Checklist

Added or updated unit tests for all changes
Added or updated documentation for all changes

Copilot

Pull request overview

This PR centralizes encoding/dtype parsing and encoder dispatch in qdp-core by introducing canonical Encoding/Dtype domain types, and updates Rust + PyO3/Python boundaries to pass those typed values internally (reducing duplicated string handling and avoiding per-call heap allocation for encoder selection).

Changes:

Add qdp_core::Encoding + Dtype and replace string-based encoder selection with static encoder dispatch (incl. OnceLock for IQP variants).
Refactor pipeline configuration to use encoding: Encoding and dtype (vs encoding_method: String / float32_pipeline: bool) and update pipeline runner dispatch accordingly.
Update Python/Rust boundaries (PyO3 + pure Python loader) to validate/parse encodings at the boundary and add regression tests for case-insensitive parsing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
qdp/qdp-core/src/types.rs	Introduces `Encoding` + `Dtype` and static encoder dispatch; adds parsing/helpers
qdp/qdp-core/src/lib.rs	Switches engine APIs from `get_encoder(&str)` to `Encoding` parsing + static dispatch
qdp/qdp-core/src/pipeline_runner.rs	Pipeline config now uses typed encoding/dtype and avoids string parsing in hot paths
qdp/qdp-core/src/gpu/encodings/mod.rs	Removes `get_encoder` factory and tightens `QuantumEncoder` to `'static`
qdp/qdp-core/src/gpu/encodings/iqp.rs	Adds `OnceLock`-backed shared IQP encoder instances for static dispatch
qdp/qdp-core/src/encoding/mod.rs	Fixes streaming parquet dispatch to be case-insensitive via `Encoding::from_str_ci`
qdp/qdp-python/src/engine.rs	Parses `Encoding`/`Dtype` at the boundary and passes typed values internally
qdp/qdp-python/src/pytorch.rs	Uses `Encoding` for CUDA tensor validation logic
qdp/qdp-python/qumat_qdp/loader.py	Adds early encoding validation against a canonical set
qdp/qdp-core/tests/types.rs	Adds tests for encoding/dtype parsing, vector_len, and static dispatch stability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T09:05:27Z

+            Self::Angle => n,
+            Self::Basis => 1,
+            Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,


Encoding::vector_len is documented and used as the per-sample input feature dimension (it drives PipelineIterator file sample_size validation and synthetic batch generation), but the mapping for Iqp/IqpZ/Phase currently returns 1 << num_qubits. That contradicts the encoders’ own input validation (e.g., PhaseEncoder expects sample_size == num_qubits, and IqpEncoder expects num_qubits or num_qubits + num_qubits*(num_qubits-1)/2). This will make pipeline/file loading for these encodings reject valid inputs (or generate invalid synthetic batches). Update vector_len for these variants to match the encoder input lengths.

Suggested change

Self::Angle => n,

Self::Basis => 1,

Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,

Self::Amplitude => 1 << n,

Self::Angle | Self::Iqp | Self::IqpZ | Self::Phase => n,

Self::Basis => 1,

Copilot · 2026-04-19T09:05:28Z

        validate_cuda_tensor_for_encoding(data, self.engine.device().ordinal(), encoding_method)?;

+        let encoding = Encoding::from_str_ci(encoding_method)
+            .map_err(|e| PyRuntimeError::new_err(e.to_string()))?;
        let dtype = data.getattr("dtype")?;
        let dtype_str: String = dtype.str()?.extract()?;
-        let dtype_str_lower = dtype_str.to_ascii_lowercase();
-        let is_f32 = dtype_str_lower.contains("float32");
-        let method = encoding_method.to_ascii_lowercase();
+        let is_f32 = dtype_str.to_ascii_lowercase().contains("float32");


validate_cuda_tensor_for_encoding already parses encoding_method into an Encoding, but _encode_from_cuda_tensor parses the same string again immediately after validation. To avoid duplicated parsing (and the risk of future divergence), consider changing the validator to return the parsed Encoding (or accept Encoding as an argument) so the caller can reuse it.

Copilot · 2026-04-19T09:05:28Z

+/// Dtype for pipeline configuration (alias of [`crate::gpu::memory::Precision`]).
+pub type Dtype = crate::gpu::memory::Precision;
+
+impl Dtype {
+    /// Parse dtype from a short user string (case-insensitive, trimmed).
+    pub fn from_str_ci(s: &str) -> Result<Self> {
+        let t = s.trim();
+        if t.eq_ignore_ascii_case("f32")
+            || t.eq_ignore_ascii_case("float32")
+            || t.eq_ignore_ascii_case("float")
+        {
+            Ok(Self::Float32)
+        } else if t.eq_ignore_ascii_case("f64")
+            || t.eq_ignore_ascii_case("float64")
+            || t.eq_ignore_ascii_case("double")
+        {
+            Ok(Self::Float64)
+        } else {
+            Err(MahoutError::InvalidInput(format!(
+                "Unknown dtype: {s}. Use 'f32' or 'f64'."
+            )))
+        }
+    }


Dtype is declared as a type alias to crate::gpu::memory::Precision, but this file adds an inherent impl Dtype { ... }. Rust does not allow inherent impls on type aliases, so this will fail to compile. Consider moving these methods onto Precision (e.g., impl Precision { ... }) and re-exporting it as Dtype, or changing Dtype into a newtype wrapper if you need a distinct type name.

feat(qdp): Encoding + Dtype enums, static encoder dispatch

b812a78

rich7420 requested review from 400Ping, guan404ming and ryankert01 as code owners April 19, 2026 08:54

rich7420 requested a review from Copilot April 19, 2026 08:59

Copilot started reviewing on behalf of rich7420 April 19, 2026 09:00 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

update and improve

152e31a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276
rich7420 wants to merge 2 commits intoapache:mainfrom
rich7420:encode-types

rich7420 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rich7420 commented Apr 19, 2026

Related Issues

Changes

Why

How

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants