feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276
feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276rich7420 wants to merge 2 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR centralizes encoding/dtype parsing and encoder dispatch in qdp-core by introducing canonical Encoding/Dtype domain types, and updates Rust + PyO3/Python boundaries to pass those typed values internally (reducing duplicated string handling and avoiding per-call heap allocation for encoder selection).
Changes:
- Add
qdp_core::Encoding+Dtypeand replace string-based encoder selection with static encoder dispatch (incl.OnceLockfor IQP variants). - Refactor pipeline configuration to use
encoding: Encodinganddtype(vsencoding_method: String/float32_pipeline: bool) and update pipeline runner dispatch accordingly. - Update Python/Rust boundaries (PyO3 + pure Python loader) to validate/parse encodings at the boundary and add regression tests for case-insensitive parsing.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| qdp/qdp-core/src/types.rs | Introduces Encoding + Dtype and static encoder dispatch; adds parsing/helpers |
| qdp/qdp-core/src/lib.rs | Switches engine APIs from get_encoder(&str) to Encoding parsing + static dispatch |
| qdp/qdp-core/src/pipeline_runner.rs | Pipeline config now uses typed encoding/dtype and avoids string parsing in hot paths |
| qdp/qdp-core/src/gpu/encodings/mod.rs | Removes get_encoder factory and tightens QuantumEncoder to 'static |
| qdp/qdp-core/src/gpu/encodings/iqp.rs | Adds OnceLock-backed shared IQP encoder instances for static dispatch |
| qdp/qdp-core/src/encoding/mod.rs | Fixes streaming parquet dispatch to be case-insensitive via Encoding::from_str_ci |
| qdp/qdp-python/src/engine.rs | Parses Encoding/Dtype at the boundary and passes typed values internally |
| qdp/qdp-python/src/pytorch.rs | Uses Encoding for CUDA tensor validation logic |
| qdp/qdp-python/qumat_qdp/loader.py | Adds early encoding validation against a canonical set |
| qdp/qdp-core/tests/types.rs | Adds tests for encoding/dtype parsing, vector_len, and static dispatch stability |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Self::Angle => n, | ||
| Self::Basis => 1, | ||
| Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n, |
There was a problem hiding this comment.
Encoding::vector_len is documented and used as the per-sample input feature dimension (it drives PipelineIterator file sample_size validation and synthetic batch generation), but the mapping for Iqp/IqpZ/Phase currently returns 1 << num_qubits. That contradicts the encoders’ own input validation (e.g., PhaseEncoder expects sample_size == num_qubits, and IqpEncoder expects num_qubits or num_qubits + num_qubits*(num_qubits-1)/2). This will make pipeline/file loading for these encodings reject valid inputs (or generate invalid synthetic batches). Update vector_len for these variants to match the encoder input lengths.
| Self::Angle => n, | |
| Self::Basis => 1, | |
| Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n, | |
| Self::Amplitude => 1 << n, | |
| Self::Angle | Self::Iqp | Self::IqpZ | Self::Phase => n, | |
| Self::Basis => 1, |
| validate_cuda_tensor_for_encoding(data, self.engine.device().ordinal(), encoding_method)?; | ||
|
|
||
| let encoding = Encoding::from_str_ci(encoding_method) | ||
| .map_err(|e| PyRuntimeError::new_err(e.to_string()))?; | ||
| let dtype = data.getattr("dtype")?; | ||
| let dtype_str: String = dtype.str()?.extract()?; | ||
| let dtype_str_lower = dtype_str.to_ascii_lowercase(); | ||
| let is_f32 = dtype_str_lower.contains("float32"); | ||
| let method = encoding_method.to_ascii_lowercase(); | ||
| let is_f32 = dtype_str.to_ascii_lowercase().contains("float32"); |
There was a problem hiding this comment.
validate_cuda_tensor_for_encoding already parses encoding_method into an Encoding, but _encode_from_cuda_tensor parses the same string again immediately after validation. To avoid duplicated parsing (and the risk of future divergence), consider changing the validator to return the parsed Encoding (or accept Encoding as an argument) so the caller can reuse it.
| /// Dtype for pipeline configuration (alias of [`crate::gpu::memory::Precision`]). | ||
| pub type Dtype = crate::gpu::memory::Precision; | ||
|
|
||
| impl Dtype { | ||
| /// Parse dtype from a short user string (case-insensitive, trimmed). | ||
| pub fn from_str_ci(s: &str) -> Result<Self> { | ||
| let t = s.trim(); | ||
| if t.eq_ignore_ascii_case("f32") | ||
| || t.eq_ignore_ascii_case("float32") | ||
| || t.eq_ignore_ascii_case("float") | ||
| { | ||
| Ok(Self::Float32) | ||
| } else if t.eq_ignore_ascii_case("f64") | ||
| || t.eq_ignore_ascii_case("float64") | ||
| || t.eq_ignore_ascii_case("double") | ||
| { | ||
| Ok(Self::Float64) | ||
| } else { | ||
| Err(MahoutError::InvalidInput(format!( | ||
| "Unknown dtype: {s}. Use 'f32' or 'f64'." | ||
| ))) | ||
| } | ||
| } |
There was a problem hiding this comment.
Dtype is declared as a type alias to crate::gpu::memory::Precision, but this file adds an inherent impl Dtype { ... }. Rust does not allow inherent impls on type aliases, so this will fail to compile. Consider moving these methods onto Precision (e.g., impl Precision { ... }) and re-exporting it as Dtype, or changing Dtype into a newtype wrapper if you need a distinct type name.
Related Issues
Changes
Bug fix
New feature
Refactoring
Documentation
Test
CI/CD pipeline
Other
Why
"amplitude" was parsed and lowercased at 6 independent sites across the stack, causing silent divergence (e.g.
encode_from_parquet(…, "Amplitude") silently failed while encode_batch accepted it). get_encoder heap-allocated
a Box per batch for zero-sized unit structs. float32_pipeline: bool was scattered with
inconsistent defaults, making the published baseline ambiguous.
How
allocation in dispatch
via OnceLock statics for IQP variants
internally
.encoding() setter
Checklist