Skip to content

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276

Open
rich7420 wants to merge 2 commits intoapache:mainfrom
rich7420:encode-types
Open

feat(qdp): Encoding + Dtype enums, static encoder dispatch#1276
rich7420 wants to merge 2 commits intoapache:mainfrom
rich7420:encode-types

Conversation

@rich7420
Copy link
Copy Markdown
Contributor

Related Issues

Changes

  • Bug fix

  • New feature

  • Refactoring

  • Documentation

  • Test

  • CI/CD pipeline

  • Other

    Why

    "amplitude" was parsed and lowercased at 6 independent sites across the stack, causing silent divergence (e.g.
    encode_from_parquet(…, "Amplitude") silently failed while encode_batch accepted it). get_encoder heap-allocated
    a Box per batch for zero-sized unit structs. float32_pipeline: bool was scattered with
    inconsistent defaults, making the published baseline ambiguous.

    How

    • Added Encoding enum and Dtype alias in qdp-core/src/types.rs — single parse at every API boundary, no heap
      allocation in dispatch
    • Replaced get_encoder(&str) → Box with Encoding::encoder() → &'static dyn QuantumEncoder
      via OnceLock statics for IQP variants
    • PipelineConfig: encoding_method: String → encoding: Encoding, float32_pipeline: bool → dtype: Dtype
    • Removed encoding_supports_f32 and vector_len free functions; replaced by methods on Encoding
    • Fixed streaming Parquet dispatcher case-sensitivity gap in encoding/mod.rs
    • PyO3 boundary (engine.rs, loader.rs, lib.rs, pytorch.rs): strings consumed at boundary, enum passed
      internally
    • qumat_qdp/loader.py: added _VALID_ENCODINGS frozenset, early validation in _validate_loader_args and
      .encoding() setter

Checklist

  • Added or updated unit tests for all changes
  • Added or updated documentation for all changes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes encoding/dtype parsing and encoder dispatch in qdp-core by introducing canonical Encoding/Dtype domain types, and updates Rust + PyO3/Python boundaries to pass those typed values internally (reducing duplicated string handling and avoiding per-call heap allocation for encoder selection).

Changes:

  • Add qdp_core::Encoding + Dtype and replace string-based encoder selection with static encoder dispatch (incl. OnceLock for IQP variants).
  • Refactor pipeline configuration to use encoding: Encoding and dtype (vs encoding_method: String / float32_pipeline: bool) and update pipeline runner dispatch accordingly.
  • Update Python/Rust boundaries (PyO3 + pure Python loader) to validate/parse encodings at the boundary and add regression tests for case-insensitive parsing.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
qdp/qdp-core/src/types.rs Introduces Encoding + Dtype and static encoder dispatch; adds parsing/helpers
qdp/qdp-core/src/lib.rs Switches engine APIs from get_encoder(&str) to Encoding parsing + static dispatch
qdp/qdp-core/src/pipeline_runner.rs Pipeline config now uses typed encoding/dtype and avoids string parsing in hot paths
qdp/qdp-core/src/gpu/encodings/mod.rs Removes get_encoder factory and tightens QuantumEncoder to 'static
qdp/qdp-core/src/gpu/encodings/iqp.rs Adds OnceLock-backed shared IQP encoder instances for static dispatch
qdp/qdp-core/src/encoding/mod.rs Fixes streaming parquet dispatch to be case-insensitive via Encoding::from_str_ci
qdp/qdp-python/src/engine.rs Parses Encoding/Dtype at the boundary and passes typed values internally
qdp/qdp-python/src/pytorch.rs Uses Encoding for CUDA tensor validation logic
qdp/qdp-python/qumat_qdp/loader.py Adds early encoding validation against a canonical set
qdp/qdp-core/tests/types.rs Adds tests for encoding/dtype parsing, vector_len, and static dispatch stability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread qdp/qdp-core/src/types.rs Outdated
Comment on lines +123 to +125
Self::Angle => n,
Self::Basis => 1,
Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding::vector_len is documented and used as the per-sample input feature dimension (it drives PipelineIterator file sample_size validation and synthetic batch generation), but the mapping for Iqp/IqpZ/Phase currently returns 1 << num_qubits. That contradicts the encoders’ own input validation (e.g., PhaseEncoder expects sample_size == num_qubits, and IqpEncoder expects num_qubits or num_qubits + num_qubits*(num_qubits-1)/2). This will make pipeline/file loading for these encodings reject valid inputs (or generate invalid synthetic batches). Update vector_len for these variants to match the encoder input lengths.

Suggested change
Self::Angle => n,
Self::Basis => 1,
Self::Amplitude | Self::Iqp | Self::IqpZ | Self::Phase => 1 << n,
Self::Amplitude => 1 << n,
Self::Angle | Self::Iqp | Self::IqpZ | Self::Phase => n,
Self::Basis => 1,

Copilot uses AI. Check for mistakes.
Comment thread qdp/qdp-python/src/engine.rs Outdated
Comment on lines +480 to +486
validate_cuda_tensor_for_encoding(data, self.engine.device().ordinal(), encoding_method)?;

let encoding = Encoding::from_str_ci(encoding_method)
.map_err(|e| PyRuntimeError::new_err(e.to_string()))?;
let dtype = data.getattr("dtype")?;
let dtype_str: String = dtype.str()?.extract()?;
let dtype_str_lower = dtype_str.to_ascii_lowercase();
let is_f32 = dtype_str_lower.contains("float32");
let method = encoding_method.to_ascii_lowercase();
let is_f32 = dtype_str.to_ascii_lowercase().contains("float32");
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_cuda_tensor_for_encoding already parses encoding_method into an Encoding, but _encode_from_cuda_tensor parses the same string again immediately after validation. To avoid duplicated parsing (and the risk of future divergence), consider changing the validator to return the parsed Encoding (or accept Encoding as an argument) so the caller can reuse it.

Copilot uses AI. Check for mistakes.
Comment thread qdp/qdp-core/src/types.rs Outdated
Comment on lines +35 to +57
/// Dtype for pipeline configuration (alias of [`crate::gpu::memory::Precision`]).
pub type Dtype = crate::gpu::memory::Precision;

impl Dtype {
/// Parse dtype from a short user string (case-insensitive, trimmed).
pub fn from_str_ci(s: &str) -> Result<Self> {
let t = s.trim();
if t.eq_ignore_ascii_case("f32")
|| t.eq_ignore_ascii_case("float32")
|| t.eq_ignore_ascii_case("float")
{
Ok(Self::Float32)
} else if t.eq_ignore_ascii_case("f64")
|| t.eq_ignore_ascii_case("float64")
|| t.eq_ignore_ascii_case("double")
{
Ok(Self::Float64)
} else {
Err(MahoutError::InvalidInput(format!(
"Unknown dtype: {s}. Use 'f32' or 'f64'."
)))
}
}
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtype is declared as a type alias to crate::gpu::memory::Precision, but this file adds an inherent impl Dtype { ... }. Rust does not allow inherent impls on type aliases, so this will fail to compile. Consider moving these methods onto Precision (e.g., impl Precision { ... }) and re-exporting it as Dtype, or changing Dtype into a newtype wrapper if you need a distinct type name.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants