Skip to content

Support both oci-dir and oci-archive formats for OMMX Local Registry storage#639

Draft
Copilot wants to merge 41 commits intomainfrom
copilot/fix-fb20201e-0bd6-464a-9e1b-49c3a488a3b9
Draft

Support both oci-dir and oci-archive formats for OMMX Local Registry storage#639
Copilot wants to merge 41 commits intomainfrom
copilot/fix-fb20201e-0bd6-464a-9e1b-49c3a488a3b9

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Oct 2, 2025

Overview

This PR implements a complete refactoring of the OMMX Artifact API, replacing the parametric type-based implementation with a unified enum-based design that provides transparent dual-format support for OMMX Local Registry.

Problem

The original implementation had two critical issues:

1. Cloud Storage Incompatibility (Issue #638)

Previously, OMMX Local Registry only supported the oci-dir format, which stores artifacts as directory structures. This created challenges for cloud object storage:

  • Multiple API calls required to upload/download directory structures
  • Complex synchronization and consistency issues
  • Higher costs due to numerous small file operations
  • Difficulty with atomic operations and backups

2. API Design Issues with Dual Format Support

After adding dual-format support in early commits, the parametric type API (Artifact<T: Image>) became problematic:

  • Type parameter T didn't reflect actual storage format behavior
  • load()/pull() always saved as oci-dir despite type suggesting oci-archive
  • Format detection logic conflicted with type-based design
  • Inconsistent behavior between type annotations and runtime operations

Solution

This PR implements a two-phase solution:

Phase 1: Dual Format Support (Early Commits)

Added oci-archive format alongside existing oci-dir format with unified path API.

Phase 2: Artifact API Refactoring (Later Commits)

Completely replaced Artifact<T: Image> with a new enum-based API:

pub enum Artifact {
    Archive(OciArtifact<OciArchive>),
    Dir(OciArtifact<OciDir>),
    Remote(Box<OciArtifact<Remote>>),
}

Key Advantages:

  • Format is runtime property, not type parameter
  • Transparent format detection and conversion
  • Consistent behavior across all operations
  • Methods can change variant (e.g., pull() transitions Remote → Archive)

Architecture & Implementation

Storage Format Support

OCI Archive Format (New Default)

  • Single .ommx file (tar archive)
  • Optimized for cloud storage
  • Atomic operations with integrity checks

OCI Directory Format (Legacy Support)

  • Directory-based OCI layout
  • Full backward compatibility
  • Continues to work unchanged

Unified API Design

Rust API:

use ommx::artifact::{Artifact, Builder};

// Load (automatic format detection)
let artifact = Artifact::load(&image_name)?;
let artifact = Artifact::from_oci_archive(path)?;
let artifact = Artifact::from_oci_dir(path)?;
let mut artifact = Artifact::from_remote(image_name)?;

// Save (format control)
artifact.save()?;  // Default: oci-archive to local registry
artifact.save_as_archive(path)?;
artifact.save_as_dir(path)?;

// Pull (auto-saves as oci-archive)
artifact.pull()?;

// Build
let mut builder = Builder::new_archive(path, image_name)?;
let mut builder = Builder::for_github("org", "repo", "name", "tag")?;
builder.add_instance(instance, annotations)?;
let artifact = builder.build()?;

Python API (Unchanged - Full Backward Compatibility):

from ommx.artifact import Artifact, ArtifactBuilder

# Load (transparent format handling)
artifact = Artifact.load("image-name:tag")

# Build (defaults to oci-archive)
builder = ArtifactBuilder.new("image-name:tag")
builder.add_instance(instance)
artifact = builder.build()

Path Management

Single format-agnostic path function:

pub fn get_local_registry_path(image_name: &ImageName) -> PathBuf

Returns base path. Format determined by:

  • {base_path}.ommx exists → oci-archive
  • {base_path}/ exists → oci-dir

Path Escaping: Tags with : escaped to __

  • Example: ghcr.io/jij-inc/ommx/qplib:3734
    • Base: <ROOT>/ghcr.io/jij-inc/ommx/qplib/__3734
    • Archive: <ROOT>/ghcr.io/jij-inc/ommx/qplib/__3734.ommx

Priority: When both formats exist, oci-archive takes precedence.

Key Changes

Rust Implementation

  • Removed: Old Artifact<T: Image> parametric type (~225 lines)
  • Removed: Old Builder<T: ImageBuilder> parametric type
  • Added: New Artifact enum with dynamic format handling
  • Added: New Builder enum with format-specific variants
  • Added: Comprehensive I/O methods in artifact/io.rs
  • Added: Layer access methods in artifact/layers.rs
  • Migrated: All usage sites (CLI, dataset loaders, examples)

Python Bindings

  • Removed: ArtifactArchive, ArtifactDir wrapper classes
  • Removed: ArtifactArchiveBuilder, ArtifactDirBuilder
  • Simplified: Single Artifact class wrapping enum variant
  • Maintained: Full API compatibility (no breaking changes)

Module Structure

  • Promoted experimental::artifactommx::artifact
  • Removed experimental module entirely
  • Integrated all artifact code into unified module

Documentation

  • Updated ARTIFACT.md with storage format specifications
  • Updated CLAUDE.md with API usage examples
  • Removed TODO.md (all tasks completed)

Migration Impact

For Rust Users

Breaking: Old Artifact<T> API removed. Update to new enum API:

// Before
let artifact = Artifact::<OciArchive>::from_oci_archive(path)?;

// After
let artifact = Artifact::from_oci_archive(path)?;

For Python Users

Non-Breaking: No code changes needed. Internal implementation updated transparently.

For Existing Artifacts

Non-Breaking: All existing oci-dir artifacts continue to work. No migration required.

Testing

Rust:

  • 309 tests passing ✅
  • Comprehensive format conversion tests
  • All CLI operations tested
  • Dataset loader integration tests

Python:

  • 93 unit tests + 1 doctest + 37 benchmarks ✅
  • All adapter tests passing ✅
  • Dual format interoperability tests
  • Backward compatibility verified

Performance & Quality

  • Clippy clean: Zero warnings
  • Memory optimized: Remote variant boxed to reduce enum size (280 → 48 bytes)
  • No dead code: Removed duplicate implementations
  • Type safe: Compile-time format guarantees removed in favor of runtime flexibility

Cloud Storage Benefits

The oci-archive format provides:

  • Single file operations (simpler S3/GCS transfers)
  • Atomic transactions (no partial states)
  • Cost optimization (fewer API calls)
  • Better performance (efficient caching)
  • Standard OCI distribution compatibility

Commits Summary

  1. Initial dual format support (commits 1-12)
  2. Experimental artifact API implementation (commits 13-15)
  3. Migration to experimental API (commits 16-18)
  4. Promotion to stable API (commits 19-20)
  5. Final cleanup (commits 21-22)

Fixes #638

Copilot AI and others added 4 commits October 2, 2025 11:47
Co-authored-by: termoshtt <1238153+termoshtt@users.noreply.github.com>
Co-authored-by: termoshtt <1238153+termoshtt@users.noreply.github.com>
Co-authored-by: termoshtt <1238153+termoshtt@users.noreply.github.com>
Co-authored-by: termoshtt <1238153+termoshtt@users.noreply.github.com>
Copilot AI changed the title [WIP] Support both oci-dir and oci-archive formats for OMMX Local Registry storage Support both oci-dir and oci-archive formats for OMMX Local Registry storage Oct 2, 2025
Copilot AI requested a review from termoshtt October 2, 2025 12:06
termoshtt and others added 19 commits October 3, 2025 15:03
- Fix syntax error in ArtifactDirBuilder impl block
- Add parent directory creation in new_for_local_registry
- Export get_artifact_path function to Python module
- Add Path wrapper for get_artifact_path return type
- Fix test fixture scope and remove unused variables
- Regenerate Python type stubs

All tests now pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add detailed documentation on:
- Directory structure for both oci-dir and oci-archive formats
- Path escaping rules (: converted to __)
- Concrete example with ghcr.io/jij-inc/ommx/qplib:3734
- Format priority when both formats exist (oci-archive takes precedence)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add conftest.py with session-scoped setup_local_registry fixture
- Move set_local_registry_root to conftest.py (OnceLock constraint)
- Simplify test_artifact_dual_format.py from 5 tests to 3 tests
- Extract test_instance creation to pytest fixture
- Consolidate redundant tests into test_dual_format_interoperability
- Fix type checking for optional Path returns

Tests reduced from 5 to 3 while maintaining full coverage:
1. test_new_artifacts_default_to_archive_format
2. test_legacy_oci_dir_format_still_works
3. test_dual_format_interoperability

All 93 tests pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
….new()

- Add get_local_registry_archive_path() function to generate oci-archive paths
- Remove new_for_local_registry() from ArtifactArchiveBuilder (Rust & Python)
- Move directory creation logic from Rust to Python (ArtifactBuilder.new())
- ArtifactArchiveBuilder and ArtifactDirBuilder restored to pre-PR state
- All format selection logic now consolidated in ArtifactBuilder.new()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace get_local_registry_archive_path and get_artifact_path with single get_local_registry_path
- get_local_registry_path returns base path without format-specific extensions
- Caller determines format: {path}.ommx for oci-archive, {path}/ for oci-dir
- Move format detection responsibility to Artifact.load() for clearer separation of concerns
- Update tests to use new API with explicit format checking

This design makes path generation format-agnostic and centralizes format
detection logic in the appropriate place (Artifact.load).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Removed two tests that provide no meaningful value:
- test_get_artifact_path_none_when_not_exists: Tests non-existent function
- test_gather_artifacts_empty_dir: Self-evident behavior, covered by other tests

Remaining 4 tests provide comprehensive coverage:
- Extension-based format detection
- OCI-dir format detection
- OCI-archive format detection
- Mixed format handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Mark get_image_dir() as deprecated since 2.1.0
- Update ommx CLI tool to detect and handle both oci-dir and oci-archive formats
- Fix ImageNameOrPath::parse() to check for both formats with oci-archive priority
- Update inspect, push, and save commands to work with both formats
- Add backward compatibility support for ImageDirectory command

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add new 'local-registry-path' command that uses get_local_registry_path()
- Deprecate 'image-directory' command with runtime warning (log::warn)
- Keep image-directory for backward compatibility using get_image_dir()

The new command returns the base path for dual format support, while the
deprecated command maintains the old behavior and shows a warning when used.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update Artifact<OciArchive>::load() and Artifact<Remote>::pull() to use
get_local_registry_path() instead of deprecated get_image_dir().

This eliminates all deprecation warnings while maintaining backward
compatibility for oci-dir format in these methods.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Document the plan to refactor Artifact API for proper dual format support:
- Replace static Artifact<T> with dynamic Artifact type
- Remove Python's ArtifactArchive/ArtifactDir from public API
- Unify format handling with ArtifactInner enum
- Default to oci-archive format for new artifacts

This is a breaking change but necessary for clean dual format support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Include information about:
- Original problem from Issue #638 (cloud storage challenges)
- PR #639's solution (dual format support)
- What has been implemented so far
- Remaining problems that need to be addressed

This provides complete context for the planned API refactoring.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Introduce new experimental Artifact API in `ommx::experimental::artifact`
as part of the Artifact API refactoring plan (see TODO.md).

The new API replaces the parametric `Artifact<T: Image>` with a simpler
enum-based design:

```rust
pub enum Artifact {
    Archive(OciArtifact<OciArchive>),
    Dir(OciArtifact<OciDir>),
    Remote(OciArtifact<Remote>),
}
```

Phase 1 Implementation:
- Create `ommx::experimental` module structure
- Implement new Artifact enum with dynamic format handling
- Add basic methods:
  - `from_oci_archive()`, `from_oci_dir()`, `from_remote()`
  - `image_name()`, `get_manifest()`, `get_blob()`, `get_layer()`
  - `get_config()`, `get_solution()`, `get_instance()`
  - `get_solutions()`, `get_instances()`

The experimental API coexists with the current `ommx::artifact` API,
allowing gradual migration and testing before full replacement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
termoshtt and others added 17 commits October 3, 2025 21:05
Implement all remaining methods and features for the experimental
Artifact enum, completing TODO.md Phase 1-3:

Phase 1 completions:
- Add annotations() method to get manifest annotations
- Add layers() method to get layer descriptors
- Add Descriptor import

Phase 2-3 (already implemented):
- Builder enum with Archive/Dir variants
- I/O operations: load, save, save_as_archive, save_as_dir
- Remote operations: pull, push with basic auth support
- Format conversion and fallback logic

Tests:
- Add test_annotations() and test_layers()
- Add comprehensive tests for save/load operations
- All 12 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add support for setting manifest annotations in the experimental
Builder API:

- Add add_annotation() method to Builder enum
- Update test helpers to use experimental Builder correctly
- Add test_builder_add_annotation test case

All 13 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update TODO.md to reflect completion of Phase 1-3:

Phase 1 (完了):
- Artifact enum with Archive/Dir/Remote variants
- All basic methods including annotations() and layers()
- Additional layer access methods

Phase 2 (完了):
- All loading methods: from_oci_archive, from_oci_dir, from_remote, load
- Automatic format detection and fallback logic

Phase 3 (完了):
- All saving methods: save, save_as_archive, save_as_dir, pull, push
- Builder enum with add_annotation support
- 13 test cases covering all features

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Change Phase 4+ strategy from breaking changes to gradual migration:

Key changes:
- Keep ommx::artifact::Artifact<T> intact
- Gradually migrate usage to ommx::experimental::artifact
- Maintain full backward compatibility
- Minimize breaking changes

New Phase 4-7 plan:
- Phase 4: Migrate internal Rust code (CLI, tests)
- Phase 5: Migrate Python bindings (internal only)
- Phase 6: Documentation and integration tests
- Phase 7: Promote to stable (future)

Benefits:
- Lower risk of regressions
- Easier rollback if issues found
- Time to validate experimental API stability
- No immediate breaking changes for users

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Migrate Python Artifact class to use experimental::artifact API while
maintaining full backward compatibility:

Rust PyO3 bindings:
- Add PyArtifact class wrapping experimental::Artifact
- Implement all methods: load, from_oci_archive, from_oci_dir, save, push, pull
- Export PyArtifact to Python as _PyArtifact

Python API changes:
- Update Artifact class to use _PyArtifact internally
- Replace _base with _rust for internal artifact
- Maintain all existing methods and properties
- No breaking changes to public API

Testing:
- All 93 unit tests pass
- All 1 doctest passes
- All 37 benchmark tests pass

This completes Phase 5 (Python API migration to experimental).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Changed PyArtifactBuilder to use Option<ExperimentalBuilder> instead of Mutex for proper PyO3 ownership handling
- Updated add_layer() to accept &Bound<PyBytes> for stubgen compatibility
- Added ArtifactBuilder.new_dir() method to Python API for oci-dir format support
- Regenerated Python type stubs with stubgen
- Updated test_artifact_dual_format.py to use new API
- Removed unused imports (dataclass, ABC, abstractmethod)
- All 93 unit tests + 1 doctest + 37 benchmark tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Removed ArtifactArchive and ArtifactDir from artifact.rs
- Deleted builder.rs containing ArtifactArchiveBuilder and ArtifactDirBuilder
- Updated lib.rs to only register experimental Artifact API classes
- Cleaned up unused imports (Image, OciArchive, OciDir, Artifact, derive_more)
- Regenerated Python type stubs

All tests passing (93 unit + 1 doctest + 37 benchmark + all adapters)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Python側のAPI整理が完了:
- PyArtifactとPyArtifactBuilderの実装
- 古いArtifact関連クラスの削除
- 全テスト通過
- stubgen完了

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Changed import from ommx::artifact::Artifact to ommx::experimental::artifact::Artifact
- Updated get_manifest() calls to use &mut self
- Changed save() call in Load command (experimental API uses save() for local registry)
- Changed save() to save_as_archive() in Save command
- Added comments to clarify pull() and save() behavior
- All 313 Rust tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
CLIツールのexperimental::artifactへの移行が完了:
- ommx::experimental::artifact::Artifactを使用
- 全Rustテスト通過(313テスト)
- Phase 4 & 5の完了基準を全て達成

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit completes the migration from the parametric Artifact<T> implementation
to the enum-based experimental::artifact API.

Changes:
- Added Builder::for_github() to experimental::artifact for dataset packaging
- Migrated all usage of ommx::artifact::{Artifact, Builder} to experimental versions:
  * rust/dataset/src/qplib.rs
  * rust/dataset/src/miplib2017.rs
  * rust/ommx/examples/pull_artifact.rs
  * rust/ommx/src/dataset/{qplib,miplib2017}.rs (load functions)
- Removed old Artifact<T> implementation and Builder from ommx::artifact
- Updated get_images() to use experimental::artifact::Artifact
- Deleted rust/ommx/src/artifact/builder.rs

The old parametric type implementation had issues with Local Registry behavior
after adding dual-format (oci-dir/oci-archive) support. The experimental
enum-based API properly handles format detection and switching.

Tests: All 313 Rust tests + all Python tests passing
This commit completes the migration by promoting the experimental artifact API
to the stable ommx::artifact module.

Changes:
- Moved experimental/artifact/* files to artifact/ directory
  * builder.rs, io.rs, layers.rs, tests.rs
- Integrated Artifact enum definition into artifact.rs
- Updated artifact.rs with new module declarations
- Updated all imports from experimental::artifact to artifact:
  * Rust code (dataset, examples, bin)
  * Python bindings (artifact.rs)
- Removed experimental module entirely
- Commented out experimental module in lib.rs

The new unified Artifact enum API properly handles dual-format (oci-dir/oci-archive)
support and replaces the old parametric Artifact<T> implementation.

Tests: All 309 Rust tests + all Python tests passing
- ARTIFACT.md: Clarified format migration section to focus on external specification
- CLAUDE.md: Added Artifact API usage examples for both Rust and Python
- Removed TODO.md (completed tasks, details now in documentation)

The unified Artifact enum API provides transparent dual-format support
(oci-dir/oci-archive) with automatic detection and conversion.
- Removed duplicate auth_from_env() function (already exists in artifact/io.rs)
- Removed unused import 'bail'
- Box Remote variant to reduce enum size (280 bytes -> ~48 bytes)
- Updated image_name() to use as_mut() for boxed Remote variant

All tests passing (309 Rust tests)
- Add `add_source()` and `add_description()` convenience methods to Builder
- Fix layer filtering in pull_artifact.rs to use correct media type comparison
- All 309 Rust tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support both oci-dir and oci-archive formats for OMMX Local Registry storage

2 participants