#minor: Implement Modal sandbox provider#66
Open
tthuwng wants to merge 1 commit into
Open
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
1b510be to
8bdb173
Compare
eb0585b to
9ee5336
Compare
Fill in the Modal adapter behind the provider-selection interface on current main.\n\n- Add Modal SDK dependency and provider config for env-owned or request/header-provided credentials.\n- Implement create/get/delete/list, exec/streaming command, and file transfer operations.\n- Reject snapshot sources explicitly, document Docker/disk limitations, and keep deployment-owned Modal credentials as the default path.\n- Add provider-level retry coverage for transient Modal connection errors and not-found mapping for removed sandboxes.\n- Add focused Modal provider tests plus README usage/compatibility notes.
9ee5336 to
3b54d37
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the Modal adapter behind the provider-selection interface that is now on
mainafter #65. This PR is rebased/retargeted from the oldjf/provider-selection-cleanstack to currentmain.Refs vals-ai/valkyrie-ticket-library#21.
What changed
modal>=1.4.2and fills inModalProviderConfig/ModalSandboxProvider.create_sandbox,get_sandbox,delete_sandbox,list_sandboxes,exec, streamingcommand,upload_file, anddownload_file.ImageSource->Image.from_registry(...)auto_stop_interval-> Modalidle_timeouttimeout, matching Daytona's exit-code behaviorsandbox.filesystemAPIs for upload/download instead of deprecatedsandbox.open/sandbox.mkdirfile APIs.SnapshotSourcebefore connecting to Modal because Daytona snapshots do not have a Modal equivalent.Resources.enable_docker(defaultfalse). When true, the Modal provider passesexperimental_options={"enable_docker": True}. Daytona remains generic/unchanged; benchmarks still own their Docker-capable image, dockerd startup, compose flow, and cleanup.README.md.Compatibility with Valkyrie provider-selection work
Jarett's Valkyrie provider-selection PR resolves provider config from AWS Secrets Manager as a JSON object and calls
sandbox_provider_config_from_mapping({**secret, "type": provider_type}).This PR supports that path for Modal by accepting Secrets Manager-style keys:
{ "MODAL_TOKEN_ID": "...", "MODAL_TOKEN_SECRET": "...", "MODAL_ENVIRONMENT": "..." }The lowercase request-body form remains supported too:
token_id,token_secret, andenvironment.environmentis optional and should only be set when that Modal environment exists.For benchmarks that need nested Docker, the benchmark service can return:
Valkyrie's provider-selection flow passes
task_data.resourcesintoSandboxCreateRequest, so this stays provider-uniform instead of adding a VCB-specific provider type or Modal-only config blob.Simplification pass
After review, this PR removed non-essential Modal-specific surface area:
sandbox_providerconfig / Secrets Manager mapping.type: "vcb"special case and no CBS-owned dockerd startup.Decisions documented
Modal credentials ownership path
Hosted benchmark services should normally let the service deployment own
MODAL_TOKEN_ID/MODAL_TOKEN_SECRET. In practice that means the registry/deployment environment owns Modal credentials rather than every Valkyrie request carrying them.The adapter still supports optional request-body or Secrets-Manager-sourced credentials for development, self-hosted use, or explicit overrides:
{"type": "modal", "token_id": "...", "token_secret": "...", "environment": "..."}MODAL_TOKEN_ID,MODAL_TOKEN_SECRET, optionalMODAL_ENVIRONMENTDocker-in-sandbox support
Docker-in-sandbox is not enabled by default.
Benchmarks that require dockerd / nested Docker, such as VCB or ProgramBench-style flows, opt in through the generic
Resources.enable_dockerfield. The provider only grants the underlying sandbox capability. The benchmark service is still responsible for using a Docker-capable image, starting dockerd with provider-compatible flags, running compose, and cleaning up containers/volumes.This keeps the CBS contract uniform across providers and avoids benchmark-specific provider types.
Retry semantics parity with Daytona
Transient Modal connection errors now map to
SandboxConnectionErrorand are retried up to 3 attempts with a fixed 2s wait on provider operations and process/file startup paths. Non-transient Modal errors are not retried, and nonzero command exits still surface asSandboxCommandErrorwith the original exit code.Validation
Local validation on this branch after the latest generic Docker opt-in pass:
uv run ruff check .— passeduv run pytest tests/test_modal_sandbox.py tests/test_client.py -q— 52 passeduv run pytest -q— 233 passed, 1 Starlette/httpx deprecation warninguv run basedpyright .— 0 errors, 0 warningsCoverage added/verified:
Resources.enable_dockerparsing from retrieved task metadata.experimental_options={"enable_docker": True}only whenresources.enable_dockeris true.sandbox_provider: {"type": "modal"}and routes through the provider-selection contract.Live Modal smoke status:
valsaiwithout printing token values.BenchmarkServiceAppwebsockets: create Modal sandbox, call/ws/setup-taskwithsandbox_provider: {"type": "modal"}, upload/exec inside the benchmark setup hook, call/ws/evaluate-instance, download/exec inside the benchmark evaluation hook, and delete the sandbox.Resources(enable_docker=True): createdocker:27-dindsandbox, startdockerd --iptables=false --ip-masq=false, rundocker run --rm hello-world, and delete the sandbox.breathing_exercise_app: load the real VCB task spec, runVibeCodeBenchSetup, materialize a generated app, start it through VCBAppManagerwithdocker compose up -d --build, verifyhttp://localhost:18080returns 200 with the task marker, stop compose, and delete the Modal sandbox. This validates VCB setup + Docker runtime on Modal; it intentionally does not claim full Supabase/browser-use LLM grading coverage.GitHub CI on this branch: