feat(cluster): TP-default dispatch + TensorParallelEngine scaffold by anupsv · Pull Request #194 · Layr-Labs/d-inference

anupsv · 2026-05-21T05:29:20Z

Summary

Adds the parallelism dispatcher and tensor-parallel inference scaffold to provider-swift. TP becomes the default strategy on 2-Mac Thunderbolt 5 clusters; PP remains as the fallback for models without a *TP variant, hardware that doesn't support DistributedGroup (non-M5 / rdma_ctl disabled), or when heads don't shard evenly.

Why TP > PP for 2-Mac single-stream decode

	PP (existing)	TP (new)
Per-token rank-0 work	layers 0..N/2	layers 0..N at half-width
Per-token rank-1 work	layers N/2..N	layers 0..N at half-width
Per-Mac GPU utilization	~50% (ranks take turns)	~100% (both ranks always working)
Per-token cross-Mac transfers	1 activation tensor	2N small allreduces
Single-stream decode latency	~T_compute	~T_compute / 2 + 2N · T_allreduce ≈ ½ × PP on TB5

Stack

This PR is the third in a 3-PR stack. The submodule pointers in this PR pick up the heads of the matching PRs:

Layer	PR	Submodule SHA
`Cmlx` library product + jaccl backend	`Layr-Labs/mlx-swift#2`	(prerequisite, already on this branch's base)
Sharded linear primitives	`Layr-Labs/mlx-swift#3`	`libs/mlx-swift` → `a83e602`
`LlamaModelTP`	`Layr-Labs/mlx-swift-lm#25`	`libs/mlx-swift-lm` → `b06fa03`
This PR: dispatch + CLI flag	(here)	—

All three forks' PRs need to merge before this lands cleanly on master.

What's in this PR

New files

File	Purpose
`provider-swift/Sources/ProviderCore/P2P/Parallelism.swift`	`Parallelism` enum (`tp`/`pp`/`single`/`auto`) + `Parallelism.decide(DecisionInputs)` returning `(chosen, reason)`. Encapsulates the fallback table below.
`provider-swift/Sources/ProviderCore/P2P/TensorParallelInference.swift`	`TensorParallelEngine` (rank 0) + `TensorParallelServer` (rank 1) scaffolds, parallel to `EncryptedPipelineEngine` / `EncryptedPipelineServer`. The decode loop is intentionally TODO — see "Status" section below.
`provider-swift/Tests/ProviderCoreTests/ParallelismTests.swift`	14 tests covering every branch of the decision table. All pass.

Modified

File	Change
`provider-swift/Sources/darkbloom/StartCommand.swift`	New `--parallelism {auto, tp, pp, single}` flag (default `auto`). Logged at RDMA-enabled startup.
`libs/mlx-swift` submodule	Bumped to `a83e602` (PR #3 head).
`libs/mlx-swift-lm` submodule	Bumped to `b06fa03` (PR #25 head).

Auto-decide fallback table

`operatorChoice`	Condition	→ Chosen	Reason
any	`worldSize < 2`	`single`	"no cluster peer connected"
`single`	—	`single`	operator explicit
`pp`	—	`pp`	operator explicit
`tp`	`distributedGroupAvailable == false`	`single` (refuses to downgrade)	operator asked for TP; cap missing
`tp`	`modelHasTPVariant == false`	`single` (refuses to downgrade)	operator asked for TP; model has no TP variant
`tp`	heads don't divide	`single`	operator asked for TP; heads don't divide
`tp`	all OK	`tp`	operator explicit + achievable
`auto`	`distributedGroupAvailable == false`	`pp`	DistributedGroup unavailable
`auto`	`modelHasTPVariant == false`	`pp`	loaded model has no TP variant
`auto`	heads don't divide	`pp`	heads don't divide evenly
`auto`	all OK	`tp`	auto → tp: all capabilities OK

Explicit --parallelism tp failing closed (rather than silently downgrading to PP) is deliberate — if an operator asked for TP, a missing capability is a misconfiguration worth surfacing instead of masking with a worse-performance fallback.

Status: scaffold, not full runtime wiring

The cluster runtime in ClusterCommand.swift is currently a stub (see line 431: "Integrate EncryptedPipelineEngine with the coordinator request queue to route inference."). Both PP and TP engines exist as code paths but neither is currently driving live inference — the request queue → cluster session integration is the missing piece.

This PR delivers:

✅ The dispatcher (Parallelism.decide) — fully implemented, 14 tests passing
✅ The --parallelism CLI flag — wired to ProviderLoopConfig
✅ TensorParallelEngine / TensorParallelServer API shape — matches EncryptedPipelineEngine / EncryptedPipelineServer so the wiring follow-up is a mechanical swap
✅ Submodule bumps to the heads of PR Bump next from 15.3.9 to 15.5.14 in /web in the npm_and_yarn group across 1 directory #3 + PR bug: stt_server crashes with CohereLabs/cohere-transcribe-03-2026 — mlx_audio 0.4.2 tokenizer.model not found #25 — verifies the stack builds end-to-end

What's deferred to a follow-up:

❌ Wiring the engines into ClusterSession / ClusterPeer driven by the coordinator request queue
❌ Real jaccl DistributedGroup initialization (env vars, two-Mac handshake coordination, MR-count tuning)
❌ Two-Mac Thunderbolt 5 smoke test that runs the full TP loop end-to-end
❌ Quantized TP support in LlamaModelTP.sanitize (currently 4-bit weights are passed through unsliced — works on a single rank but doesn't shard memory)

Test plan

swift build succeeds via the d-inference workspace
swift test --filter "parallelism" passes all 14 dispatcher tests
swift test --filter "LlamaTPTests" passes all 8 model-level tests (via the mlx-swift-lm submodule)
No regression for existing single-rank or PP code paths — engines are pure additions
Two-Mac TB5 smoke test once the runtime wiring lands

Adds the Parallelism dispatcher that selects between tensor-parallel, pipeline-parallel, and single-rank inference based on operator choice and runtime capability. TP wins by default on 2-Mac Thunderbolt 5 clusters because both Macs run all transformer layers in parallel with per-layer allreduce, roughly halving single-stream decode latency vs PP (where one Mac waits while the other runs). New: - ProviderCore/P2P/Parallelism.swift: enum {tp, pp, single, auto} + decide() that takes DecisionInputs (operatorChoice, worldSize, modelHasTPVariant, attentionHeads, kvHeads, distributedGroupAvailable) and returns the chosen strategy with an operator-readable reason. - ProviderCore/P2P/TensorParallelInference.swift: TensorParallelEngine (rank 0) and TensorParallelServer (rank 1) scaffolds, parallel to EncryptedPipelineEngine/EncryptedPipelineServer. The decode loop is intentionally TODO — the cluster runtime is currently a stub at ClusterCommand.swift:431 ("Integrate ... with the coordinator request queue"). Wiring will land in a follow-up that touches both PP and TP engines together. - darkbloom CLI: --parallelism flag on `serve` (auto, tp, pp, single). Defaults to auto. Logged at startup as the operator's preference; the final strategy is resolved at session-handshake time when the peer's capabilities are known. Auto fallback (Parallelism.decide auto path): - worldSize < 2 → single (no peer) - DistributedGroup unavailable (non-M5, rdma_ctl disabled, macOS < 26.2) → pp (TB5 transfer is enough; jaccl isn't required for PP) - modelHasTPVariant == false (non-Llama until they get *TP variants) → pp via callPartial - heads % worldSize != 0 → pp - everything OK → tp Explicit --parallelism tp fails closed when capabilities are missing: falls back to single rather than silently downgrading to pp, so a misconfiguration is visible to the operator instead of masked. Submodule bumps: - libs/mlx-swift → a83e602 (Layr-Labs/mlx-swift#3, sharded linear primitives) - libs/mlx-swift-lm → b06fa03 (Layr-Labs/mlx-swift-lm#25, LlamaModelTP) Tests (Tests/ProviderCoreTests/ParallelismTests.swift): 14 tests covering the auto-decide table, single-rank short-circuit, explicit operator overrides, fail-closed semantics for --parallelism tp, and the canShard divisibility helper. All pass. Related: #193 (upstream mlx-swift distributed deviation tracker).

vercel · 2026-05-21T05:29:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	May 21, 2026 7:19am
d-inference-console-ui-dev	Ready	Preview	May 21, 2026 7:19am
d-inference-landing	Ready	Preview	May 21, 2026 7:19am

github-actions · 2026-05-21T05:33:30Z

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-21 07:20 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.792s
Throughput	1.1 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	1.714s	1.715s	1.715s	1.715s
parse	4	19µs	16µs	34µs	34µs
reserve	4	2ms	3ms	4ms	4ms
route	4	380µs	394µs	448µs	448µs
coordinator_to_provider	4	1.709s	1.709s	1.71s	1.71s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=19.25µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=34µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.4945ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.725ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	20
Success	4
Errors	16
Total Duration	2.262s
Throughput	1.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.259s	2.26s	2.262s	2.262s
parse	4	0s	1ms	1ms	1ms
reserve	4	7ms	7ms	9ms	9ms
route	4	765µs	812µs	893µs	893µs
coordinator_to_provider	4	1.682s	1.682s	1.684s	1.684s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=455.75µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=1.174ms (threshold=5ms)
reserve:mean<=50ms	PASS	mean=6.77475ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=9.297ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	4	0.5 GB
mlx-community/gemma-3-270m-4bit	3	0.2 GB

Metric	Value
Total Requests	50
Success	50
Errors	0
Total Duration	10.987s
Throughput	4.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	50	610ms	4ms	3.678s	3.685s
parse	49	27µs	17µs	74µs	318µs
reserve	49	1ms	1ms	3ms	3ms
route	49	0s	0s	1ms	1ms
coordinator_to_provider	50	556ms	1ms	3.671s	3.679s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=27.122µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=74µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.490755ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.922ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	12
Errors	48
Total Duration	3.685s
Throughput	3.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	12	2.159s	2.155s	2.168s	2.168s
parse	12	14µs	12µs	27µs	27µs
reserve	12	2ms	2ms	3ms	3ms
route	12	516µs	492µs	772µs	772µs
coordinator_to_provider	12	2.151s	2.148s	2.161s	2.161s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=13.583µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=27µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.166583ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.969ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	40
Success	5
Errors	35
Total Duration	3.141s
Throughput	1.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	5	2.322s	2.164s	2.952s	2.952s
parse	5	18µs	16µs	25µs	25µs
reserve	5	3ms	3ms	3ms	3ms
route	5	588ms	1ms	2.937s	2.937s
queue_wait	1	2.937s	2.937s	2.937s	2.937s
dispatch	1	30µs	30µs	30µs	30µs
coordinator_to_provider	5	1.728s	2.158s	2.158s	2.158s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=25µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=3.1876ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.324ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:mean<=5ms	PASS	mean=30µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=30µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	60
Success	60
Errors	0
Total Duration	5.777s
Throughput	10.4 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	60	399ms	5ms	2.388s	2.388s
parse	60	17µs	16µs	31µs	78µs
reserve	60	1ms	1ms	2ms	2ms
route	60	453µs	411µs	783µs	901µs
coordinator_to_provider	60	396ms	2ms	2.38s	2.382s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=17.05µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=31µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.27505ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.378ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	1	0.5 GB

Metric	Value
Total Requests	30
Success	4
Errors	26
Total Duration	3.134s
Throughput	1.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	4	2.163s	2.163s	2.163s	2.163s
parse	4	18µs	18µs	29µs	29µs
reserve	4	2ms	2ms	3ms	3ms
route	4	515µs	486µs	803µs	803µs
coordinator_to_provider	4	2.158s	2.158s	2.158s	2.158s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=18µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=29µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=2.39ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=2.509ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	4.106s
Throughput	7.3 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	675ms	7ms	2.028s	2.028s
parse	30	20µs	16µs	45µs	69µs
reserve	30	2ms	1ms	3ms	4ms
route	30	482µs	491µs	768µs	823µs
coordinator_to_provider	30	670ms	5ms	2.021s	2.021s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=19.8µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=45µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.711433ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.189ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	5	0.5 GB

Metric	Value
Total Requests	30
Success	30
Errors	0
Total Duration	3.96s
Throughput	7.6 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	30	747ms	4ms	2.251s	2.252s
parse	30	19µs	15µs	49µs	66µs
reserve	30	2ms	1ms	3ms	5ms
route	30	431µs	395µs	771µs	789µs
coordinator_to_provider	30	743ms	1ms	2.243s	2.245s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=19.033µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=49µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=1.7278ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=3.346ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:present	FAIL	no data for segment dispatch

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model	Providers	RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit	3	0.5 GB

Metric	Value
Total Requests	100
Success	13
Errors	87
Total Duration	3.389s
Throughput	3.8 req/s

Latency Decomposition

Segment	Count	Mean	P50	P95	Max
total_e2e	13	2.229s	2.147s	3.22s	3.22s
parse	13	294µs	141µs	980µs	980µs
reserve	13	7ms	7ms	9ms	9ms
route	13	260ms	18ms	3.167s	3.167s
queue_wait	1	3.167s	3.167s	3.167s	3.167s
dispatch	1	59µs	59µs	59µs	59µs
coordinator_to_provider	13	1.939s	2.099s	2.101s	2.101s

Assertion Report: FAIL

Assertion	Result	Detail
parse:mean<=1ms	PASS	mean=293.769µs (threshold=1ms)
parse:p95<=5ms	PASS	p95=980µs (threshold=5ms)
reserve:mean<=50ms	PASS	mean=7.441076ms (threshold=50ms)
reserve:p95<=200ms	PASS	p95=9.168ms (threshold=200ms)
encrypt:present	FAIL	no data for segment encrypt
dispatch:mean<=5ms	PASS	mean=59µs (threshold=5ms)
dispatch:p95<=50ms	PASS	p95=59µs (threshold=50ms)

Picks up the quantized TP variant + matrix-level sharding tests on Layr-Labs/mlx-swift-lm#25 (commit 56a5ca6). This makes makeLlamaTP() available for routing the dispatcher to the right TP variant when a 4-bit / 8-bit quantized checkpoint is loaded.

Picks up Layr-Labs/mlx-swift-lm#25 commit 735c28a — drops the misleading LoRA conformance on LlamaModelTP/Q (returns [] with a docstring pointing at LlamaModel for LoRA workflows) and renames the testLlamaTPRejectsNonDivisibleHeads test to match what it actually exercises.

TensorParallelEngine / TensorParallelServer held `model: LlamaModelTP` as a concrete type, which meant the dispatcher couldn't pass the quantized `LlamaModelTPQ` returned by `makeLlamaTP(args:quantization:group:)`. Without this fix, the entire quantized-TP path was dead code — type checking would reject the quantized model at construction time. Both fields now hold `any LLMModel`. The engine needs only the callAsFunction(_:cache:) and newCache(parameters:) operations from the LanguageModel protocol (which LLMModel refines), and both LlamaModelTP and LlamaModelTPQ conform. Callers (the dispatcher in ClusterDiscovery, once wired) are responsible for passing a TP-capable model — a non-TP `LlamaModel` would type-check but produce single-rank semantics. Document this explicitly in the property comment.

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 05:29 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 05:29 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 06:13 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 06:13 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 06:13 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 07:14 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 07:14 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 07:14 View deployment

vercel Bot deployed to Preview – d-inference-landing May 21, 2026 07:18 View deployment

vercel Bot deployed to Preview – d-inference May 21, 2026 07:19 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev May 21, 2026 07:19 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cluster): TP-default dispatch + TensorParallelEngine scaffold#194

feat(cluster): TP-default dispatch + TensorParallelEngine scaffold#194
anupsv wants to merge 4 commits into
rdma-connection-testfrom
feat/tp-default-dispatch

anupsv commented May 21, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anupsv commented May 21, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why TP > PP for 2-Mac single-stream decode

Stack

What's in this PR

New files

Modified

Auto-decide fallback table

Status: scaffold, not full runtime wiring

Test plan

Related

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

1-provider-streaming

Latency Decomposition

Assertion Report: FAIL

1-provider-non-streaming

Latency Decomposition

Assertion Report: FAIL

7-provider-multi-model

Latency Decomposition

Assertion Report: FAIL

3-provider-high-concurrency

Latency Decomposition

Assertion Report: FAIL

1-provider-queue-saturation

Latency Decomposition

Assertion Report: FAIL

3-provider-20-users

Latency Decomposition

Assertion Report: FAIL

1-provider-scaling

Latency Decomposition

Assertion Report: FAIL

3-provider-scaling

Latency Decomposition

Assertion Report: FAIL

5-provider-scaling

Latency Decomposition

Assertion Report: FAIL

3-provider-heavy-100conc-10kb

Latency Decomposition

Assertion Report: FAIL

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anupsv commented May 21, 2026 •

edited by blacksmith-sh Bot

Loading

vercel Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading