Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,12 @@ Thumbs.db

tmp/

# Local snapshots written by `npm run push` for `npm run rollback` recovery.
# Operator-local; not shared.
.vapi-state.*.snapshots/

# Local agent state
.claude/

# Local-only audit notes (not part of the upstream repo)
requested improvements.md
.agent/
.agent/handoffs/
.claude/handoffs/
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
| Building outbound calling agents | `docs/learnings/outbound-agents.md` |
| Voicemail detection / VM vs human classification | `docs/learnings/voicemail-detection.md` |
| Enforcing call time limits / graceful call ending | `docs/learnings/call-duration.md` |
| Voice provider field cheat-sheet (Cartesia vs 11labs vs OpenAI etc.) | `docs/learnings/voice-providers.md` |

---

Expand All @@ -50,6 +51,7 @@ This project manages **Vapi voice agent configurations** as code. All resources
| Pull latest from Vapi | `npm run pull -- <org>`, `--force`, or `--bootstrap` |
| Pull one known remote resource | `npm run pull -- <org> --type assistants --id <uuid>` |
| Push only one file | `npm run push -- <org> resources/<org>/assistants/my-agent.md` |
| Push multiple specific files | `npm run push -- <org> <path1> <path2>` (one state-file rewrite at the end) |
| Test a call | `npm run call -- <org> -a <assistant-name>` |

---
Expand Down Expand Up @@ -744,6 +746,7 @@ npm run pull -- <org> --type squads --id <uuid> # Pull one known remote resou
npm run push -- <org> # Push all local changes to Vapi
npm run push -- <org> assistants # Push only assistants
npm run push -- <org> resources/<org>/assistants/my-agent.md # Push single file
npm run push -- <org> <path1> <path2> # Push multiple specific files (one state write)
npm run apply -- <org> # Pull then push (full sync)

# Testing
Expand Down
20 changes: 20 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,26 @@ When both files exist, follow both. If guidance overlaps, treat `AGENTS.md` as t
- WebSocket transport → `docs/learnings/websocket.md`
- Call time limits / graceful ending → `docs/learnings/call-duration.md`

## Improvements log

This repo maintains an upstream-only running log at `improvements.md` (repo
root). It tracks engine friction, footguns, and improvement ideas surfaced
during real customer work — both before and after fixes land.

**When you (Claude or human) hit something that makes you go "this should be
better," append or update an entry in `improvements.md` in the same change.**
The format is **Problem → Current behavior → Risk → Current mitigation →
Possible fix → Status**, ordered by severity / blast radius. Cite source
file paths with line numbers so future readers can verify your claims.

When a fix lands, mark the entry `[RESOLVED YYYY-MM-DD] (#<PR-number>)` at
the top — don't delete it. The history is the point.

Customer-fork logs (`gitops-mudflap/improvements.md`,
`gitops-amazon3p/improvements.md`) feed upstream: when an entry there is
generic enough to apply across customers, surface it here in the same
revision.

## Test-Call CLI Notes

When debugging a customer issue with `npm run call -- <org> -s <squad>`:
Expand Down
4 changes: 2 additions & 2 deletions docs/learnings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Each file targets a specific topic so you can load only the context you need.
| Bulk-dialing from a CSV (Outbound Call Campaigns) | [outbound-campaigns.md](outbound-campaigns.md) |
| Voicemail detection / VM vs human classification | [voicemail-detection.md](voicemail-detection.md) |
| Enforcing call time limits / graceful call ending | [call-duration.md](call-duration.md) |
| Authoring YAML resource files (scalar coercion, frontmatter conventions) | [yaml-conventions.md](yaml-conventions.md) |
| Voice provider field cheat-sheet (Cartesia vs 11labs vs others) | [voice-providers.md](voice-providers.md) |

---

Expand All @@ -44,7 +44,7 @@ Gotchas and silent defaults for each resource type:
| [structured-outputs.md](structured-outputs.md) | Schema type gotchas, assistant_ids, default models, target modes, KPI patterns |
| [simulations.md](simulations.md) | Personalities, evaluation comparators, chat-mode gotcha, missing references, full `/eval/simulation/*` API reference |
| [webhooks.md](webhooks.md) | Default server messages, timeouts, unreachable servers, credential resolution, payload shape |
| [yaml-conventions.md](yaml-conventions.md) | YAML 1.1 boolean coercion (`off`/`yes`/`no`), whitespace-truthy gotchas, discriminated-union sentinels, deprecated-field footguns, multi-line block scalars, anchors/aliases, frontmatter fence rules |
| [voice-providers.md](voice-providers.md) | Per-provider voice block layout (Cartesia vs 11labs vs OpenAI/Azure/Rime/LMNT/Minimax/Neuphonic/SmallestAI) — saves 400s at push time |

### Troubleshooting Runbooks

Expand Down
97 changes: 97 additions & 0 deletions docs/learnings/voice-providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Voice Providers — Field Cheat-Sheet

The `voice` block on an assistant or `membersOverrides.voice` on a squad is **provider-specific**. Same conceptual field (e.g. "speed") lives at different paths depending on the provider. The Vapi platform rejects misplaced fields with a generic `property X should not exist` 400 — it does not point to the correct path. This page is the lookup table.

> **When a 400 says "property X should not exist":** check this page for the provider's field layout before re-pushing. The engine has no schema awareness and will accept whatever you write, then surface the error only after the push reaches the API.

---

## Quick lookup

| Field | 11labs | Cartesia (sonic-3) | OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI |
|-------|--------|---------------------|------------------------------------------------------------------|
| Speech rate | `voice.speed` (0.7–1.2) | `voice.generationConfig.speed` (0.6–1.5) | `voice.speed` |
| Stability / consistency | `voice.stability` (0.0–1.0) | — (not exposed) | — |
| Voice similarity | `voice.similarityBoost` (0.0–1.0) | — | — |
| SSML parsing | `voice.enableSsmlParsing: true` | (parsed natively, no flag) | varies — see provider docs |
| Pronunciation dictionary | `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`) | `voice.pronunciationDictId` (single string id; not in Vapi docs but accepted as a Cartesia passthrough) | — |
| Volume control | — | `voice.generationConfig.volume` (0.5–2.0) | — |
| Emotion / accent (experimental) | — | `voice.experimentalControls.emotion`, `voice.experimentalControls.speed` (-1 to 1, older API) | — |

---

## 11labs

```yaml
voice:
provider: 11labs
voiceId: <uuid-or-name>
model: eleven_turbo_v2 # or eleven_flash_v2_5
speed: 1.05 # 0.7–1.2
stability: 0.6 # 0.0–1.0; higher = less expressive variation
similarityBoost: 0.75 # 0.0–1.0; higher = closer to source voice
enableSsmlParsing: true # required for `<break>`, `<flush/>`, etc.
pronunciationDictionaryLocators: # ElevenLabs PLS dictionaries; multiple allowed
- pronunciationDictionaryId: rjshI10OgN6KxqtJBqO4
versionId: xJl0ImZzi3cYp61T0UQG
```

Common pitfalls:
- `voice.generationConfig.*` — **does not exist** for 11labs. That's a Cartesia path. Push will 400.
- Forgetting `enableSsmlParsing: true` — SSML tags will be spoken literally.
- `voice.pronunciationDictId` (single string) — that's the Cartesia shape. 11labs uses `voice.pronunciationDictionaryLocators[]` (array of `{pronunciationDictionaryId, versionId}`). Reference: <https://docs.vapi.ai/assistants/pronunciation-dictionaries>.

**Pronunciation dictionary warning (11labs):** dashboard edits that change the voice can drop `pronunciationDictionaryLocators` entries silently — the same drift class as Cartesia, just with the array shape. Treat the locators array as part of the voice's identity during edits.

---

## Cartesia (sonic-3)

```yaml
voice:
provider: cartesia
model: sonic-3
voiceId: <uuid>
pronunciationDictId: pdict_<id> # optional but sticky — see warning below
generationConfig:
speed: 1.1 # 0.6–1.5
volume: 1.0 # 0.5–2.0
experimentalControls:
speed: 0.0 # -1 to 1 (older API path)
emotion: ["positivity:high"]
```

**Forbidden at top level for Cartesia (will 400):**
- `voice.speed` — use `voice.generationConfig.speed` instead.
- `voice.enableSsmlParsing` — Cartesia parses SSML (`<break time='0.4s'/>`, `<speed ratio='0.9'/>`) natively from the text stream; no opt-in flag exists.
- `voice.stability`, `voice.similarityBoost` — those are 11labs fields.

**Pronunciation dictionary warning (Cartesia):** changing the `voiceId` in the Vapi dashboard's voice picker silently drops `pronunciationDictId` from the resource. If you swap the Cartesia voice via the dashboard, re-attach the dictionary on the next pull or it will be gone. Treat `(voiceId, pronunciationDictId)` as one atomic unit during edits. Note: `voice.pronunciationDictId` for Cartesia is observed in real customer payloads but is not in the Vapi docs (Vapi only documents the 11labs `pronunciationDictionaryLocators[]` shape — see the 11labs section above). Vapi appears to pass the field through to Cartesia's native API; behavior may change without notice.

---

## OpenAI / Azure / Rime / LMNT / Minimax / Neuphonic / SmallestAI

```yaml
voice:
provider: openai # or azure, rime, lmnt, minimax, neuphonic, smallestai
voiceId: <provider-voice-id>
model: <provider-model> # e.g. tts-1-hd for openai
speed: 1.0 # top-level for these providers
```

These providers expose `speed` at the top of the `voice` block. Refer to the [Vapi voice provider docs](https://docs.vapi.ai/providers/voice) for additional provider-specific fields (instructions, language hints, etc.).

---

## Switching providers

When migrating an assistant or squad member from Cartesia to 11labs (or vice versa), the field layout flips. If you carry over `generationConfig` from a Cartesia config to an 11labs voice, the next push will 400. Always rewrite the voice block from the target provider's template; do not patch in place.

If a customer changes the provider on the dashboard and your local YAML still has the old nesting, `pull` will overwrite it cleanly — but a subsequent `push` from a stale branch will 400. Pull first, then edit.

---

## Adding a new provider

If you find yourself reaching for a provider not in the table above, append a row here in the same PR. The cheat-sheet only stays useful if it grows with the platform.
Loading