Skip to content

fix(transcription): suppress whisper non-speech tokens#1

Open
maubrowncow wants to merge 1 commit into
mainfrom
maubrowncow/strip-bracketed-sounds
Open

fix(transcription): suppress whisper non-speech tokens#1
maubrowncow wants to merge 1 commit into
mainfrom
maubrowncow/strip-bracketed-sounds

Conversation

@maubrowncow

@maubrowncow maubrowncow commented May 28, 2026

Copy link
Copy Markdown

Summary

Tell the Whisper decoder not to emit bracketed sound labels like [cough], [Music], [applause], [♪♪♪], [inaudible] in the first place — by flipping suppress_blank and suppress_nst on whisper.cpp params. Add a backstop cleanup prompt rule for the WhisperKit path on Apple Silicon, which doesn't expose the same single flag.

Motivation

Whisper sometimes emits environmental sound labels into transcripts. The whisper.cpp library has built-in decoder flags (suppress_blank, suppress_nst) designed for exactly this — they bias the decoder against the specific token IDs used to denote non-speech events, without affecting regular spoken words like "music" or "cough".

Changes

File Change
native-core/windows/src/model_whisper_cpp.rs Set set_suppress_blank(true) and set_suppress_nst(true) on whisper-rs FullParams
native-core/macos/Sources/WhisperCppBridge/WhisperCppBridge.swift Set params.suppress_blank = true and params.suppress_nst = true on whisper.cpp whisper_full_params
native-core/shared/prompts/cleanup-default-instructions.md Add rule 4: drop bracketed non-speech annotations during cleanup (backstop for the WhisperKit path)

Coverage

Backend Used on Fix
whisper-rs Windows Decoder flag ✅
whisper.cpp bridge macOS Intel / x86_64 Decoder flag ✅
WhisperKit macOS Apple Silicon Prompt-based cleanup (decoder flag not exposed as a single boolean)

WhisperKit's DecodingOptions.supressTokens takes a [Int] of token IDs and the library has a // TODO: implement these as default comment for the non-speech list. Reimplementing OpenAI's non_speech_tokens() against WhisperKit's tokenizer is ~80 lines accessing internal APIs and is fragile across WhisperKit upgrades, so the prompt backstop is the better trade until WhisperKit ships the default.

Caveats

  • Apple Silicon users with cleanup disabled may still see [cough] / [Music]. Cleanup is enabled by default.
  • suppress_nst is a probability bias, not a hard ban — extremely rare leakage is still possible.

Test plan

  • Manual on Apple Silicon: dictate with cleanup enabled near a noise source, confirm [cough]/[Music] no longer appear.
  • Manual on Windows or Intel macOS: same, with cleanup disabled, confirm decoder-level suppression works.
  • Regression: dictate text legitimately containing the words "music" / "applause" / "cough" and confirm they still transcribe normally.

Privacy / Security

No new network calls, no telemetry. Changes are local decoder configuration plus a prompt string.

🤖 Generated with Claude Code

@maubrowncow maubrowncow force-pushed the maubrowncow/strip-bracketed-sounds branch from 7ff9d70 to 743c243 Compare May 28, 2026 23:55
@maubrowncow maubrowncow changed the title fix(transcription): strip whisper non-speech annotations fix(prompts): drop bracketed non-speech annotations during cleanup May 28, 2026
Set suppress_blank and suppress_nst on whisper.cpp decoder params on
both the Windows whisper-rs path and the macOS whisper.cpp bridge so
the decoder never emits bracketed sound labels like [cough], [Music],
or [applause] in the first place. WhisperKit on Apple Silicon doesn't
expose this as a single flag (it takes a token-ID list with a TODO
default), so add a backstop rule to the default cleanup prompt to
drop any annotations that still leak through there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@maubrowncow maubrowncow force-pushed the maubrowncow/strip-bracketed-sounds branch from 743c243 to af34f6a Compare May 29, 2026 00:01
@maubrowncow maubrowncow changed the title fix(prompts): drop bracketed non-speech annotations during cleanup fix(transcription): suppress whisper non-speech tokens May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant