Skip to content

Fix meeting audio capture and source diarization#341

Open
sjug wants to merge 4 commits intopeteonrails:mainfrom
sjug:fix/meeting-audio-diarization
Open

Fix meeting audio capture and source diarization#341
sjug wants to merge 4 commits intopeteonrails:mainfrom
sjug:fix/meeting-audio-diarization

Conversation

@sjug
Copy link
Copy Markdown

@sjug sjug commented May 1, 2026

Description

This PR fixes several meeting-mode issues around mic capture, final chunk handling, voice activity detection tuning, and source-based diarization.

The changes are split into four commits:

  1. Honor [meeting.audio].mic_device
  2. Flush buffered meeting audio on stop
  3. Make meeting VAD threshold configurable
  4. Preserve simple diarizer segment boundaries

Honor [meeting.audio].mic_device

mic_device already existed in config and docs, but meeting mode always passed the main [audio] config into DualCapture. That meant a meeting-specific mic device override was ignored.

This now builds the meeting capture config at the call site:

  • "default" and "" both mean “use the normal dictation audio device”
  • explicit values override the mic used for meeting capture only
  • an INFO log is emitted only when the override is active, showing both the meeting mic and dictation mic

This keeps the normal/default path unchanged.

Flush buffered meeting audio on stop

DualCapture::stop() returns any samples that arrived since the last poll, but stop_meeting() previously discarded them. That could drop speech near the end of a meeting, especially for short recordings or when the user stops shortly after speaking.

This change:

  • keeps samples returned by capture.stop().await
  • appends them to the meeting buffers
  • flushes the final partial chunk before saving/stopping the meeting
  • extracts shared chunk processing into helpers so the periodic poll path and final flush path use the same logic

Make meeting VAD threshold configurable

Meeting chunks use a simple RMS gate before transcription. The existing default remains 0.01 for backwards compatibility, but quiet microphones can now tune it via:

[meeting.audio]
vad_threshold = 0.001

The threshold is wired through:

  • user config: MeetingAudioConfig
  • daemon meeting config construction
  • CLI meeting command config construction
  • internal MeetingConfig
  • ChunkConfig

This also improves diagnostic logging for skipped meeting chunks by including source, duration, RMS, and threshold at debug level. Transcription logs now include the audio source (Microphone or Loopback) so mic/loopback behavior is easier to debug.

Preserve simple diarizer segment boundaries

The simple diarizer assigns speakers based on audio source:

  • microphone -> You
  • loopback -> Remote

It previously merged adjacent same-speaker transcript segments. However, the caller applies diarization results back to transcript segments positionally with zip(). If the diarizer returned fewer segments than ASR produced, later transcript segments silently missed speaker_id.

This change removes merging from SimpleDiarizer, preserving a 1:1 mapping between transcript segments and diarized segments. As a result, every mic/loopback segment receives the expected speaker label.

Related Issue

N/A

Type of Change

  • Bug fix
  • New feature
  • Documentation update

Testing

  • I have tested these changes locally

Commands run:

cargo check
cargo test meeting::chunk::
cargo test meeting::diarization::simple::
cargo test config::tests::test_meeting
cargo test config::tests::test_parse_meeting_config_with_nested_sections
cargo test

Results:

  • cargo check passed.
  • Targeted meeting/config tests passed.
  • Full cargo test passed:
    • 539 unit tests passed
    • 25 integration tests passed
    • doc tests passed

Additional non-mutating checks run:

cargo fmt --check
cargo clippy --all-targets --all-features

Results:

  • cargo fmt --check failed due existing unrelated formatting differences across the repo. This PR does not include those unrelated formatting changes.
  • cargo clippy --all-targets --all-features failed because --all-features enables Whisper ROCm/HIP and this machine does not have hipcc in PATH. It also reported two existing build.rs uninlined_format_args warnings before failing.

Manual validation:

  • Verified mic-only meeting recording produced transcript segments instead of zero segments.
  • Verified mixed mic + loopback meeting recording produced both You and Remote speakers.
  • Verified every exported transcript segment had a speaker label after the simple diarizer fix.

Documentation

  • I have updated documentation as needed

Updated docs:

  • docs/CONFIGURATION.md
  • docs/USER_MANUAL.md
  • docs/MEETING_MODE.md

Additional Notes

The default meeting VAD threshold remains 0.01 to preserve existing behavior. Users with quiet meeting microphones can lower [meeting.audio].vad_threshold, with 0.001 documented as a suggested starting point.

@sjug sjug requested a review from peteonrails as a code owner May 1, 2026 16:31
@peteonrails
Copy link
Copy Markdown
Owner

This is really good work that I may not get a chance to test before 0.7.0 drops, but I will try to get it in there.

@peteonrails peteonrails self-assigned this May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants