Fix meeting audio capture and source diarization by sjug · Pull Request #341 · peteonrails/voxtype

sjug · 2026-05-01T16:31:04Z

Description

This PR fixes several meeting-mode issues around mic capture, final chunk handling, voice activity detection tuning, and source-based diarization.

The changes are split into four commits:

Honor [meeting.audio].mic_device
Flush buffered meeting audio on stop
Make meeting VAD threshold configurable
Preserve simple diarizer segment boundaries

Honor `[meeting.audio].mic_device`

mic_device already existed in config and docs, but meeting mode always passed the main [audio] config into DualCapture. That meant a meeting-specific mic device override was ignored.

This now builds the meeting capture config at the call site:

"default" and "" both mean “use the normal dictation audio device”
explicit values override the mic used for meeting capture only
an INFO log is emitted only when the override is active, showing both the meeting mic and dictation mic

This keeps the normal/default path unchanged.

Flush buffered meeting audio on stop

DualCapture::stop() returns any samples that arrived since the last poll, but stop_meeting() previously discarded them. That could drop speech near the end of a meeting, especially for short recordings or when the user stops shortly after speaking.

This change:

keeps samples returned by capture.stop().await
appends them to the meeting buffers
flushes the final partial chunk before saving/stopping the meeting
extracts shared chunk processing into helpers so the periodic poll path and final flush path use the same logic

Make meeting VAD threshold configurable

Meeting chunks use a simple RMS gate before transcription. The existing default remains 0.01 for backwards compatibility, but quiet microphones can now tune it via:

[meeting.audio]
vad_threshold = 0.001

The threshold is wired through:

user config: MeetingAudioConfig
daemon meeting config construction
CLI meeting command config construction
internal MeetingConfig
ChunkConfig

This also improves diagnostic logging for skipped meeting chunks by including source, duration, RMS, and threshold at debug level. Transcription logs now include the audio source (Microphone or Loopback) so mic/loopback behavior is easier to debug.

Preserve simple diarizer segment boundaries

The simple diarizer assigns speakers based on audio source:

microphone -> You
loopback -> Remote

It previously merged adjacent same-speaker transcript segments. However, the caller applies diarization results back to transcript segments positionally with zip(). If the diarizer returned fewer segments than ASR produced, later transcript segments silently missed speaker_id.

This change removes merging from SimpleDiarizer, preserving a 1:1 mapping between transcript segments and diarized segments. As a result, every mic/loopback segment receives the expected speaker label.

Related Issue

N/A

Type of Change

Bug fix
New feature
Documentation update

Testing

I have tested these changes locally

Commands run:

cargo check
cargo test meeting::chunk::
cargo test meeting::diarization::simple::
cargo test config::tests::test_meeting
cargo test config::tests::test_parse_meeting_config_with_nested_sections
cargo test

Results:

cargo check passed.
Targeted meeting/config tests passed.
Full cargo test passed:
- 539 unit tests passed
- 25 integration tests passed
- doc tests passed

Additional non-mutating checks run:

cargo fmt --check
cargo clippy --all-targets --all-features

Results:

cargo fmt --check failed due existing unrelated formatting differences across the repo. This PR does not include those unrelated formatting changes.
cargo clippy --all-targets --all-features failed because --all-features enables Whisper ROCm/HIP and this machine does not have hipcc in PATH. It also reported two existing build.rs uninlined_format_args warnings before failing.

Manual validation:

Verified mic-only meeting recording produced transcript segments instead of zero segments.
Verified mixed mic + loopback meeting recording produced both You and Remote speakers.
Verified every exported transcript segment had a speaker label after the simple diarizer fix.

Documentation

I have updated documentation as needed

Updated docs:

docs/CONFIGURATION.md
docs/USER_MANUAL.md
docs/MEETING_MODE.md

Additional Notes

The default meeting VAD threshold remains 0.01 to preserve existing behavior. Users with quiet meeting microphones can lower [meeting.audio].vad_threshold, with 0.001 documented as a suggested starting point.

peteonrails · 2026-05-02T02:52:57Z

This is really good work that I may not get a chance to test before 0.7.0 drops, but I will try to get it in there.

sjug added 4 commits May 1, 2026 12:08

Honor meeting audio mic device

35ee0dc

Flush buffered meeting audio on stop

038a59a

Make meeting VAD threshold configurable

06d4e99

Preserve simple diarizer segment boundaries

f279d2b

sjug requested a review from peteonrails as a code owner May 1, 2026 16:31

peteonrails self-assigned this May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix meeting audio capture and source diarization#341

Fix meeting audio capture and source diarization#341
sjug wants to merge 4 commits intopeteonrails:mainfrom
sjug:fix/meeting-audio-diarization

sjug commented May 1, 2026

Uh oh!

peteonrails commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sjug commented May 1, 2026

Description

Honor [meeting.audio].mic_device

Flush buffered meeting audio on stop

Make meeting VAD threshold configurable

Preserve simple diarizer segment boundaries

Related Issue

Type of Change

Testing

Documentation

Additional Notes

Uh oh!

peteonrails commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Honor `[meeting.audio].mic_device`