Fix meeting audio capture and source diarization#341
Open
sjug wants to merge 4 commits intopeteonrails:mainfrom
Open
Fix meeting audio capture and source diarization#341sjug wants to merge 4 commits intopeteonrails:mainfrom
sjug wants to merge 4 commits intopeteonrails:mainfrom
Conversation
Owner
|
This is really good work that I may not get a chance to test before 0.7.0 drops, but I will try to get it in there. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes several meeting-mode issues around mic capture, final chunk handling, voice activity detection tuning, and source-based diarization.
The changes are split into four commits:
[meeting.audio].mic_deviceHonor
[meeting.audio].mic_devicemic_devicealready existed in config and docs, but meeting mode always passed the main[audio]config intoDualCapture. That meant a meeting-specific mic device override was ignored.This now builds the meeting capture config at the call site:
"default"and""both mean “use the normal dictation audio device”This keeps the normal/default path unchanged.
Flush buffered meeting audio on stop
DualCapture::stop()returns any samples that arrived since the last poll, butstop_meeting()previously discarded them. That could drop speech near the end of a meeting, especially for short recordings or when the user stops shortly after speaking.This change:
capture.stop().awaitMake meeting VAD threshold configurable
Meeting chunks use a simple RMS gate before transcription. The existing default remains
0.01for backwards compatibility, but quiet microphones can now tune it via:The threshold is wired through:
MeetingAudioConfigMeetingConfigChunkConfigThis also improves diagnostic logging for skipped meeting chunks by including source, duration, RMS, and threshold at debug level. Transcription logs now include the audio source (
MicrophoneorLoopback) so mic/loopback behavior is easier to debug.Preserve simple diarizer segment boundaries
The simple diarizer assigns speakers based on audio source:
YouRemoteIt previously merged adjacent same-speaker transcript segments. However, the caller applies diarization results back to transcript segments positionally with
zip(). If the diarizer returned fewer segments than ASR produced, later transcript segments silently missedspeaker_id.This change removes merging from
SimpleDiarizer, preserving a 1:1 mapping between transcript segments and diarized segments. As a result, every mic/loopback segment receives the expected speaker label.Related Issue
N/A
Type of Change
Testing
Commands run:
Results:
cargo checkpassed.cargo testpassed:Additional non-mutating checks run:
Results:
cargo fmt --checkfailed due existing unrelated formatting differences across the repo. This PR does not include those unrelated formatting changes.cargo clippy --all-targets --all-featuresfailed because--all-featuresenables Whisper ROCm/HIP and this machine does not havehipccin PATH. It also reported two existingbuild.rsuninlined_format_argswarnings before failing.Manual validation:
YouandRemotespeakers.Documentation
Updated docs:
docs/CONFIGURATION.mddocs/USER_MANUAL.mddocs/MEETING_MODE.mdAdditional Notes
The default meeting VAD threshold remains
0.01to preserve existing behavior. Users with quiet meeting microphones can lower[meeting.audio].vad_threshold, with0.001documented as a suggested starting point.