Skip to content

Fix dual-source timestamp inflation in meeting mode#330

Open
materemias wants to merge 2 commits intomainfrom
fix/dual-source-timestamp-offset
Open

Fix dual-source timestamp inflation in meeting mode#330
materemias wants to merge 2 commits intomainfrom
fix/dual-source-timestamp-offset

Conversation

@materemias
Copy link
Copy Markdown
Collaborator

Summary

  • Per-chunk start offset was derived from transcript.duration_ms(), which returns the max end_ms across all segments — conflating mic and loopback on dual-track captures. Every new chunk got pushed past the other source's end, roughly doubling apparent meeting length (observed: 80-minute meeting exporting as 2h17m).
  • Track the offset per AudioSource so mic and loopback each advance on their own wall-clock timeline, by the actual audio duration consumed each chunk.
  • Pad loopback chunks in daemon.rs to the mic chunk length with silence before dispatch — keeps both offsets advancing in lockstep if the loopback monitor lags at startup or briefly gaps. VAD still skips transcription on silent frames.

Why

Repro: run voxtype meeting export latest --format markdown --timestamps on any meeting where both sides speak. Real-time captures with ~1h20m of audio were exporting transcript timestamps up to [02:17:29].

Test plan

  • cargo build --lib
  • cargo test --lib meeting:: — 175 passing
  • cargo clippy --lib --no-deps — no new warnings
  • Record a dual-source meeting, export, verify timestamps track real wall clock

Co-authored-by: Codex noreply@openai.com

The per-chunk start offset was derived from transcript.duration_ms(),
which returns the max end_ms across all segments — conflating mic
and loopback. On dual-track captures, each new loopback chunk got
pushed past the last mic segment and vice versa, roughly doubling
apparent elapsed time (e.g. an 80-minute meeting exporting as 2:17).

Track the offset per AudioSource so mic and loopback each maintain
their own wall-clock-aligned timeline, advancing by the actual audio
duration consumed each chunk.

Pad loopback chunks in the daemon to the mic chunk length with
silence before dispatch. If the loopback monitor lags at startup or
briefly gaps, the padding keeps both source offsets advancing in
lockstep with wall-clock time; VAD still skips transcription on
silent frames.

Co-authored-by: Codex <noreply@openai.com>
Copilot AI review requested due to automatic review settings April 23, 2026 10:32
@materemias materemias requested a review from peteonrails as a code owner April 23, 2026 10:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes meeting-mode timestamp inflation on dual-source (mic + loopback) captures by tracking per-source time offsets instead of deriving per-chunk offsets from the overall transcript duration.

Changes:

  • Track and advance a per-AudioSource cumulative offset in MeetingDaemon when creating new chunk buffers.
  • Clear per-source offsets on meeting stop to avoid cross-meeting contamination.
  • In the daemon meeting loop, drain loopback alongside mic and (currently) pad loopback to mic chunk size, then always dispatch loopback chunks to keep offsets advancing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/meeting/mod.rs Introduces per-source offset tracking and uses it to compute chunk start offsets; advances offsets by processed audio duration.
src/daemon.rs Adjusts meeting chunk dispatch to pad loopback chunks to mic length and always process loopback chunks to keep offsets aligned.

Comment thread src/meeting/mod.rs Outdated
Comment on lines +318 to +332
@@ -319,6 +325,12 @@ impl MeetingDaemon {
.process_chunk(buffer)
.map_err(crate::error::VoxtypeError::Transcribe)?;

// Advance the per-source offset by the actual audio duration consumed,
// regardless of whether VAD found speech in this chunk.
if let Some(offset) = self.source_offsets.get_mut(&source) {
*offset += result.audio_duration_ms;
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-source offset is only advanced after process_chunk succeeds. If transcription/VAD processing returns an error, the caller (daemon.rs) has already drained and discarded the audio, but source_offsets won’t move forward—subsequent segments will reuse the same start_offset_ms and produce overlapping/incorrect timestamps. Consider advancing the offset based on the input samples length (and configured sample_rate) even on error paths, or computing audio_duration_ms upfront and updating source_offsets in a finally-style block before returning the error.

Copilot uses AI. Check for mistakes.
Comment thread src/meeting/mod.rs Outdated
Comment on lines +314 to +332
@@ -319,6 +325,12 @@ impl MeetingDaemon {
.process_chunk(buffer)
.map_err(crate::error::VoxtypeError::Transcribe)?;

// Advance the per-source offset by the actual audio duration consumed,
// regardless of whether VAD found speech in this chunk.
if let Some(offset) = self.source_offsets.get_mut(&source) {
*offset += result.audio_duration_ms;
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New per-source offset behavior is central to meeting timestamp correctness, but there’s no unit test exercising process_chunk_with_source with interleaved sources to ensure offsets don’t inflate (and that silent/VAD-skipped chunks still advance time). Since this file already has tests, please add a focused test using a lightweight in-module Transcriber stub and fixed-size sample buffers to assert that: (1) mic and loopback maintain independent offsets, and (2) timestamps advance by chunk audio duration rather than transcript.duration_ms().

Copilot uses AI. Check for mistakes.
Comment thread src/daemon.rs Outdated
Comment on lines +2691 to +2700
// Also drain loopback buffer up to the same amount.
// Pad with silence so it matches the mic chunk length —
// keeps the per-source timestamp offsets advancing in
// lockstep with wall-clock even if the loopback monitor
// has a short startup lag or a transient gap.
let loopback_len = self.meeting_loopback_buffer.len().min(chunk_samples);
let loopback_chunk: Vec<f32> = self.meeting_loopback_buffer.drain(..loopback_len).collect();
let mut loopback_chunk: Vec<f32> = self.meeting_loopback_buffer.drain(..loopback_len).collect();
if loopback_chunk.len() < chunk_samples {
loopback_chunk.resize(chunk_samples, 0.0);
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Padding loopback_chunk up to chunk_samples can cause the current VAD implementation to miss real loopback speech when the loopback buffer is lagging. VoiceActivityDetector::contains_speech requires >10% of frames to be above threshold; if you pad a short (e.g., 1–2s) loopback slice to a 30s chunk, the speech-frame ratio is diluted and the entire chunk is treated as “no speech”, dropping that audio from transcription. A safer approach is to run VAD/transcription on the actual loopback samples and advance the loopback offset separately (e.g., by adding an explicit “advance offset by N ms” path when loopback is short/empty), rather than padding before VAD.

Copilot uses AI. Check for mistakes.
Copilot review on #330 flagged two real issues with the initial fix:

1. Padding the loopback chunk to mic length with silence diluted the
   VAD speech-frame ratio, risking dropping real remote audio when the
   loopback monitor was lagging — exactly the case the padding was
   meant to help.
2. The per-source offset was only advanced after process_chunk
   succeeded, so on a transcription/VAD error the caller had already
   drained the samples but the offset stayed put — subsequent chunks
   would reuse the same start_offset_ms and overlap.

Changes:
- Advance source_offsets up front from the input sample count, before
  any fallible processing, so offsets track wall-clock even on errors.
- Drop the silence padding in the meeting chunk dispatch and restore
  the original skip-when-empty guard.
- Add MeetingDaemon::sync_source_offsets() that levels all sources to
  the furthest-advanced one, and call it after each mic+loopback
  iteration so a short or skipped loopback chunk catches up before
  the next one is dispatched.

Co-authored-by: Copilot <copilot-pull-request-reviewer[bot]@users.noreply.github.com>
@materemias
Copy link
Copy Markdown
Collaborator Author

Addressed Copilot review in 431a5f5.

Fixed (P1)src/daemon.rs padding dropped. Padding a short loopback slice (e.g. 2s) to 30s of silence would dilute the VAD speech-frame ratio below the 10% threshold and cause real remote audio to be discarded. Now we drain only actual samples and reconcile timestamps separately.

Fixed (P2)src/meeting/mod.rs offsets are now advanced up front, from the input sample count, before any fallible processing. If transcription errors, the caller has still drained the samples, and the offset now correctly reflects the elapsed time.

AddedMeetingDaemon::sync_source_offsets() levels all per-source offsets to the furthest-advanced one. Called after each mic+loopback iteration in the daemon loop, so a short or skipped loopback chunk catches up to wall-clock before the next chunk dispatches. Replaces the padding approach while solving the same "loopback monitor lags at startup" edge case.

Not addressed — the unit-test suggestion. Adding a focused test for this path would require building a mock Transcriber harness just for one test; the cost/value didn't pencil out given the existing integration coverage. Happy to add one if you'd prefer.

@peteonrails peteonrails self-assigned this Apr 24, 2026
@peteonrails
Copy link
Copy Markdown
Owner

This initially looks great. I need a few days to give it a closer look and test. THANK YOU

@peteonrails peteonrails added this to the 0.6.7 milestone Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants