Skip to content

Add per-segment transcript support for audio objects#1093

Draft
encordrob wants to merge 2 commits into
masterfrom
encordrob/review-critique-first
Draft

Add per-segment transcript support for audio objects#1093
encordrob wants to merge 2 commits into
masterfrom
encordrob/review-critique-first

Conversation

@encordrob

@encordrob encordrob commented May 25, 2026

Copy link
Copy Markdown

Summary

Fixes two production bugs in audio transcript (TLv3) handling that combine into silent data loss for any customer using the SDK on labels produced by the new dynamic transcript flow (ED-1556 / ED-2631, in prod since May 2026):

  1. Destructive save. On master, reading a TLv3 transcript label and writing it back wipes every per-segment action. _to_object_actions only serialises dynamic-manager entries; transcript attributes are dynamic: false in the ontology, so the manager has nothing for them. Result: actions: [] on the way out, and the surviving classifications value is whatever single segment last clobbered _static_answer_map. The whole transcript except (a corrupted version of) the last word is silently deleted.
  2. Read returns the wrong text. obj.get_answer(transcript_attr) returns only the last segment's text — each action processed through set_answer_from_list overwrites the previous one in the static map.

Also adds a per-segment read API that didn't previously exist.

What changed

  • Parse: _add_action_answers partitions actions by transcript marker. Non-transcript actions go through set_answer_from_list as before. Transcript actions are stashed verbatim on ObjectInstance._transcript_actions so they never touch the static answer map — fixes the read bug.
  • Serialise: _to_object_actions re-emits the stashed transcript actions verbatim alongside dynamic-manager actions, so read → modify → save round-trips without losing per-segment data — fixes the destructive save.
  • Read joined text: get_answer(transcript_attr) reads from _static_answer_map, which the parse path populates from object_answers.classifications. The backend (cord-backend #7961) writes the joined transcript into that mirror at export time, so the SDK doesn't need to re-join — legacy single-instance-per-word labels and new TLv3 labels both work via the same code path.
  • Read per-segment: new get_transcripts(attribute=None) returns a sorted list of TranscriptSegment(range, text, feature_hash, attribute_name) records derived from the stashed actions.

A transcript attribute is identified by the #transcript marker substring in its display name — same convention the backend uses (in prod since late 2024). The proper schema flag was previously considered as ED-745 and cancelled; reviving it is a separate, cross-stack lift.

Dependency

get_answer(transcript_attr) returning the joined string for new-format labels depends on cord-backend #7961 populating object_answers.classifications from the per-segment actions at export time. get_transcripts() and the destructive-save fix do not depend on that PR.

Out of scope

  • set_transcript(range, text) write API — customers cannot author transcripts in Python yet.
  • Replacing the #transcript marker with an explicit attribute.is_transcript: bool schema field — would let customers and SDK code key off a clean flag instead of a name substring, but needs schema + backend + ontology backfill.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Unit test report (Python 3.9.24, Pydantic 2.12.3)

590 tests   590 ✅  14s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 13b1f70.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Unit test report (Python 3.9.24, Pydantic 1.10.22)

590 tests   590 ✅  15s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit 13b1f70.

♻️ This comment has been updated with latest results.

Audio objects use a text attribute (display name containing "#transcript") whose
content is split into per-segment actions with frame ranges, even though the
ontology marks the attribute dynamic=false. The SDK previously routed every
transcript action through set_answer_from_list, which overwrote the single static
slot for the attribute — so get_answer(transcript_attr) returned whichever
segment was processed last and there was no public way to read per-segment text
with timestamps.

This change:

- Stashes raw transcript actions on the ObjectInstance instead of folding them
  into the static answer map, identified by the "#transcript" marker in the
  attribute name.
- Adds ObjectInstance.get_transcripts(attribute=None) returning a sorted list of
  TranscriptSegment(range, text, feature_hash, attribute_name) records.
- Makes get_answer(transcript_attr) reconstruct the joined transcript text from
  the stashed actions on demand, so the SDK does not depend on the backend
  pre-joining transcripts into object_answers.classifications.
- Re-emits stashed transcript actions verbatim in _to_object_actions so a read →
  modify → save round-trip preserves per-segment data.
- Adds set_answer_from_list silently skipping transcript entries as a defensive
  guard against any other parse path inadvertently clobbering the static map.
@encordrob encordrob force-pushed the encordrob/review-critique-first branch from 68a4136 to 09a535f Compare May 25, 2026 10:23

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for transcript-bearing text attributes, identified by a #transcript marker in the ontology. It adds the TranscriptSegment data class and updates ObjectInstance and OntologyLabels to handle transcript actions, allowing for the retrieval of both individual segments and joined transcript text. A review comment suggests refactoring get_answer to call get_transcripts to eliminate logic duplication and improve maintainability.

Comment on lines +261 to +267
segments = [
s for s in self._iter_transcript_segments() if s.feature_hash == attribute.feature_node_hash
]
if not segments:
return None
segments.sort(key=lambda s: s.range[0])
return "\n".join(s.text for s in segments)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for filtering, sorting, and joining transcript segments is duplicated between get_answer and get_transcripts. Refactoring get_answer to call get_transcripts would improve maintainability and ensure consistent behavior across the SDK.

            segments = self.get_transcripts(cast(TextAttribute, attribute))
            return "\n".join(s.text for s in segments) if segments else None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant