Add per-segment transcript support for audio objects by encordrob · Pull Request #1093 · encord-team/encord-client-python

encordrob · 2026-05-25T10:20:10Z

Summary

Fixes two production bugs in audio transcript (TLv3) handling that combine into silent data loss for any customer using the SDK on labels produced by the new dynamic transcript flow (ED-1556 / ED-2631, in prod since May 2026):

Destructive save. On master, reading a TLv3 transcript label and writing it back wipes every per-segment action. _to_object_actions only serialises dynamic-manager entries; transcript attributes are dynamic: false in the ontology, so the manager has nothing for them. Result: actions: [] on the way out, and the surviving classifications value is whatever single segment last clobbered _static_answer_map. The whole transcript except (a corrupted version of) the last word is silently deleted.
Read returns the wrong text. obj.get_answer(transcript_attr) returns only the last segment's text — each action processed through set_answer_from_list overwrites the previous one in the static map.

Also adds a per-segment read API that didn't previously exist.

What changed

Parse: _add_action_answers partitions actions by transcript marker. Non-transcript actions go through set_answer_from_list as before. Transcript actions are stashed verbatim on ObjectInstance._transcript_actions so they never touch the static answer map — fixes the read bug.
Serialise: _to_object_actions re-emits the stashed transcript actions verbatim alongside dynamic-manager actions, so read → modify → save round-trips without losing per-segment data — fixes the destructive save.
Read joined text: get_answer(transcript_attr) reads from _static_answer_map, which the parse path populates from object_answers.classifications. The backend (cord-backend #7961) writes the joined transcript into that mirror at export time, so the SDK doesn't need to re-join — legacy single-instance-per-word labels and new TLv3 labels both work via the same code path.
Read per-segment: new get_transcripts(attribute=None) returns a sorted list of TranscriptSegment(range, text, feature_hash, attribute_name) records derived from the stashed actions.

A transcript attribute is identified by the #transcript marker substring in its display name — same convention the backend uses (in prod since late 2024). The proper schema flag was previously considered as ED-745 and cancelled; reviving it is a separate, cross-stack lift.

Dependency

get_answer(transcript_attr) returning the joined string for new-format labels depends on cord-backend #7961 populating object_answers.classifications from the per-segment actions at export time. get_transcripts() and the destructive-save fix do not depend on that PR.

Out of scope

set_transcript(range, text) write API — customers cannot author transcripts in Python yet.
Replacing the #transcript marker with an explicit attribute.is_transcript: bool schema field — would let customers and SDK code key off a clean flag instead of a name substring, but needs schema + backend + ontology backfill.

github-actions · 2026-05-25T10:21:00Z

Unit test report (Python 3.9.24, Pydantic 2.12.3)

590 tests 590 ✅ 14s ⏱️
1 suites 0 💤
1 files 0 ❌

Results for commit 13b1f70.

♻️ This comment has been updated with latest results.

github-actions · 2026-05-25T10:21:01Z

Unit test report (Python 3.9.24, Pydantic 1.10.22)

590 tests 590 ✅ 15s ⏱️
1 suites 0 💤
1 files 0 ❌

Results for commit 13b1f70.

♻️ This comment has been updated with latest results.

Audio objects use a text attribute (display name containing "#transcript") whose content is split into per-segment actions with frame ranges, even though the ontology marks the attribute dynamic=false. The SDK previously routed every transcript action through set_answer_from_list, which overwrote the single static slot for the attribute — so get_answer(transcript_attr) returned whichever segment was processed last and there was no public way to read per-segment text with timestamps. This change: - Stashes raw transcript actions on the ObjectInstance instead of folding them into the static answer map, identified by the "#transcript" marker in the attribute name. - Adds ObjectInstance.get_transcripts(attribute=None) returning a sorted list of TranscriptSegment(range, text, feature_hash, attribute_name) records. - Makes get_answer(transcript_attr) reconstruct the joined transcript text from the stashed actions on demand, so the SDK does not depend on the backend pre-joining transcripts into object_answers.classifications. - Re-emits stashed transcript actions verbatim in _to_object_actions so a read → modify → save round-trip preserves per-segment data. - Adds set_answer_from_list silently skipping transcript entries as a defensive guard against any other parse path inadvertently clobbering the static map.

gemini-code-assist

Code Review

This pull request introduces support for transcript-bearing text attributes, identified by a #transcript marker in the ontology. It adds the TranscriptSegment data class and updates ObjectInstance and OntologyLabels to handle transcript actions, allowing for the retrieval of both individual segments and joined transcript text. A review comment suggests refactoring get_answer to call get_transcripts to eliminate logic duplication and improve maintainability.

gemini-code-assist · 2026-05-25T10:24:27Z

+            segments = [
+                s for s in self._iter_transcript_segments() if s.feature_hash == attribute.feature_node_hash
+            ]
+            if not segments:
+                return None
+            segments.sort(key=lambda s: s.range[0])
+            return "\n".join(s.text for s in segments)


The logic for filtering, sorting, and joining transcript segments is duplicated between get_answer and get_transcripts. Refactoring get_answer to call get_transcripts would improve maintainability and ensure consistent behavior across the SDK.

segments = self.get_transcripts(cast(TextAttribute, attribute)) return "\n".join(s.text for s in segments) if segments else None

encordrob force-pushed the encordrob/review-critique-first branch from 68a4136 to 09a535f Compare May 25, 2026 10:23

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Fix lint and format on transcript changes

13b1f70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-segment transcript support for audio objects#1093

Add per-segment transcript support for audio objects#1093
encordrob wants to merge 2 commits into
masterfrom
encordrob/review-critique-first

encordrob commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

encordrob commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Dependency

Out of scope

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit test report (Python 3.9.24, Pydantic 2.12.3)

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit test report (Python 3.9.24, Pydantic 1.10.22)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

encordrob commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading