Add per-segment transcript support for audio objects#1093
Conversation
Unit test report (Python 3.9.24, Pydantic 2.12.3)590 tests 590 ✅ 14s ⏱️ Results for commit 13b1f70. ♻️ This comment has been updated with latest results. |
Unit test report (Python 3.9.24, Pydantic 1.10.22)590 tests 590 ✅ 15s ⏱️ Results for commit 13b1f70. ♻️ This comment has been updated with latest results. |
Audio objects use a text attribute (display name containing "#transcript") whose content is split into per-segment actions with frame ranges, even though the ontology marks the attribute dynamic=false. The SDK previously routed every transcript action through set_answer_from_list, which overwrote the single static slot for the attribute — so get_answer(transcript_attr) returned whichever segment was processed last and there was no public way to read per-segment text with timestamps. This change: - Stashes raw transcript actions on the ObjectInstance instead of folding them into the static answer map, identified by the "#transcript" marker in the attribute name. - Adds ObjectInstance.get_transcripts(attribute=None) returning a sorted list of TranscriptSegment(range, text, feature_hash, attribute_name) records. - Makes get_answer(transcript_attr) reconstruct the joined transcript text from the stashed actions on demand, so the SDK does not depend on the backend pre-joining transcripts into object_answers.classifications. - Re-emits stashed transcript actions verbatim in _to_object_actions so a read → modify → save round-trip preserves per-segment data. - Adds set_answer_from_list silently skipping transcript entries as a defensive guard against any other parse path inadvertently clobbering the static map.
68a4136 to
09a535f
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces support for transcript-bearing text attributes, identified by a #transcript marker in the ontology. It adds the TranscriptSegment data class and updates ObjectInstance and OntologyLabels to handle transcript actions, allowing for the retrieval of both individual segments and joined transcript text. A review comment suggests refactoring get_answer to call get_transcripts to eliminate logic duplication and improve maintainability.
| segments = [ | ||
| s for s in self._iter_transcript_segments() if s.feature_hash == attribute.feature_node_hash | ||
| ] | ||
| if not segments: | ||
| return None | ||
| segments.sort(key=lambda s: s.range[0]) | ||
| return "\n".join(s.text for s in segments) |
There was a problem hiding this comment.
The logic for filtering, sorting, and joining transcript segments is duplicated between get_answer and get_transcripts. Refactoring get_answer to call get_transcripts would improve maintainability and ensure consistent behavior across the SDK.
segments = self.get_transcripts(cast(TextAttribute, attribute))
return "\n".join(s.text for s in segments) if segments else None
Summary
Fixes two production bugs in audio transcript (TLv3) handling that combine into silent data loss for any customer using the SDK on labels produced by the new dynamic transcript flow (ED-1556 / ED-2631, in prod since May 2026):
_to_object_actionsonly serialises dynamic-manager entries; transcript attributes aredynamic: falsein the ontology, so the manager has nothing for them. Result:actions: []on the way out, and the survivingclassificationsvalue is whatever single segment last clobbered_static_answer_map. The whole transcript except (a corrupted version of) the last word is silently deleted.obj.get_answer(transcript_attr)returns only the last segment's text — each action processed throughset_answer_from_listoverwrites the previous one in the static map.Also adds a per-segment read API that didn't previously exist.
What changed
_add_action_answerspartitions actions by transcript marker. Non-transcript actions go throughset_answer_from_listas before. Transcript actions are stashed verbatim onObjectInstance._transcript_actionsso they never touch the static answer map — fixes the read bug._to_object_actionsre-emits the stashed transcript actions verbatim alongside dynamic-manager actions, soread → modify → saveround-trips without losing per-segment data — fixes the destructive save.get_answer(transcript_attr)reads from_static_answer_map, which the parse path populates fromobject_answers.classifications. The backend (cord-backend #7961) writes the joined transcript into that mirror at export time, so the SDK doesn't need to re-join — legacy single-instance-per-word labels and new TLv3 labels both work via the same code path.get_transcripts(attribute=None)returns a sorted list ofTranscriptSegment(range, text, feature_hash, attribute_name)records derived from the stashed actions.A transcript attribute is identified by the
#transcriptmarker substring in its display name — same convention the backend uses (in prod since late 2024). The proper schema flag was previously considered as ED-745 and cancelled; reviving it is a separate, cross-stack lift.Dependency
get_answer(transcript_attr)returning the joined string for new-format labels depends on cord-backend #7961 populatingobject_answers.classificationsfrom the per-segment actions at export time.get_transcripts()and the destructive-save fix do not depend on that PR.Out of scope
set_transcript(range, text)write API — customers cannot author transcripts in Python yet.#transcriptmarker with an explicitattribute.is_transcript: boolschema field — would let customers and SDK code key off a clean flag instead of a name substring, but needs schema + backend + ontology backfill.