Fixes #15274: Protobuf schema parsing when message name differs from topic#27866
Fixes #15274: Protobuf schema parsing when message name differs from topic#27866jatinmasaram wants to merge 12 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
1 similar comment
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi, could a maintainer please add the 'safe to test' label so CI can run? |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🟡 Playwright Results — all passed (16 flaky)✅ 3983 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped
🟡 16 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
…n name differs from topic - Add _resolve_message_class() with two-step fallback for issue open-metadata#15274 - Fix type errors: use DataType/DataTypeTopic enum members instead of strings - Fix list invariance in get_protobuf_fields via list() cast - Remove invalid FieldName import, use plain string for FieldModel.name - Split get_protobuf_fields into typed helpers _get_column_fields/_get_field_models - Fix worker_id fixture to work with and without pytest-xdist - Rename ProtobufParserTests to TestProtobufParser for pytest discovery
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
|
The failing Playwright tests (e.g., GlossaryImportExport, LineageFilters, LineageRightPanel) appear unrelated to the changes in this PR. This PR only modifies the Protobuf parser in the ingestion layer, while the failures are occurring in UI E2E flows involving glossary import and lineage entity operations (HTTP 409 conflicts). The protobuf parsing logic and associated unit tests are passing locally and are isolated from these UI workflows. Requesting maintainer review to confirm whether these failures are pre-existing or can be overridden. |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review ✅ Approved 9 resolved / 9 findingsRefactors Protobuf schema parsing to dynamically resolve message names via descriptor introspection, ensuring compatibility when names differ from topics. Addresses multiple issues regarding test path manipulation, type handling, and ✅ 9 resolved✅ Edge Case: Returning [] instead of None changes children semantics
✅ Quality: Trailing blank lines left from removed code in tests
✅ Bug: UnboundLocalError when exception path is taken in parse_protobuf_schema
✅ Bug:
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
looks like the shard-1 failure happened during the environment setup and datamodel generation, which is before the tests even started. My changes are strictly limited to the Protobuf parser, so I don't think they could have touched the build pipeline. Mind giving it a re-run? Pretty sure it’s just a transient CI hiccup. |
|
The failure in test_lineage.py seems to be a pyodbc.DataError related to SQL Server datetime handling. since this PR is strictly focused on the protobuf parser and its unit tests, I don't see how it could impact the SQL Server lineage logic. Is this a known flaky test or a pre existing issue? Would appreciate a re run or some guidance on how to proceed |
|
|
@open-metadata/ingestion Hi maintainers , I’ve reviewed the remaining CI failures and they appear outside the scope of this PR, which is limited to Protobuf schema parsing (issue #15274). SonarCloud (S5443)
Requesting review of this hotspot to confirm it can be marked as safe. MSSQL lineage test All protobuf-related tests are passing, with coverage ~69% (≥20% required). Happy to investigate further or open a separate issue/PR for the MSSQL test if needed. |


Describe your changes:
Fixes #15274
The Kafka Protobuf ingestion currently assumes that the top-level message name is derived from the topic name using PascalCase (e.g.
loans→Loans). This fails when the actual Protobuf message name differs, leading togetattrreturning None and downstreamNoneType.DESCRIPTORerrors.This PR updates the message resolution logic to:
pb2_module.DESCRIPTOR.message_types_by_nameto dynamically discover declared top-level messagesThis ensures schema parsing works even when naming conventions differ, without breaking existing behavior.
Testing:
Added unit tests covering:
All tests pass locally
Type of change:
Checklist:
Fixes #15274: Protobuf schema parsing fails when message name differs from topicSummary by Gitar
protobuf_parser.pylogic._resolve_message_classto streamline conditional logic while maintaining existing name-matching precedence._get_column_fieldsand_get_field_models.This will update automatically on new commits.