Skip to content

Add validator checks for data_gene_panel_matrix vs _sequenced case list#114

Open
rujuthaa wants to merge 1 commit into
cBioPortal:mainfrom
rujuthaa:codex/10842-gene-matrix-validator
Open

Add validator checks for data_gene_panel_matrix vs _sequenced case list#114
rujuthaa wants to merge 1 commit into
cBioPortal:mainfrom
rujuthaa:codex/10842-gene-matrix-validator

Conversation

@rujuthaa
Copy link
Copy Markdown

Summary

Implements additional data_gene_panel_matrix.txt validations from docs:

  • If mutation gene panel is specified (not NA), sample must be in _sequenced case list.
  • If a sample is in _sequenced, it must exist in the mutation column of gene panel matrix.
  • If a sample is in _sequenced, mutation gene panel cannot be NA.

Fixes #10842

Changes

  • Updated validator logic in scripts/importer/validateData.py.
  • Added/updated unit tests in tests/unit_tests_validate_data.py.
  • Added fixtures:
    • tests/test_data/data_gene_matrix_sequenced_na.txt
    • tests/test_data/data_gene_matrix_missing_sequenced_sample.txt
  • Updated existing test fixtures to keep system/unit tests consistent with new rule:
    • tests/test_data/data_gene_matrix_duplicate_sample.txt
    • tests/test_data/study_es_0/data_gene_panel_matrix.txt
    • tests/test_data/study_es_0/result_report.html

Validation

  • Ran full test suite:
    • ./test_scripts.sh
  • Result:
    • Ran 184 tests ... OK (skipped=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant