fix: set columns and dtype explicitly in dataset readers by ShresthSamyak · Pull Request #1706 · deeppavlov/DeepPavlov

ShresthSamyak · 2026-05-07T16:05:23Z

Changes

`docred_reader.py`: replaced generic `d1, d2, d3, d4` column names with `rel_id`, `train`, `valid`, `test`
`basic_classification_reader.py`: added `dtype` to the passthrough keys for both `csv` and `json` reads so callers can explicitly
control column types

Code smell addressed

Columns and DataType Not Explicitly Set — Zhang et al., CAIN 2022

- docred_reader: replace generic d1-d4 column names with rel_id, train, valid, test - basic_classification_reader: add dtype to passthrough keys for csv and json reads Fixes deeppavlov#1654

Copilot

Pull request overview

This PR addresses issue #1654 by making dataset reader outputs more explicit and controllable, improving downstream schema readability and reducing silent type inference surprises.

Changes:

Updated DocREDDatasetReader.print_statistics() to use meaningful column names (rel_id, train, valid, test) instead of generic d1..d4.
Extended BasicClassificationDatasetReader to pass through dtype to pandas.read_csv() and pandas.read_json() so callers can explicitly control column types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`deeppavlov/dataset_readers/docred_reader.py`	Renames statistics DataFrame columns to explicit, semantically meaningful names for clearer logging/output.
`deeppavlov/dataset_readers/basic_classification_reader.py`	Adds `dtype` passthrough to pandas readers to allow explicit column type control.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

                if format == 'csv':
-                    keys = ('sep', 'header', 'names')
+                    keys = ('sep', 'header', 'names', 'dtype')
                    options = {k: kwargs[k] for k in keys if k in kwargs}
                    df = pd.read_csv(file, **options)
                elif format == 'json':
-                    keys = ('orient', 'lines')
+                    keys = ('orient', 'lines', 'dtype')
                    options = {k: kwargs[k] for k in keys if k in kwargs}
                    df = pd.read_json(file, **options)


fix: set columns and dtype explicitly in dataset readers

62e61cf

- docred_reader: replace generic d1-d4 column names with rel_id, train, valid, test - basic_classification_reader: add dtype to passthrough keys for csv and json reads Fixes deeppavlov#1654

Copilot AI review requested due to automatic review settings May 7, 2026 16:05

Copilot started reviewing on behalf of ShresthSamyak May 7, 2026 16:06 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: set columns and dtype explicitly in dataset readers#1706

fix: set columns and dtype explicitly in dataset readers#1706
ShresthSamyak wants to merge 1 commit into
deeppavlov:masterfrom
ShresthSamyak:fix/explicit-columns-dtype

ShresthSamyak commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShresthSamyak commented May 7, 2026

Changes

Code smell addressed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants