Skip to content

Dataops user guide intro#2162

Open
jeromedockes wants to merge 9 commits into
skrub-data:mainfrom
jeromedockes:dataops-user-guide
Open

Dataops user guide intro#2162
jeromedockes wants to merge 9 commits into
skrub-data:mainfrom
jeromedockes:dataops-user-guide

Conversation

@jeromedockes

Copy link
Copy Markdown
Member

rewording a bit the intro section. @rcap107

@jeromedockes jeromedockes added documentation Add or improve the documentation data_ops Something related to the skrub DataOps labels Jun 12, 2026
@rcap107 rcap107 added this to the Release 0.10 milestone Jun 12, 2026
Comment thread doc/data_ops.rst
to help predict the product's category? What learning rate to set on a
:class:`~sklearn.ensemble.HistGradientBoostingRegressor`?

**Validation**  Finally, the quality of predictions must be evaluated on

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the section on leakage should be moved further up.

I also think there should be a mention of leakage at the very start, because it's really important and it may come a bit late (even though it's not that far down the page)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I added a (bold font) mention of data leakage at the very start. for the paragraphs that follow I think the chronological order of when you meet problems is roughly this one (building a pipeline at all, making modelling choices, validation) but that is indeed debatable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data_ops Something related to the skrub DataOps documentation Add or improve the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants