Skip to content

ENH - adding doc link to html repr of estimators#2036

Merged
rcap107 merged 25 commits into
skrub-data:mainfrom
rcap107:enh-add-doc-link-to-estimator
Jun 16, 2026
Merged

ENH - adding doc link to html repr of estimators#2036
rcap107 merged 25 commits into
skrub-data:mainfrom
rcap107:enh-add-doc-link-to-estimator

Conversation

@rcap107

@rcap107 rcap107 commented Apr 21, 2026

Copy link
Copy Markdown
Member

This PR improves the html repr of the skrub estimators so that now they show the (?) symbol that leads to the documentation page.

At the moment, this is what a user sees in a jupyter notebook:
image
Note that only the OneHotEncoder has a (?)

With this PR, it looks like this:
image

I updated:

  • ApplyToCols
  • All the transformers that inherit from SingleColumnTransformers by adding the required methods to the base class
  • SquashingScaler
  • TableVectorizer and Cleaner
  • SelectCols and DropCols
  • SkrubLearner

I did not touch any of the joiners.

For the change itself I had to override the _doc_link_template. I also had to modify the VisualBlock in ApplyToCols because otherwise cases like ApplyToCols(TableVectorizer()) would not show the link for the TableVectorizer; I'm not entirely sure this is the best way of doing it, and maybe we could abstract more the way I added _doc_link_template.

@rcap107

rcap107 commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

Coverage is failing because I'm not testing every line. I'll wait for someone to review the current version of the code because simplifying it may make it easier to test as well.

@glemaitre

Copy link
Copy Markdown
Member

Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine.

One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components.

@rcap107

rcap107 commented Apr 27, 2026

Copy link
Copy Markdown
Member Author

Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine.

I forgot about #1051 🙈

One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components.

Yes, that's what I was not convinced by. I'll remove the duplication using a base class.

@rcap107 rcap107 linked an issue Apr 27, 2026 that may be closed by this pull request
@rcap107 rcap107 added this to the Release 0.10 milestone May 4, 2026
@rcap107 rcap107 requested a review from jeromedockes June 8, 2026 15:01

@jeromedockes jeromedockes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the approach of replacing BaseEstimator with a skrub base class in all MROs. we should make that explicit in the class docstring and use it everywhere we inherit from baseestimator and it makes sense ( i think there are a few more like DropSimilar, and the joiners if we care to update those)

Comment thread skrub/_base.py Outdated
Comment thread skrub/_data_ops/_estimator.py Outdated
Comment thread skrub/_apply_to_cols.py Outdated
return self._wrapped_transformer.get_feature_names_out(input_features)

def _sk_visual_block_(self):
# This is needed because when ApplyToCols is used with a transformer like

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i understood this comment, but also the scikit-learn diagram machinery is quite complicated so maybe it's not easy to explain in a short comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I added this block I had the problem that for some reason the (?) signs did not appear properly for the TableVectorizer, but now I can't replicate the problem anymore so I think it can be removed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested sklearn 1.5, but it seems to work on the old version so not sure what happened

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I found the problem. Without that block, the docstring for the TableVectorizer itself is not added.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this function, the transformer inside applytocols doesn't get the doc link

Image

with the change, it shows up properly

Image

@rcap107 rcap107 requested a review from jeromedockes June 16, 2026 09:47
Comment thread skrub/_apply_to_cols.py
return self._wrapped_transformer.get_feature_names_out(input_features)

def _sk_visual_block_(self):
# This is needed because cases like ApplyToCols(TableVectorizer())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre could you give a bit of context on why this is needed here?

Comment thread skrub/_apply_to_each_col.py Outdated


class ApplyToEachCol(BaseEstimator, TransformerMixin):
class ApplyToEachCol(SkrubBaseTransformer, TransformerMixin):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we're at it let's put the mixin before the base class

Comment thread skrub/_base.py Outdated

@jeromedockes jeromedockes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @rcap107 !

Comment on lines +1466 to +1468
data_op, data = get_data_op_and_data("simple")
split = data_op.skb.train_test_split(data)
learner = data_op.skb.make_learner().fit(split["train"])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it to be fitted? otherwise maybe we can slightly simplify & speedup

Suggested change
data_op, data = get_data_op_and_data("simple")
split = data_op.skb.train_test_split(data)
learner = data_op.skb.make_learner().fit(split["train"])
learner = skrub.var('a').skb.make_learner()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops didn't see the comment, I'll add fix this when I'm cleaning up for the release

@rcap107 rcap107 merged commit 97c39bf into skrub-data:main Jun 16, 2026
48 of 49 checks passed
@rcap107 rcap107 deleted the enh-add-doc-link-to-estimator branch June 16, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a SkrubEstimator to have docs in HTML display

3 participants