Skip to content

ENH - adding doc link to html repr of estimators#2036

Open
rcap107 wants to merge 17 commits into
skrub-data:mainfrom
rcap107:enh-add-doc-link-to-estimator
Open

ENH - adding doc link to html repr of estimators#2036
rcap107 wants to merge 17 commits into
skrub-data:mainfrom
rcap107:enh-add-doc-link-to-estimator

Conversation

@rcap107

@rcap107 rcap107 commented Apr 21, 2026

Copy link
Copy Markdown
Member

This PR improves the html repr of the skrub estimators so that now they show the (?) symbol that leads to the documentation page.

At the moment, this is what a user sees in a jupyter notebook:
image
Note that only the OneHotEncoder has a (?)

With this PR, it looks like this:
image

I updated:

  • ApplyToCols
  • All the transformers that inherit from SingleColumnTransformers by adding the required methods to the base class
  • SquashingScaler
  • TableVectorizer and Cleaner
  • SelectCols and DropCols
  • SkrubLearner

I did not touch any of the joiners.

For the change itself I had to override the _doc_link_template. I also had to modify the VisualBlock in ApplyToCols because otherwise cases like ApplyToCols(TableVectorizer()) would not show the link for the TableVectorizer; I'm not entirely sure this is the best way of doing it, and maybe we could abstract more the way I added _doc_link_template.

@rcap107

rcap107 commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

Coverage is failing because I'm not testing every line. I'll wait for someone to review the current version of the code because simplifying it may make it easier to test as well.

@glemaitre

Copy link
Copy Markdown
Member

Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine.

One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components.

@rcap107

rcap107 commented Apr 27, 2026

Copy link
Copy Markdown
Member Author

Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine.

I forgot about #1051 🙈

One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components.

Yes, that's what I was not convinced by. I'll remove the duplication using a base class.

@rcap107 rcap107 linked an issue Apr 27, 2026 that may be closed by this pull request
@rcap107 rcap107 added this to the Release 0.10 milestone May 4, 2026
@rcap107 rcap107 requested a review from jeromedockes June 8, 2026 15:01

@jeromedockes jeromedockes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the approach of replacing BaseEstimator with a skrub base class in all MROs. we should make that explicit in the class docstring and use it everywhere we inherit from baseestimator and it makes sense ( i think there are a few more like DropSimilar, and the joiners if we care to update those)

Comment thread skrub/_base.py
from sklearn.base import BaseEstimator


class BaseTransformer(BaseEstimator):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can call it SkrubBaseEstimator instead because the point is not transformer vs estimator, but that it should point to the skrub documentation.
also it applies to skrublearners which are not (always) transformers

it could also be SkrubEstimator but that is too similar to SkrubLearner

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also could you add a docstring to this class to say it is a base class for stuff shared by all estimators defined in skrub, which at the moment is only the documentation url

"""
return describe_params(eval_choices(self.data_op), choice_graph(self.data_op))

_doc_link_module = "skrub"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also we could remove this and replace BaseEstimator by SkrubBaseEstimator as the base class right? also as the base class of _BaseParamSearch

Comment thread skrub/_apply_to_cols.py
return self._wrapped_transformer.get_feature_names_out(input_features)

def _sk_visual_block_(self):
# This is needed because when ApplyToCols is used with a transformer like

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure i understood this comment, but also the scikit-learn diagram machinery is quite complicated so maybe it's not easy to explain in a short comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a SkrubEstimator to have docs in HTML display

3 participants