ENH - adding doc link to html repr of estimators#2036
Conversation
|
Coverage is failing because I'm not testing every line. I'll wait for someone to review the current version of the code because simplifying it may make it easier to test as well. |
|
Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine. One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components. |
I forgot about #1051 🙈
Yes, that's what I was not convinced by. I'll remove the duplication using a base class. |
…rub into enh-add-doc-link-to-estimator
jeromedockes
left a comment
There was a problem hiding this comment.
i like the approach of replacing BaseEstimator with a skrub base class in all MROs. we should make that explicit in the class docstring and use it everywhere we inherit from baseestimator and it makes sense ( i think there are a few more like DropSimilar, and the joiners if we care to update those)
| return self._wrapped_transformer.get_feature_names_out(input_features) | ||
|
|
||
| def _sk_visual_block_(self): | ||
| # This is needed because when ApplyToCols is used with a transformer like |
There was a problem hiding this comment.
i'm not sure i understood this comment, but also the scikit-learn diagram machinery is quite complicated so maybe it's not easy to explain in a short comment
There was a problem hiding this comment.
When I added this block I had the problem that for some reason the (?) signs did not appear properly for the TableVectorizer, but now I can't replicate the problem anymore so I think it can be removed
There was a problem hiding this comment.
I also tested sklearn 1.5, but it seems to work on the old version so not sure what happened
There was a problem hiding this comment.
Nevermind, I found the problem. Without that block, the docstring for the TableVectorizer itself is not added.
| return self._wrapped_transformer.get_feature_names_out(input_features) | ||
|
|
||
| def _sk_visual_block_(self): | ||
| # This is needed because cases like ApplyToCols(TableVectorizer()) |
There was a problem hiding this comment.
@glemaitre could you give a bit of context on why this is needed here?
|
|
||
|
|
||
| class ApplyToEachCol(BaseEstimator, TransformerMixin): | ||
| class ApplyToEachCol(SkrubBaseTransformer, TransformerMixin): |
There was a problem hiding this comment.
while we're at it let's put the mixin before the base class
Co-authored-by: Jérôme Dockès <jerome@dockes.org>
…rub into enh-add-doc-link-to-estimator
| data_op, data = get_data_op_and_data("simple") | ||
| split = data_op.skb.train_test_split(data) | ||
| learner = data_op.skb.make_learner().fit(split["train"]) |
There was a problem hiding this comment.
do we need it to be fitted? otherwise maybe we can slightly simplify & speedup
| data_op, data = get_data_op_and_data("simple") | |
| split = data_op.skb.train_test_split(data) | |
| learner = data_op.skb.make_learner().fit(split["train"]) | |
| learner = skrub.var('a').skb.make_learner() |
There was a problem hiding this comment.
whoops didn't see the comment, I'll add fix this when I'm cleaning up for the release


This PR improves the html repr of the skrub estimators so that now they show the (?) symbol that leads to the documentation page.
At the moment, this is what a user sees in a jupyter notebook:

Note that only the OneHotEncoder has a (?)
With this PR, it looks like this:

I updated:
I did not touch any of the joiners.
For the change itself I had to override the _doc_link_template. I also had to modify the VisualBlock in ApplyToCols because otherwise cases like
ApplyToCols(TableVectorizer())would not show the link for theTableVectorizer; I'm not entirely sure this is the best way of doing it, and maybe we could abstract more the way I added_doc_link_template.