ENH - adding doc link to html repr of estimators#2036
Conversation
|
Coverage is failing because I'm not testing every line. I'll wait for someone to review the current version of the code because simplifying it may make it easier to test as well. |
|
Compare to my PR in #1051, here you are not backporting the feature. But I think it is fine. Basically, you would benefit from the feature if you have a recent enough scikit-learn version, otherwise you will not see the help which I think it is fine. One thing that I find strange is to have to redefine the property everywhere. I think that it would be best to define it in a mixin or base class that would be inherited from each skrub components. |
I forgot about #1051 🙈
Yes, that's what I was not convinced by. I'll remove the duplication using a base class. |
…rub into enh-add-doc-link-to-estimator
jeromedockes
left a comment
There was a problem hiding this comment.
i like the approach of replacing BaseEstimator with a skrub base class in all MROs. we should make that explicit in the class docstring and use it everywhere we inherit from baseestimator and it makes sense ( i think there are a few more like DropSimilar, and the joiners if we care to update those)
| from sklearn.base import BaseEstimator | ||
|
|
||
|
|
||
| class BaseTransformer(BaseEstimator): |
There was a problem hiding this comment.
Maybe we can call it SkrubBaseEstimator instead because the point is not transformer vs estimator, but that it should point to the skrub documentation.
also it applies to skrublearners which are not (always) transformers
it could also be SkrubEstimator but that is too similar to SkrubLearner
There was a problem hiding this comment.
also could you add a docstring to this class to say it is a base class for stuff shared by all estimators defined in skrub, which at the moment is only the documentation url
| """ | ||
| return describe_params(eval_choices(self.data_op), choice_graph(self.data_op)) | ||
|
|
||
| _doc_link_module = "skrub" |
There was a problem hiding this comment.
here also we could remove this and replace BaseEstimator by SkrubBaseEstimator as the base class right? also as the base class of _BaseParamSearch
| return self._wrapped_transformer.get_feature_names_out(input_features) | ||
|
|
||
| def _sk_visual_block_(self): | ||
| # This is needed because when ApplyToCols is used with a transformer like |
There was a problem hiding this comment.
i'm not sure i understood this comment, but also the scikit-learn diagram machinery is quite complicated so maybe it's not easy to explain in a short comment
This PR improves the html repr of the skrub estimators so that now they show the (?) symbol that leads to the documentation page.
At the moment, this is what a user sees in a jupyter notebook:

Note that only the OneHotEncoder has a (?)
With this PR, it looks like this:

I updated:
I did not touch any of the joiners.
For the change itself I had to override the _doc_link_template. I also had to modify the VisualBlock in ApplyToCols because otherwise cases like
ApplyToCols(TableVectorizer())would not show the link for theTableVectorizer; I'm not entirely sure this is the best way of doing it, and maybe we could abstract more the way I added_doc_link_template.