include bin counts and edges in tablereport json output#2164
include bin counts and edges in tablereport json output#2164jeromedockes wants to merge 5 commits into
Conversation
|
tentatively adding to the 0.10 milestone because it is fairly simple and would be useful for skore but we can remove it if it ends up being a blocker |
rcap107
left a comment
There was a problem hiding this comment.
Looks good to me, though I'd like a couple more comments to make some code clearer.
Could you also extend the docs of the json() function to mention that it includes both the svg and the bins for the histograms? I think it's useful information because users may not expect that, or may need the data and not know where to get it from.
| col = sbd.to_float32(col) | ||
| values = sbd.to_numpy(col) | ||
| if sbd.is_any_date(col): | ||
| # numpy histogram does not handle datetimes but matplotlib does |
There was a problem hiding this comment.
Could you add a comment here explaining that this is converting dates to seconds since epoch? It's not clear from the code
| summary["value_is_constant"] = False | ||
| summary["quantiles"] = quantiles | ||
| if not with_plots: | ||
| summary["histogram_data"] = _plotting.histogram_data(column) |
There was a problem hiding this comment.
could you add a small comment here to explain what this is used for?
Now the json output of TableReport also contains the result of
np.histogramon the part of the column used for the histogram plots (non-null + remove outliers)it is the only piece of information needed to reconstruct everything that is displayed by the table report from the json output only