Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
How to tweak the Appearance of the |TableReport|
------------------------------------------------

The skrub global configuration includes various parameters that allow to tweak
The skrub global configuration includes various parameters that let you tweak
the HTML representation of the |TableReport|.

For performance reasons, the |TableReport| disables the computation of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ It is also possible to export the raw HTML, or a HTML fragment to embed in a pag
with :func:`~skrub.TableReport.html` and :func:`~skrub.TableReport.html_snippet`
respectively.

Finally, it is possible to export the data in JSON format, which allows structured
The report can be exported in JSON format, which allows structured
access to the data and statistics used to build the report with
:func:`~skrub.TableReport.json`.

Expand All @@ -46,3 +46,16 @@ disabled directly when generating the table report.

tr = TableReport(df, plot_distributions=False)
json_data = tr.json()

Finally, :func:`~skrub.TableReport.markdown` produces a shortened summary of the
report in Markdown format. This summary contains the measured statistics and the
associations (if measured): plots and table preview are skipped from this view.
This format can be shared easily in text form, or fed to an AI agent to obtain
insight about a given table.

.. warning::

No sanitization of the input data is performed, and the report includes raw data
(column names and cell values). Therefore, it should not be used on untrusted data,
or when the resulting summary may be too large as it could lead to security risks
or performance problems.
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
.. |TableReport| replace:: :class:`~skrub.TableReport`
.. |DropSimilar| replace:: :class:`~skrub.DropSimilar`
.. |column_associations| replace:: :func:`~skrub.column_associations`

.. _user_guide_table_report_associations:

How to find correlated columns in a datarame
How to find correlated columns in a dataframe
============================================

In addition to |TableReport|'s **Associations** tab, you can compute associations
Expand All @@ -13,11 +14,12 @@ associations.
Reported metrics include `Cramer’s V statistic <https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V>`_
and `Pearson’s Correlation Coefficient <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_.
The result is returned as a dataframe that contains the column name and idx for the
left and right table and both associations; results are sorted in descending order
left and the right table, and both associations; results are sorted in descending order
by Cramer’s V association.

This can be useful to have access to the information used in the |TableReport|
for later use (e.g., to select which columns to drop).
for later use (e.g., to select which columns to drop). These associations are
also used by the |DropSimilar| transformer to select which columns should be dropped.

.. code-block::

Expand Down
32 changes: 32 additions & 0 deletions doc/guides/table_report/04_custom_filters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
.. |TableReport| replace:: :class:`~skrub.TableReport`


How to define custom filters for the TableReport
================================================

It is possible to define custom filters for the |TableReport| using either column
names, or :ref:`skrub selectors <user_guide_selectors>`.

By defining a custom filter, it becomes easier to show and work directly on a given
subset of columns.

For example, we might want to select only the columns whose name follows a certain
pattern (here, starting with "metric"):

>>> import pandas as pd
>>> from skrub import TableReport
>>> from skrub import selectors as s
>>> df = pd.DataFrame(
... {"id": [1, 2, 3], "metric1": [1, 2, 3], "metric2": [4, 5, 6], "metric3": [7, 8, 9]}
... )

Custom filters should be defined as a dictionary where the key is the name of the
filter that should be displayed in the generated report, and the value is either
a list of columns, the indices of the columns (first column has index 0 etc.), or
a skrub selector, as shown in this example:

>>> filters = {"only_metrics": s.glob("metric*")}
>>> report = TableReport(df, column_filters=filters)

Custom filters are placed at the top of the list of filters, in the "Filter columns"
drop-down menu.
7 changes: 4 additions & 3 deletions doc/howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ the :ref:`API Reference <api_ref>`.
.. toctree::
:maxdepth: 2

guides/table_report/alter_appearance.rst
guides/table_report/exporting.rst
guides/table_report/finding_correlated_columns.rst
guides/table_report/01_alter_appearance.rst
guides/table_report/02_exporting.rst
guides/table_report/03_finding_correlated_columns.rst
guides/table_report/04_custom_filters.rst
guides/utilities/customizing_configuration.rst
guides/utilities/deduplicate_categorical_data.rst
guides/utilities/fetching_datasets.rst