Skrub Add Estimated Memory Usage to TableReport#2153
Open
salam-alkaissi wants to merge 14 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds estimated dataframe memory usage to TableReport outputs (HTML + Markdown) and verifies the new content via tests.
Changes:
- Compute and expose
memory_usage_kb(and an “estimate unreliable” flag) in dataframe summaries. - Render memory usage in Markdown and HTML report templates.
- Extend tests and update changelog entries (plus a new
notesfile).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| skrub/_reporting/_summarize.py | Adds memory-usage estimation and an “unreliable estimate” flag into the summary payload. |
| skrub/_reporting/_data/templates/report.md | Renders memory usage (and an optional warning) into the Markdown report. |
| skrub/_reporting/_data/templates/dataframe-sample.html | Renders memory usage (and an optional warning) into the HTML header area. |
| skrub/_reporting/tests/test_table_report.py | Adds an assertion that HTML output includes memory usage text. |
| skrub/_reporting/tests/test_markdown_template.py | Adds an assertion that Markdown output includes memory usage text. |
| CHANGES.rst | Adds a changelog bullet about memory usage display/export (formatting currently broken). |
| notes | Adds local environment/setup notes (likely not intended as a tracked repo file). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+21
to
+29
| def _memory_usage_kb(df): | ||
| if sbd.dataframe_module_name(df) == "pandas": | ||
| memory_usage_bytes = df.memory_usage(deep=False).sum() | ||
| else: | ||
| estimated_size = getattr(df, "estimated_size", None) | ||
| if estimated_size is None: | ||
| return None | ||
| memory_usage_bytes = estimated_size() | ||
| return memory_usage_bytes / 1024 |
Comment on lines
+116
to
+120
| # detect complex objects that make memory estimates unreliable | ||
| try: | ||
| summary["memory_estimate_unreliable"] = _has_complex_objects(df) | ||
| except Exception: | ||
| summary["memory_estimate_unreliable"] = False |
Comment on lines
+5
to
+10
| {% if summary.get("memory_usage_kb") is not none %} | ||
| **memory usage** {{ "%.1f" | format(summary.get("memory_usage_kb")) }} KB. | ||
| {% if summary.get("memory_estimate_unreliable") %} | ||
| _Note: memory estimate may be inaccurate for complex object columns._ | ||
| {% endif %} | ||
| {% endif %} |
Updated JSON output to include memory usage information.
emassoulie
reviewed
Jun 10, 2026
Comment on lines
+21
to
+29
| def _memory_usage_kb(df): | ||
| if sbd.dataframe_module_name(df) == "pandas": | ||
| memory_usage_bytes = df.memory_usage(deep=False).sum() | ||
| else: | ||
| estimated_size = getattr(df, "estimated_size", None) | ||
| if estimated_size is None: | ||
| return None | ||
| memory_usage_bytes = estimated_size() | ||
| return memory_usage_bytes / 1024 |
Contributor
There was a problem hiding this comment.
Suggested change
| def _memory_usage_kb(df): | |
| if sbd.dataframe_module_name(df) == "pandas": | |
| memory_usage_bytes = df.memory_usage(deep=False).sum() | |
| else: | |
| estimated_size = getattr(df, "estimated_size", None) | |
| if estimated_size is None: | |
| return None | |
| memory_usage_bytes = estimated_size() | |
| return memory_usage_bytes / 1024 | |
| @dispatch | |
| def _memory_usage_kb(obj): | |
| raise_dispatch_unregistered_type(obj) | |
| @_memory_usage.specialize("pandas") | |
| def _memory_usage_pandas(obj): | |
| memory_usage_bytes = df.memory_usage(deep=False).sum() | |
| return(memory_usage_bytes / 1024) | |
| @_memory_usage.specialize("polars") | |
| def _memory_usage_polars(obj): | |
| estimated_size = getattr(df, "estimated_size", None) | |
| if estimated_size is None: | |
| return None | |
| memory_usage_bytes = estimated_size() | |
| return(memory_usage_bytes / 1024) |
There is a dedicated method to differentiate functions between Pandas and Polars! I've reformatted your exact code to use it.
Author
…kaissi/skrub-women-in-tech into skrub-Salam-womenInTech
emassoulie
reviewed
Jun 11, 2026
| :pr:`2096` by :user:`Ayesha Siddiqua <siddiqua-tamk>`. | ||
| - The :class:`TableReport` can now be exported in markdown format with ``.markdown``. | ||
| :pr:`2048` by :user:`Riccardo Cappuzzo <rcap107>`. | ||
| - The :class:`TableReport` can now be exported the estimated memory usage in TableReport when display data. |
Contributor
There was a problem hiding this comment.
Suggested change
| - The :class:`TableReport` can now be exported the estimated memory usage in TableReport when display data. | |
| - The :class:`TableReport` can now display the estimated memory usage of | |
| the data it is applied to. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Bug Fix Pull Request
Description
Addresses #
Checklist
How Has This Been Tested?
AI Disclosure
Results:
