Skip to content

Skrub Add Estimated Memory Usage to TableReport#2153

Open
salam-alkaissi wants to merge 14 commits into
skrub-data:mainfrom
salam-alkaissi:skrub-Salam-womenInTech
Open

Skrub Add Estimated Memory Usage to TableReport#2153
salam-alkaissi wants to merge 14 commits into
skrub-data:mainfrom
salam-alkaissi:skrub-Salam-womenInTech

Conversation

@salam-alkaissi

Copy link
Copy Markdown

Bug Fix Pull Request

Description

Addresses #

Checklist

  • I have read the contributing guidelines
  • I have added tests that verify the bug fix
  • I have added an entry to CHANGES.rst describing the fix
  • My code follows the code style of this project
  • I have checked my code and corrected any misspellings

How Has This Been Tested?

AI Disclosure

  • This PR contains AI-generated code
    • I have tested the code generated in my PR
    • I have read and understood every line that has been generated by the AI agent
    • I can explain what the AI-generated code does

Results:
image

image

Copilot AI review requested due to automatic review settings June 10, 2026 12:02

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds estimated dataframe memory usage to TableReport outputs (HTML + Markdown) and verifies the new content via tests.

Changes:

  • Compute and expose memory_usage_kb (and an “estimate unreliable” flag) in dataframe summaries.
  • Render memory usage in Markdown and HTML report templates.
  • Extend tests and update changelog entries (plus a new notes file).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
skrub/_reporting/_summarize.py Adds memory-usage estimation and an “unreliable estimate” flag into the summary payload.
skrub/_reporting/_data/templates/report.md Renders memory usage (and an optional warning) into the Markdown report.
skrub/_reporting/_data/templates/dataframe-sample.html Renders memory usage (and an optional warning) into the HTML header area.
skrub/_reporting/tests/test_table_report.py Adds an assertion that HTML output includes memory usage text.
skrub/_reporting/tests/test_markdown_template.py Adds an assertion that Markdown output includes memory usage text.
CHANGES.rst Adds a changelog bullet about memory usage display/export (formatting currently broken).
notes Adds local environment/setup notes (likely not intended as a tracked repo file).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skrub/_reporting/_summarize.py Outdated
Comment on lines +21 to +29
def _memory_usage_kb(df):
if sbd.dataframe_module_name(df) == "pandas":
memory_usage_bytes = df.memory_usage(deep=False).sum()
else:
estimated_size = getattr(df, "estimated_size", None)
if estimated_size is None:
return None
memory_usage_bytes = estimated_size()
return memory_usage_bytes / 1024
Comment thread skrub/_reporting/_summarize.py
Comment thread skrub/_reporting/_summarize.py Outdated
Comment on lines +116 to +120
# detect complex objects that make memory estimates unreliable
try:
summary["memory_estimate_unreliable"] = _has_complex_objects(df)
except Exception:
summary["memory_estimate_unreliable"] = False
Comment on lines +5 to +10
{% if summary.get("memory_usage_kb") is not none %}
**memory usage** {{ "%.1f" | format(summary.get("memory_usage_kb")) }} KB.
{% if summary.get("memory_estimate_unreliable") %}
_Note: memory estimate may be inaccurate for complex object columns._
{% endif %}
{% endif %}
Comment thread skrub/_reporting/_data/templates/dataframe-sample.html
Comment thread notes Outdated
Comment thread CHANGES.rst Outdated
@MarieSacksick MarieSacksick added the CFM sprint June 2026 For PRs opened during the CFM sprint in June 2026 label Jun 10, 2026
@salam-alkaissi salam-alkaissi changed the title Skrub salam women in tech Skrub Add Estimated Memory Usage to TableReport Jun 10, 2026
salam-alkaissi and others added 2 commits June 10, 2026 14:17
Updated JSON output to include memory usage information.
Comment thread skrub/_reporting/_summarize.py Outdated
Comment on lines +21 to +29
def _memory_usage_kb(df):
if sbd.dataframe_module_name(df) == "pandas":
memory_usage_bytes = df.memory_usage(deep=False).sum()
else:
estimated_size = getattr(df, "estimated_size", None)
if estimated_size is None:
return None
memory_usage_bytes = estimated_size()
return memory_usage_bytes / 1024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _memory_usage_kb(df):
if sbd.dataframe_module_name(df) == "pandas":
memory_usage_bytes = df.memory_usage(deep=False).sum()
else:
estimated_size = getattr(df, "estimated_size", None)
if estimated_size is None:
return None
memory_usage_bytes = estimated_size()
return memory_usage_bytes / 1024
@dispatch
def _memory_usage_kb(obj):
raise_dispatch_unregistered_type(obj)
@_memory_usage.specialize("pandas")
def _memory_usage_pandas(obj):
memory_usage_bytes = df.memory_usage(deep=False).sum()
return(memory_usage_bytes / 1024)
@_memory_usage.specialize("polars")
def _memory_usage_polars(obj):
estimated_size = getattr(df, "estimated_size", None)
if estimated_size is None:
return None
memory_usage_bytes = estimated_size()
return(memory_usage_bytes / 1024)

There is a dedicated method to differentiate functions between Pandas and Polars! I've reformatted your exact code to use it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the test passed successfully! Thank you for your support and the event.

image

@jeromedockes jeromedockes self-requested a review June 11, 2026 08:00
Comment thread CHANGES.rst Outdated
:pr:`2096` by :user:`Ayesha Siddiqua <siddiqua-tamk>`.
- The :class:`TableReport` can now be exported in markdown format with ``.markdown``.
:pr:`2048` by :user:`Riccardo Cappuzzo <rcap107>`.
- The :class:`TableReport` can now be exported the estimated memory usage in TableReport when display data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The :class:`TableReport` can now be exported the estimated memory usage in TableReport when display data.
- The :class:`TableReport` can now display the estimated memory usage of
the data it is applied to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CFM sprint June 2026 For PRs opened during the CFM sprint in June 2026

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants