Skip to content

feat: add configurable safety limits to ZipConverter#1666

Open
VANDRANKI wants to merge 1 commit intomicrosoft:mainfrom
VANDRANKI:feat/zip-configurable-limits
Open

feat: add configurable safety limits to ZipConverter#1666
VANDRANKI wants to merge 1 commit intomicrosoft:mainfrom
VANDRANKI:feat/zip-configurable-limits

Conversation

@VANDRANKI
Copy link
Copy Markdown

Problem

The current ZipConverter has no limits on how many files it processes or how much data it extracts. A maliciously crafted zip file (zip bomb) can compress gigabytes of data into a small archive, causing OOM or denial of service when converted. There is also no protection against zip slip path traversal attacks.

Changes

converters/_zip_converter.py:

  • Add three constructor parameters with safe defaults:
    • max_file_count (default 100): stops processing when limit is reached, appends a notice
    • max_file_size (default 50 MB): skips individual files that exceed the limit with a per-file notice
    • max_total_size (default 200 MB): stops processing when cumulative uncompressed bytes would exceed the budget
  • Guard against zip slip: skip entries whose name starts with / or contains .. path components (cross-platform - handles both POSIX and Windows absolute paths)
  • Iterate over ZipInfo objects instead of namelist() to read file_size before extracting
  • Skip directory entries (names ending with /)

tests/test_zip_converter.py (new file): 10 tests covering all limit types, zip slip protection, and accepts().

Usage

# Use defaults (100 files, 50 MB/file, 200 MB total)
md = MarkItDown()

# Tighter limits for untrusted input
from markitdown.converters import ZipConverter
converter = ZipConverter(
    markitdown=md_instance,
    max_file_count=20,
    max_file_size=5 * 1024 * 1024,   # 5 MB
    max_total_size=50 * 1024 * 1024, # 50 MB
)

Tests

python -m pytest tests/test_zip_converter.py -v
# 10 passed
ruff check src/markitdown/converters/_zip_converter.py tests/test_zip_converter.py  # All checks passed
ruff format --check ...  # Already formatted

Closes #1661

The previous ZipConverter had no limits on file count, per-file size,
or total uncompressed size, making it vulnerable to zip bomb extraction.
It also had no protection against zip slip path traversal attacks.

Changes:
- Add max_file_count constructor parameter (default 100): stops
  processing when the limit is reached and appends a notice
- Add max_file_size constructor parameter (default 50 MB): skips
  individual files that exceed the limit with a per-file notice
- Add max_total_size constructor parameter (default 200 MB): stops
  processing when cumulative uncompressed bytes would exceed the limit
- Guard against zip slip: skip entries whose name starts with '/' or
  contains '..' path components (handles both POSIX and Windows paths)
- Iterate over ZipInfo objects instead of namelist() to read
  file_size before extracting
- Skip directory entries (names ending with '/')
- Add 10 tests covering all limit types, zip slip, and accepts()

Closes: microsoft#1661
@VANDRANKI
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make ZipConverter safety limits configurable via constructor parameters

1 participant