Skip to content

fix: handle BadZipFile for corrupted media in PPTX converter#1670

Open
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch:fix/issue-1159-pptx-bad-zip-file
Open

fix: handle BadZipFile for corrupted media in PPTX converter#1670
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch:fix/issue-1159-pptx-bad-zip-file

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #1159

Problem

When a PPTX file contains a media file with a corrupted CRC-32 checksum (e.g., an embedded .m4a audio file), python-pptx raises BadZipFile when reading the media blob during conversion. This caused the entire PPTX conversion to fail with an unhandled exception:

PptxConverter threw BadZipFile with message: Bad CRC-32 for file 'ppt/media/media31.m4a'

Solution

Wrap the shape.image.blob access in a try/except block to catch BadZipFile and other IO errors when reading media files. When a corrupted media file is encountered, log a warning and gracefully continue with the rest of the conversion (using a placeholder image reference instead).

Also fix an adjacent TypeError that would occur if llm_caption() returns None: the original "\n".join([None, alt_text]) would fail with a TypeError. Changed to use filter(None, ...) to safely skip None and empty string values before joining.

Testing

  • Conversion of PPTX files without corrupted media continues to work as before
  • PPTX files with corrupted media (bad CRC) no longer crash; a warning is logged and conversion continues for the remaining shapes and slides

…icrosoft#1159)

When a PPTX file contains a media file with corrupted CRC-32 (e.g., a .m4a
audio file), python-pptx raises BadZipFile when reading the media blob. This
caused the entire PPTX conversion to fail.

Wrap the image blob access inside try/except to catch BadZipFile and other
IO errors, logging a warning and continuing with the rest of the presentation.

Also fix the adjacent TypeError: if llm_caption returns None, the original
join would fail. Use filter(None, ...) to safely skip None and empty values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pptx conversion fails when .m4a file

1 participant