Skip to content

Fix/epub in the wild robustness#10

Merged
Ringyuki merged 2 commits into
masterfrom
fix/epub-in-the-wild-robustness
Jun 8, 2026
Merged

Fix/epub in the wild robustness#10
Ringyuki merged 2 commits into
masterfrom
fix/epub-in-the-wild-robustness

Conversation

@Ringyuki

@Ringyuki Ringyuki commented Jun 8, 2026

Copy link
Copy Markdown
Owner

No description provided.

Ringyuki added 2 commits June 8, 2026 14:08
Old EPUBs in the wild routinely violate the spec in ways that broke
loading,
parsing, or rendering. Make the parser and resource layers tolerant:

- XHTML: escape stray `&`, remap HTML named entities to numeric refs,
  strip
  illegal XML control chars, preserve comments/CDATA — fixes
  strict-parser
  failures like "EntityRef: expecting ';'" and "PCDATA invalid Char
  value 31".
- Images: index every image file in the archive (not just manifest
  items),
  tolerate unreadable manifest entries, and percent-decode href/zip
  lookups —
  fixes in-content illustrations rendering as broken/no-data.
- OPF: default missing dc:title/language/identifier (with a warning)
  instead of
  throwing; structural <manifest>/<spine> checks stay strict.

Verified across a 27-book corpus: every book loads, parses all chapters,
and
paginates, and every image that exists in the archive resolves.
  Old EPUBs in the wild routinely violate the spec in ways that broke
  loading,
  parsing, or rendering. Make the parser and resource layers tolerant:
  - XHTML: escape stray `&`, remap HTML named entities to numeric refs,
    strip
    illegal XML control chars, preserve comments/CDATA — fixes
    strict-parser
    failures like "EntityRef: expecting ';'" and "PCDATA invalid Char
    value 31".
  - Images: index every image file in the archive (not just manifest
    items),
    tolerate unreadable manifest entries, and percent-decode href/zip
    lookups —
    fixes in-content illustrations rendering as broken/no-data.
  - OPF: default missing dc:title/language/identifier (with a warning)
    instead of
    throwing; structural <manifest>/<spine> checks stay strict.
@Ringyuki Ringyuki merged commit 1698b8f into master Jun 8, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant