Fix inconsistent recognition_done_ state on empty pages by eyupcanakman · Pull Request #4528 · tesseract-ocr/tesseract

eyupcanakman · 2026-03-18T15:39:58Z

When Recognize() encounters an empty page (no text blocks detected), it sets page_res_ but returns without setting recognition_done_ to true.

Some renderers (hOCR, ALTO, TSV) check page_res_ == nullptr to decide whether to re-run recognition, while others (GetUTF8Text, GetBoxText, GetUNLVText) check !recognition_done_. The second group triggers a redundant Recognize() call on empty pages. If the second pass non-deterministically finds text, later renderers get text while earlier ones (hOCR) return empty output.

Set recognition_done_ = true in the empty-page early-return path, same as the non-empty path. Add a regression test that verifies hOCR, UTF8, and TSV output are all non-null after recognizing a blank image.

Fixes #4112

Fix inconsistent recognition_done_ state on empty pages

90928a9

eyupcanakman force-pushed the fix/recognition-done-empty-page branch from 003e38e to 90928a9 Compare March 18, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inconsistent recognition_done_ state on empty pages#4528

Fix inconsistent recognition_done_ state on empty pages#4528
eyupcanakman wants to merge 1 commit intotesseract-ocr:mainfrom
eyupcanakman:fix/recognition-done-empty-page

eyupcanakman commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eyupcanakman commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eyupcanakman commented Mar 18, 2026 •

edited

Loading