Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions include/tesseract/renderer.h
Original file line number Diff line number Diff line change
Expand Up @@ -173,13 +173,16 @@ class TESS_API TessHOcrRenderer : public TessResultRenderer {
explicit TessHOcrRenderer(const char *outputbase, bool font_info);
explicit TessHOcrRenderer(const char *outputbase);

void SetInputLanguages(const char *languages);

protected:
bool BeginDocumentHandler() override;
bool AddImageHandler(TessBaseAPI *api) override;
bool EndDocumentHandler() override;

private:
bool font_info_; // whether to print font information
std::string input_languages_;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding std::string input_languages_ to TessHOcrRenderer changes the size/layout of this exported (TESS_API) C++ class, which can break binary compatibility for downstream code that links against libtesseract and instantiates TessHOcrRenderer. If ABI stability is a concern, consider storing the new state behind an indirection (pimpl/opaque pointer) or in a separate internal structure to minimize ABI impact.

Suggested change
std::string input_languages_;
// NOTE: Additional per-instance state (e.g. input languages) is stored
// out-of-line to avoid changing the ABI-visible layout of this class.

Copilot uses AI. Check for mistakes.
};

/**
Expand Down
13 changes: 12 additions & 1 deletion src/api/hocrrenderer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,12 @@ TessHOcrRenderer::TessHOcrRenderer(const char *outputbase, bool font_info)
font_info_ = font_info;
}

void TessHOcrRenderer::SetInputLanguages(const char *languages) {
if (languages) {
input_languages_ = languages;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetInputLanguages(nullptr) currently leaves any previously set value intact, which can produce stale ocr-langs metadata if a renderer instance is reused and the caller tries to clear the languages. Consider explicitly clearing input_languages_ when languages == nullptr (or provide a ClearInputLanguages()), so the behavior is unambiguous.

Suggested change
if (languages) {
input_languages_ = languages;
if (languages && languages[0] != '\0') {
input_languages_ = languages;
} else {
input_languages_.clear();

Copilot uses AI. Check for mistakes.
}
}

bool TessHOcrRenderer::BeginDocumentHandler() {
AppendString(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
Expand All @@ -496,8 +502,13 @@ bool TessHOcrRenderer::BeginDocumentHandler() {
if (font_info_) {
AppendString(" ocrp_font ocrp_fsize");
}
AppendString("'/>\n");
if (!input_languages_.empty()) {
AppendString(" <meta name='ocr-langs' content='");
AppendString(input_languages_.c_str());
AppendString("' />\n");
}
AppendString(
"'/>\n"
" </head>\n"
" <body>\n");

Expand Down
1 change: 1 addition & 0 deletions src/tesseract.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,7 @@ static void PreloadRenderers(tesseract::TessBaseAPI &api,
bool font_info;
api.GetBoolVariable("hocr_font_info", &font_info);
auto renderer = std::make_unique<tesseract::TessHOcrRenderer>(outputbase, font_info);
renderer->SetInputLanguages(api.GetInitLanguagesAsString());
if (renderer->happy()) {
renderers.push_back(std::move(renderer));
} else {
Expand Down
Loading