Add Arabic text shaping for PDF export by alghanim · Pull Request #22656 · opf/openproject

alghanim · 2026-04-02T17:11:16Z

Summary

Add Exports::PDF::Common::ArabicShaping module that converts Arabic characters to Unicode Presentation Forms based on joining context
Intercept all Prawn text methods (text, draw_text, formatted_text, text_box, make_cell) on the PDF document instance
Supports all standard Arabic letters, extended Arabic (Farsi, Urdu), Lam-Alef ligatures, and diacritical marks
Zero performance impact on non-Arabic text (early return)
Upstream PR on Prawn: Add Arabic text shaping support prawnpdf/prawn#1392

Context

Arabic is a cursive script where characters must connect based on position. Prawn uses ttfunk for font parsing but does not run the OpenType shaping engine, rendering all Arabic characters in isolated form — completely unreadable.

Test plan

Export work package with Arabic title/description to PDF — characters connected
Export work package table with Arabic content — cells render correctly
Mixed Arabic/English text displays correctly
Arabic with diacritics renders correctly
Lam-Alef combinations form ligatures
Non-Arabic exports unaffected

Prawn does not perform OpenType text shaping, so Arabic characters render as disconnected isolated glyphs. This adds a shaping module that converts Arabic characters to their Unicode Presentation Forms (initial/medial/final/isolated) before Prawn renders them. Features: - All standard Arabic letters (U+0621-U+064A) - Extended Arabic (Farsi, Urdu: Peh, Tcheh, Gaf, etc.) - Lam-Alef mandatory ligatures - Diacritical marks preservation - Intercepts all Prawn text methods via document singleton See also: prawnpdf/prawn#1392 for upstream PR

+
+        if arabic_letter?(cp)
+          # Collect a run of Arabic characters (letters + marks)
+          run_start = i


Copilot

Pull request overview

This PR adds an internal Arabic shaping layer to the PDF export pipeline so that Arabic (and extended Arabic) text renders with correct contextual forms in Prawn-generated PDFs.

Changes:

Add Exports::PDF::Common::ArabicShaping to shape Arabic characters into Unicode Presentation Forms and perform a simplified RTL reordering.
Prepend interceptors onto the Prawn document instance in Exports::PDF::Common::View to run shaping for all major text entry points (text, draw_text, formatted_text, text_box, formatted_text_box, make_cell).
Add specs to cover the shaping module and its integration with the PDF view.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`app/models/exports/pdf/common/arabic_shaping.rb`	Implements shaping + RTL reordering logic for Arabic text destined for Prawn.
`app/models/exports/pdf/common/view.rb`	Hooks shaping into PDF rendering by prepending a module on the document instance.
`spec/models/exports/pdf/common/arabic_shaping_spec.rb`	Adds unit specs for Arabic shaping behavior.
`spec/models/exports/pdf/common/view_arabic_spec.rb`	Adds integration-level coverage for shaping being applied to common Prawn text methods.

Copilot · 2026-04-03T23:30:44Z

+require 'spec_helper'
+
+RSpec.describe Exports::PDF::Common::View do
+  describe 'Arabic text shaping integration' do
+    let(:view) { described_class.new(:en) }
+
+    it 'applies Arabic shaping to text method' do
+      pdf = view.document
+      # text method should accept Arabic without error
+      expect { pdf.text('مرحبا بالعالم') }.not_to raise_error
+    end
+
+    it 'applies Arabic shaping to formatted_text method' do
+      pdf = view.document
+      expect { pdf.formatted_text([{ text: 'مرحبا' }]) }.not_to raise_error
+    end
+
+    it 'applies Arabic shaping to make_cell method' do
+      pdf = view.document
+      expect { pdf.make_cell('مرحبا', {}) }.not_to raise_error


These new specs don’t follow the repository’s Ruby/RuboCop conventions (missing # frozen_string_literal: true header and using single-quoted strings). This will likely fail linting; align the header and string literal quoting with existing specs in this directory (e.g., spec/models/exports/pdf/common/view_spec.rb).

Suggested change

require 'spec_helper'

RSpec.describe Exports::PDF::Common::View do

describe 'Arabic text shaping integration' do

let(:view) { described_class.new(:en) }

it 'applies Arabic shaping to text method' do

pdf = view.document

# text method should accept Arabic without error

expect { pdf.text('مرحبا بالعالم') }.not_to raise_error

end

it 'applies Arabic shaping to formatted_text method' do

pdf = view.document

expect { pdf.formatted_text([{ text: 'مرحبا' }]) }.not_to raise_error

end

it 'applies Arabic shaping to make_cell method' do

pdf = view.document

expect { pdf.make_cell('مرحبا', {}) }.not_to raise_error

# frozen_string_literal: true

require "spec_helper"

RSpec.describe Exports::PDF::Common::View do

describe "Arabic text shaping integration" do

let(:view) { described_class.new(:en) }

it "applies Arabic shaping to text method" do

pdf = view.document

# text method should accept Arabic without error

expect { pdf.text("مرحبا بالعالم") }.not_to raise_error

end

it "applies Arabic shaping to formatted_text method" do

pdf = view.document

expect { pdf.formatted_text([{ text: "مرحبا" }]) }.not_to raise_error

end

it "applies Arabic shaping to make_cell method" do

pdf = view.document

expect { pdf.make_cell("مرحبا", {}) }.not_to raise_error

Copilot · 2026-04-03T23:30:44Z

+      # text method should accept Arabic without error
+      expect { pdf.text('مرحبا بالعالم') }.not_to raise_error
+    end
+
+    it 'applies Arabic shaping to formatted_text method' do
+      pdf = view.document
+      expect { pdf.formatted_text([{ text: 'مرحبا' }]) }.not_to raise_error
+    end
+
+    it 'applies Arabic shaping to make_cell method' do
+      pdf = view.document
+      expect { pdf.make_cell('مرحبا', {}) }.not_to raise_error


These examples only assert that Prawn methods don’t raise when given Arabic text; they don’t verify that shaping is actually applied. This test would still pass if apply_arabic_shaping! were removed. Consider asserting that Exports::PDF::Common::ArabicShaping.process is invoked (e.g., via a spy) or that the returned text passed into Prawn is transformed.

Suggested change

# text method should accept Arabic without error

expect { pdf.text('مرحبا بالعالم') }.not_to raise_error

end

it 'applies Arabic shaping to formatted_text method' do

pdf = view.document

expect { pdf.formatted_text([{ text: 'مرحبا' }]) }.not_to raise_error

end

it 'applies Arabic shaping to make_cell method' do

pdf = view.document

expect { pdf.make_cell('مرحبا', {}) }.not_to raise_error

expect(Exports::PDF::Common::ArabicShaping).to receive(:process).with('مرحبا بالعالم').and_call_original

pdf.text('مرحبا بالعالم')

end

it 'applies Arabic shaping to formatted_text method' do

pdf = view.document

expect(Exports::PDF::Common::ArabicShaping).to receive(:process).with('مرحبا').and_call_original

pdf.formatted_text([{ text: 'مرحبا' }])

end

it 'applies Arabic shaping to make_cell method' do

pdf = view.document

expect(Exports::PDF::Common::ArabicShaping).to receive(:process).with('مرحبا').and_call_original

pdf.make_cell('مرحبا', {})

Copilot · 2026-04-03T23:30:44Z

+require 'spec_helper'
+
+RSpec.describe Exports::PDF::Common::ArabicShaping do
+  describe '.contains_arabic?' do
+    it 'returns true for Arabic text' do
+      expect(described_class.contains_arabic?('مرحبا')).to be true
+    end
+
+    it 'returns false for Latin text' do
+      expect(described_class.contains_arabic?('Hello')).to be false
+    end
+
+    it 'returns true for mixed text' do
+      expect(described_class.contains_arabic?('Hello مرحبا')).to be true
+    end
+  end
+
+  describe '.shape' do
+    it 'converts Arabic to presentation forms' do
+      shaped = described_class.shape('مرحبا')
+      shaped.codepoints.each do |cp|
+        expect(cp >= 0xFE70 && cp <= 0xFEFF || cp >= 0xFB50 && cp <= 0xFDFF).to be true
+      end
+    end
+
+    it 'creates Lam-Alef ligatures' do
+      shaped = described_class.shape('لا')
+      expect(shaped.codepoints).to eq([0xFEFB])
+    end
+
+    it 'preserves diacritical marks' do
+      shaped = described_class.shape("بِسْمِ")
+      marks = shaped.codepoints.select { |cp| (0x064B..0x065F).cover?(cp) }
+      expect(marks).not_to be_empty
+    end
+
+    it 'passes through non-Arabic text' do
+      expect(described_class.shape('Hello')).to eq('Hello')
+    end
+
+    it 'handles nil and empty' do
+      expect(described_class.process(nil)).to be_nil
+      expect(described_class.process('')).to eq('')
+    end
+
+    it 'preserves spaces in mixed text' do
+      shaped = described_class.shape('مرحبا بالعالم')
+      expect(shaped).to include(' ')
+    end
+  end
+
+  describe '.process' do
+    it 'shapes and reorders Arabic text' do
+      result = described_class.process('مرحبا')
+      expect(result).not_to eq('مرحبا')


These new specs don’t follow the repository’s Ruby/RuboCop conventions (missing # frozen_string_literal: true header and using single-quoted strings). This will likely fail linting; align with existing specs’ headers and quoting.

Suggested change

require 'spec_helper'

RSpec.describe Exports::PDF::Common::ArabicShaping do

describe '.contains_arabic?' do

it 'returns true for Arabic text' do

expect(described_class.contains_arabic?('مرحبا')).to be true

end

it 'returns false for Latin text' do

expect(described_class.contains_arabic?('Hello')).to be false

end

it 'returns true for mixed text' do

expect(described_class.contains_arabic?('Hello مرحبا')).to be true

end

end

describe '.shape' do

it 'converts Arabic to presentation forms' do

shaped = described_class.shape('مرحبا')

shaped.codepoints.each do |cp|

expect(cp >= 0xFE70 && cp <= 0xFEFF || cp >= 0xFB50 && cp <= 0xFDFF).to be true

end

end

it 'creates Lam-Alef ligatures' do

shaped = described_class.shape('لا')

expect(shaped.codepoints).to eq([0xFEFB])

end

it 'preserves diacritical marks' do

shaped = described_class.shape("بِسْمِ")

marks = shaped.codepoints.select { |cp| (0x064B..0x065F).cover?(cp) }

expect(marks).not_to be_empty

end

it 'passes through non-Arabic text' do

expect(described_class.shape('Hello')).to eq('Hello')

end

it 'handles nil and empty' do

expect(described_class.process(nil)).to be_nil

expect(described_class.process('')).to eq('')

end

it 'preserves spaces in mixed text' do

shaped = described_class.shape('مرحبا بالعالم')

expect(shaped).to include(' ')

end

end

describe '.process' do

it 'shapes and reorders Arabic text' do

result = described_class.process('مرحبا')

expect(result).not_to eq('مرحبا')

# frozen_string_literal: true

require "spec_helper"

RSpec.describe Exports::PDF::Common::ArabicShaping do

describe ".contains_arabic?" do

it "returns true for Arabic text" do

expect(described_class.contains_arabic?("مرحبا")).to be true

end

it "returns false for Latin text" do

expect(described_class.contains_arabic?("Hello")).to be false

end

it "returns true for mixed text" do

expect(described_class.contains_arabic?("Hello مرحبا")).to be true

end

end

describe ".shape" do

it "converts Arabic to presentation forms" do

shaped = described_class.shape("مرحبا")

shaped.codepoints.each do |cp|

expect(cp >= 0xFE70 && cp <= 0xFEFF || cp >= 0xFB50 && cp <= 0xFDFF).to be true

end

end

it "creates Lam-Alef ligatures" do

shaped = described_class.shape("لا")

expect(shaped.codepoints).to eq([0xFEFB])

end

it "preserves diacritical marks" do

shaped = described_class.shape("بِسْمِ")

marks = shaped.codepoints.select { |cp| (0x064B..0x065F).cover?(cp) }

expect(marks).not_to be_empty

end

it "passes through non-Arabic text" do

expect(described_class.shape("Hello")).to eq("Hello")

end

it "handles nil and empty" do

expect(described_class.process(nil)).to be_nil

expect(described_class.process("")).to eq("")

end

it "preserves spaces in mixed text" do

shaped = described_class.shape("مرحبا بالعالم")

expect(shaped).to include(" ")

end

end

describe ".process" do

it "shapes and reorders Arabic text" do

result = described_class.process("مرحبا")

expect(result).not_to eq("مرحبا")

Copilot · 2026-04-03T23:30:45Z

+
+        if arabic_letter?(cp)
+          # Collect a run of Arabic characters (letters + marks)
+          run_start = i


run_start is assigned but never used. This will trigger RuboCop’s unused variable check and should be removed (or used if it was intended for something).

Suggested change

run_start = i

Copilot · 2026-04-03T23:30:45Z

+      segments.reverse_each do |segment|
+        if segment[:dir] == :rtl
+          # Reverse RTL characters (they were in logical order)
+          result.concat(segment[:cps].reverse)
+        else
+          # LTR text keeps its order
+          result.concat(segment[:cps])
+        end


reorder_rtl_line reverses all codepoints in RTL segments. For combining marks (tashkeel), this changes the required order from base+mark to mark+base (e.g., "بِ" becomes kasra before beh), which can break rendering. Consider reversing RTL text by grapheme clusters (base + following combining marks) or otherwise ensuring marks remain after their base character when reversing.

Copilot · 2026-04-03T23:30:45Z

+    it 'preserves diacritical marks' do
+      shaped = described_class.shape("بِسْمِ")
+      marks = shaped.codepoints.select { |cp| (0x064B..0x065F).cover?(cp) }
+      expect(marks).not_to be_empty
+    end


The current spec for diacritics only checks that marks are present, but it wouldn’t catch the mark/base order regression introduced by reorder_rtl (marks can move before their base). Add an assertion that combining marks remain after the intended base character in the output of .process (e.g., for a short string like "بِ").

- Use push() instead of concat([]) for ARABIC_MARKS - Rename short param cp to codepoint (min 3 chars) - Use text.blank? instead of text.nil? || text.empty? - Extract collect_arabic_run to reduce AbcSize of shape method - Remove unused run_start variable - Remove unused RTL reordering code (direction handled by Prawn align) - Fix extra spacing in hash comment alignment

as-op · 2026-04-07T12:02:30Z

Hi.
This is again the monkey-patching of text shaping that should not be done on the application level and also would also be a duplication by your PR to prawn. Instead of hacking the shaping support in, there is a better way mentioned as a comment in your PR.

myabc added the contribution label Apr 3, 2026

github-advanced-security AI found potential problems Apr 3, 2026

View reviewed changes

Comment thread app/models/exports/pdf/common/arabic_shaping.rb Outdated

if arabic_letter?(cp)

# Collect a run of Arabic characters (letters + marks)

run_start = i

myabc requested review from as-op and Copilot April 3, 2026 23:26

Copilot started reviewing on behalf of myabc April 3, 2026 23:27 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

as-op requested review from myabc and removed request for as-op April 4, 2026 09:52

as-op closed this Apr 7, 2026

github-actions bot locked and limited conversation to collaborators Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Arabic text shaping for PDF export#22656

Add Arabic text shaping for PDF export#22656
alghanim wants to merge 2 commits intoopf:devfrom
alghanim:feature/pdf-arabic-shaping

alghanim commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

as-op commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Conversation

alghanim commented Apr 2, 2026

Summary

Context

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

as-op commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants