Skip to content

Add high-level accessibility API for tagged PDF generation#1391

Open
craigmcnamara wants to merge 5 commits intoprawnpdf:masterfrom
mes-amis:section-508-accessability
Open

Add high-level accessibility API for tagged PDF generation#1391
craigmcnamara wants to merge 5 commits intoprawnpdf:masterfrom
mes-amis:section-508-accessability

Conversation

@craigmcnamara
Copy link
Copy Markdown

@craigmcnamara craigmcnamara commented Mar 26, 2026

Summary

Adds Prawn::Accessibility module providing a user-friendly API for generating Section 508 compliant tagged PDFs.

New document options:

  • marked: true — enables tagged PDF mode (structure tree, marked content)
  • language: 'en-US' — sets /Lang on the Catalog

New methods on Prawn::Document:

  • tagged? — check if document is in tagged mode
  • structure(tag, attributes, &block) — wrap content in a structure element with BDC/EMC
  • structure_container(tag, attributes, &block) — container element for nesting (children tag themselves)
  • artifact(type:, &block) — mark decorative content (footers, borders) excluded from screen readers
  • heading(level, text, options) — H1-H6 convenience
  • paragraph(text, options, &block)<P> convenience
  • figure(alt_text:, &block)<Figure> with /Alt

Supported attributes: :Alt, :ActualText, :Lang, :Scope

pdf = Prawn::Document.new(marked: true, language: 'en-US')
pdf.heading(1, 'Document Title')
pdf.paragraph('Body text.')
pdf.structure(:Span, ActualText: 'required') { pdf.text('*') }
pdf.artifact(type: :Pagination) { pdf.text('Page 1') }

Dependency: This PR depends on prawnpdf/pdf-core#67 being merged first. The new accessibility specs require the marked: option and StructureTree class added in that PR. Test failures in CI for release and edge matrix variants are expected until pdf-core#67 is merged and available.

See also companion PR for prawn-table#164.

Addresses prawnpdf/prawn-table#78

Test plan

  • 18 new specs covering tagged mode, structure elements, artifacts, headings, paragraphs, figures, ActualText, and full document round-trip
  • All 912 specs pass locally (896 existing + 18 new, using local pdf-core fork)
  • RuboCop passes
  • CI: release and edge tests will pass once pdf-core#67 is merged
  • Generate a tagged PDF and verify structure in Adobe Acrobat Tags panel
  • Run PAC 2024 accessibility checker

🤖 Generated with Claude Code

craigmcnamara and others added 3 commits March 25, 2026 15:28
Adds Prawn::Accessibility module providing structure(), structure_container(),
artifact(), heading(), paragraph(), and figure() methods for creating
Section 508 compliant tagged PDFs.

Usage:
  pdf = Prawn::Document.new(marked: true, language: 'en-US')
  pdf.heading(1, 'Title')
  pdf.paragraph('Body text.')
  pdf.artifact { pdf.text 'Page 1' }

New options on Prawn::Document.new:
  - marked: true — enables tagged PDF mode
  - language: 'en-US' — sets document language in Catalog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that ActualText is properly passed through to structure elements,
useful for screen reader replacement text on symbolic characters like
* (required) and X (selected checkbox).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
craigmcnamara and others added 2 commits March 26, 2026 09:59
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pointlessone
Copy link
Copy Markdown
Member

Could you please explain what this feature is? Could you please point me to the reference (either 1.7 or 2.0)? How much of the spec does this implement? How does it interact with other Prawn features?

@craigmcnamara
Copy link
Copy Markdown
Author

Could you please explain what this feature is? Could you please point me to the reference (either 1.7 or 2.0)? How much of the spec does this implement? How does it interact with other Prawn features?

This adds an API to create tagged PDFs that are capable of meeting WCAG and GSA section 508 guidelines. It doesn't get you compliant automatically, but it does give you the tools to do so.

The changes to prawn add an API to fluently make use of tags if you need them. Otherwise Prawn works the same as before if you don't use these features.

Without these tools, users of assistive technologies see the PDFs as a big blob of text. With these tools you can use accessibility audits to mark up your PDF so it's accessible.

@pointlessone
Copy link
Copy Markdown
Member

Well, that covers the first question. Would you please answer the rest?

@craigmcnamara
Copy link
Copy Markdown
Author

Sorry, didn't mean to be overly casual. Here's the details and links to the 1.7 spec.

Could you please point me to the reference (either 1.7 or 2.0)? How much of the spec does this implement?

This implements portions of PDF 1.7:

  • Section 14.6: Marked Content: BMC, BDC, EMC operators for associating content stream sequences with structure elements
  • Section 14.7: Logical Structure: StructTreeRoot, StructElem, ParentTree (number tree), MCR (marked content references), and StructParents on page dictionaries
    • Section 14.7.2: Structure hierarchy (Document, Part, Sect, H1-H6, P, L/LI/Lbl/LBody, Table/TR/TH/TD, Figure, Span, etc.)
  • Section 14.8.2: Standard structure types and their semantics
    • Section 14.8.2.2: Artifacts (content excluded from the logical structure, such as pagination and decorative elements)

What this does NOT implement:

  • Role mapping (/RoleMap on StructTreeRoot)
  • Classification of structure elements beyond the standard types
  • ID tree for element identification
  • Automatic reading order detection, the caller controls reading order via the order of structure calls
  • PDF/UA-1 (ISO 14289-1) full conformance, this provides the building blocks but doesn't enforce all PDF/UA rules

The minimum PDF version is set to 1.7 when marked: true is used. Everything here is also valid for PDF 2.0.

How does it interact with other Prawn features?

At the pdf-core level:

  • /MarkInfo << /Marked true >> on the Catalog
  • StructTreeRoot creation with ParentTree (flat Nums array number tree)
  • StructElem creation with /S (structure type), /P (parent), /K (children: both child StructElems and MCR references)
  • Per-page MCID allocation and /StructParents assignment
  • Attributes: /Alt, /ActualText, /Lang, /Scope (for TH cells), and attribute objects with /O /Table
  • BMC/BDC/EMC operator emission in content streams
  • Automatic finalization via before_render callback

At the prawn level:

  • structure(tag, attributes, &block): creates a StructElem + marked content sequence
  • structure_container(tag, &block): creates a StructElem without marking (for containers like Table, L, whose children mark themselves)
  • artifact(type:, &block): marks decorative content
  • Convenience methods: heading, paragraph, figure
  • Document-level /Lang on the Catalog

The accessibility API is entirely opt-in. When marked: true is not set, no accessibility objects are created and there is no overhead.

  • Text (text, text_box, formatted_text): Works unchanged inside structure blocks. The BDC/EMC operators wrap whatever content the block renders.
  • Graphics (lines, shapes, fills): Same: works inside structure or artifact blocks. Decorative drawing should be in artifact.
  • Images: Work inside structure(:Figure, Alt: '...') for alt text.
  • Tables (prawn-table PR): Automatically emit Table/TR/TH/TD structure when tagged? is true. No API change needed, existing pdf.table(data, header: true) calls gain tagging automatically.

structure and artifact are no-ops when tagged? is false. They just yield the block. This means code can be written once and produce either tagged or untagged PDFs depending on the document option.

@pointlessone
Copy link
Copy Markdown
Member

I think my language fails me.

Craig, I want to make sure you understand what's going on in these PRs, not Claude. I want you to read the spec, read and understand all the code, and then type your own answers to the questions.

I want all these things because Prawn is a community project: it made for people. It's meant to be understood by people. The code you're submitting is going to be supported by people: by me in the immediate future but also by other who'd might want to expand the spec coverage or fix bugs in the future. You're offloading this support burden onto the community by submitting these PRs. I want to make sure that this code—and it's a non-trivial amount of code—can be understood by people.

I appreciate your interest in the project, your impulse to improve it, and your tokens but I care about people more. I'll leave the PRs open in hope you'll do the work but won't spend any more time on reviewing them before you do. Feel free to close the PRs if you have no intent following through.

I'm open to discuss Prawn, PDF, and code if you'll have any questions. My position expressed above—not so much but a wider discussion can be had in Discussions.

@craigmcnamara
Copy link
Copy Markdown
Author

craigmcnamara commented Mar 30, 2026

Craig, I want to make sure you understand what's going on in these PRs, not Claude. I want you to read the spec, read and understand all the code, and then type your own answers to the questions.

I did type my answers and pull the links from the PDF specs to answer the questions you asked. I think you're being a bit hostile because I used Claude to prepare this pull request and I didn't attempt to hide that fact. If there are issues with using generative AI to produce contributions, I'd expect at least a note in the contribution guidelines.

At Mon Ami we've adopted Prawn as the rendering engine for our document product because it's battle tested and gives us the fine level of control we need to produce beautiful PDFs, but we also support users that require accessibility tools to do their work. WCAG support and GSA Section 508 compliance is required for them to do their work, and it's a hard requirement for us as we work with government agencies in the USA.

As mentioned in the 9 years old bug ticket on prawn table generating tagged PDF involves much work most open source libraries don't implement it, I thought this would be an excellent opportunity to leverage Claude for a technically tedious effort and make it possible to produce accessible PDFs with Prawn. The resulting code seems quite reasonable, decently tested, and non-intrusive when the new flags aren't passed.

I did happen to read your blog post My Next Project Won't be FLOSS and can't help but wondering if some of this sentiment is bleeding over into the the decision to reject the contribution out of hand. In the past I was the maintainer of compass-rails so I'm familiar with, and sympathetic towards your position, but I'd like to set that aside and focus on accessibility.

For people that use accessibility tools, a feature that is often considered an afterthought or a nice to have is the difference between them being able to do their jobs or being sidelined by inaccessible technology. In spite of the accessibility laws and guidelines, much new and existing software is not accessible in any real way. I thought a contribution like this would be more than welcomed, but that doesn't seem to be the case.

So, while you're welcome to reject it, and we're welcome to keep running my forked versions in perpetuity, I'd like to ask if there is a path to getting this merged and released for the benefit of the community and to close the accessibility ticket on prawn table before it celebrates a 10th birthday.

I'm happy to contribute time and effort toward maintenance of the accessibility features of Prawn. I didn't notice any financial contribution links to support the maintenance of Prawn, but if there is, a financial contribution is not out of the question. Mon Ami is going to be depending on Prawn for the long term and we'd love to support all users of Prawn and all users in need of accessible software.

@pointlessone
Copy link
Copy Markdown
Member

I did not reject it. I only wanted to make sure that it is not a complete slop. I appreciate you being open about using AI (though, it's kinda obvious AI was used here). I don't have an inherent issue with use of AI but it doesn't mean the way the code's going to be reviewed is different: I expect submissions to be coherent. Since you have a similar experience you understand that reviewing a thousand lines of code is no small task and PDF is not the simplest format out there either.

I will consider adding an AI policy. This is the first submission of such kind, we didn't need a policy before. In due time.

@lindenthal
Copy link
Copy Markdown

Hi @pointlessone and @craigmcnamara,
Thanks for investing time into this. I can contribute financially to this unfit helpful. Please don‘t feel offended.

@craigmcnamara
Copy link
Copy Markdown
Author

We just shipped accessible PDFs using this code from our fork. It passed the Adobe accessibility tools audits and from our in house accessibility specialist who did a manual audit for the things that automated checkers don't usually find. So the code is now in production and being used by actual people with accessibility needs.

At the very least, if we start discussing changes there is a production use case to test the changes against. There isn't any urgency, I don't mind running a fork while we decide if and how any accessibility patches can land in Prawn and pdf-core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants