Skip to content

Support PDF 2.0 dates and tighten date validation#631

Open
thomasbuilds wants to merge 1 commit into
GrapheneOS:mainfrom
thomasbuilds:fix-pdf-2.0-date-format
Open

Support PDF 2.0 dates and tighten date validation#631
thomasbuilds wants to merge 1 commit into
GrapheneOS:mainfrom
thomasbuilds:fix-pdf-2.0-date-format

Conversation

@thomasbuilds
Copy link
Copy Markdown
Contributor

closes #629

@thomasbuilds thomasbuilds force-pushed the fix-pdf-2.0-date-format branch from 24ce6ed to 6724ffe Compare April 27, 2026 08:39
…lidation

Support the PDF 2.0 spec (ISO 32000-2) date format which omits the trailing
apostrophe that was required in PDF 1.7. Per section 7.9.4, continue accepting
the older format. The regex also tolerates a missing splitting apostrophe for
additional leniency (matching pdf.js behavior).

The regex-based implementation replaces the string position tracking and fixes
several existing bugs:
- Date strings with optional fields omitted (e.g., no seconds) now parse
  correctly. The old parser incorrectly interpreted the timezone sign as part
  of the seconds field when seconds were absent.
- Add strict validation for month (0-11 range, was not checked for < 0) and
  day (1-31 range, now checked for < 1).
- UTC indicator 'Z' now correctly rejects non-zero offsets, as per spec.

Remove dependency on android.text.TextUtils by replacing isDigitsOnly() checks
with regex validation.

Fixes GrapheneOS#629
@thomasbuilds thomasbuilds force-pushed the fix-pdf-2.0-date-format branch from 6724ffe to d28e515 Compare April 27, 2026 10:42
@thomasbuilds thomasbuilds changed the title Accept PDF 2.0 date strings without trailing apostrophe Support PDF 2.0 dates and improve date validation Apr 27, 2026
@RankoR RankoR self-requested a review April 27, 2026 19:11
@thomasbuilds
Copy link
Copy Markdown
Contributor Author

I noticed a couple of preexisting issues in parseDate. They may be worth handling in a separate PR, since this PR is mainly about accepting the PDF 2.0 date syntax, but they’re relevant to the rewritten parser.

1. UTC offset handling is semantically inverted.

The current arithmetic, preserved from main, does:

case "-":
    hours -= offsetHours;
    minutes -= offsetMinutes;
    break;
case "+":
    hours += offsetHours;
    minutes += offsetMinutes;
    break;

Per ISO 32000-2 §7.9.4, an offset of -08'00 means the local time is 8 hours earlier than UTC, so normalizing to UTC should add 8 hours, not subtract. pdf.js handles it the other way:

https://github.com/mozilla/pdf.js/blob/v5.6.205/src/display/display_utils.js#L551-L561

However, just flipping the signs would not fully fix it, because Calendar.getInstance() uses the device timezone. If we normalize fields to UTC and then set them on a default-timezone Calendar, the resulting instant is still wrong. A complete fix would probably need to build the calendar in UTC, then format the resulting Date in the default timezone, for example

final Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
// set fields and apply offset to normalize to UTC
return DateFormat.getDateTimeInstance(...).format(calendar.getTime());

2. Calendar leniency accepts impossible dates.

D:20230231 is currently accepted and silently rolls over to March 3. If we want strict validation, we’d need something like

calendar.setLenient(false);
calendar.getTime();

@thomasbuilds thomasbuilds changed the title Support PDF 2.0 dates and improve date validation Support PDF 2.0 dates and tighten date validation May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accept PDF 2.0 (and 1.7) dates

1 participant