Skip to content

Date32 getDateDay silently loses precision for dates outside ~1685-2255 #422

@rustyconover

Description

@rustyconover

Summary

getDateDay converts int32 day values to epoch milliseconds via 86400000 * data[index]. This overflows Number.MAX_SAFE_INTEGER for dates far from epoch, silently returning incorrect values.

Root cause

In visitor/get.mjs:

const epochDaysToMs = (data, index) => 86400000 * data[index];
const getDateDay = ({ values }, index) => epochDaysToMs(values, index);

Date32 stores days as int32, supporting a range of ±2^31 days (~5.8 million years in each direction). But 86400000 * 2147483647 = 1.855e17, which exceeds Number.MAX_SAFE_INTEGER (9.007e15).

The precision boundary is Math.floor(Number.MAX_SAFE_INTEGER / 86400000) = 104,249,991 days ≈ ±285,420 years from epoch. Dates outside roughly 283,400 BC – 287,400 AD silently return incorrect millisecond values due to floating-point precision loss.

Impact

Unlike the timestamp overflow (which throws — see #421), this silently returns wrong data. DuckDB's DATE type supports dates from 5,877,642 BC to 5,881,580 AD. DuckDB's test_all_types() produces min/max dates well outside the safe conversion range. Consumers displaying these dates get subtly wrong output with no error.

Additionally, DuckDB uses date infinity sentinels (INT32_MAX and -INT32_MAX) which also overflow when multiplied by 86400000.

Proposal

Return the raw int32 day count instead of converting to milliseconds:

const getDateDay = ({ values }, index) => values[index];

This is lossless, never overflows, and consistent with how visitInt32 already returns the raw value. Callers that want a JS Date can convert explicitly:

new Date(days * 86400000) // fine for dates within ±285k years

As with #421, this is a breaking change for code that expects get() to return epoch milliseconds. The same resolution options apply (major version bump, opt-in flag, etc.).

Context

We hit this building a DuckDB WASM frontend that renders test_all_types(). Our workaround reads raw int32 values from the underlying Int32Array:

if (typeStr.includes("Date32")) {
  const chunk = column.data?.[0] ?? column.data;
  if (chunk?.values instanceof Int32Array) {
    return chunk.values[row - (chunk.offset ?? 0)]; // raw days
  }
}

This bypasses Arrow's getter entirely. We then format using Howard Hinnant's civil calendar algorithm, which handles the full int32 day range correctly with pure arithmetic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions