Skip to content

[CH][Feat] EC-safe reads for Parquet/ORC and libhdfs3 striped reader#11977

Draft
zhanglistar wants to merge 1 commit intoapache:mainfrom
zhanglistar:feat/support-ec-read
Draft

[CH][Feat] EC-safe reads for Parquet/ORC and libhdfs3 striped reader#11977
zhanglistar wants to merge 1 commit intoapache:mainfrom
zhanglistar:feat/support-ec-read

Conversation

@zhanglistar
Copy link
Copy Markdown
Contributor

@zhanglistar zhanglistar commented Apr 22, 2026

  • HDFS ReadBufferBuilder: do not trust Substrait properties.filesize for Parquet/ORC (incl. Iceberg); always stat real length for footer/postscript.
  • Parquet: prefer RandomAccessFileFromRandomAccessReadBuffer when readBigAt is available so footer ReadAt uses pread, not seek+read.
  • ORC (Gluten): OrcUtil path mirrors the same Arrow RandomAccessFile choice where applicable.

What changes are proposed in this pull request?

How was this patch tested?

Manual test, CI.

Was this patch authored or co-authored using generative AI tooling?

Cursor

- HDFS ReadBufferBuilder: do not trust Substrait properties.filesize for
  Parquet/ORC (incl. Iceberg); always stat real length for footer/postscript.
- Parquet: prefer RandomAccessFileFromRandomAccessReadBuffer when
  readBigAt is available so footer ReadAt uses pread, not seek+read.
- ORC (Gluten): OrcUtil path mirrors the same Arrow RandomAccessFile
  choice where applicable.
@zhanglistar zhanglistar marked this pull request as draft April 22, 2026 09:18
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant