Skip to content

[#10029][benchmarks] arrow-flight roundtrip as well as encode/decode #10031

Merged
alamb merged 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/introduce-flight-benchmarks
Jun 3, 2026
Merged

[#10029][benchmarks] arrow-flight roundtrip as well as encode/decode #10031
alamb merged 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/introduce-flight-benchmarks

Conversation

@Rich-T-kid

@Rich-T-kid Rich-T-kid commented May 27, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Provides benchmarks for arrow-flight crate. benchmarks for round trip as well as encode/decode individually.

What changes are included in this PR?

Adds three criterion benches under arrow-flight/benchmarks/ (roundtrip.rs, flight_encode.rs, flight_decode.rs), each sweeping a tunable matrix of rows, cols, and column types (fixed Int64, variable StringArray, nested List, dict DictionaryArray) built via a shared
common::build_batch helper.

Are these changes tested?

n/a

Are there any user-facing changes?

no

@github-actions github-actions Bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels May 27, 2026
@Rich-T-kid Rich-T-kid changed the title [#10029] arrow-flight roundtrip as well as encode/decode [#10029][benchmarks] arrow-flight roundtrip as well as encode/decode May 27, 2026
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/introduce-flight-benchmarks branch from 20b0d75 to 4f6d153 Compare May 27, 2026 21:25
@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

@alamb could you take a look at this when you get a chance? smaller PR that only includes benchmarks, #10029 has a bit more context. 🫡

@gabotechs gabotechs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work @Rich-T-kid! pretty clean benchmarks. Left some suggestions, but otherwise this LGTM

Comment thread arrow-flight/benchmarks/common/mod.rs
Comment thread arrow-flight/benchmarks/common/mod.rs Outdated
Comment thread arrow-flight/benchmarks/common/mod.rs

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rich-T-kid and @gabotechs

I think this looks good to me after we implement @gabotechs 's suggestion

Comment thread arrow-flight/benchmarks/common/mod.rs
Comment thread arrow-flight/Cargo.toml Outdated
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/introduce-flight-benchmarks branch from 97bc1a9 to a0ce9a3 Compare June 2, 2026 16:41
@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

pushed a revised PR @alamb. im not sure why the CI is failing. I'm running ./regen.sh but that isnt resolving it.

@alamb

alamb commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

pushed a revised PR @alamb. im not sure why the CI is failing. I'm running ./regen.sh but that isnt resolving it.

Looks like the fail is https://github.com/apache/arrow-rs/actions/runs/26834183268/job/79123485734?pr=10031

Run git diff --exit-code
diff --git a/Cargo.lock b/Cargo.lock
index 28fe6a4..9e5963a 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -348,6 +348,7 @@ dependencies = [
  "base64",
  "bytes",
  "clap",
+ "criterion",
  "futures",
  "http",
  "http-body",
@@ -1063,6 +1064,7 @@ dependencies = [
  "serde",
  "serde_json",
  "tinytemplate",
+ "tokio",
  "walkdir",
 ]

do we need to check in the Cargo.lock file 🤔

@Rich-T-kid

Rich-T-kid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

yea the error mentions the cargo.lock file but it doesn't exist within the arrow-flight repo.

@Rich-T-kid

Rich-T-kid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

The issue is that the cargo.lock file on main doesnt have criterion listed as a dependency for arrow-flight. But Cargo.lock is ignored due to the .gitignore so I cant push an update directly. ./arrow-flight/regen.sh is suppose to do this but it only shows up in my local environment.

@github-actions github-actions Bot added parquet Changes to the parquet crate parquet-derive parquet-variant parquet-variant* crates arrow-avro arrow-avro crate labels Jun 2, 2026
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/introduce-flight-benchmarks branch from 92c2c64 to dbf546c Compare June 2, 2026 18:10
@github-actions github-actions Bot removed parquet Changes to the parquet crate parquet-derive parquet-variant parquet-variant* crates arrow-avro arrow-avro crate labels Jun 2, 2026
@Rich-T-kid

Copy link
Copy Markdown
Contributor Author

😰 fixed it

Comment thread arrow-flight/benches/common/mod.rs Outdated
Comment thread arrow-flight/benches/flight_decode.rs Outdated

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rich-T-kid -- I started going through this one.

I suggest:

  1. Put the code in single benchmark for now arrow-flight/benches/flight.rs
  2. Name the benchmarks after whatever flight command they are doing (put and exchange for example)
  3. Run the benchmarks locally (via cargo bench --bench <name> ) and profile them to make sure they show that the encoder is taking substantial time

Comment thread arrow-flight/benches/flight_encode.rs Outdated
let mut client = FlightClient::new(channel);
let frames = FlightDataEncoderBuilder::new().build(futures::stream::iter([Ok(batch)]));
let _: Vec<_> = client
.do_put(frames)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be benchmarking do_put not really flight_encode 🤔

@Rich-T-kid Rich-T-kid Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I wanted to do an end-to-end benchmark, but I think it makes more sense to have smaller, more focused benchmarks. I updated the benchmarks to include one round-trip benchmark and another that benchmarks FlightDataEncoder::encode_batch() directly. Other benchmarks can be added later organically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image 6-2-26 at 3 57 PM majority of the time is being spent int arrow-ipc as expected, this is what #10044 aims to address. its not directly wired up to to arrow-flight yet, but the results seems to be good so far.

@Rich-T-kid

Rich-T-kid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

CI is finally passing, sorry about that! @alamb should be good now, lmk if theres anything else

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Rich-T-kid -- while it did take some back and forth I think this benchmark will now let us improve things substnatially.

I agree with your assessment that there is a lot of copying going on -- it will be great to see PRs to avoid it

@alamb

alamb commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Thank you aslo @gabotechs for your review

@alamb alamb merged commit f001b22 into apache:main Jun 3, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants