Skip to content

Fix HNSW InfoStream duplicate times and add per-chunk completion logging#15978

Open
iprithv wants to merge 6 commits intoapache:mainfrom
iprithv:fix-hnsw-infostream-duplicate-times
Open

Fix HNSW InfoStream duplicate times and add per-chunk completion logging#15978
iprithv wants to merge 6 commits intoapache:mainfrom
iprithv:fix-hnsw-infostream-duplicate-times

Conversation

@iprithv
Copy link
Copy Markdown
Contributor

@iprithv iprithv commented Apr 23, 2026

Description:

Fixes #15967

The printGraphBuildStatus method in HnswGraphBuilder.addVectors() reports progress as "built %d in %d/%d ms" where the two times are meant to show incremental (since last print) and total (since start). However, during concurrent HNSW merging, each worker processes chunks of 2048 vectors, and the progress print fires every 10,000 nodes. This means the print fires at most once per chunk, always on the first matching node where t == start, producing identical values:

HNSW: built 950000 in 11447/11447 ms   ← always duplicate

Changes

1. Suppress the broken in-loop progress for small ranges

Added a numVectors > 10000 guard to skip the in-loop progress print for ranges ≤ 10,000 vectors. These ranges can trigger the print at most once, where t == start always produces duplicate incremental/total times. For large ranges (e.g., the non-concurrent build() path with millions of vectors), the in-loop progress continues to fire every 10K nodes with correct diverging times.

2. Add per-chunk completion logging

Every addVectors() call now emits a completion message with the range, vector count, and wall-clock elapsed time:

HNSW: addVectors [950000 952048): 2048 vectors in 47 ms

This enables deriving effective concurrency during HNSW merges by summing all chunk times that contributed to one merge and dividing by wall-clock elapsed time, as described in #15967.

Output comparison

Before (concurrent merge chunk crossing a 10K boundary):

HNSW: addVectors [950000 952048)
HNSW: built 950000 in 11447/11447 ms    ← duplicate times

After:

HNSW: addVectors [950000 952048): 2048 vectors in 11450 ms

Before (non-concurrent build of 1M vectors):

HNSW: build graph from 1000000 vectors
HNSW: addVectors [0 1000000)
HNSW: built 0 in 0/0 ms
HNSW: built 10000 in 150/150 ms
HNSW: built 20000 in 140/290 ms
...

After:

HNSW: build graph from 1000000 vectors
HNSW: built 0 in 0/0 ms
HNSW: built 10000 in 150/150 ms
HNSW: built 20000 in 140/290 ms
...
HNSW: addVectors [0 1000000): 1000000 vectors in 45000 ms

@github-actions github-actions Bot added this to the 11.0.0 milestone Apr 23, 2026
@mikemccand
Copy link
Copy Markdown
Member

Whoa, thank you for the quick PR @iprithv! I'll have a look soon. This would make segment traces more consumable, and then we can also more accurately measure effective HNSW concurrency for each merge.

The reason I'm chasing this is I was worried about a possible concurrency bug where a merge would incorrectly remain single-threaded for its duration if when the merge started there were other merges temporarily consuming all HNSW worker threads. Not sure there really is a bug ... speculating by attempting to simulate code in my head ;)

Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iprithv! I left a few comments. There are many words but I think pretty simple changes maybe?

if ((node % 10000 == 0) && infoStream.isEnabled(HNSW_COMPONENT)) {
// Skip in-loop progress for ranges <= 10000 where it would fire at most once
// with identical incremental/total times (#15967).
if (numVectors > 10000 && (node % 10000 == 0) && infoStream.isEnabled(HNSW_COMPONENT)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe the original print (when this was single threaded, maybe?) is/was trying to print elapsed time since the merge kicked off, and then also relative time since the previous print?
Also, it'd be nice to keep the consistent printing every 10K vectors (whichever worker thread happens to get the lucky golden tickets). Imagine an adversarial-to-InfoStream world where a large merge kicks off, there are many threads so all chunks get pulled and worked on by a thread, but there are not enough cores, so the threads progress slowly, and we never see any logging until all chunks are done?

Maybe we could stop trying to log the delta-time (time since last log)? Just time since merge began, and, keep printing on the lucky golden ticket (node % 10000 == 0)? Then we will see regular prints, it's always since HNSW merge start, it kinda behaves similar w/ many chunks vs one chunk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

  1. Kept the node % 10000 for all ranges (removed the numVectors > 10000 guard)
  2. Dropped the delta time, now just shows elapsed time since start: "built %d in %.2f ms"
  3. For concurrent merges, it'll now show time since merge start (not chunk start). Added a mergeStartTimeNS field that HnswConcurrentMergeBuilder.build() sets on all workers before launching them. For the non-concurrent build() path, it falls back to chunk-local start time (which is the same as build start).

}
if (infoStream.isEnabled(HNSW_COMPONENT)) {
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
infoStream.message(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I wonder if, separately from this new InfoStream log line, we could also aggregate these times and then (when HNSW merge is completely done, upstairs) add another summary InfoStream log line stating the effective concurrency? Could maybe just be an AtomicLong that each chunk increments its elapsed time into, then upstairs at HNSW merge end just divide that value by elapsed wall clock time to get & InfoStream.message the implied concurrency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Added a shared AtomicLong cumulativeWorkTimeNS that each chunk increments with its elapsed time. At the end of HnswConcurrentMergeBuilder.build(), after all workers finish, it logs:

merge completed: 100000 vectors, 12500.00 ms wall clock, 42300.45 ms cumulative worker time, 3.38x effective concurrency

This should make it straightforward to detect the single threaded merge scenario, it would show ~1.0x effective concurrency.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, this is awesome! Can't wait for nightly build to run on this then I scrutinize the InfoStream -- first light for this new metric!

HNSW_COMPONENT,
String.format(
Locale.ROOT,
"addVectors [%d %d): %d vectors in %d ms",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am overly optimistic, but maybe we could %.2f these ms instead of %d? (And also the one in printGraphBuildStatus)? How fast are 2048 ANN searches + HNSW insert on these modern CPUs with massive SIMD silicon and native optimized dot products and such ...

Copy link
Copy Markdown
Contributor Author

@iprithv iprithv Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, individual chunks could well complete in sub-millisecond times. Updated both printGraphBuildStatus and the per chunk completion message to use %.2f with nanoTime / 1_000_000.0 for proper sub-millisecond precision. Thanks!

…ing (apache#15967)

Signed-off-by: prithvi <prithvisivasankar@gmail.com>
@iprithv iprithv force-pushed the fix-hnsw-infostream-duplicate-times branch from 4494239 to 512d5b4 Compare April 24, 2026 19:16
@iprithv iprithv requested a review from mikemccand April 24, 2026 19:30
Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- I left a couple trivial comments, then I think this is ready! Can't wait to see what nightly benchy sees as effective HNSW concurrency building the Cohere vector indices...

private long printGraphBuildStatus(int node, long start, long t) {
long now = System.nanoTime();
private void printGraphBuildStatus(int node, long start) {
double elapsedMs = (System.nanoTime() - start) / 1_000_000.0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename start -> startNs? I think it's crucial to include units in variable names ... I've seen too many bugs over the years where person 1 thought it was ns and person 2 thought it was us and ...

And of course the Mars Climate Orbiter disaster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. Thanks!

Comment thread lucene/CHANGES.txt Outdated

Bug Fixes
---------------------
* GITHUB#15967: Fix HNSW InfoStream progress to show elapsed time since merge start instead of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe state that this is an IndexWriter thing? Fix IndexWriter's HSNW InfoStream...?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense, added. thanks!

iprithv added 2 commits April 28, 2026 02:19
Signed-off-by: prithvi <prithvisivasankar@gmail.com>
@iprithv iprithv requested a review from mikemccand April 27, 2026 20:52
Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- I just had one nitpick about camelCaseContainingAcronyms consistency. Thanks @iprithv.

private long printGraphBuildStatus(int node, long start, long t) {
long now = System.nanoTime();
private void printGraphBuildStatus(int node, long startNs) {
double elapsedMs = (System.nanoTime() - startNs) / 1_000_000.0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we be consistent about startNs vs chunkElapsedNS ("treat it like a word" --> chunkedElapsedNs)? Darned camel-casing rules intersecting with acronyms/abbreviations... Wikipedia's CamelCase page has a whole paragraph dedicated to the conundrum.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, updated. Thanks!

@iprithv iprithv requested a review from mikemccand April 28, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The InfoStream progress lines from concurrent HNSW merging report duplicate times

2 participants