Fix HNSW InfoStream duplicate times and add per-chunk completion logging by iprithv · Pull Request #15978 · apache/lucene

iprithv · 2026-04-23T19:38:51Z

Description:

The printGraphBuildStatus method in HnswGraphBuilder.addVectors() reports progress as "built %d in %d/%d ms" where the two times are meant to show incremental (since last print) and total (since start). However, during concurrent HNSW merging, each worker processes chunks of 2048 vectors, and the progress print fires every 10,000 nodes. This means the print fires at most once per chunk, always on the first matching node where t == start, producing identical values:

HNSW: built 950000 in 11447/11447 ms   ← always duplicate

Changes

1. Suppress the broken in-loop progress for small ranges

Added a numVectors > 10000 guard to skip the in-loop progress print for ranges ≤ 10,000 vectors. These ranges can trigger the print at most once, where t == start always produces duplicate incremental/total times. For large ranges (e.g., the non-concurrent build() path with millions of vectors), the in-loop progress continues to fire every 10K nodes with correct diverging times.

2. Add per-chunk completion logging

Every addVectors() call now emits a completion message with the range, vector count, and wall-clock elapsed time:

HNSW: addVectors [950000 952048): 2048 vectors in 47 ms

This enables deriving effective concurrency during HNSW merges by summing all chunk times that contributed to one merge and dividing by wall-clock elapsed time, as described in #15967.

Output comparison

Before (concurrent merge chunk crossing a 10K boundary):

HNSW: addVectors [950000 952048)
HNSW: built 950000 in 11447/11447 ms    ← duplicate times

After:

HNSW: addVectors [950000 952048): 2048 vectors in 11450 ms

Before (non-concurrent build of 1M vectors):

HNSW: build graph from 1000000 vectors
HNSW: addVectors [0 1000000)
HNSW: built 0 in 0/0 ms
HNSW: built 10000 in 150/150 ms
HNSW: built 20000 in 140/290 ms
...

After:

HNSW: build graph from 1000000 vectors
HNSW: built 0 in 0/0 ms
HNSW: built 10000 in 150/150 ms
HNSW: built 20000 in 140/290 ms
...
HNSW: addVectors [0 1000000): 1000000 vectors in 45000 ms

mikemccand · 2026-04-24T17:15:43Z

Whoa, thank you for the quick PR @iprithv! I'll have a look soon. This would make segment traces more consumable, and then we can also more accurately measure effective HNSW concurrency for each merge.

The reason I'm chasing this is I was worried about a possible concurrency bug where a merge would incorrectly remain single-threaded for its duration if when the merge started there were other merges temporarily consuming all HNSW worker threads. Not sure there really is a bug ... speculating by attempting to simulate code in my head ;)

mikemccand

Thanks @iprithv! I left a few comments. There are many words but I think pretty simple changes maybe?

mikemccand · 2026-04-24T17:18:32Z

-      if ((node % 10000 == 0) && infoStream.isEnabled(HNSW_COMPONENT)) {
+      // Skip in-loop progress for ranges <= 10000 where it would fire at most once
+      // with identical incremental/total times (#15967).
+      if (numVectors > 10000 && (node % 10000 == 0) && infoStream.isEnabled(HNSW_COMPONENT)) {


I think maybe the original print (when this was single threaded, maybe?) is/was trying to print elapsed time since the merge kicked off, and then also relative time since the previous print?
Also, it'd be nice to keep the consistent printing every 10K vectors (whichever worker thread happens to get the lucky golden tickets). Imagine an adversarial-to-InfoStream world where a large merge kicks off, there are many threads so all chunks get pulled and worked on by a thread, but there are not enough cores, so the threads progress slowly, and we never see any logging until all chunks are done?

Maybe we could stop trying to log the delta-time (time since last log)? Just time since merge began, and, keep printing on the lucky golden ticket (node % 10000 == 0)? Then we will see regular prints, it's always since HNSW merge start, it kinda behaves similar w/ many chunks vs one chunk?

Sure, updated.

Kept the node % 10000 for all ranges (removed the numVectors > 10000 guard)

Dropped the delta time, now just shows elapsed time since start: "built %d in %.2f ms"

For concurrent merges, it'll now show time since merge start (not chunk start). Added a mergeStartTimeNS field that HnswConcurrentMergeBuilder.build() sets on all workers before launching them. For the non-concurrent build() path, it falls back to chunk-local start time (which is the same as build start).

mikemccand · 2026-04-24T17:27:57Z

    }
+    if (infoStream.isEnabled(HNSW_COMPONENT)) {
+      long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
+      infoStream.message(


This is awesome! I wonder if, separately from this new InfoStream log line, we could also aggregate these times and then (when HNSW merge is completely done, upstairs) add another summary InfoStream log line stating the effective concurrency? Could maybe just be an AtomicLong that each chunk increments its elapsed time into, then upstairs at HNSW merge end just divide that value by elapsed wall clock time to get & InfoStream.message the implied concurrency?

Sure! Added a shared AtomicLong cumulativeWorkTimeNS that each chunk increments with its elapsed time. At the end of HnswConcurrentMergeBuilder.build(), after all workers finish, it logs:

merge completed: 100000 vectors, 12500.00 ms wall clock, 42300.45 ms cumulative worker time, 3.38x effective concurrency

This should make it straightforward to detect the single threaded merge scenario, it would show ~1.0x effective concurrency.

Yay, this is awesome! Can't wait for nightly build to run on this then I scrutinize the InfoStream -- first light for this new metric!

mikemccand · 2026-04-24T17:29:47Z

+          HNSW_COMPONENT,
+          String.format(
+              Locale.ROOT,
+              "addVectors [%d %d): %d vectors in %d ms",


Maybe I am overly optimistic, but maybe we could %.2f these ms instead of %d? (And also the one in printGraphBuildStatus)? How fast are 2048 ANN searches + HNSW insert on these modern CPUs with massive SIMD silicon and native optimized dot products and such ...

Yes, individual chunks could well complete in sub-millisecond times. Updated both printGraphBuildStatus and the per chunk completion message to use %.2f with nanoTime / 1_000_000.0 for proper sub-millisecond precision. Thanks!

…ing (apache#15967) Signed-off-by: prithvi <prithvisivasankar@gmail.com>

mikemccand

Looks great -- I left a couple trivial comments, then I think this is ready! Can't wait to see what nightly benchy sees as effective HNSW concurrency building the Cohere vector indices...

mikemccand · 2026-04-27T12:05:15Z

-  private long printGraphBuildStatus(int node, long start, long t) {
-    long now = System.nanoTime();
+  private void printGraphBuildStatus(int node, long start) {
+    double elapsedMs = (System.nanoTime() - start) / 1_000_000.0;


Can we rename start -> startNs? I think it's crucial to include units in variable names ... I've seen too many bugs over the years where person 1 thought it was ns and person 2 thought it was us and ...

And of course the Mars Climate Orbiter disaster.

Sure, done. Thanks!

mikemccand · 2026-04-27T12:06:19Z


 Bug Fixes
 ---------------------
+* GITHUB#15967: Fix HNSW InfoStream progress to show elapsed time since merge start instead of


Maybe state that this is an IndexWriter thing? Fix IndexWriter's HSNW InfoStream...?

Yes, makes sense, added. thanks!

Signed-off-by: prithvi <prithvisivasankar@gmail.com>

mikemccand

This looks great -- I just had one nitpick about camelCaseContainingAcronyms consistency. Thanks @iprithv.

mikemccand · 2026-04-28T14:23:53Z

-  private long printGraphBuildStatus(int node, long start, long t) {
-    long now = System.nanoTime();
+  private void printGraphBuildStatus(int node, long startNs) {
+    double elapsedMs = (System.nanoTime() - startNs) / 1_000_000.0;


Hmm, can we be consistent about startNs vs chunkElapsedNS ("treat it like a word" --> chunkedElapsedNs)? Darned camel-casing rules intersecting with acronyms/abbreviations... Wikipedia's CamelCase page has a whole paragraph dedicated to the conundrum.

sure, updated. Thanks!

github-actions Bot added the module:core/hnsw label Apr 23, 2026

github-actions Bot added this to the 11.0.0 milestone Apr 23, 2026

mikemccand reviewed Apr 24, 2026

View reviewed changes

Fix HNSW InfoStream duplicate times and add per-chunk completion logg…

512d5b4

…ing (apache#15967) Signed-off-by: prithvi <prithvisivasankar@gmail.com>

iprithv force-pushed the fix-hnsw-infostream-duplicate-times branch from 4494239 to 512d5b4 Compare April 24, 2026 19:16

Merge branch 'main' into fix-hnsw-infostream-duplicate-times

01fd29e

iprithv requested a review from mikemccand April 24, 2026 19:30

mikemccand reviewed Apr 27, 2026

View reviewed changes

iprithv added 2 commits April 28, 2026 02:19

review changes

3afd11d

Signed-off-by: prithvi <prithvisivasankar@gmail.com>

Merge branch 'main' into fix-hnsw-infostream-duplicate-times

b14a463

iprithv requested a review from mikemccand April 27, 2026 20:52

mikemccand reviewed Apr 28, 2026

View reviewed changes

review changes

de39a78

iprithv requested a review from mikemccand April 28, 2026 15:11

Merge branch 'main' into fix-hnsw-infostream-duplicate-times

4e9d5a5

Conversation

iprithv commented Apr 23, 2026

Description:

Changes

Output comparison

Uh oh!

mikemccand commented Apr 24, 2026

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iprithv Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iprithv Apr 24, 2026 •

edited

Loading