Fix undercounting of RAM used by vectors buffered in in-memory segments by iprithv · Pull Request #15982 · apache/lucene

iprithv · 2026-04-24T21:01:19Z

Description

Vector RAM accounting in ramBytesUsed() had three bugs causing IndexWriter to undercount memory usage for buffered vectors, leading to delayed flush decisions and higher than expected memory consumption.

Bugs Fixed

Fixes #15901

1. BufferingKnnVectorsWriter hardcoded Float.BYTES for all encodings

Byte vectors (VectorEncoding.BYTE) were reported as 4x their actual size because ramBytesUsed() always multiplied by Float.BYTES (4) instead of Byte.BYTES (1). This is technically an overcount for byte vectors, but it's wrong in the opposite direction, it masks the undercounting elsewhere and produces incorrect flush thresholds.

2. Quantized writers never counted rawVectorDelegate RAM

Lucene104ScalarQuantizedVectorsWriter, Lucene99ScalarQuantizedVectorsWriter, and Lucene102BinaryQuantizedVectorsWriter all wrap a rawVectorDelegate (Lucene99FlatVectorsWriter). For FLOAT32 fields, the delegate's field-level data was counted indirectly through FieldWriter.flatFieldVectorsWriter.ramBytesUsed(). But for BYTE fields, which bypass the quantized FieldWriter entirely, the delegate was never queried, making byte vector RAM completely invisible (48 bytes reported for hundreds of KB of actual data).

Refactored all three writers to call rawVectorDelegate.ramBytesUsed() at the writer level for all flat vector data, and quantizationOverheadBytesUsed() for quantization-specific state (magnitudes, dimensionSums) to avoid double-counting.

3. dimensionSums array not counted

The float[dimension] array used for centroid calculation during flush was not included in ramBytesUsed() for Lucene104ScalarQuantizedVectorsWriter and Lucene102BinaryQuantizedVectorsWriter.

…ts (apache#15901)

iprithv · 2026-04-28T15:15:26Z

@mikemccand could you please take a look at this when you get a chance? Thanks!

github-actions Bot added module:core/codecs module:test-framework labels Apr 24, 2026

github-actions Bot added this to the 11.0.0 milestone Apr 24, 2026

Fix undercounting of RAM used by vectors buffered in in-memory segmen…

918fa44

…ts (apache#15901)

iprithv force-pushed the fix/vector-ram-accounting-undercount branch from f97d7cf to 918fa44 Compare April 24, 2026 21:11

Merge branch 'main' into fix/vector-ram-accounting-undercount

d87ecfa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix undercounting of RAM used by vectors buffered in in-memory segments#15982

Fix undercounting of RAM used by vectors buffered in in-memory segments#15982
iprithv wants to merge 2 commits intoapache:mainfrom
iprithv:fix/vector-ram-accounting-undercount

iprithv commented Apr 24, 2026

Uh oh!

iprithv commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iprithv commented Apr 24, 2026

Description

Bugs Fixed

Uh oh!

iprithv commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant