Make BitSet no longer implement Bits.#14996
Conversation
Lucene has call sites of `Bits#get` that are more polymorphic than they should because they sometimes receive a `FixedBitSet` as `Bits`, and sometimes a `FixedBits` (the return type of `FixedBitSet#asReadOnlyBits`). This makes some call sites bimorphic when they could be monomorphic, of polymorphic when they could be bimorphic (e.g. if `SparseFixedBitSet` also get used). Given how polymorphism affects performance, I would like to fix this performance trap. One idea would be to make `BitSet` no longer implement `Bits`, ie. you could no longer pass a `FixedBitSet` when `Bits` are expected, you would need to pass the result of `FixedBitSet#asReadOnlyBits`. This is what this PR does. I had to move/copy some APIs around between `Bits`, `BitSet` and `FixedBitSet` to make it work, but it's not as invasive as I initially worried. The API change is a 11.0 change, but we could backport the bits that reduce polymorphism to 10.x, ie. making sure to pass the result of `FixedBitSet#asReadOnlyBits` rather than the `FixedBitSet` directly when `Bits` are expected.
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
|
@msokolov @benwtrent You may want to take a look since vector search is probably the most performance-critical code that is touched by this change due to
With this PR, dense filters and unfiltered searches both use a |
msokolov
left a comment
There was a problem hiding this comment.
Sounds good! I left a few small comments/questions
| private final CacheHelper readerCacheHelper; | ||
|
|
||
| private SoftDeletesFilterLeafReader(LeafReader reader, FixedBitSet bits, int numDocs) { | ||
| private SoftDeletesFilterLeafReader(LeafReader reader, Bits bits, int numDocs) { |
There was a problem hiding this comment.
curious why we are changing these signatures - it seems the call sites still always use FixedBitSet?
There was a problem hiding this comment.
The Bits are indeed always created bia FixedBitSet#asReadOnlyBits but I liked doing it on the call site better than in this constructor. Since it's a private constructor, it doesn't break users.
| } | ||
| if (length < bitSet.length() | ||
| && bitSet.nextSetBit(Math.max(0, length)) != DocIdSetIterator.NO_MORE_DOCS) { | ||
| throw new IllegalArgumentException("Some bits are set beyond the end of live docs"); |
There was a problem hiding this comment.
this comment refers to "live docs" but the method is generic and could apply to other kinds of bit set applications - maybe change to "beyond the end of the bit set"?
| approximateSearch( | ||
| ctx, acceptDocs.asReadOnlyBits(), cost + 1, timeLimitingKnnCollectorManager); |
There was a problem hiding this comment.
Within the HNSW format, we do this:
if (acceptDocs instanceof BitSet bitSet) {
// Use approximate cardinality as this is good enough, but ensure we don't exceed the graph
// size as that is illogical
filteredDocCount = Math.min(bitSet.approximateCardinality(), graph.size());
if (unfilteredVisit >= filteredDocCount) {
doHnsw = false;
}
}
We determine the filter size via casting the Bits to BitSet. So, I am not sure we can make this change without progressing: #15011
Or bits should give cardinality information...
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
|
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Lucene has call sites of
Bits#getthat are more polymorphic than they should because they sometimes receive aFixedBitSetasBits, and sometimes aFixedBits(the return type ofFixedBitSet#asReadOnlyBits). This makes some call sites bimorphic when they could be monomorphic, of polymorphic when they could be bimorphic (e.g. ifSparseFixedBitSetalso gets used).Given how polymorphism affects performance, I would like to fix this performance trap. One idea would be to make
BitSetno longer implementBits, ie. you could no longer pass aFixedBitSetwhenBitsare expected, you would need to pass the result ofFixedBitSet#asReadOnlyBits. This is what this PR does.I had to move/copy some APIs around between
Bits,BitSetandFixedBitSetto make it work, but it's not as invasive as I initially worried.The API change is a 11.0 change, but we could backport the bits that reduce polymorphism to 10.x, ie. making sure to pass the result of
FixedBitSet#asReadOnlyBitsrather than theFixedBitSetdirectly whenBitsare expected.