Use FixedBitSet#cardinality for counting liveDocs in CheckIndex#15045
Use FixedBitSet#cardinality for counting liveDocs in CheckIndex#15045easyice merged 4 commits intoapache:mainfrom
Conversation
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
jpountz
left a comment
There was a problem hiding this comment.
FYI I have an in-progress PR that would break this optimization: #14996. Furthermore, in the typical case, live docs are not as instance of FixedBitSet but of FixedBits (the result of FixedBitSet#asReadOnlyBits) so I don't think it would help much?
|
Thanks for the review Adrien, sorry for not making it clear, this change also use I had considered another approach: placing the new |
jpountz
left a comment
There was a problem hiding this comment.
Thanks for explaining. I'm not not too fond of the approach, it looks like you'd really want to add int Bits#cardinality(), but also don't want to add it to keep Bits lean (which I appreciate). But it looks a bit odd to me.
If we'd like to speed these things up, maybe we should allocate a FixedBitSet(1024), copy the content of the Bits into this FixedBitSet using applyMask and then call cardinality() on the FixedBitSet?
|
It s a nice idea! although it requires allocating an Here are some JMH numbers:
Code@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Fork(1)
public class FixedBitSetBenchmark {
@Param({"1024"})
private int size;
@Param({"0.5"})
private float density; // the percentage of 1 in the bitset
private Bits bits; // FixedBitSet#asReadOnlyBits
Bits fallbackBits; // will not use FixedBitSet
@Setup(Level.Trial)
public void setup() {
FixedBitSet bitSet = new FixedBitSet(size);
int numSet = (int) (size * density);
if (numSet == size) {
bitSet.set(0, size);
} else if (numSet > 0) {
Random random = new Random(0);
for (int i = 0; i < numSet; i++) {
bitSet.set(random.nextInt(size - 1));
}
}
bits = bitSet.asReadOnlyBits();
fallbackBits =
new Bits() {
@Override
public boolean get(int index) {
return index % 2 == 0;
}
@Override
public int length() {
return size;
}
};
}
@Benchmark
public void countWithCardinality(Blackhole bh) {
int count = 0;
FixedBitSet bitSet = new FixedBitSet(size);
bitSet.set(0, size);
bits.applyMask(bitSet, 0);
count = bitSet.cardinality();
bh.consume(count);
}
@Benchmark
public void countWithFixedBitSetGet(Blackhole bh) {
int count = 0;
for (int i = 0; i < bits.length(); i++) {
if (bits.get(i)) {
count++;
}
}
bh.consume(count);
}
@Benchmark
public void countWithFallbackGet(Blackhole bh) {
int count = 0;
for (int i = 0; i < fallbackBits.length(); i++) {
if (fallbackBits.get(i)) {
count++;
}
}
bh.consume(count);
}
} |
|
We don't actually need to allocate a FixedBitSet of size maxDoc, we could copy slices of 1024 bits into a FixedBitSet(1024) to do the counting? |
|
No problem. I will update it. |
|
The new approach is similar to #14998, so I reused part of the code. The changes touch The optimization is now applied only to |
This uses
FixedBitSet#cardinalityto speed up counting liveDocs in CheckIndex and some assert implementations, instead of checking bits one by one.