perf: implement bitpack encoding for LID and MID blocks by cheb0 · Pull Request #328 · ozontech/seq-db

cheb0 · 2026-01-26T06:17:58Z

Description

Replaces varint encoding with faster delta bitpacking. Both LID and MID blocks now use bitpack. Currently, intcomp library is used. The lib doesn't utilize SIMD, so we might update to something else in future.

Measurements

compression
bitpack compresses a lot better: for varints zstd compresses with ratio ~1.7-2.0 while it only compresses delta bitpacked data with ratio ~1.3. Therefore we potentially can disable zstd on benchmarks with a slight dataset size overhead.

dataset size
Overall, we reach approximately same dataset size. For some envs there is a small benefit of around -3% of total dataset.

Search latency (stg fractions repacked with bitpack)

Query	Type	Ids	Total	cold, ms	hot, ms	cold (branch), ms	hot (branch), ms	cold diff	hot diff
service:W AND message:"A B C D" (found)	reg	100	0	5.68	0.13	5.29	0.08	-6.9%	-38.5%
service:W AND message:"A B C D" (not found)	reg	0	0	37.27	0.53	16.19	0.55	-56.6%	3.8%
service:X AND level:3 AND NOT (x6)	reg	0	0	66.21	2.13	32.39	2.02	-51.1%	-5.2%
service:X AND level:[0 to 4]	hist	100	1617439	185.71	50.86	99.57	42.61	-46.4%	-16.2%
zone:Z AND service:X AND level:4 AND job_id:J AND message:"X Y Z"	hist	100	1791	76.59	5.13	32.4	4.81	-57.7%	-6.2%
k8s_container:C AND k8s_namespace:N AND service:X AND level:4 AND source:S	hist	100	11798	93.15	18.97	48.08	18	-48.4%	-5.1%
service:W AND message:"X Y Z A" AND logger:L AND source:S	hist	100	260	68.24	2.52	34.45	1.51	-49.5%	-40.1%
service:W \| group by k8s_pod	agg	100	170764	259.34	188.19	231.81	190	-10.6%	1%
service:O \| group by k8s_pod	agg	100	1619624	273.8	199.66	235.28	201.4	-14.1%	0.9%
service:W \| group by k8s_pod (timeseries)	agg	100	170764	467.04	274.9	405.11	268.39	-13.3%	-2.4%

Fixes #312

I have read and followed all requirements in CONTRIBUTING.md;

cheb0 · 2026-03-12T10:02:52Z

@seqbenchbot up main bulk

seqbenchbot · 2026-03-12T10:02:55Z

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - dc2d4d40.

Here is a list of helpful links:

Take a look at Grafana dashboard;
Live-tailing logs are also available;

Have a great time!

github-actions · 2026-03-12T10:10:32Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`AggDeep/size=10000-4`	`03387c`	`b4fae3`
	`45145.00 ns/op`	`52104.00 ns/op`	`1.15`	`🔴`
`AndTree/size=1000-4`	`03387c`	`b4fae3`
	`4.26 ns/op`	`4.85 ns/op`	`1.14`	`🔴`
`Block_Pack-4`	`------`	`b4fae3`
	`NaN B/op`	`27344.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`4.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`82987.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`b4fae3`
	`NaN B/op`	`262186.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`80490.00 ns/op`	`NaN`	`🔴`
`FindSequence_Random/small-4`	`03387c`	`b4fae3`
	`6427.67 MB/s`	`5452.96 MB/s`	`0.85`	`🔴`
	`40.85 ns/op`	`46.95 ns/op`	`1.15`	`🔴`

cheb0 · 2026-03-12T10:12:28Z

@seqbenchbot down dc2d4d40

seqbenchbot · 2026-03-12T10:12:31Z

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator dc2d4d40 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary

Query	Type	`mean (ms)`			`stddev (ms)`			`p(50) (ms)`			`p(95) (ms)`			`p(99) (ms)`			`iterations`
Query	Type	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff	base	comp	diff
`bulk`	warm	`60.99`	`60.79`	`-0.33%`	`23.17`	`22.77`	`-1.74%`	`54.00`	`54.00`	`0.00%`	`107.00`	`107.00`	`0.00%`	`146.00`	`145.00`	`-0.68%`	`9694.00`	`9652.00`	`-0.43%`

Have a great time!

github-actions · 2026-03-12T10:53:28Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`035be1`
	`NaN B/op`	`27344.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`4.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`72861.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`035be1`
	`NaN B/op`	`262183.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`73565.00 ns/op`	`NaN`	`🔴`
`Indexer-4`	`03387c`	`035be1`
	`721669786.00 B/op`	`805572232.00 B/op`	`1.12`	`🔴`

github-actions · 2026-03-12T12:14:31Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`AggDeep/size=10000-4`	`03387c`	`b3e5ee`
	`45145.00 ns/op`	`51940.00 ns/op`	`1.15`	`🔴`
`AggDeep/size=1000000-4`	`03387c`	`b3e5ee`
	`4554859.00 ns/op`	`5474331.00 ns/op`	`1.20`	`🔴`
`And/size=1000-4`	`03387c`	`b3e5ee`
	`4.29 ns/op`	`4.83 ns/op`	`1.13`	`🔴`
`Block_Pack-4`	`------`	`b3e5ee`
	`NaN B/op`	`27344.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`4.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`78396.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`b3e5ee`
	`NaN B/op`	`262177.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`86552.00 ns/op`	`NaN`	`🔴`
`FindSequence_Random/small-4`	`03387c`	`b3e5ee`
	`6427.67 MB/s`	`5668.64 MB/s`	`0.88`	`🔴`
`Indexer-4`	`03387c`	`b3e5ee`
	`721669786.00 B/op`	`805627342.00 B/op`	`1.12`	`🔴`

codecov-commenter · 2026-03-24T05:59:05Z

Codecov Report

❌ Patch coverage is 78.65854% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.28%. Comparing base (124d460) to head (f3827f1).

Files with missing lines	Patch %	Lines
packer/delta_bitpacker.go	82.79%	8 Missing and 8 partials ⚠️
frac/sealed/lids/block.go	71.42%	4 Missing and 2 partials ⚠️
frac/sealed/lids/loader.go	77.27%	5 Missing ⚠️
frac/sealed/seqids/blocks.go	64.28%	4 Missing and 1 partial ⚠️
cmd/index_analyzer/main.go	0.00%	2 Missing ⚠️
frac/sealed/seqids/loader.go	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
- Coverage   71.40%   71.28%   -0.12%     
==========================================
  Files         219      220       +1     
  Lines       16454    16585     +131     
==========================================
+ Hits        11749    11823      +74     
- Misses       3834     3884      +50     
- Partials      871      878       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-03-24T06:03:40Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`6963b4`
	`NaN B/op`	`27344.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`4.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`84222.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`6963b4`
	`NaN B/op`	`262170.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`78137.00 ns/op`	`NaN`	`🔴`

dkharms · 2026-03-24T13:57:39Z

+
+// CastAsUint64 allows working on []byte slice as []uint64. Uses unsafe casts for little endian, allocates
+// a new buf and copies on big endian hosts. The caller must work as read only.
+func CastAsUint64(buf []byte) []uint64 {


I guess it is worth putting a check that (n % sizeOfUint64) == 0. Just in case.

dkharms · 2026-03-24T14:04:15Z

+	}
+
+	if littleEndian {
+		return unsafe.Slice((*uint64)(unsafe.Pointer(unsafe.SliceData(buf))), n)


Are there any potential issues with alignment? AFAIK, there is no alignment guarantees that are provided by the Golang memory allocator.

For x86-64 it is not a problem (well, performance-wise it is), but for other architectures it is.

Maybe, we should add another check, something like:

assert(uintptr(unsafe.Pointer(&buf[0])) % unsafe.Alignof(uint64(0)) == 0)

I deleted the entire file. I think it's too early for those optimizations. Maybe will do in a separate PR.

dkharms · 2026-03-30T10:44:04Z

+
+func (b *Block) unpackBitpack(data []byte, buf *UnpackBuffer) error {
+	if data[0] == 1 {
+		b.IsLastLID = true


IIRC IsLastLID only affects packing and unpacking -- specifically, when we build slice of offsets for chunks of lids. This field in not accessed within search queries at all.

So maybe we can omit it? You store offsets as a part if LID block now so basically this field is useless.

It's used by index analyzer cmd file. Probably deletion a separate PR would be fine, now it's just easier for me to avoid thinking about those fields.

Deleted it from block format. We can rely on MinTID/MaxTID in index_analyzer, so we can delete this field altogether after index split is merged.

dkharms · 2026-03-31T07:47:54Z

+}
+
+// CopyUints32 copies srt to dst byte slice. If host is little-endian, then uses direct memory copy instead of loop.
+func CopyUints32(src []uint32, dst []byte) []byte {


Have you measured impact of reinterpret cast for such functions?

github-actions · 2026-04-10T09:10:26Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`4f96b8`
	`NaN B/op`	`113400.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`19.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`109098.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`4f96b8`
	`NaN B/op`	`262193.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`89462.00 ns/op`	`NaN`	`🔴`

github-actions · 2026-04-15T12:49:18Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`292ee9`
	`NaN B/op`	`113400.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`19.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`108668.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`292ee9`
	`NaN B/op`	`262194.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`90967.00 ns/op`	`NaN`	`🔴`

github-actions · 2026-04-21T10:21:38Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`65fb06`
	`NaN B/op`	`113400.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`19.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`106561.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`65fb06`
	`NaN B/op`	`262192.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`87751.00 ns/op`	`NaN`	`🔴`

github-actions · 2026-04-21T10:37:02Z

🔴 Performance Degradation

Some benchmarks have degraded compared to the previous run.
Click on Show table button to see full list of degraded benchmarks.

Show table

Name	Previous	Current	Ratio	Verdict
`Block_Pack-4`	`------`	`b8671d`
	`NaN B/op`	`113401.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`19.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`107155.00 ns/op`	`NaN`	`🔴`
`Block_Unpack-4`	`------`	`b8671d`
	`NaN B/op`	`262192.00 B/op`	`NaN`	`🔴`
	`NaN allocs/op`	`2.00 allocs/op`	`NaN`	`🔴`
	`NaN ns/op`	`87578.00 ns/op`	`NaN`	`🔴`

bitpack encoding for LID and MID blocks

d73e08a

ozontech deleted a comment from seqbenchbot Feb 6, 2026

ozontech deleted a comment from github-actions Bot Feb 6, 2026

ozontech deleted a comment from seqbenchbot Feb 6, 2026

dkharms added the performance Features or improvements that positively affect seq-db performance label Feb 10, 2026

ozontech deleted a comment from seqbenchbot Feb 13, 2026

cheb0 changed the title ~~perf: bitpack encoding for LID and MID blocks (draft, work in progress)~~ perf: implement bitpack encoding for LID and MID blocks (draft, work in progress) Mar 10, 2026

cheb0 added 3 commits March 10, 2026 14:57

unify bitpack files

fcc9d26

Merge branch 'main' into 312-bitpack-encoding

5e22f88

simplify intcomp usage, refine tests

53ef5eb

cheb0 force-pushed the 312-bitpack-encoding branch from 3accfd6 to 53ef5eb Compare March 11, 2026 10:27

cheb0 changed the title ~~perf: implement bitpack encoding for LID and MID blocks (draft, work in progress)~~ perf: implement bitpack encoding for LID and MID blocks Mar 11, 2026

ozontech deleted a comment from github-actions Bot Mar 12, 2026

ozontech deleted a comment from seqbenchbot Mar 12, 2026

ozontech deleted a comment from github-actions Bot Mar 12, 2026

small fixes

b04a41d

cheb0 force-pushed the 312-bitpack-encoding branch from bf8f3d8 to 985f11f Compare March 12, 2026 10:45

less copying for little endian hosts

22a3e36

fixes

4c802d2

cheb0 force-pushed the 312-bitpack-encoding branch from 985f11f to 4c802d2 Compare March 12, 2026 12:06

ozontech deleted a comment from seqbenchbot Mar 12, 2026

ozontech deleted a comment from codecov-commenter Mar 12, 2026

ozontech deleted a comment from seqbenchbot Mar 13, 2026

cheb0 marked this pull request as ready for review March 13, 2026 09:27

eguguchkin requested review from dkharms and forshev March 23, 2026 10:47

Merge branch 'main' into 312-bitpack-encoding

97f2ad7

dkharms reviewed Mar 30, 2026

View reviewed changes

dkharms reviewed Mar 31, 2026

View reviewed changes

forshev approved these changes Mar 31, 2026

View reviewed changes

cheb0 added 2 commits April 10, 2026 12:02

PR review: delete premature optimizations

812dbac

Merge branch 'main' into 312-bitpack-encoding

e60aabc

cheb0 requested a review from dkharms April 10, 2026 09:08

eguguchkin modified the milestones: v0.72.0, v0.71.0 Apr 13, 2026

Merge branch 'main' into 312-bitpack-encoding

757588c

eguguchkin removed this from the v0.71.0 milestone Apr 20, 2026

remove IsLastLID from storage format

f3827f1

cheb0 force-pushed the 312-bitpack-encoding branch from 3a567c4 to f3827f1 Compare April 21, 2026 10:28

Conversation

cheb0 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Measurements

Uh oh!

cheb0 commented Mar 12, 2026

Uh oh!

seqbenchbot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 12, 2026

🔴 Performance Degradation

Uh oh!

cheb0 commented Mar 12, 2026

Uh oh!

seqbenchbot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 12, 2026

🔴 Performance Degradation

Uh oh!

github-actions Bot commented Mar 12, 2026

🔴 Performance Degradation

Uh oh!

codecov-commenter commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Mar 24, 2026

🔴 Performance Degradation

Uh oh!

dkharms Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

dkharms Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

cheb0 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

dkharms Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

cheb0 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

cheb0 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dkharms Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 10, 2026

🔴 Performance Degradation

Uh oh!

github-actions Bot commented Apr 15, 2026

🔴 Performance Degradation

Uh oh!

github-actions Bot commented Apr 21, 2026

🔴 Performance Degradation

Uh oh!

github-actions Bot commented Apr 21, 2026

🔴 Performance Degradation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cheb0 commented Jan 26, 2026 •

edited

Loading

seqbenchbot commented Mar 12, 2026 •

edited

Loading

seqbenchbot commented Mar 12, 2026 •

edited

Loading

codecov-commenter commented Mar 24, 2026 •

edited

Loading