Fix LTF8 9-byte write bug: wrong bit shift (>> 28 instead of >> 24) by tfenne · Pull Request #1765 · samtools/htsjdk

tfenne · 2026-03-31T13:56:32Z

The writeUnsignedLTF8(long, OutputStream) method incorrectly shifts by 28 bits instead of 24 bits when encoding byte 5 of a 9-byte LTF8 value. This causes data corruption for values requiring the full 9-byte encoding (values >= 2^56 with certain bit patterns).

The 9-byte encoding should write bytes at shifts:
56, 48, 40, 32, 24, 16, 8, 0
but the code wrote:
56, 48, 40, 32, 28, 16, 8, 0

The corresponding read path (readUnsignedLTF8) correctly uses << 24, so the reader and writer disagreed for affected values.

The correct behavior is confirmed by htslib's reference implementation:
https://github.com/samtools/htslib/blob/master/cram/cram_io.c#L364-L373

Existing tests did not catch this because their 9-byte test values (-4757, Long.MAX_VALUE, -1) have identical nibbles at bit positions 28 and 24, masking the bug. A targeted test with value 0x0123456789ABCDEF is added to exercise the difference.

Description

Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?

Things to think about before submitting:

Make sure your changes compile and new tests pass locally.
Add new tests or update existing ones:
- A bug fix should include a test that previously would have failed and passes now.
- New features should come with new tests that exercise and validate the new functionality.
Extended the README / documentation, if necessary
Check your code style.
Write a clear commit title and message
- The commit message should describe what changed and is targeted at htsjdk developers
- Breaking changes should be mentioned in the commit message.

The writeUnsignedLTF8(long, OutputStream) method incorrectly shifts by 28 bits instead of 24 bits when encoding byte 5 of a 9-byte LTF8 value. This causes data corruption for values requiring the full 9-byte encoding (values >= 2^56 with certain bit patterns). The 9-byte encoding should write bytes at shifts: 56, 48, 40, 32, 24, 16, 8, 0 but the code wrote: 56, 48, 40, 32, 28, 16, 8, 0 The corresponding read path (readUnsignedLTF8) correctly uses << 24, so the reader and writer disagreed for affected values. The correct behavior is confirmed by htslib's reference implementation: https://github.com/samtools/htslib/blob/master/cram/cram_io.c#L364-L373 Existing tests did not catch this because their 9-byte test values (-4757, Long.MAX_VALUE, -1) have identical nibbles at bit positions 28 and 24, masking the bug. A targeted test with value 0x0123456789ABCDEF is added to exercise the difference.

yfarjoun

asking for a test without a leading zero

tfenne requested a review from yfarjoun March 31, 2026 13:56

yfarjoun requested changes Mar 31, 2026

View reviewed changes

Comment thread src/test/java/htsjdk/samtools/cram/io/LTF8Test.java Outdated

Added test per reviewer request.

ae38144

tfenne merged commit 79926f9 into master Apr 1, 2026
3 checks passed

tfenne deleted the tf_fix_ltf8_write_bug branch April 1, 2026 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LTF8 9-byte write bug: wrong bit shift (>> 28 instead of >> 24)#1765

Fix LTF8 9-byte write bug: wrong bit shift (>> 28 instead of >> 24)#1765
tfenne merged 2 commits intomasterfrom
tf_fix_ltf8_write_bug

tfenne commented Mar 31, 2026

Uh oh!

yfarjoun left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tfenne commented Mar 31, 2026

Description

Things to think about before submitting:

Uh oh!

yfarjoun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants