Skip to content

Fix LTF8 9-byte write bug: wrong bit shift (>> 28 instead of >> 24)#1765

Merged
tfenne merged 2 commits intomasterfrom
tf_fix_ltf8_write_bug
Apr 1, 2026
Merged

Fix LTF8 9-byte write bug: wrong bit shift (>> 28 instead of >> 24)#1765
tfenne merged 2 commits intomasterfrom
tf_fix_ltf8_write_bug

Conversation

@tfenne
Copy link
Copy Markdown
Member

@tfenne tfenne commented Mar 31, 2026

The writeUnsignedLTF8(long, OutputStream) method incorrectly shifts by 28 bits instead of 24 bits when encoding byte 5 of a 9-byte LTF8 value. This causes data corruption for values requiring the full 9-byte encoding (values >= 2^56 with certain bit patterns).

The 9-byte encoding should write bytes at shifts:
56, 48, 40, 32, 24, 16, 8, 0
but the code wrote:
56, 48, 40, 32, 28, 16, 8, 0

The corresponding read path (readUnsignedLTF8) correctly uses << 24, so the reader and writer disagreed for affected values.

The correct behavior is confirmed by htslib's reference implementation:
https://github.com/samtools/htslib/blob/master/cram/cram_io.c#L364-L373

Existing tests did not catch this because their 9-byte test values (-4757, Long.MAX_VALUE, -1) have identical nibbles at bit positions 28 and 24, masking the bug. A targeted test with value 0x0123456789ABCDEF is added to exercise the difference.

Description

Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?

Things to think about before submitting:

  • Make sure your changes compile and new tests pass locally.
  • Add new tests or update existing ones:
    • A bug fix should include a test that previously would have failed and passes now.
    • New features should come with new tests that exercise and validate the new functionality.
  • Extended the README / documentation, if necessary
  • Check your code style.
  • Write a clear commit title and message
    • The commit message should describe what changed and is targeted at htsjdk developers
    • Breaking changes should be mentioned in the commit message.

The writeUnsignedLTF8(long, OutputStream) method incorrectly shifts by
28 bits instead of 24 bits when encoding byte 5 of a 9-byte LTF8 value.
This causes data corruption for values requiring the full 9-byte encoding
(values >= 2^56 with certain bit patterns).

The 9-byte encoding should write bytes at shifts:
  56, 48, 40, 32, 24, 16, 8, 0
but the code wrote:
  56, 48, 40, 32, 28, 16, 8, 0

The corresponding read path (readUnsignedLTF8) correctly uses << 24,
so the reader and writer disagreed for affected values.

The correct behavior is confirmed by htslib's reference implementation:
  https://github.com/samtools/htslib/blob/master/cram/cram_io.c#L364-L373

Existing tests did not catch this because their 9-byte test values
(-4757, Long.MAX_VALUE, -1) have identical nibbles at bit positions
28 and 24, masking the bug. A targeted test with value 0x0123456789ABCDEF
is added to exercise the difference.
@tfenne tfenne requested a review from yfarjoun March 31, 2026 13:56
Copy link
Copy Markdown
Contributor

@yfarjoun yfarjoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking for a test without a leading zero

Comment thread src/test/java/htsjdk/samtools/cram/io/LTF8Test.java Outdated
@tfenne tfenne merged commit 79926f9 into master Apr 1, 2026
3 checks passed
@tfenne tfenne deleted the tf_fix_ltf8_write_bug branch April 1, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants