Skip to content

Ensure indirect function calls have the correct type.#1994

Merged
jkbonfield merged 1 commit intosamtools:developfrom
daviesrob:function-pointers
Apr 13, 2026
Merged

Ensure indirect function calls have the correct type.#1994
jkbonfield merged 1 commit intosamtools:developfrom
daviesrob:function-pointers

Conversation

@daviesrob
Copy link
Copy Markdown
Member

Some HTSlib interfaces (notably kgetline(), kgetline2(), hts_itr_querys() and hts_parse_region()) have function callbacks that are intended to be generic, so the function signatures include a void * for data to be passed in. E.g. for kgetline() this would be the FILE * or hFILE * to read. Historically they have often been called with a function that has an explicit type (e.g. FILE * for fgets() used with kgetline) on the assumption that casting it to void * is harmless. Unfortunately doing this is strictly undefined behaviour and will fail in some conditions - for example if compiling with control flow integrity turned on. Recent versions of clang will also call this out when using undefined behaviour sanitizer. The solution is to create wrapper functions that do take a void *, and call the intended function with suitable casts.

Some new interfaces are also added so that the messy implementation details can be hidden away, and the inputs can be better type checked. Hopefully this should also allow the wrapper functions to be inlined away.

  • hget_kstr() to read lines from an hFILE * to a kstring.

  • kgetline() to read lines from a FILE * to a kstring.

  • tbx_itr_querys1() to create a tabix iterator on a region. Used to reimplement the existing tbx_itr_querys() macro.

  • bcf_itr_querys1() to create a region iterator an a bcf file Used to reimplement the existing bcf_itr_querys() macro.

Copy link
Copy Markdown
Contributor

@jkbonfield jkbonfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK and I can confirm the errors when built using clang21 -fsanitize=kcfi go away with this PR. (I'm assuming there are samtools and bcftools updates coming too.)

My only query is on the function naming.

Note: you also have a typo in your commit message.

"kgetline() to read lines from a FILE * to a kstring" should read
"kfgetline() to read lines from a FILE * to a kstring"

Comment thread htslib/hfile.h Outdated
EOF is returned at end of file or on error.
*/
HTSLIB_EXPORT
int hget_kstr(struct kstring_t *kstr, hFILE *fp) HTS_RESULT_USED;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hget_kstr and not khgetline?

We have a new kfgetline that works via fgets, so khgetline using hgets seems like the more memorable function name. Either that or rename 'kfgetline' to fget_kstr, but the getline ones seem more appropriate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name was a modification of hgets() above, which itself derives from fgets(). kgetline() of course derived from the POSIX getline() function. I guess I could make it khgetline(), although having two functions that differ by one letter could be its own annoyance...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the return value better matches that for getline(), so I'll change it.

Some HTSlib interfaces (notably kgetline(), kgetline2(),
hts_itr_querys() and hts_parse_region()) have function
callbacks that are intended to be generic, so the
function signatures include a `void *` for data to
be passed in.  E.g. for kgetline() this would be the
FILE * or hFILE * to read.  Historically they have
often been called with a function that has an explicit
type (e.g. FILE * for fgets() used with kgetline)
on the assumption that casting it to void * is harmless.
Unfortunately doing this is strictly undefined behaviour
and will fail in some conditions - for example if
compiling with control flow integrity turned on.  Recent
versions of clang will also call this out when using
undefined behaviour sanitizer.  The solution is to create
wrapper functions that do take a void *, and call the
intended function with suitable casts.

Some new interfaces are also added so that the
messy implementation details can be hidden away,
and the inputs can be better type checked. Hopefully
this should also allow the wrapper functions to be
inlined away.

 * khgetline() to read lines from an hFILE * to a kstring.

 * kfgetline() to read lines from a FILE * to a kstring.

 * tbx_itr_querys1() to create a tabix iterator on a region.
   Used to reimplement the existing tbx_itr_querys() macro.

 * bcf_itr_querys1() to create a region iterator an a bcf file
   Used	to reimplement the existing bcf_itr_querys() macro.
@daviesrob
Copy link
Copy Markdown
Member Author

hget_kstr() has been renamed as khgetline() and the commit message fixed.

@jkbonfield jkbonfield merged commit cb88c90 into samtools:develop Apr 13, 2026
9 checks passed
@daviesrob daviesrob deleted the function-pointers branch April 14, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants