xls: fix issue with incorrect SST record#549
Conversation
450c21f to
bfef628
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an issue in Excel xls file parsing where the SST (Shared String Table) record contains an incorrect unique string count, causing parsing to fail or miss strings.
- Modified the SST parsing logic to read all available strings instead of relying on the declared count
- Updated string parsing conditions to handle continuation records properly
- Added test coverage for the specific issue with incorrect SST unique count
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/xls.rs | Updates SST parsing to read all strings until data is exhausted rather than trusting the count field |
| tests/test.rs | Adds test case to verify parsing of files with incorrect SST unique string counts |
| Changelog.md | Documents the bug fix for issue #548 |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
bfef628 to
e1c63e0
Compare
|
@sftse If you get a chance could you review this Xls fix. |
affb801 to
aeef06d
Compare
|
documentation point: the breakage is in a .xls file, not xlsx. should the user be made aware of the error in the xls file? I think that they should, but I don't know how calamine would do that (I'm almost totally ignorant of the package). |
|
If the Will try to give a more thorough review within a day or two. |
aeef06d to
189e02b
Compare
Thanks. Fixed.
In this particular case I don't think the information is useful to most end-users and unless they wrote the xls exporting software there isn't anything that they can do about it. If you know what software produced these files we could raise the issue upstream. |
Agreed. I removed that and rebased. There are a lot of other similar instances in the code base. There are probably very few that have a measurable benefit to performance and some could led to crashes with deliberate or accidentally incorrect values in the parse file.
Thanks. |
I've not yet bottomed out how this file was created. It comes from UK NHS. My concern is that such orgs have poor processes for keeping s/w current (or the export formats, for that matter: still on .xls), and anything that makes such mistakes more visible would reduce the risks from not updating often enough. What's my fastest route for validating the fix against other files - build the dev environment, or wait for the PR to flow thro' ? |
It depends on how urgently you need it. The fix should be released over the weekend at which point If you want to test it in the interim you can modify the You will then need to rebuild the python module and install it. This will be slightly tricky if you haven't done it before. |
This commit fixes an issue in the Excel xls SST record (Shared String Table) when the unique string count is incorrect. Fixes tafia#548
189e02b to
c243386
Compare
|
Indeed. It failed on |
|
Merged so that the fix can be released to unblock downstream. |
This commit fixes an issue in the Excel xls SST record (Shared String Table) when the unique string count is incorrect.
Fixes #548