xet-data: drop sha2/asm to fix v1-resolver Windows breakage#833
Closed
assafvayner wants to merge 1 commit into
Closed
xet-data: drop sha2/asm to fix v1-resolver Windows breakage#833assafvayner wants to merge 1 commit into
assafvayner wants to merge 1 commit into
Conversation
`sha2-asm` is explicitly unsupported on Windows (its `lib.rs` has a `compile_error!` on `cfg(target_os = "windows")` and its build.rs feeds GAS-syntax `.S` files to MSVC, which does not understand them). `xet_data/Cargo.toml` previously declared `sha2/asm` only on `cfg(not(target_os = "windows"))`. That looks correct, but Cargo's v1 feature resolver (used by crates on edition < 2021 with no explicit `resolver = "2"`) does not respect target-conditional `cfg(...)` gates around feature activations: it unifies features globally across the resolve graph, so the `asm` feature still gets enabled on Windows for downstream consumers using v1. The result: any v1-resolver crate depending transitively on `xet-data` (e.g. `tokenizers` via `hf-hub` -> `hf-xet`) fails to build on `windows-msvc` with: LINK : fatal error LNK1181: cannot open input file '...sha512_x64.o' We can't fix this with a target-conditional gate; the only options are to drop the `sha2/asm` activation or expose it as an opt-in feature. Drop it. `sha2 0.10` already runtime-detects SHA-NI on x86/x86_64 and ARM Crypto Extensions on aarch64 via the `cpufeatures` crate (unconditional dep, independent of the `asm` feature), which is the dominant fast path on modern hardware. The `sha2-asm` fallback only helps on older x86 CPUs without SHA-NI, where the gain is small. The loss is acceptable; the gain is fixing Windows for all downstream consumers regardless of resolver version.
Contributor
Author
|
Closing — wrong layer for the fix. The target-conditional gate here is correct under Cargo's v2 feature resolver; the leak only happens for v1-resolver consumers (edition < 2021 with no |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sha2-asmis explicitly unsupported on Windows: itslib.rshas acompile_error!oncfg(target_os = "windows"), and itsbuild.rsfeeds GAS-syntax.Sfiles to the C compiler — onwindows-msvccl.exedoesn't understand them, so the link step fails with:xet_data/Cargo.tomlalready declaredsha2/asmonly undercfg(not(target_os = "windows"))(introduced in #693), which looks correct on its face. But Cargo's v1 feature resolver does not respect target-conditionalcfg(...)gates around feature activations — it unifies features globally across the resolve graph regardless of which target's deps are active. Crates on edition < 2021 with no explicitresolver = "2"(in their package or workspaceCargo.toml) use v1 by default.The result: any v1-resolver crate transitively depending on
xet-data(throughhf-xet→hf-hub, etc.) still getssha2/asmactivated on Windows, and the build fails.This was caught while bumping
tokenizers(edition 2018, noresolver) tohf-hub 1.0.0-rc.1— its windows-latest CI now fails onsha2-asmlinking. Seetokenizers#2047build (windows-latest).There is no way to fix the leak with a target-conditional gate — that's exactly what v1 ignores. Options are: drop the activation, or expose
sha2/asmas an opt-in feature. This PR drops it.The perf cost is small in practice:
sha2 0.10already runtime-detects SHA-NI on x86/x86_64 (and ARM Crypto Extensions on aarch64) viacpufeatures, which is an unconditional dep independent of theasmfeature. SHA-NI is by far the dominant fast path on modern hardware. Thesha2-asmfallback only helps on older x86 CPUs without SHA-NI, where the gap is modest.The leftover wasm-only
features = ["asm"]inwasm/hf_xet_wasm/Cargo.tomlis left alone; it isn't reachable from the published crates' dep graphs (target_arch = wasm32, and that crate isn't a transitive dep ofhf-xet).Test plan
cargo check -p xet-datapasses locally.cargo tree -p xet-data --invert sha2-asmreturns "did not match any packages" (i.e.sha2-asmis no longer in the tree).tokenizers#2047build (windows-latest)against anhf-xetbuild that pulls this xet-data — expected to clear the linker error.Note
Low Risk
Low risk: dependency/feature-flag change only; main impact is potentially reduced SHA-2 performance on older non-SHA-NI CPUs while improving Windows compatibility for downstream crates.
Overview
Removes target-conditional activation of
sha2'sasmfeature inxet_data/Cargo.tomland replaces it with a single unconditionalsha2dependency.This prevents
sha2-asmfrom being pulled into downstream dependency graphs (notably under Cargo’s v1 feature resolver), avoiding Windows build/link failures while keepingsha2’s runtime CPU fast paths.Reviewed by Cursor Bugbot for commit fabb24a. Bugbot is set up for automated code reviews on this repo. Configure here.