perf: custom zero-overhead attribute extraction#621
perf: custom zero-overhead attribute extraction#621alexander-beedie wants to merge 1 commit intotafia:masterfrom
Conversation
d452aab to
8177452
Compare
8177452 to
12499aa
Compare
|
This does give an impressive performance improvement and also nicely simplifies the code for getting attributes. My concern would be that we are turning off the safefy guards of These are use some initial, non-exhaustive issues, which could probably be fixed, and/or may exist in the current implementation anyway. But I'm not sure if I can make a judgement call on whether the (impressive 16%) performance increase is worth the additional risk. If you are in contact with tafia you could run it by him. I could also ask the current maintainer of It is probably worth deciding on whether this is a worthwhile change first before trying to harden the current implementation. |
| let decoded = decoder.decode(val)?; | ||
| let unescaped = unescape(&decoded).map_err(quick_xml::Error::from)?; | ||
| Ok(unescaped.into_owned()) | ||
| } |
There was a problem hiding this comment.
I'd suggest adding some lib tests here like:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_attrs() {
let bytes = b"key1=\"val1\" key2='val2'";
let mut iter = RawAttrIter::new(bytes);
assert_eq!(iter.next(), Some((&b"key1"[..], &b"val1"[..])));
assert_eq!(iter.next(), Some((&b"key2"[..], &b"val2"[..])));
assert_eq!(iter.next(), None);
}
#[test]
fn test_whitespace_around_equals() {
let bytes = b"key = \"value\"";
let mut iter = RawAttrIter::new(bytes);
assert_eq!(iter.next(), Some((&b"key"[..], &b"value"[..])));
assert_eq!(iter.next(), None);
}
#[test]
fn test_empty_value() {
let bytes = b"key=\"\"";
let mut iter = RawAttrIter::new(bytes);
assert_eq!(iter.next(), Some((&b"key"[..], &b""[..])));
}
#[test]
fn test_no_trailing_space() {
let bytes = b"key=\"value\"";
let mut iter = RawAttrIter::new(bytes);
assert_eq!(iter.next(), Some((&b"key"[..], &b"value"[..])));
assert_eq!(iter.next(), None);
}
}
There was a problem hiding this comment.
Definitely a good idea - will get back to this shortly 👌
Optimisation
Continuing to iterate through flamegraphs/profiling to tackle hotspots 👍
This PR replaces the
quick_xmlAttributes iterator with a custom zero-overheadRawAttrIterthat operates on the raw element attribute bytes and returns name/value byte-slice pairs , integrating it via trait extension and aget_attrs!macro. It's surprisingly straightforward, and noticeably faster.Avoids all unnecessary per-item overhead from
Resultwrapping,CowandQNamenewtypes, etc, finds all attributes in a single pass, and quick-exits the iterator as soon as all requested attributes are identified.Code cleanup
A pleasant side-effect of the integration is how clean the calling code becomes; despite
attrs.rs(where the new code lives) being ~125 lines, this PR actually reduces the total amount of project code by nearly ~100 lines.For example,
becomes...
...and
is now just
Also, spotted a few places that were missing entity unescaping on user-visible values like
displayName,tableColumnnames, and pivot field names; ensured they now go throughdecode_attr.Performance
xlsxworkbooks, across a variety of sizes.Footnotes
Benchmarked on: Apple Silicon M3 Max ↩