Skip to content

Redundant schema compilation causes significant resource overhead #175

@uCantHim

Description

@uCantHim

Schema compilations are not de-duplicated, even though scenarios can easily reference the same unique files multiple times. For example, the Validator Configuration for XRechnung compiles 6 unique .xls files a total of 34 times. Both the computational overhead and the memory overhead are significant, as all redundantly compiled documents are being kept in memory.

I have implemented a small fix (~8 lines of code) that caches compiled schemas in ContentRepository. Here are my measurement results for cold starts on the same machine with the default usage example from https://github.com/itplr-kosit/validator-configuration-xrechnung (/usr/bin/time -v for measurement):

Version Time Peak Memory
v1.6.2 ~11s ~800mb
v1.6.2 patched ~7.3s ~400mb

Note that the Saxon documentation explicitly states that XsltExecutable is thread-safe by design. The cache utilizes this property well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions