Artifact for the VLDB 2026 Submission - Sharp: Shared State Reduction for Efficient Matching of Sequential Patterns
This repository provides the artifact for Sharp, a system for efficient best-effort pattern matching using shared state reduction. It supports three workloads: CEP/ for Complex Event Processing, MATCH_RECOGNIZE/ for SQL-based row pattern matching, and GraphRAG/ for path-based pattern matching over knowledge graphs. Sharp leverages pattern-sharing and a lightweight cost model to significantly reduce computational overhead while preserving high recall under latency constraints.
The codebase has been tested on Ubuntu 22.04, SUSE Linux Enterprise Server 15 SP5, and Red Hat Enterprise 9.5. For both CEP/ and MATCH_RECOGNIZE/, enter each directory and install the necessary dependencies listed in build_support/packages.sh:
$ sudo build_support/packages.sh
$ sudo apt install libboost-all-devDownload the required datasets from synthetic dataset, real-world datasets.
Create the following directories and unzip the datasets inside accordingly:
$ mkdir synthetic_data real_data
# unzip downloaded files into the above foldersInside both CEP/ and MATCH_RECOGNIZE/:
$ mkdir build && cd build
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=YES -DCMAKE_BUILD_TYPE=Debug ..
$ make -j$(nproc)Navigate to the scripts/ folder, set up a Python virtual environment, and install dependencies:
$ python -m venv venv && source venv/bin/activate
$ pip install pandas matplotlib
$ python recall_latency_throughput_parallel.pyNavigate to GraphRAG/GraphRAG-SHARP/, create a virtual environment, and install dependencies:
$ python -m venv venv && source venv/bin/activate
$ pip install -r requirements.txtThe dataset can be downloaded here.
Ensure access to a GPU (≥12GB). Set your Hugging Face token:
$ export HF_TOKEN="<TOKEN>"$ ./scripts/planning.shAfter step 1 completes:
$ ./scripts/reasoning.shRepeat this process for each baseline folder.