RDF Framework Benchmark

A reproducible benchmark comparing eleven RDF frameworks and triplestores on I/O performance (read/write Turtle and N-Triples) and SPARQL query performance across three dataset scales (100K, 1M, and 10M triples).

Results: Open benchmark.html in a browser to view the interactive report with charts, framework filters, preset groups, and expandable query details.

Frameworks tested

Framework	Language	Store type	Version	License
maplib	Python (Rust core)	In-memory (Polars + Arrow)	0.20.15	Apache 2.0
maplib (disk)	Python (Rust core)	Disk-backed (Polars + Arrow)	0.20.15	Proprietary
oxigraph	Python (Rust core)	Disk-backed (RocksDB)	0.5.7	MIT / Apache 2.0
rdflib	Python (pure)	In-memory (dict-of-dicts)	latest	BSD 3-Clause
Apache Jena	Java	In-memory Model	5.2.0	Apache 2.0
Eclipse RDF4J	Java	MemoryStore SAIL	5.0.3	EDL 1.0
QLever	C++ (Docker)	On-disk index + SPARQL endpoint	latest	Apache 2.0
Virtuoso	C (Docker)	Hybrid relational/RDF, column store	7.x	GPL v2
GraphDB	Java (Docker)	RDF4J-based, on-disk persistence	10.8.0	Proprietary (free tier)
dotNetRDF	C# (Docker)	In-memory TripleStore	3.5.1	MIT
Neo4j + n10s	Java (Docker)	Native property graph + RDF import	5.26 + n10s 5.26.0	GPL v3 (Community)

Prerequisites

Python frameworks (maplib, maplib-disk, oxigraph, rdflib):

Python 3.10+
pip install maplib pyoxigraph rdflib

Java frameworks (Jena, RDF4J):

Java 11+
Maven 3.8+

Docker-based frameworks (QLever, Virtuoso, GraphDB, dotNetRDF, Neo4j):

Docker Desktop installed and running

Images are pulled automatically by each benchmark script, or manually:

docker pull adfreiburg/qlever
docker pull openlink/virtuoso-opensource-7:latest
docker pull ontotext/graphdb:10.8.0
docker pull mcr.microsoft.com/dotnet/sdk:8.0
docker pull neo4j:5.26-community

Directory structure

rdf-benchmark/
├── README.md              ← you are here
├── benchmark.html         ← interactive results report
├── generate_data.py       ← synthetic data generator
├── data/                  ← generated .ttl and .nt files
├── queries/               ← shared SPARQL query files (q1–q4)
├── results/               ← JSON output from each framework
├── python-maplib/         ← maplib benchmark
├── python-maplib-disk/    ← maplib disk-backed benchmark
├── python-oxigraph/       ← oxigraph benchmark
├── python-rdflib/         ← rdflib benchmark
├── java-jena/             ← Jena benchmark (Maven project)
├── java-rdf4j/            ← RDF4J benchmark (Maven project)
├── qlever/                ← QLever benchmark (Docker)
├── virtuoso/              ← Virtuoso benchmark (Docker)
├── graphdb/               ← GraphDB benchmark (Docker)
├── dotnetrdf/             ← dotNetRDF benchmark (Docker)
└── neo4j/                 ← Neo4j + n10s benchmark (Docker)

Step 1: Generate test data

From the rdf-benchmark/ root directory:

python generate_data.py

This creates Turtle (.ttl) and N-Triples (.nt) files in data/ at three scales, plus four SPARQL query files in queries/:

Scale	Triples	Turtle size	N-Triples size
Medium	~100K	~3.6 MB	~10.9 MB
Large	~1M	~36.9 MB	~111 MB
Xlarge	~10M	~369 MB	~1.1 GB

The data models a synthetic e-commerce graph with customers, orders, and products. A fixed random seed (42) ensures reproducibility.

Step 2: Run benchmarks

Each benchmark script runs from its own directory and writes results to ../results/.

Python frameworks

cd python-maplib && python bench_maplib.py && cd ..
cd python-maplib-disk && python bench_maplib_disk.py && cd ..
cd python-oxigraph && python bench_oxigraph.py && cd ..
cd python-rdflib && python bench_rdflib.py && cd ..    # slow — expect ~5 min on medium

Java frameworks

Build and run from each directory:

cd java-jena
mvn package -q
java -jar target/jena-benchmark-1.0-SNAPSHOT.jar ../data ../queries ../results
cd ..

cd java-rdf4j
mvn package -q
java -jar target/rdf4j-benchmark-1.0-SNAPSHOT.jar ../data ../queries ../results
cd ..

Docker-based frameworks

Each script handles pulling images, starting containers, and cleanup:

cd qlever && python bench_qlever.py && cd ..
cd virtuoso && python bench_virtuoso.py && cd ..
cd graphdb && python bench_graphdb.py && cd ..
cd dotnetrdf && python bench_dotnetrdf.py && cd ..
cd neo4j && python bench_neo4j.py && cd ..

Notes on Docker benchmarks:

GraphDB requires the image to be pulled manually first: docker pull ontotext/graphdb:10.8.0
Neo4j automatically downloads the neosemantics (n10s) plugin JAR on first run
dotNetRDF builds a Docker image from source (Dockerfile + C# project)
dotNetRDF's xlarge (10M) fails with OOM — results are recorded as TIMEOUT
All Docker containers are cleaned up automatically after benchmarking

Step 3: View results

Open benchmark.html in any browser. The report includes:

I/O performance charts — grouped bar charts for read/write at each scale
Query performance charts — SPARQL timing comparisons
Scale tabs — switch between 100K, 1M, and 10M triple datasets
Log scale toggle — useful when comparing frameworks with very different speeds
Framework filters — click chips to show/hide individual frameworks
Preset groups — All, In-memory, Disk/server, Python, Docker-based
Expandable query rows — click any query name to see the full SPARQL

SPARQL queries

ID	Description	Complexity
Q1	`COUNT` all triples	Full scan
Q2	Top 20 customers by spend (`GROUP BY` + `SUM` + `ORDER BY`)	Aggregation over joins
Q3	3-entity join (customer + order + product) with country filter	Multi-pattern + filter
Q4	Revenue by country/segment with `OPTIONAL` orders	OPTIONAL + aggregation

Neo4j uses equivalent Cypher translations of these queries rather than SPARQL.

Methodology

All frameworks use the same data files and the same queries. Each operation has a 5-minute timeout.

Timing: Python uses time.perf_counter() with garbage collection between runs. Java uses System.nanoTime() with JVM warmup. Docker-based frameworks time the full operation including any HTTP round-trip.

Queries: Best of 3 runs after a warmup run.

I/O: Single timed run (no averaging), since allocation overhead is part of the real-world cost.

Write operations: Native library frameworks (maplib, oxigraph, rdflib, Jena, RDF4J, dotNetRDF) benchmark writing Turtle and N-Triples. Server-based frameworks (QLever, Virtuoso, GraphDB, Neo4j) record write operations as N/A since they are database servers that don't serialize RDF files.

oxigraph: Uses pyoxigraph.Store() without a path argument, creating an anonymous temporary store backed by RocksDB.

Neo4j + n10s: Imports RDF via the neosemantics plugin which maps RDF triples to Neo4j's native labeled property graph model. Import times include this RDF-to-property-graph conversion. Queries are Cypher translations of the SPARQL benchmarks.

Notes

The xlarge dataset (~10M triples) requires significant memory. Expect 4+ GB for in-memory frameworks.
rdflib is pure Python and will be substantially slower than the Rust-backed and Java frameworks — this is expected.
maplib reads use parallel=True for multi-threaded parsing.
maplib (disk) uses the proprietary storage_folder parameter. The in-memory maplib is fully open source under Apache 2.0.
dotNetRDF cannot handle xlarge (10M triples) within its 16 GB Docker memory limit.
Server-based Docker frameworks (QLever, Virtuoso, GraphDB, Neo4j) have query times that include ~0.5–1 ms of HTTP overhead. dotNetRDF runs in-process with no network overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDF Framework Benchmark

Frameworks tested

Prerequisites

Directory structure

Step 1: Generate test data

Step 2: Run benchmarks

Python frameworks

Java frameworks

Docker-based frameworks

Step 3: View results

SPARQL queries

Methodology

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dotnetrdf		dotnetrdf
graphdb		graphdb
java-jena		java-jena
java-rdf4j		java-rdf4j
neo4j		neo4j
python-maplib-disk		python-maplib-disk
python-maplib		python-maplib
python-oxigraph		python-oxigraph
python-rdflib		python-rdflib
qlever		qlever
queries		queries
virtuoso		virtuoso
.gitignore		.gitignore
README.md		README.md
generate_data.py		generate_data.py
index.html		index.html
run_all.sh		run_all.sh
update_report.py		update_report.py

Folders and files

Latest commit

History

Repository files navigation

RDF Framework Benchmark

Frameworks tested

Prerequisites

Directory structure

Step 1: Generate test data

Step 2: Run benchmarks

Python frameworks

Java frameworks

Docker-based frameworks

Step 3: View results

SPARQL queries

Methodology

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages