A small but real relational database engine, written from scratch in Swift. It implements the layers that sit underneath a SQL prompt — how bytes are laid out on disk, how pages are cached in memory, how rows and indexes are stored, how a query is turned into a plan and executed — and exposes them through an interactive sql shell.
It is built for learning. If you have used SQL but never seen what happens after you press enter, this codebase walks you down through every layer, with each layer small enough to read in one sitting.
Inspiration. The design follows the concepts taught in Database Systems on Modern CPU Architectures and Query Optimization at TU München (TUM): a buffer manager with a 2Q replacement policy, slotted pages with a free-space inventory, a B+-tree, external (on-disk) sorting, and the iterator/Volcano execution model. This is an independent Swift implementation of those ideas.
Requirements: Swift 6.2+, macOS 13+ (Apple Silicon).
swift build # build the library + the `sql` CLI
swift test # run the test suite (132 tests)
swift run sql mydb # open (or create) a database in ./mydb and start a REPLA "database" is just a directory of files. Point the CLI at any path; it is created on first use.
$ swift run sql mydb
sql> CREATE TABLE users (id INTEGER, name CHAR(16), PRIMARY KEY (id));
CREATE TABLE
sql> INSERT INTO users VALUES (1, 'alice');
INSERT 1
sql> INSERT INTO users VALUES (2, 'bob');
INSERT 1
sql> SELECT id, name FROM users WHERE id = 1;
1,alice
sql> exit
Other ways to run it:
swift run sql mydb -c "SELECT * FROM users" # run one statement and exit
swift run sql mydb -f script.sql # run a file of ;-separated statements
swift run sql mydb --open-only # fail if the database doesn't already existAPI reference for every public type is published with DocC to GitHub Pages:
https://pauljohanneskraft.github.io/Database/documentation/database/
The landing page lists every public symbol grouped by kind, and each type links down to its members — so the whole API is reachable by clicking, no URL-guessing. The docs are rebuilt and redeployed on every push to main.
A query falls through these layers from top to bottom. Each is a directory under Sources/Database/.
| Layer | Where | What it does |
|---|---|---|
| Storage | Storage/ |
Raw file I/O. File is the protocol; PosixFile is the pread/pwrite implementation. Mutex / RWLock are the locking primitives the upper layers latch with. |
| Buffer manager | BufferManager/ |
Caches fixed-size pages in memory so the engine isn't doing a disk read per row. BufferManager hands out BufferFrames via fix/unfix, evicts cold pages with a 2Q policy, and is safe to call from many threads. A 64-bit page id is 16-bit segment id ∥ 48-bit page id. |
| Slotted pages | SlottedPages/ |
Turns opaque pages into variable-length records. SlottedPage is the on-page layout; SPSegment stores/fetches/updates/deletes records addressed by TID (tuple id); FSISegment is the free-space inventory that finds a page with room. |
| B+-tree | BTree/ |
The index structure. BTree supports insert/lookup/erase with latch coupling for concurrency; Char16 is the 16-byte key type used for CHAR columns. |
| Operators | Operators/ |
Query execution, "iterator model": every operator answers open() / next() / close() and pulls rows from its children one at a time. Values flow through shared Registers. |
| SQL front-end | SQL/ |
Lexer → Parser → SemanticAnalysis → Planner produce the operator tree; SQLExecutor is the one entry point the CLI and tests both call. |
| CLI | Sources/SQL/ |
main.swift — the sql REPL / script runner that wraps SQLExecutor. |
The whole engine is the Database library target; the sql executable is a thin shell on top of it.
Everything below is reachable through the CLI. Each feature links to the types that implement it, so you can read the SQL command and then jump straight to the code behind it.
Just open a directory — there is no separate "init" step.
- Behind it:
Database.create/Database.open(SlottedPages/Database.swift) lay out and load the segment files;SchemaSegmentpersists the catalog (your tables and indexes) as JSON so it survives a reopen.
CREATE TABLE users (id INTEGER, name CHAR(16), PRIMARY KEY (id));Columns are typed INTEGER or CHAR(n). A single-column PRIMARY KEY is automatically given a unique index.
- Behind it: parsed into
CreateTableAST(SQL/Statement.swift); the column types areSchemaType(SlottedPages/Schema.swift); the table is registered inSchemaand gets its ownSPSegmentfor row storage. A single-column PK auto-creates aBTree-backed index.
CREATE INDEX users_by_name ON users (name);Adds a unique secondary index on one column.
- Behind it:
CreateIndexAST→Database.createIndex; the index is aSchemaIndex(SlottedPages/DatabaseIndex.swift) wrapping aBTreewhose values areTIDs pointing back at the row. The index's tree root is recorded in the schema JSON so a reopened database finds it again.
INSERT INTO users VALUES (1, 'alice');- Behind it:
InsertAST→Database.insert. The row is placed on a page chosen by theFSISegmentfree-space lookup, stored bySPSegmentas a record addressed by a newTID, and any indexes on the table are updated.
COPY users FROM 'people.csv' CSV HEADER;- Behind it:
CopyAST→CSVLoader(SQL/CSVLoader.swift), which streams the file and inserts each line through the sameSPSegmentpath asINSERT. ReturnsCOPY <n>.
SELECT name FROM users WHERE id = 1;
SELECT u.name, o.total FROM users u, orders o WHERE u.id = o.user_id AND o.total = 100;List one or more tables in FROM; WHERE predicates are equalities (=) joined by AND. An attr = constant is a filter; an attr = attr across two tables is an equi-join. SELECT * projects everything.
- Behind it:
SemanticAnalysis(SQL/SemanticAnalysis.swift) resolves names/types against theSchema;Planner(SQL/Planner.swift) lowers it to an operator tree:- leaf scans are
TableScan, orIndexScan+TIDResolvewhen a filter hits an indexed column (see below), attr = attrpredicates becomeHashJoin(orCrossProductwhen no join condition connects two tables),attr = constantpredicates becomeSelect,- the column list becomes
Projection, Printrenders the rows the CLI prints.
- leaf scans are
If you filter on an indexed column with equality, the planner uses the index instead of scanning the whole table — no special syntax needed.
- Behind it: in
Planner.makeScan, an equality on an indexed column is turned into anIndexScan(which emits matchingTIDs) feeding aTIDResolve(which fetches the full rows) — same row shape as aTableScan, so the rest of the plan doesn't care which was used. Otherwise it falls back to a fullTableScan.
SELECT id FROM a UNION SELECT id FROM b;
SELECT id FROM a INTERSECT ALL SELECT id FROM b;
SELECT id FROM a EXCEPT SELECT id FROM b;UNION, INTERSECT, and EXCEPT, each with an optional ALL (bag vs. set semantics). They chain and can be parenthesised; per the SQL standard, INTERSECT binds tightest.
- Behind it: the parser builds a
SelectExprtree (SQL/QueryAST.swift); thePlannermaps each node to the matching operator:Union/UnionAll/Intersect/IntersectAll/Except/ExceptAll(Operators/Operators.swift).
DROP TABLE users;- Behind it: removes the table and its indexes from the
Schemaand re-persists the catalog viaDatabase.persistSchema.
Engine vs. SQL surface. The execution engine also includes
Sort(backed by the on-diskExternalSort+SortSpillover) andHashAggregationoperators. These are fully implemented and tested at the operator level, but the SQL grammar above does not yet exposeORDER BY/GROUP BY— a good first feature to add if you want to extend the front-end.
Sources/
Database/ the engine (library target)
Storage/ file I/O + locking primitives
BufferManager/ page cache, 2Q eviction, segment/page-id scheme
SlottedPages/ records, free-space inventory, schema/catalog, on-disk Database
BTree/ B+-tree index + Char16 key
Operators/ iterator-model query operators + Register exchange
ExternalSort/ generic k-way on-disk merge sort
SQL/ lexer, parser, semantic analysis, planner, executor, CSV loader
SQL/ the `sql` command-line shell (executable target)
Tests/
DatabaseTests/ 132 tests (uses the swift-testing framework: @Test / #expect)
swift test # everything
swift test --filter SQLSuite # one suite, matched by name
swift test --no-parallel # serial run, if you suspect a concurrency issueThe tests are written with Swift's Testing framework (@Test, #expect), not XCTest, and double as runnable documentation: each layer's expected contracts (page-id layout, one-hop TID redirects, the external-sort memory cap, iterator semantics) are pinned down by a test you can read.