juniper-data

A FastAPI service that generates, versions, and serves ML datasets as NPZ artifacts.

juniper-data turns a catalogue of dataset generators into a REST service: you ask it for a dataset by name and parameters, and it returns a versioned, NPZ-formatted train/test/full split. The catalogue spans synthetic classification problems (two-spiral, concentric circles, XOR, Gaussian mixtures, moons, checkerboard), image sets (MNIST), the ARC-AGI visual-reasoning families, a CSV/JSON import path, and a family of time-series and irregularly-sampled sequence generators (autoregressive, Mackey-Glass, multi-sine, delay-product, equities, and the irregular-Δt equities_seq contract). A named-version registry, tag filtering, batch creation, and per-dataset preview round out the surface. Call GET /v1/generators for the live catalogue.

It is the foundational data layer of the platform: the dataset identifiers it returns are the substrate juniper-cascor trains on and juniper-canopy visualises.

Part of the Juniper platform. juniper-data is the dataset-generation service of Juniper — a multi-package ML research platform built around constructive (Cascade-Correlation) and recurrent neural networks. It runs standalone; the rest of the platform consumes it over HTTP (see juniper-data-client).

Install

pip install juniper-data            # from PyPI

For development from a clone (the optional extras are api, arc-agi, equities, observability, test, dev, all):

git clone https://github.com/pcalnon/juniper-data.git && cd juniper-data
pip install -e ".[all]"

Run

uvicorn --factory juniper_data.api.app:get_app --reload    # binds 127.0.0.1:8100
curl http://localhost:8100/v1/health/ready
curl http://localhost:8100/v1/generators                   # the live generator catalogue

Create a dataset over the REST API:

curl -sX POST localhost:8100/v1/datasets \
  -H 'Content-Type: application/json' \
  -d '{"generator": "spiral", "name": "demo", "params": {"n_spirals": 2, "noise": 0.1}}'

Or generate one in-process, without the service:

from juniper_data.generators import SpiralGenerator, SpiralParams

dataset = SpiralGenerator.generate(SpiralParams(n_spirals=2, n_points_per_spiral=100, noise=0.1))
# dataset: dict of float32 arrays — X_train, y_train, X_test, y_test, X_full, y_full

Data contract

Datasets are NPZ archives with the keys X_train, y_train, X_test, y_test, X_full, y_full, all float32. This is the contract every Juniper consumer reads.

Configuration

Settings load from the JUNIPER_DATA_ environment namespace (juniper_data/api/settings.py) and honor the Docker _FILE secret convention. The most common knobs (full surface in docs/REFERENCE.md):

Variable	Default	Purpose
`JUNIPER_DATA_HOST` / `JUNIPER_DATA_PORT`	`127.0.0.1` / `8100`	Bind address / port (`0.0.0.0` under Docker).
`JUNIPER_DATA_STORAGE_PATH`	`./data/datasets`	Where persisted dataset artifacts live.
`JUNIPER_DATA_API_KEYS`	(unset)	CSV / JSON-array of `X-API-Key` values; auth is disabled when unset.
`JUNIPER_DATA_LOG_LEVEL` / `_LOG_FORMAT`	`INFO` / `text`	Verbosity / `text` or `json`.
`JUNIPER_DATA_METRICS_ENABLED`	`false`	Expose `/metrics` for Prometheus (IP-gated).

Docker

docker build -t juniper-data:latest .
docker run --rm -p 8100:8100 -e JUNIPER_DATA_HOST=0.0.0.0 juniper-data:latest

Multi-stage build (Python 3.14-slim); health is probed at /v1/health/ready. For the full stack, see juniper-deploy.

Status

Live on PyPI. The current version is shown by the badge above; see CHANGELOG.md. Consumed by juniper-cascor and juniper-canopy via JUNIPER_DATA_URL, and by juniper-data-client programmatically.

Documentation

docs/QUICK_START.md — get running in five minutes
docs/USER_MANUAL.md — comprehensive usage guide
docs/api/JUNIPER_DATA_API.md — full REST reference (filtering, batch, tagging, versioning)
docs/REFERENCE.md — configuration and environment-variable reference
docs/DOCUMENTATION_OVERVIEW.md — index of all juniper-data docs

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 945 Commits
.github		.github
.serena		.serena
conf		conf
docs		docs
images		images
juniper_data		juniper_data
notes		notes
scripts		scripts
util		util
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.sops.yaml		.sops.yaml
.yamllint.yaml		.yamllint.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
markdown.css		markdown.css
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock
src		src
tests		tests
try		try

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

juniper-data

Install

Run

Data contract

Configuration

Docker

Status

Documentation

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

juniper-data

Install

Run

Data contract

Configuration

Docker

Status

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages