Skip to content

pcalnon/juniper-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

945 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

juniper-data

PyPI Python 3.12+ License: MIT

A FastAPI service that generates, versions, and serves ML datasets as NPZ artifacts.

juniper-data turns a catalogue of dataset generators into a REST service: you ask it for a dataset by name and parameters, and it returns a versioned, NPZ-formatted train/test/full split. The catalogue spans synthetic classification problems (two-spiral, concentric circles, XOR, Gaussian mixtures, moons, checkerboard), image sets (MNIST), the ARC-AGI visual-reasoning families, a CSV/JSON import path, and a family of time-series and irregularly-sampled sequence generators (autoregressive, Mackey-Glass, multi-sine, delay-product, equities, and the irregular-Δt equities_seq contract). A named-version registry, tag filtering, batch creation, and per-dataset preview round out the surface. Call GET /v1/generators for the live catalogue.

It is the foundational data layer of the platform: the dataset identifiers it returns are the substrate juniper-cascor trains on and juniper-canopy visualises.

Part of the Juniper platform. juniper-data is the dataset-generation service of Juniper — a multi-package ML research platform built around constructive (Cascade-Correlation) and recurrent neural networks. It runs standalone; the rest of the platform consumes it over HTTP (see juniper-data-client).

Install

pip install juniper-data            # from PyPI

For development from a clone (the optional extras are api, arc-agi, equities, observability, test, dev, all):

git clone https://github.com/pcalnon/juniper-data.git && cd juniper-data
pip install -e ".[all]"

Run

uvicorn --factory juniper_data.api.app:get_app --reload    # binds 127.0.0.1:8100
curl http://localhost:8100/v1/health/ready
curl http://localhost:8100/v1/generators                   # the live generator catalogue

Create a dataset over the REST API:

curl -sX POST localhost:8100/v1/datasets \
  -H 'Content-Type: application/json' \
  -d '{"generator": "spiral", "name": "demo", "params": {"n_spirals": 2, "noise": 0.1}}'

Or generate one in-process, without the service:

from juniper_data.generators import SpiralGenerator, SpiralParams

dataset = SpiralGenerator.generate(SpiralParams(n_spirals=2, n_points_per_spiral=100, noise=0.1))
# dataset: dict of float32 arrays — X_train, y_train, X_test, y_test, X_full, y_full

Data contract

Datasets are NPZ archives with the keys X_train, y_train, X_test, y_test, X_full, y_full, all float32. This is the contract every Juniper consumer reads.

Configuration

Settings load from the JUNIPER_DATA_ environment namespace (juniper_data/api/settings.py) and honor the Docker _FILE secret convention. The most common knobs (full surface in docs/REFERENCE.md):

Variable Default Purpose
JUNIPER_DATA_HOST / JUNIPER_DATA_PORT 127.0.0.1 / 8100 Bind address / port (0.0.0.0 under Docker).
JUNIPER_DATA_STORAGE_PATH ./data/datasets Where persisted dataset artifacts live.
JUNIPER_DATA_API_KEYS (unset) CSV / JSON-array of X-API-Key values; auth is disabled when unset.
JUNIPER_DATA_LOG_LEVEL / _LOG_FORMAT INFO / text Verbosity / text or json.
JUNIPER_DATA_METRICS_ENABLED false Expose /metrics for Prometheus (IP-gated).

Docker

docker build -t juniper-data:latest .
docker run --rm -p 8100:8100 -e JUNIPER_DATA_HOST=0.0.0.0 juniper-data:latest

Multi-stage build (Python 3.14-slim); health is probed at /v1/health/ready. For the full stack, see juniper-deploy.

Status

Live on PyPI. The current version is shown by the badge above; see CHANGELOG.md. Consumed by juniper-cascor and juniper-canopy via JUNIPER_DATA_URL, and by juniper-data-client programmatically.

Documentation

License

MIT — see LICENSE.

About

Dataset generation service for the Juniper AI/ML research ecosystem

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages