Skip to content

Releases: pcalnon/juniper-data

juniper-data v0.9.0 — delay_product capacity generator (DP-3)

Choose a tag to compare

@pcalnon pcalnon released this 22 Jun 23:04
Immutable release. Only release title and notes can be modified.
f38714e

juniper-data v0.9.0 Release Notes

Release Date: 2026-06-22
Version: 0.9.0
Codename: DP-3 Capacity Dataset
Release Type: MINOR

Authored from the canonical juniper-ml/notes/templates/TEMPLATE_RELEASE_NOTES.md.


Overview

Adds the delay_product synthetic time-series generator — the capacity-demonstrating dataset
for the juniper-recurrence DP-3 readout spectrum — plus routine dependency / CI maintenance. The new
generator's regression target is a bilinear product of two delayed in-window values, a quadratic
form in the LMU memory state that a linear readout provably cannot fit (so it exposes a clear
nonlinear ≫ linear r² gap, unlike the near-linear forecasting synthetics). Backward-compatible:
purely additive (a new generator + dependency bumps).

Status: STABLE — additive / backward-compatible; all existing generators and the 3-D NPZ
contract are unchanged.


Release Summary

  • Release type: MINOR
  • Primary focus: New delay_product capacity generator (DP-3) + dependency / CI maintenance
  • Breaking changes: NO
  • Priority summary: Unblocks the juniper-recurrence DP-3 P2 bench (the RFF-readout capacity gap)

Features Summary

ID Feature Status Version Phase
DP-3 §8a delay_product capacity generator Done 0.9.0 P2

What's New

delay_product synthetic generator (DP-3 capacity instrument)

An irregularly-sampled sinusoid superposition (the same non-uniform Δt sampling as irregular_sine)
whose regression target is the bilinear product of two delayed in-window values,
y = x(t−τ₁)·x(t−τ₂), with lag1 / lag2 step-delays kept strictly inside the lookback.

Changes:

  • The product is a quadratic form in the (linear) LMU memory state, so a linear readout
    provably cannot fit it
    (r² bounded below 1) while a non-linear (random-Fourier-feature) readout
    can
    — the capacity-demonstrating dataset that complements the near-linear synthetics (where the
    linear readout is already at its ceiling).
  • Emits the standard additive 3-D NPZ contract
    ({X, y, dt, target_dt, observed_mask}_{train,test,full}, task_type="regression",
    time_unit="steps") and reuses the leakage-safe window_timed_series windowing (the target reads
    only the emitted window contents; y_full == concat(train, test)).
  • Registered as delay_product in the generator registry; numpy-only, no extra. See juniper-ml
    notes/JUNIPER_RECURRENCE_DP3_READOUT_SPECTRUM_DESIGN_2026-06-20.md §8a.

Bug Fixes

None.


Improvements

Routine maintenance bundled into this release:

  • Dependency bumpsactions/checkout 6 → 7, anthropics/claude-code-action 1.0.148 → 1.0.154,
    and the python-minor dependency group (16 updates).
  • CI / tooling — local coverage reproduction (make coverage + util script), asyncio_mode=auto
    for the pytest-asyncio config, and pre-push pre-commit gates wired via default_install_hook_types.

Test Results

The delay_product generator ships with a dedicated unit-test module (contract, genuinely
non-uniform dt, the known-answer bilinear target, determinism, parameter validation, and schema)
and is wired into the parametrized end-to-end synthetic-regression and scaling test suites. The full
juniper-data suite is green in CI.


Upgrade Notes

This is a backward-compatible MINOR release. No migration steps required.

pip install --upgrade juniper-data==0.9.0

Known Issues

None known at time of release.


What's Next

  • juniper-recurrence DP-3 P2 bench — the bench will delegate to delay_product (via
    juniper_data.generators) to demonstrate the RFF-readout capacity gap (nonlinear ≫ linear r²),
    alongside the tie on the existing near-linear datasets.

Contributors

  • Paul Calnon

Version History

Version Date Description
0.9.0 2026-06-22 delay_product DP-3 capacity generator + maintenance
0.8.0 2026-06-19 Configurable equities regression_target
0.7.1 2026-06-19 equities wheel packaging fix
0.7.0 2026-06-19 Δt sequence data foundation

Links

juniper-data v0.8.0 — configurable equities regression_target

Choose a tag to compare

@pcalnon pcalnon released this 21 Jun 01:56
Immutable release. Only release title and notes can be modified.
2143883

MINOR release. Adds a configurable equities regression target.

EquitiesParams (inherited by EquitiesSeqParams) gains regression_target: "next_close" | "return" | "log_return", controlling the y_reg_* representation:

  • next_close (default) — raw next-day close, byte-identical to prior output;
  • returnnext_close / close - 1;
  • log_returnln(next_close / close).

The raw close is non-stationary; the return variants are stationary (standard conditioning for trending price data). Both equities and equities_seq honor it via a shared helper. No change to the direction target, the feature matrix, or any other array; the default keeps every existing artifact byte-identical.

Motivated by the juniper-recurrence Δt-LMU equities_seq finding (raw-close target → r²≈−50). Feature PR: #195. Full notes: notes/releases/RELEASE_NOTES_v0.8.0.md (#198).

🤖 Generated with Claude Code

v0.7.1 — equities wheel packaging fix

Choose a tag to compare

@pcalnon pcalnon released this 19 Jun 22:49
Immutable release. Only release title and notes can be modified.
711a618

Juniper Data v0.7.1 Release Notes

Release Date: 2026-06-19
Version: 0.7.1
Release Type: PATCH


Overview

Patch release fixing a packaging defect in 0.7.0: the equities generators' bundled S&P 500
constituents CSV was not shipped inside the wheel, leaving the equities / equities_seq extras
non-functional from a pip install. No API change.

Status: STABLE — backward-compatible patch. No migration.


Fixed

  • Ship sp500_constituents.csv inside the wheel. 0.7.0 packaged only *.py, so the equities
    generators raised FileNotFoundError on the bundled constituents file from a pip install of
    juniper-data[equities]==0.7.0 (the file is loaded via Path(__file__).parent / "sp500_constituents.csv"
    — fine in a source checkout, absent from the built wheel). Adds a [tool.setuptools.package-data]
    entry (juniper_data.generators.equities = ["*.csv"]) so the constituents list ships in the
    wheel + sdist, plus a CI build-step assertion that the CSV is present in the built wheel (guards
    the actual failure mode against a future regression). (juniper-data#193)

The defect was surfaced by the juniper-recurrence benchmark's equities_seq row, which could not
load the generator from the published juniper-data[equities]==0.7.0 wheel.


Upgrade

pip install --upgrade "juniper-data[equities]"   # 0.7.1 — equities/equities_seq now work from PyPI

Backward-compatible; no migration steps. The synthetic generators (multi_sine, mackey_glass,
ar_p, irregular_sine) were unaffected by the 0.7.0 defect and continue to work without any extra.


Known Issues

None. All required CI checks pass; the new build-step assertion confirms the CSV ships in the wheel.


Version History

Version Date Description
0.7.1 2026-06-19 Fix: equities constituents CSV now ships in the wheel
0.7.0 2026-06-19 Synthetic dt-sequence generators + scaling meta channel
0.6.0 2026-04-08 Versioning, batch ops, systemd, PostgreSQL fixes

Links

v0.7.0 — Δt Sequence Data Foundation

Choose a tag to compare

@pcalnon pcalnon released this 19 Jun 04:03
Immutable release. Only release title and notes can be modified.
aca62cc

Juniper Data v0.7.0 Release Notes

Release Date: 2026-06-19
Version: 0.7.0
Codename: Δt Sequence Data Foundation
Release Type: MINOR


Overview

This release completes the Δt-native sequence data foundation for the Juniper
recurrence workstream. JuniperData can now generate irregular- and regular-Δt
time-series datasets — both synthetic (closed-form, zero-dependency) and real
(S&P 500 equities) — that emit the additive 3-D NPZ sequence contract (WS-1),
plus an advisory scaling-meta channel and build provenance on the health surface.

Status: STABLE — backward-compatible, additive contract. No breaking changes.


Release Summary

  • Release type: MINOR
  • Primary focus: New features — irregular/regular-Δt sequence generators, the 3-D sequence contract, scaling meta, build provenance
  • Breaking changes: NO (every existing classification generator and NPZ invariant is unchanged; all new fields are optional/additive)
  • Headline: ships the generators that were merged to main after v0.6.0 but were absent from the published 0.6.0 wheel — closing the publish-first gap that blocked the juniper-recurrence benchmark and recurrence-model evaluation

What's New

Δt sequence generators (the recurrence "hello-world" datasets)

Synthetic regression generators — multi_sine, mackey_glass, ar_p (#187)

Three numpy-only, deterministic, offline generators emitting the additive 3-D
sequence NPZ contract (WS-1) as task_type="regression". Each samples a process
at a regular Δt and windows it into (W, L, 1) sequences with a per-step dt, a
fixed target_dt forecast horizon, an all-ones observed_mask, and the target
carried directly in y_*. multi_sine is a superposition of K sinusoids
(closed-form known answer when noise-free); mackey_glass integrates the chaotic
delay-differential equation (β=0.2, γ=0.1, n=10, τ=17); ar_p is a stable
autoregressive process. No optional extra required — pure numpy.

Irregular-Δt synthetic generator — irregular_sine (#188)

A fourth numpy-only regression generator that samples a continuous-time sinusoid
superposition at non-uniform times (sample_dt · U[1−jitter, 1+jitter]), so
the windowed artifact carries a genuinely non-uniform per-step dt and a variable
target_dt. The synthetic, known-answer counterpart to equities_seq's
calendar-gap irregularity. Backed by a new window_timed_series(values, times, …)
helper.

Real irregular-Δt sequences — equities_seq (#171) and equities (#164)

equities produces daily per-(ticker, day) records for S&P 500 constituents
(Yahoo Finance OHLCV + SEC EDGAR shares/market-cap, 52-week high/low, cost basis)
with dual targets (one-hot next-day direction + auxiliary next-day-close
regression). equities_seq is its windowed 3-D sequence variant carrying genuine
calendar-gap irregular Δt. Both require the [equities] extra (yfinance,
pandas).

Advisory dt / target scaling-meta channel (#189)

A generator may now report how its per-step dt and regression target should be
standardized, via a reserved "scaling" key that the dataset route pops into two
new optional DatasetMeta fields — dt_scaling and target_scaling. The scaling
is advisory: the NPZ keeps RAW arrays (every contract invariant intact); a
consumer standardizes at ingestion and denormalizes for metrics using the
persisted stats. New core/scaling.py (exact-inverse standardize /
inverse_standardize, std≈0 guard) and core/meta.py::pop_scaling_meta. The four
synthetic generators gain a scaling: "identity" | "standardize" parameter
(standardize descriptors fit on the train split only — no test leakage).

Sequence contract foundation (WS-1) (#169, #170)

A per-entity sequence-windowing primitive with a Hypothesis leakage-property test
(#169), and a regression/sequence-tolerant dataset contract that makes class
metadata optional and dispatches on task_type (#170).

Build provenance on the health surface (#180)

/v1/health and /v1/health/ready now report the source git_sha and ISO-8601
build_date baked into the image (GIT_SHA / BUILD_DATE / APP_VERSION
build-args → OCI labels + env vars; new juniper_data.provenance accessor; values
flow into set_build_info(...) and the shared ReadinessResponse). Foundation for
ecosystem stale-image detection. Requires juniper-observability>=0.4.0.

Compatibility

  • fastapi 0.137 route-introspection compatibility (_IncludedRouter) (#181),
    starlette>=1.0.1 floor (CVE-2026-48710), and routine dependency bumps.

API Changes

New / changed response fields

Surface Change Breaking?
DatasetMeta New optional dt_scaling, target_scaling descriptors No
/v1/health, /v1/health/ready New git_sha, build_date provenance fields No
Dataset metadata n_classes / class_distribution now optional (task_type="regression") No

New generators registered on the dataset route

multi_sine, mackey_glass, ar_p, irregular_sine (no extra) and equities,
equities_seq ([equities] extra). All emit the 3-D sequence NPZ contract:
X (n,T,F), y / y_reg, dt (n,T, dt[:,0]=0), target_dt (n,),
seq_lengths, observed_mask — split-suffixed (_train / _test / _full).


Upgrade Notes

This is a backward-compatible MINOR release. No migration steps are required for
existing classification datasets or consumers.

pip install --upgrade juniper-data            # synthetic generators, core, API
pip install --upgrade "juniper-data[equities]" # + equities / equities_seq
  • The API server extra pulls juniper-observability>=0.4.0 (provenance helpers).
  • Synthetic Δt generators (multi_sine, mackey_glass, ar_p, irregular_sine)
    need no optional extra.

Known Issues

  • equities / equities_seq require network access to Yahoo Finance and SEC
    EDGAR at generation time; they are excluded from the offline test path. Not a
    functional defect.
  • None blocking. All required CI checks (unit/integration across Python
    3.12–3.14, pre-commit, CodeQL, security, lockfile freshness, quality gate) pass.

What's Next

  • Consumed downstream: the juniper-recurrence benchmark and recurrence-model
    evaluation depend on these published generators (the Δt thesis was validated
    against irregular_sine).
  • Eval extensions: noisy synthetic variants (noise_std > 0) and real
    equities_seq benchmarking.
  • Scaling/synthetic generator enhancements tracked under WS-4.

Version History

Version Date Description
0.7.0 2026-06-19 Synthetic dt-sequence generators + scaling meta channel
0.6.0 2026-04-08 Versioning, batch ops, systemd, PostgreSQL fixes

Links

v0.6.0

Choose a tag to compare

@pcalnon pcalnon released this 09 Apr 09:39
v0.6.0
0d65630

Highlights

Major release with dataset versioning, batch operations, security hardening, and infrastructure improvements.

Added

  • Dataset Versioning (CAN-DEF-005 Phase 1): Logical dataset names with auto-incrementing version numbers. Atomic version allocation prevents duplicates under concurrency. New endpoints: GET /v1/datasets/versions, GET /v1/datasets/latest.
  • Batch Operations (CAN-DEF-006): POST /v1/datasets/batch-create, PATCH /v1/datasets/batch-tags, POST /v1/datasets/batch-export for operating on multiple datasets in single requests.
  • Docker Secrets: File-based secrets support via get_secret() utility.
  • Systemd Integration: Service unit and management CLI for native Linux deployments.
  • CSV Import Path Traversal Protection: New JUNIPER_DATA_IMPORT_DIR setting restricts CSV file imports to a configurable base directory.

Fixed

  • Synchronized version across __init__.py, pyproject.toml, and Dockerfile
  • PostgreSQL metadata/artifact split-brain on save failure
  • PostgreSQL temp artifact race conditions on concurrent saves
  • Advisory lock namespace collision between dataset ID and version allocation
  • Generic n_classes fallback replacing spiral-specific params.n_spirals
  • Removed inconsistent fallback that crashed for non-spiral generators with empty training sets

Changed

  • Updated GitHub Actions (checkout v6, setup-python v6.2, upload-artifact v7, codecov v6)
  • Sentry PII and traces sample rate now configurable (defaults: PII=False, sample_rate=0.1)
  • AGENTS.md comprehensive audit and update
  • Documentation link checker with cross-repo skip mode

Security

  • CSV import generator now validates file paths against configurable import directory
  • Sentry PII transmission disabled by default (was enabled)

Stats

  • 849 tests passing
  • 98%+ code coverage
  • All pre-commit hooks passing

🤖 Generated with Claude Code

v0.4.2

Choose a tag to compare

@pcalnon pcalnon released this 26 Feb 12:01
1968dd3

Initial PyPI release of juniper-data.

Dataset generation and management service for the Juniper ecosystem.