libRandomizer

libRandomizer is a portable training-data generator for simple prediction networks. You define the datatype or schema for the input, define the datatype or schema for the output, choose a fixed record count, and set a seed. The SDK then produces a reproducible list of input/output training pairs.

The Python package is the reference implementation. Its schema contract is JSON-native so the same dataset definition can be carried across language targets without depending on opaque Python objects.

Install

python -m pip install .

Quickstart

from librandomizer import TrainingDataGenerator, choice, integer

generator = TrainingDataGenerator(
    input_schema=integer(0, 99),
    output_schema=choice(["low", "medium", "high"]),
    count=100,
    seed=42,
)

pairs = generator.generate()

Each generated record has the same language-neutral shape:

{
  "input": 81,
  "output": "low"
}

Calling the same generator again with the same schema, count, and seed produces the same records in the same order.

Derived Outputs

Most datasets can be generated from separate input and output schemas. When a target should be calculated from the input, pass a transform callback and keep an output_schema so the result can be validated and serialized consistently.

from librandomizer import TrainingDataGenerator, integer, number

generator = TrainingDataGenerator(
    input_schema=integer(0, 10),
    output_schema=number(0, 20),
    count=100,
    seed=42,
    transform=lambda value: value * 2,
)

pairs = generator.generate()

This is useful for labels, thresholds, regression targets, boolean decisions, and other predictable outputs for supervised learning examples.

Schema Helpers

The v1 schema layer focuses on portable JSON-native datatypes:

Helper	Purpose
`integer(min, max)`	Bounded integer values
`number(min, max, precision=None)`	Bounded floating point values
`boolean()`	`true` or `false` values
`string(length=8, alphabet=None)`	Fixed-length strings
`choice(values)`	One value from a finite set
`array_schema(items, length)`	Fixed-length arrays
`object_schema(properties)`	Nested JSON objects
`null()`	Explicit `null` values
`literal(value)`	A fixed serializable value
`one_of(schemas)`	A deterministic choice among schema variants

Schemas can be nested:

from librandomizer import boolean, integer, object_schema, string

input_schema = object_schema({
    "profile": object_schema({
        "age": integer(18, 65),
        "active": boolean(),
    }),
    "plan": string(length=6),
})

Generator API

TrainingDataGenerator(
    input_schema,
    output_schema,
    *,
    count=None,
    seed=42,
    transform=None,
    transform_spec=None,
)

input_schema describes the generated input side of each pair.
output_schema describes the generated output side, or validates transform results when a transform is supplied.
count is the default number of pairs produced by generate() and export methods.
seed controls deterministic generation.
transform is optional and receives one generated input value.
transform_spec is optional serializable metadata for cross-language specs.

Exports

generator.write_json("train.json")
generator.write_jsonl("train.jsonl")
generator.write_csv("train.csv")

JSON preserves the full nested record structure. JSONL is convenient for streaming and line-oriented tooling. CSV flattens nested records into stable columns such as input.profile.age, input.features[0], and output.score.

Portable Specs

Generators can be serialized as a spec:

spec = generator.to_spec()
restored = TrainingDataGenerator.from_spec(spec)

Specs include the seed, count, input schema, output schema, and optional transform metadata. Host-language callback code is intentionally not serialized; portable transforms should be represented by a named transform_spec and bound to native code in each SDK.

Reproducibility Guarantees

The same implementation must produce identical datasets for the same:

input schema
output schema
seed
count
transform behavior, when a transform is used

Different seeds should change generated inputs while preserving schema validity. Exports are deterministic so generated JSON, JSONL, and CSV files can be used in tests, demos, examples, and repeatable training experiments.

Legacy Compatibility

The original OS-backed random primitive APIs remain available as compatibility shims while the training-data generator becomes the primary product. New code should use TrainingDataGenerator and the portable schema helpers.

V1 Beta parity note

Python/CLI is the full behavior-complete reference implementation. Other language SDKs are currently generated API surfaces and are advancing through parity hardening in this order: JavaScript, TypeScript, Go, C#, Java, Rust, C/C++, PHP, Ruby, Kotlin, Swift, Dart, R.

Current language status is tracked in docs/SDK_PARITY_STATUS.md.

Documentation

The GitHub Pages site lives in docs/. Start with docs/index.html for the developer-facing overview, then see spec/training/README.md for the portable schema contract.

Tests

python -m unittest discover tests

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
datasets		datasets
docs		docs
packages		packages
scripts		scripts
spec		spec
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
libRandom.py		libRandom.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libRandomizer

Install

Quickstart

Derived Outputs

Schema Helpers

Generator API

Exports

Portable Specs

Reproducibility Guarantees

Legacy Compatibility

V1 Beta parity note

Documentation

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

libRandomizer

Install

Quickstart

Derived Outputs

Schema Helpers

Generator API

Exports

Portable Specs

Reproducibility Guarantees

Legacy Compatibility

V1 Beta parity note

Documentation

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages