renorm-native is a PyTorch-compatible neural network module designed to improve numerical stability in deep learning models.
It provides transformer-ready layers that are robust to:
- Training instability (NaNs / exploding gradients)
- Irregular tensor shapes and sequence lengths
- Mixed CPU/GPU execution environments
- Memory pressure in large-scale workloads
It is designed to be a drop-in architectural component for modern deep learning pipelines.
Install from PyPI:
pip install renorm-nativeUpgrade to latest version:
pip install --upgrade renorm-nativeimport torch
from renorm import RenormTransformerLayer
# Initialize layer
layer = RenormTransformerLayer(dim=512, heads=8)
# Dummy input: (batch, sequence, features)
x = torch.randn(2, 16, 512)
# Forward pass
y = layer(x)
print(y.shape)torch.Size([2, 16, 512])
A lightweight transformer block with built-in normalization stability.
RenormTransformerLayer(
dim: int,
heads: int,
eps: float = 1e-5
)dim: Hidden dimension sizeheads: Number of attention headseps: Numerical stability constant
A stable replacement for torch.nn.Linear.
from renorm.layers import RenormLinearExample:
layer = RenormLinear(256, 128)
y = layer(torch.randn(4, 256))Automatically works across:
- CPU (Windows / Linux / Mac)
- CUDA (NVIDIA GPUs)
- Mixed environments (fallback-safe execution)
Example:
device = "cuda" if torch.cuda.is_available() else "cpu"
layer = RenormTransformerLayer(dim=512, heads=8).to(device)
x = torch.randn(2, 16, 512).to(device)
y = layer(x)Run this to verify installation:
python -c "from renorm import RenormTransformerLayer; print(RenormTransformerLayer(dim=256, heads=4))"Expected behavior: no errors and model prints successfully.
renorm-native uses a dual-path execution design:
-
CUDA Path (GPU):
- Optimized tensor execution path
- High-performance kernel routing (where available)
-
CPU Path (Fallback):
- Stable numerical execution engine
- Strict variance preservation for stability
This ensures consistent behavior across heterogeneous compute environments.
Prevents numerical collapse in deep stacks by maintaining bounded activation scaling.
Ensures gradient computation remains isolated from unsafe tensor views in dynamic graphs.
Same model behavior across CPU and GPU environments.
- Transformer models (LLMs)
- Time-series forecasting systems
- Anomaly detection pipelines
- Edge-device inference systems
- Low-memory GPU environments
- Requires PyTorch β₯ 2.0
- Python β₯ 3.10 recommended
- CUDA optional but supported
MIT License β see LICENSE for details.
Contributions, issues, and improvements are welcome.
Maintained by the renorm-native team.
renorm-native can be used in production systems requiring deterministic numerical stability under high load.
Typical deployment environments:
- GPU inference clusters (CUDA-enabled)
- On-prem ML pipelines
- Edge inference systems
- Distributed training environments (PyTorch DDP)
Some builds may enable enterprise validation for regulated or production deployments.
export RENORM_ENTERPRISE_KEY="your_token_here"base64_payload.hex_hmac_signature
from renorm.auth import check_enterprise_license
check_enterprise_license()| Condition | Behavior |
|---|---|
| Missing key | Raises PermissionError |
| Invalid signature | Raises PermissionError |
| Expired token | Raises TimeoutError |
Recommended structure in production pipelines:
import torch
from renorm import RenormTransformerLayer
def build_model():
model = RenormTransformerLayer(dim=1024, heads=16)
return model
def forward_pass(model, x):
return model(x)Run a deterministic sanity check:
python -c "
import torch
from renorm import RenormTransformerLayer
layer = RenormTransformerLayer(dim=256, heads=4)
x = torch.randn(2, 8, 256)
y = layer(x)
assert y.shape[-1] == 256
print('OK')
"renorm-native is optimized for:
- Stable forward/backward propagation under long sequence lengths
- Reduced numerical drift in deep stacks
- Consistent execution across heterogeneous compute backends
It is not intended as a raw speed-optimized kernel replacement for PyTorch primitives.
| Environment | Status |
|---|---|
| CPU (Windows) | β Supported |
| CPU (Linux) | β Supported |
| CUDA 11+ | β Supported |
| MPS (Apple Silicon) | |
| Distributed training (DDP) | β Compatible |
renorm-native prioritizes:
- Numerical correctness over raw speed
- Stability over aggressive optimization
- Cross-device consistency over hardware specialization
It is designed to behave predictably under:
- gradient explosion conditions
- low precision arithmetic
- fragmented tensor memory layouts
FROM pytorch/pytorch:2.2.0-cuda11.8-cudnn8-runtime
WORKDIR /app
RUN pip install renorm-native
COPY . .
CMD ["python", "main.py"]| Layer | Stability Score | NaN Rate |
|---|---|---|
| torch.nn.LayerNorm | baseline | medium under stress |
| renorm-native | improved | near-zero |
(Replace with your real measured results when ready β do NOT leave as-is in final production release if publishing publicly.)
Planned improvements:
- Distributed kernel optimization (multi-GPU aware routing)
- Expanded attention primitives
- Quantization-aware renormalization mode
- Torch compile integration (torch.compile support)
For production integration or enterprise deployment:
- GitHub Issues: https://github.com/Tobi-Adesoye/renorm-native
- Contact: Adesoyetobe@gmail.com