Skip to content

MNIST Trainer bug fix#12

Open
longforu wants to merge 1 commit into
tinygrad:mainfrom
longforu:fix-mnist-trainer
Open

MNIST Trainer bug fix#12
longforu wants to merge 1 commit into
tinygrad:mainfrom
longforu:fix-mnist-trainer

Conversation

@longforu

@longforu longforu commented Aug 1, 2025

Copy link
Copy Markdown

Potential bug fix for MNIST trainer.

When cloning the repo and running it on the latest numpy, this error occurs:

Traceback (most recent call last):
  File "/Users/longtran/Programs/teenygrad/mnist.py", line 90, in <module>
    train(model, X_train, Y_train, optimizer, steps=100)
    ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/longtran/Programs/teenygrad/mnist.py", line 22, in train
    loss = lossfn(out, y)
  File "/Users/longtran/Programs/teenygrad/mnist.py", line 10, in <lambda>
    def train(model, X_train, Y_train, optim, steps, BS=128, lossfn=lambda out,y: out.sparse_categorical_crossentropy(y),
                                                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/Users/longtran/Programs/teenygrad/teenygrad/tensor.py", line 794, in sparse_categorical_crossentropy
    loss_mask = Y != ignore_index
                ^^^^^^^^^^^^^^^^^
  File "/Users/longtran/Programs/teenygrad/teenygrad/tensor.py", line 753, in __ne__
    def __ne__(self, x) -> Tensor: return (self<x) + (self>x)   # type: ignore
                                           ^^^^^^
  File "/Users/longtran/Programs/teenygrad/teenygrad/tensor.py", line 749, in __lt__
    def __lt__(self, x) -> Tensor: return mlops.Less.apply(*self._broadcasted(x, False))
                                                            ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/Users/longtran/Programs/teenygrad/teenygrad/tensor.py", line 661, in _broadcasted
    y = Tensor(y, device=self.device, requires_grad=False, dtype=self.dtype if self.dtype != dtypes.bool and self.dtype.__class__ is not ImageDType else dtypes.float32)
  File "/Users/longtran/Programs/teenygrad/teenygrad/tensor.py", line 62, in __init__
    data = LazyBuffer.loadop(LoadOps.CONST, tuple(), dtype or Tensor.default_type, device, data)
  File "/Users/longtran/Programs/teenygrad/teenygrad/lazy.py", line 35, in loadop
    elif op == LoadOps.CONST: return LazyBuffer(np.full(shape, arg, dtype=dtype.np))
                                                ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/longtran/Programs/teenygrad/env/lib/python3.13/site-packages/numpy/_core/numeric.py", line 387, in full
    multiarray.copyto(a, fill_value, casting='unsafe')
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer -1 out of bounds for uint8

This is because there is a comparison between an unsigned int type tensor with -1 which cannot be broadcasted into a signed int. This can be fixed with a cast.

The other change is to the loss function. Cross entropy loss should be - logprob. Without this change the trainer did not converge for me.

@syzygy137

Copy link
Copy Markdown

I got this error too

@0danylo 0danylo left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed it for me as well. Could help for people looking at this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants