With 144GB of Vram what is biggest model we can train from scratch using tinybox. Will it be possible to train model similar to lets say llama 3 8B from scratch using this ? Here are some model params
llama3(
vocab_size=128_256,
num_layers=32,
num_heads=32,
num_kv_heads=8,
embed_dim=4096,
max_seq_len=8192,
intermediate_dim=14336,
attn_dropout=0.0,
norm_eps=1e-5,
rope_base=500000.0,
)
Now we need to also consider block size, batch size etc
With 144GB of Vram what is biggest model we can train from scratch using tinybox. Will it be possible to train model similar to lets say llama 3 8B from scratch using this ? Here are some model params
Now we need to also consider block size, batch size etc