We currently have minimal docs covering training rationale, interesting optimizations, or longer term objectives, nor do we currently link to the resources that already exist (e.g. the padding-free Transformers blog post).
We should collect all the resources we have and any brain-dump-grade information we care to share into some public set.
We currently have minimal docs covering training rationale, interesting optimizations, or longer term objectives, nor do we currently link to the resources that already exist (e.g. the padding-free Transformers blog post).
We should collect all the resources we have and any brain-dump-grade information we care to share into some public set.