AMD today announced “AMD-135M” as their first small language model they are publicly releasing. AMD-135M is open-source with the training code, dataset, and weights all being open-source to help in the development of other SLMs and LLMs.
AMD-135M features speculative decoding and was trained from scratch using AMD Instinct MI250 accelerators with 670 billion tokens. Training using four MI250 nodes took six days. There is also an AMD-Llama-135M-code variant that has an additional 20 billion tokens of code data. AMD-135M is based on the LLaMA2 model architecture.
AMD is making all of the AMD-135M model assets open-source in hopes of helping other AI development — and for AMD’s part, hoping that the training and inferencing is happening from AMD hardware.
More details on the AMD-135M SLM via the AMD blog. AMD-135M is available via HuggingFace and GitHub.