By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Apple shows how much faster the M5 runs local LLMs on MLX – 9to5Mac
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Apple shows how much faster the M5 runs local LLMs on MLX – 9to5Mac
News

Apple shows how much faster the M5 runs local LLMs on MLX – 9to5Mac

News Room
Last updated: 2025/11/20 at 4:21 PM
News Room Published 20 November 2025
Share
Apple shows how much faster the M5 runs local LLMs on MLX – 9to5Mac
SHARE

A new post on Apple’s Machine Learning Research blog shows how much the M5 Apple silicon improved over the M4 when it comes to running a local LLM. Here are the details.

A bit of context

A couple of years ago, Apple released MLX, which the company describes as “an array framework for efficient and flexible machine learning on Apple silicon”.

In practice, MLX is an open-source framework that helps developers build and run machine learning models natively on their Apple silicon Macs, supported by APIs and interfaces that are familiar to the AI world.

Here’s Apple again on MLX:

MLX is an open source array framework that is efficient, flexible, and highly tuned for Apple silicon. You can use MLX for a wide variety of applications ranging from numerical simulations and scientific computing to machine learning. MLX comes with built in support for neural network training and inference, including text and image generation. MLX makes it easy to generate text with or fine tune of large language models on Apple silicon devices.

MLX takes advantage of Apple silicon’s unified memory architecture. Operations in MLX can run on either the CPU or the GPU without needing to move memory around. The API closely follows NumPy and is both familiar and flexible. MLX also has higher level neural net and optimizer packages along with function transformations for automatic differentiation and graph optimization.

One of the MLX packages available today is MLX LM, which is meant for generating text and for fine-tuning language models on Apple silicon Macs.

With MLX LM, developers and users can download most models available on Hugging Face, and run them locally.

This framework even supports quantization, which is a compression method that enables large models to run while using less memory. This leads to faster inference, which is basically the step during which the model produces an answer to an input or a prompt.

M5 vs. M4

In its blog post, Apple showcases the inference performance gains of the new M5 chip, thanks to the chip’s new GPU Neural Accelerators, which Apple says “provide[s] dedicated matrix-multiplication operations, which are critical for many machine learning workloads.”

To illustrate the performance gains, Apple compared the time it took for multiple open models to generate the first token after receiving a prompt on an M4 and an M5 MacBook Pro, using MLX LM.

Or, as Apple put it:

We evaluate Qwen 1.7B and 8B, in native BF16 precision, and 4-bit quantized Qwen 8B and Qwen 14B models. In addition, we benchmark two Mixture of Experts (MoE): Qwen 30B (3B active parameters, 4-bit quantized) and GPT OSS 20B (in native MXFP4 precision). Evaluation is performed with mlx_lm.generate, and reported in terms of time to first token generation (in seconds), and generation speed (in terms of token/s). In all these benchmarks, the prompt size is 4096. Generation speed was evaluated when generating 128 additional tokens.

These were the results:

One important detail here is that LLM inference takes different approaches to generate the very first token, compared to how it works under the hood to generate subsequent tokens. In a nutshell, first token inference is compute-bound, while subsequent token generation is memory-bound.

This is why Apple also evaluated generation speed for 128 additional tokens, as described above. And in general, the M5 showed a 19-27% performance boost compared to the M4.

Here’s Apple on these results:

On the architectures we tested in this post, the M5 provides 19-27% performance boost compared to the M4, thanks to its greater memory bandwidth (120GB/s for the M4, 153GB/s for the M5, which is 28% higher). Regarding memory footprint, the MacBook Pro 24GB can easily hold a 8B in BF16 precision or a 30B MoE 4-bit quantized, keeping the inference workload under 18GB for both of these architectures.

Apple also compared the performance difference for image generation, and said that the M5 did the job more than 3.8x faster than the M4.

You can read Apple’s full blog post here, and you can learn more about MLX here.

Accessory deals on Amazon

Add 9to5Mac as a preferred source on Google
Add 9to5Mac as a preferred source on Google

FTC: We use income earning auto affiliate links. More.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article After Seven Years, Google Reinvents Android Navigation with Jetpack Navigation 3 After Seven Years, Google Reinvents Android Navigation with Jetpack Navigation 3
Next Article What are the latest Hootsuite product features? [Oct 2025] What are the latest Hootsuite product features? [Oct 2025]
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Pi App Studio Expands Beyond No-Code, Giving Creators and Developers More Room to Build | HackerNoon
Pi App Studio Expands Beyond No-Code, Giving Creators and Developers More Room to Build | HackerNoon
Computing
How Nano Banana changes Google Messages
How Nano Banana changes Google Messages
News
Microsoft’s AI-powered copy and paste can now use on-device AI
Microsoft’s AI-powered copy and paste can now use on-device AI
News
All Windows-based handheld gaming consoles are getting the Xbox Full Screen Experience — what you need to know
All Windows-based handheld gaming consoles are getting the Xbox Full Screen Experience — what you need to know
News

You Might also Like

How Nano Banana changes Google Messages
News

How Nano Banana changes Google Messages

3 Min Read
Microsoft’s AI-powered copy and paste can now use on-device AI
News

Microsoft’s AI-powered copy and paste can now use on-device AI

1 Min Read
All Windows-based handheld gaming consoles are getting the Xbox Full Screen Experience — what you need to know
News

All Windows-based handheld gaming consoles are getting the Xbox Full Screen Experience — what you need to know

3 Min Read
Amazon And Home Depot Are About To Face A Major Consumer Boycott – Here’s Why – BGR
News

Amazon And Home Depot Are About To Face A Major Consumer Boycott – Here’s Why – BGR

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?