By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Docker Model Runner Aims to Make it Easier to Run LLM Models Locally
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Docker Model Runner Aims to Make it Easier to Run LLM Models Locally
News

Docker Model Runner Aims to Make it Easier to Run LLM Models Locally

News Room
Last updated: 2025/04/22 at 9:17 AM
News Room Published 22 April 2025
Share
SHARE

Currently in preview with Docker Desktop 4.40 for macOS on Apple Silicon, Docker Model Runner allows developers to run models locally and iterate on application code using the local models— without disrupting their container-based workflows.

Using local LLMs for development offers several benefits, including lower costs, improved data privacy, reduced network latency, and greater control over the model.

Docker Model Runner addresses several pain points for developers integrating LLMs into containerized apps, such as dealing with different tools, configuring environments, and managing models outside of their containers. Additionally, there is no standard way to store, share, or serve models. To reduce the friction associated with that, Docker Model Runner includes

an inference engine as part of Docker Desktop, built on top of llama.cpp and accessible through the familiar OpenAI API. No extra tools, no extra setup, and no disconnected workflows. Everything stays in one place, so you can test and iterate quickly, right on your machine.

To avoid the typical performance overhead of virtual machines, Docker Model Runner uses host-based execution. This means models run directly on Apple Silicon and take advantage of GPU acceleration, which is crucial for inference speed and development cycle smoothness.

For model distribution, Docker is, unsurprisingly, betting on the OCI standard, the same specification that powers container distribution, aiming to unify both under a single workflow.

Today, you can easily pull ready-to-use models from Docker Hub. Soon, you’ll also be able to push your own models, integrate with any container registry, connect them to your CI/CD pipelines, and use familiar tools for access control and automation.

If you are using Docker Desktop 4.40 for macOS on Apple Silicon, you can use the docker model command, which supports a workflow quite similar to the one you are used to with images and containers. For example, you can pull a model and run it. To specify the exact model version, such as its size or quantization, docker model uses tags, e.g.:


docker model pull ai/smollm2:360M-Q4_K_M
docker model run ai/smollm2:360M-Q4_K_M "Give me a fact about whales."

However, the mechanics behind these commands are particular to models, as they do not actually create a container. Instead, the run command will delegate the inference task to an Inference Server running as a native process on top of llama.cpp. The inference server loads the model into memory and caches it for a set period of inactivity.

You can use Model Runner with any OpenAI-compatible client or framework via its OpenAI endpoint at http://model-runner.docker.internal/engines/v1 available from within containers. You can also reach this endpoint from the host, provided you enable TCP host access running docker desktop enable model-runner --tcp 12434.

Docker Hub hosts a variety of models ready to use for Model Runner, including smollm2 for on-device applications, as well as llama3.3 and gemma3. Docker has also published a tutorial on integrating Gemma 3 into a comment processing app using Model Runner. It walks through common tasks like configuring the OpenAI SDK to use local models, using the model itself to generate test data, and more.

Docker Model Runner isn’t the only option for running LLMs locally. If you’re not drawn to Docker’s container-centric approach, you might also be interested in checking out Ollama. It works as a standalone tool, has a larger model repository and community, and is generally more flexible for model customization. While Docker Model Runner is currently macOS-only, Ollama is cross-platform. However, although Ollama supports GPU acceleration on Apple Silicon when run natively, this isn’t available when running it inside a container.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Insta360’s new 360-degree action cam lets you replace damaged lenses
Next Article The Best Eco-Friendly Cleaning Products for Your Home
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

What is ‘Grow a Garden?’ The Roblox farming simulator exploding in popularity
News
Sam Altman says Meta tried and failed to poach OpenAI’s talent with $100M offers | News
News
China’s cyberspace chief makes Time100 AI list, along with Baichuan AI founder · TechNode
Computing
The 5 best TV shows of 2025 so far, now that we’re halfway through the year
News

You Might also Like

News

What is ‘Grow a Garden?’ The Roblox farming simulator exploding in popularity

2 Min Read
News

Sam Altman says Meta tried and failed to poach OpenAI’s talent with $100M offers | News

4 Min Read
News

The 5 best TV shows of 2025 so far, now that we’re halfway through the year

7 Min Read
News

Senate passes stablecoin framework in major crypto milestone

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?