By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Large Language Model Tutorial: 5 Ways to Run LLMs Locally
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Large Language Model Tutorial: 5 Ways to Run LLMs Locally
News

Large Language Model Tutorial: 5 Ways to Run LLMs Locally

News Room
Last updated: 2026/06/27 at 3:47 PM
News Room Published 27 June 2026
Share
Large Language Model Tutorial: 5 Ways to Run LLMs Locally
SHARE

LLM lists all available language models if necessary.

Photo: Sharon Machlis / IDG

To send a request to a local LLM, use the following syntax:

llm -m the-model-name "Your query"

What makes LLM’s user experience elegant is the fact that the tool automatically installs the GPT4All-LLM on your system if it is not present. The LLM plugin for Metas Llama models requires a little more setup work than in the case of GPT4All. The details can be found in the tool’s GitHub repository.

The LLM tool also has other functions, such as: argument-Flag that can be inherited from previous chat sessions and applied within a Python script.

3. Llama on Mac with Ollama

If you want it to be even easier than with the LLM (but can also accept limitations), this is an open source tool To be worth a look. This is currently available for macOS and Linux – according to those responsible, a Windows version is in development.

The installation takes just a few clicks – and although Ollama is also a command line tool, there is only one command:

ollama run model-name

If the model in question is not yet available on your system, it will be downloaded automatically. You can view the list of currently available LLMs online at any time.

This is what it looks like when Code Llama runs in an Ollama Terminal window.

This is what it looks like when Code Llama runs in an Ollama Terminal window.

Photo: Sharon Machlis / IDG

The README of the Ollama GitHub repo contains a helpful list of some model specifications and useful information about which models require how much memory. In our test, the Llama-LLM 7B Code performed surprisingly quickly and well (Mac M1). Although it is the smallest model in the Llama family, a question about R code (“Write R code for a ggplot2 chart with blue bars.”) didn’t phase it – even if the answer, or rather the code, was not perfect). Ollama also offers some additional functions, such as an integration option with LangChain.

4. Chat with documents via h2oGPT

h2o.ai has been working on automated machine learning for some time. It’s no surprise that the open source provider is now involved h2oGPT has also ventured into the area of ​​chatbot LLMs. This is available for download in a free trial version. This does not allow you to download the LLM onto your system. But you can use it to test whether the interface is something for you.

For a local version of the tool, clone the GitHub repository, create and activate a Python virtual environment, and then run the following five lines of code (which you can also find in the README):

pip install -r requirements.txt

pip install -r reqs_optional/requirements_optional_langchain.txt

pip install -r reqs_optional/requirements_optional_gpt4all.txt

python generate.py --base_model=llama --prompt_type=llama2 --model_path_llama=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf --max_seq_len=4096

This leads you to a “limited document query capability” and a Llama model of Meta. One more line of code is enough to make a local version and an application available at http://localhost:7860:

python generate.py --base_model="llama" --prompt_type=llama2

Without adding any additional data input, you can use the application as a general chatbot. If you upload your own data – such as documents – you can then ask specific questions about the content. Compatible file formats include, but are not limited to:

  • .pdf,

  • .csv,

  • .doc,

  • .txt and

  • .markdown.

h2oGPT’s interface also features an “Expert” tab that provides a range of configuration options for users who know what they are doing.

A look at that

A look at the “Expert” tab in h2oGPT.

Photo: Sharon Machlis / IDG

5. Query documents with PrivateGPT

With PrivateGPT you can query your documents in natural language. The documents in this application can contain several dozen different formats. According to the README for the project, the data should remain private and should never leave the execution environment. The tool also works without an internet connection.

PrivateGPT has scripts to:

  • read files,

  • to then subdivide them,

  • Create embeddings (numerical representations of text semantics) and

  • save them in a local Chroma Vector store.

When you ask a question, the app searches for relevant documents and sends only those to the LLM to generate an accurate answer. If you are comfortable with Python, you can clone the full PrivateGPT repository and run it locally. If this is not the case, a simplified version is also available on GitHub. The README file of the latter version contains detailed instructions that do not require any Python sysadmin knowledge.

PrivateGPT includes the features you would most likely imagine from a “chat with your documents” application in the terminal. However, the documentation warns against using the tool in production. If you do it anyway, you’ll quickly see why. Even the small model option ran very sluggishly on our home PC.

Further paths to the local LLM

There are other ways to run Large Language Models locally, from finished desktop apps to DIY scripts. A small selection:

Jan

This relatively young open source project aims to democratize access to artificial intelligence with “open, locally focused products.” The app is easy to download and install, the interface offers a good balance between customizability and usability. Choosing models is also intuitive with Jan. More than 30 AI models are available for download via the project’s hub shown in the screenshot below – others can be imported (in GGUF format). If your computer is too weak for certain LLMs, you will see this when selecting the model in the hub. Even if there is not enough RAM available (or is running out), you will receive a corresponding message.

A look at the Jan Project model hub.

A look at the Jan Project model hub.

Photo: Sharon Machlis IDG

Jan’s chat interface includes an area on the right where you can set system instructions for the LLM and adjust parameters. Provided there is enough RAM, the outputs are streamed relatively quickly. By the way, with Jan you can not only work locally, but also use OpenAI models from the cloud. In addition, the tool can be configured to work with remote or local API servers.

Jan's chat interface is detailed and easy to use.

Jan’s chat interface is detailed and easy to use.

Photo: Sharon Machlis IDG

Jan’s project documentation is still a bit sparse (as of April 2024). It’s just a good thing that the majority of the application is intuitive to use. A key advantage of Jan over LMStudio is that Jan is available as open source software under the AGPLv3 license. Therefore, unrestricted commercial use is permitted as long as all derivative works are also open source. Jan is available for Windows, macOS and Linux.

Nvidia ChatRTX

The Nvidia demo application ChatRTX is designed to answer questions about document directories. Since its launch in February 2024, the tool has used either the Mistral or Llama-2 LLM on a local basis. The hardware requirements: A Windows PC with GPU (Nvidia Geforce RTX 30 series or higher) and at least 8 GB of video RAM. With a download size of 35 GB, a robust internet connection is also recommended. Once the requirements are met and the application is unpacked, ChatRTX offers a simple interface that is easy and intuitive to use.

The interface of Nvidia's ChatRTX.

The interface of Nvidia’s ChatRTX.

Photo: Sharon Machlis IDG

Select an LLM and the path to your files, wait for the application to create embeds for your files – you can watch this process in the terminal window – and then ask your question. The answer contains links to the documents the model used to generate its output. The Nvidia app currently supports .txt, .pdf and .doc files as well as YouTube videos (via a URL).

A ChatRTX session with links to source documents.

A ChatRTX session with links to source documents.

Photo: Sharon Machlis IDG

You should note that the application does not search subdirectories – so you have to put all relevant files in one folder. If you want to add more documents to the directory, click the refresh button at the top right of the record to regenerate the embeds

llamafile

Mozilla’s llamafile allows developers to turn critical parts of large language models into executable files. This also includes software that can download LLM files in GGUF format, import them and run them in a local chat interface in the browser.

To run llamafile, download the current server version with (see README):

curl -L https://github.com/Mozilla-Ocho/llamafile/releases/download/0.1/llamafile-server-0.1 > llamafile

chmod +x llamafile

Then download a model of your choice. For this article we chose Zephyr and downloaded a version of Hugging Face (link goes directly to the GGUF download). After that’s done, run the model with:

./llamafile --model ./zephyr-7b-alpha.Q4_0.gguf

Now open it in your browser at http://127.0.0.1:8080. You will see an opening screen with various chat options:

As soon as you enter a query...

As soon as you enter a query…

Photo: Sharon Machlis / IDG

...the start screen transforms into a simple chatbot interface.

…the start screen transforms into a simple chatbot interface.

Photo: Sharon Machlis / IDG

While llamafile was extremely easy to get running on my Mac, we ran into some issues on Windows. Like ollama, llamafile is not the first choice when it comes to plug-and-play software for Windows.

LocalGPT

This offshoot of PrivateGPT offers more model options and also provides detailed instructions. A 17-minute video walkthrough is also available on YouTube.

LM Studio

Another desktop application we tested is LM Studio. It is characterized by a user-friendly, simple chat interface. However, when it comes to choosing a model, you are on your own. The fact that the Hugging Face Hub serves as the main source for model downloads within LM Studio doesn’t make things any better, as the selection is overwhelming.

LangChain

Another option: download Large Language Models for local use via the open source LangChain framework. However, this requires programming knowledge related to the LangChain ecosystem. Once you’re comfortable with this, consider taking a closer look at the following resources for local LLM operations:

OpenLLM is a standalone platform designed to deploy LLM-based applications in production. (fm)

This article originally appeared at our sister publication Infoworld.com.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Scientists touched the horizon of a black hole using gravitational waves Scientists touched the horizon of a black hole using gravitational waves
Next Article where to watch the free match live HD? 🔴 where to watch the free match live HD? 🔴
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

where to watch the free match live HD? 🔴
where to watch the free match live HD? 🔴
Mobile
Scientists touched the horizon of a black hole using gravitational waves
Scientists touched the horizon of a black hole using gravitational waves
Computing
Secret Surveillance: Government Breaks Silence on Silent Text Messages
Secret Surveillance: Government Breaks Silence on Silent Text Messages
Software
Fed up with AI, a programmer created a chat where 16,000 people pretend to be ChatGPT so they don’t have to use it
Fed up with AI, a programmer created a chat where 16,000 people pretend to be ChatGPT so they don’t have to use it
Gaming

You Might also Like

Is fast charging harmful to the cell phone battery?
News

Is fast charging harmful to the cell phone battery?

4 Min Read
Your Android phone has an expiration date: here’s how to find out
News

Your Android phone has an expiration date: here’s how to find out

3 Min Read
The blind spot of European industry
News

The blind spot of European industry

9 Min Read
Lancom becomes Rohde & Schwarz Networks and Cybersecurity
News

Lancom becomes Rohde & Schwarz Networks and Cybersecurity

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?