Large Language Model Tutorial: 5 Ways To Run LLMs Locally

LLM lists all available language models if necessary.

Photo: Sharon Machlis / IDG

To send a request to a local LLM, use the following syntax:

llm -m the-model-name "Your query"

What makes LLM’s user experience elegant is the fact that the tool automatically installs the GPT4All-LLM on your system if it is not present. The LLM plugin for Metas Llama models requires a little more setup work than in the case of GPT4All. The details can be found in the tool’s GitHub repository.

The LLM tool also has other functions, such as: argument-Flag that can be inherited from previous chat sessions and applied within a Python script.

3. Llama on Mac with Ollama

If you want it to be even easier than with the LLM (but can also accept limitations), this is an open source tool To be worth a look. This is currently available for macOS and Linux – according to those responsible, a Windows version is in development.

The installation takes just a few clicks – and although Ollama is also a command line tool, there is only one command:

ollama run model-name

If the model in question is not yet available on your system, it will be downloaded automatically. You can view the list of currently available LLMs online at any time.

This is what it looks like when Code Llama runs in an Ollama Terminal window.

Photo: Sharon Machlis / IDG

The README of the Ollama GitHub repo contains a helpful list of some model specifications and useful information about which models require how much memory. In our test, the Llama-LLM 7B Code performed surprisingly quickly and well (Mac M1). Although it is the smallest model in the Llama family, a question about R code (“Write R code for a ggplot2 chart with blue bars.”) didn’t phase it – even if the answer, or rather the code, was not perfect). Ollama also offers some additional functions, such as an integration option with LangChain.

4. Chat with documents via h2oGPT

h2o.ai has been working on automated machine learning for some time. It’s no surprise that the open source provider is now involved h2oGPT has also ventured into the area of chatbot LLMs. This is available for download in a free trial version. This does not allow you to download the LLM onto your system. But you can use it to test whether the interface is something for you.

For a local version of the tool, clone the GitHub repository, create and activate a Python virtual environment, and then run the following five lines of code (which you can also find in the README):

pip install -r requirements.txt

pip install -r reqs_optional/requirements_optional_langchain.txt

pip install -r reqs_optional/requirements_optional_gpt4all.txt

python generate.py --base_model=llama --prompt_type=llama2 --model_path_llama=https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf --max_seq_len=4096

This leads you to a “limited document query capability” and a Llama model of Meta. One more line of code is enough to make a local version and an application available at http://localhost:7860:

python generate.py --base_model="llama" --prompt_type=llama2

Without adding any additional data input, you can use the application as a general chatbot. If you upload your own data – such as documents – you can then ask specific questions about the content. Compatible file formats include, but are not limited to:

.pdf,
.csv,
.doc,
.txt and
.markdown.

h2oGPT’s interface also features an “Expert” tab that provides a range of configuration options for users who know what they are doing.

A look at that — A look at the “Expert” tab in h2oGPT.

Photo: Sharon Machlis / IDG

5. Query documents with PrivateGPT

With PrivateGPT you can query your documents in natural language. The documents in this application can contain several dozen different formats. According to the README for the project, the data should remain private and should never leave the execution environment. The tool also works without an internet connection.

PrivateGPT has scripts to:

read files,
to then subdivide them,
Create embeddings (numerical representations of text semantics) and
save them in a local Chroma Vector store.

When you ask a question, the app searches for relevant documents and sends only those to the LLM to generate an accurate answer. If you are comfortable with Python, you can clone the full PrivateGPT repository and run it locally. If this is not the case, a simplified version is also available on GitHub. The README file of the latter version contains detailed instructions that do not require any Python sysadmin knowledge.

PrivateGPT includes the features you would most likely imagine from a “chat with your documents” application in the terminal. However, the documentation warns against using the tool in production. If you do it anyway, you’ll quickly see why. Even the small model option ran very sluggishly on our home PC.

Further paths to the local LLM

There are other ways to run Large Language Models locally, from finished desktop apps to DIY scripts. A small selection:

Jan

This relatively young open source project aims to democratize access to artificial intelligence with “open, locally focused products.” The app is easy to download and install, the interface offers a good balance between customizability and usability. Choosing models is also intuitive with Jan. More than 30 AI models are available for download via the project’s hub shown in the screenshot below – others can be imported (in GGUF format). If your computer is too weak for certain LLMs, you will see this when selecting the model in the hub. Even if there is not enough RAM available (or is running out), you will receive a corresponding message.

Photo: Sharon Machlis IDG

Jan’s chat interface includes an area on the right where you can set system instructions for the LLM and adjust parameters. Provided there is enough RAM, the outputs are streamed relatively quickly. By the way, with Jan you can not only work locally, but also use OpenAI models from the cloud. In addition, the tool can be configured to work with remote or local API servers.

Jan's chat interface is detailed and easy to use. — Jan’s chat interface is detailed and easy to use.

Photo: Sharon Machlis IDG

Jan’s project documentation is still a bit sparse (as of April 2024). It’s just a good thing that the majority of the application is intuitive to use. A key advantage of Jan over LMStudio is that Jan is available as open source software under the AGPLv3 license. Therefore, unrestricted commercial use is permitted as long as all derivative works are also open source. Jan is available for Windows, macOS and Linux.

Nvidia ChatRTX

The Nvidia demo application ChatRTX is designed to answer questions about document directories. Since its launch in February 2024, the tool has used either the Mistral or Llama-2 LLM on a local basis. The hardware requirements: A Windows PC with GPU (Nvidia Geforce RTX 30 series or higher) and at least 8 GB of video RAM. With a download size of 35 GB, a robust internet connection is also recommended. Once the requirements are met and the application is unpacked, ChatRTX offers a simple interface that is easy and intuitive to use.

The interface of Nvidia's ChatRTX. — The interface of Nvidia’s ChatRTX.

Photo: Sharon Machlis IDG

Select an LLM and the path to your files, wait for the application to create embeds for your files – you can watch this process in the terminal window – and then ask your question. The answer contains links to the documents the model used to generate its output. The Nvidia app currently supports .txt, .pdf and .doc files as well as YouTube videos (via a URL).

A ChatRTX session with links to source documents.

Photo: Sharon Machlis IDG

You should note that the application does not search subdirectories – so you have to put all relevant files in one folder. If you want to add more documents to the directory, click the refresh button at the top right of the record to regenerate the embeds

llamafile

Mozilla’s llamafile allows developers to turn critical parts of large language models into executable files. This also includes software that can download LLM files in GGUF format, import them and run them in a local chat interface in the browser.

To run llamafile, download the current server version with (see README):

curl -L https://github.com/Mozilla-Ocho/llamafile/releases/download/0.1/llamafile-server-0.1 > llamafile

chmod +x llamafile

Then download a model of your choice. For this article we chose Zephyr and downloaded a version of Hugging Face (link goes directly to the GGUF download). After that’s done, run the model with:

./llamafile --model ./zephyr-7b-alpha.Q4_0.gguf

Now open it in your browser at http://127.0.0.1:8080. You will see an opening screen with various chat options:

As soon as you enter a query... — As soon as you enter a query…

Photo: Sharon Machlis / IDG

...the start screen transforms into a simple chatbot interface. — …the start screen transforms into a simple chatbot interface.

Photo: Sharon Machlis / IDG

While llamafile was extremely easy to get running on my Mac, we ran into some issues on Windows. Like ollama, llamafile is not the first choice when it comes to plug-and-play software for Windows.

LocalGPT

This offshoot of PrivateGPT offers more model options and also provides detailed instructions. A 17-minute video walkthrough is also available on YouTube.

LM Studio

Another desktop application we tested is LM Studio. It is characterized by a user-friendly, simple chat interface. However, when it comes to choosing a model, you are on your own. The fact that the Hugging Face Hub serves as the main source for model downloads within LM Studio doesn’t make things any better, as the selection is overwhelming.

LangChain

Another option: download Large Language Models for local use via the open source LangChain framework. However, this requires programming knowledge related to the LangChain ecosystem. Once you’re comfortable with this, consider taking a closer look at the following resources for local LLM operations:

OpenLLM is a standalone platform designed to deploy LLM-based applications in production. (fm)

This article originally appeared at our sister publication Infoworld.com.