I’ve been using cloud-based chatbots for a long time now. Since large language models require serious computing power to run, they were basically the only option. But with LM Studio and quantized LLMs, I can now run decent models offline using the hardware I already own. What started as curiosity about local AI has turned into a powerful alternative that costs nothing, works without the internet, and gives me complete control over my AI interactions.
The shift became urgent after I experienced firsthand what happens when you accidentally share sensitive information with cloud AI. I ended up sharing my PIN with ChatGPT during what felt like a casual conversation. That moment of carelessness made me realize I was treating cloud AI like a digital notepad, not considering the security implications. LM Studio fixes these fundamental problems by bringing LLM AI capabilities directly to your desktop, eliminating both privacy risks and recurring costs.
LM Studio fixed local AI complexity
Running local LLMs is now easier than ever!
Before discovering LM Studio, I spent countless hours wrestling with newly built open-source tools. I’d find myself deep in GitHub repositories, reading long technical documentation, configuring Python environments that seemed to break with every update, and hunting for the right models at oobabooga’s page on Hugging Face. Just when I’d manage to get a working setup, the underlying tools would be deprecated or undergo major changes, requiring me to start the entire process over again.
LM Studio completely changed this experience by packaging everything into a polished desktop application that makes downloading and running large language models as simple as installing any other software. To run offline AI, you need two things: a quantized AI model and an interface tool like LM Studio. Quantized models are compressed versions of full AI models that maintain most of their capabilities while using dramatically fewer computer resources. Instead of needing expensive server-grade hardware, you can run sophisticated AI models on a regular laptop with a decent CPU and 16GB of RAM. With LM Studio, it is even possible to run an AI chatbot on old hardware!
One of my favorite quantized models to use with LM Studio is Dolphin3. Unlike mainstream AI models that come with extensive content filtering, Dolphin3 is designed to be genuinely helpful without arbitrary restrictions. It responds to requests that other models might refuse and provides straightforward answers without lecturing about potential misuse. For legitimate research, legal work, or deep conversations and advice, this uncensored AI model has quickly become one of my favorites.
Getting Dolphin3 running in minutes
Quick-and-easy startup guide
Setting up your offline AI assistant requires surprisingly little technical expertise. The entire process takes maybe 20 minutes, most of which is just waiting for downloads to complete.
First, download LM Studio from its official website and install it like any normal application. The software is compatible with Windows, Mac, and Linux, with Apple Silicon Macs performing particularly well in these types of AI inference tasks. Once installed, LM Studio opens to a clean interface with a search bar for finding models.
Search for “Dolphin3” and you’ll see several versions available. I recommend starting with the 8B parameter version if you have 16GB of RAM, or the smaller 3B version for computers with 8GB. The download size ranges from 2GB to 6GB, depending on which version you choose. LM Studio shows you exactly how much memory each model needs, taking the guesswork out of hardware compatibility.
After downloading completes, go to the Chat interface at the top right of the sidebar, then click on the Select a model to load button at the top center of the window. Your downloaded models will show as a dropdown. Select Dolphin3 to start loading the model. The loading process takes about thirty seconds, then you’re ready to start chatting. The interface feels familiar to anyone who’s used ChatGPT, with a message box at the bottom and conversation history above.
Here, I’ve asked Dolphin3 a question, and the performance was good. Not blazingly fast like ChatGPT or Claude, but pretty acceptable. As you can see, it was able to reply in around 11 seconds for a roughly 320-word (453-token) reply, keeping the conversation flowing smoothly without noticeable lag. Everything happens locally, so your response time is consistent regardless of your internet connection.
When you’re finished with your conversation, you can click the Eject button to completely eject Dolphin3 from memory. This instantly erases all traces of your conversation and frees up your system resources. Unlike cloud services that might retain your chat history indefinitely, ejecting the model gives you complete control over when your conversations are permanently deleted.
Why I love using Dolphin3
Fast, private, and surprisingly capable
Yes, it’s not going to be a great ChatGPT alternative for heavy-duty reasoning or the latest web-connected insights, but it makes up for it in other ways. Privacy-sensitive conversations top this list, where I can share my deepest thoughts and concerns without worrying about data retention policies or corporate surveillance. This includes personal reflections, relationship issues, or sensitive workplace situations that I would never trust to cloud services.
There are several other offline LLMs you can try right now, but I still stick to Dolphin3 because of its approach to content moderation. Being an uncensored model does not mean it ignores ethics or context. Since it builds on LLaMA, which was trained on large, diverse datasets, it still reflects a solid understanding of right and wrong. “Uncensored” simply means it can handle topics that other models might avoid, such as controversial politics or sensitive historical events. Unlike many AI assistants that feel like they’re constantly policing your words, Dolphin3 offers honest, straightforward answers without unnecessary restrictions. The result is a conversation that feels like talking with a knowledgeable friend rather than interacting with a corporate-sanitized chatbot. You can dive into complex subjects and ask uncomfortable questions without triggering safety lectures.
I also love the RAG functionality that LM Studio provides with Dolphin3. It allows me to analyze contracts, legal documents, and privacy policies that contain sensitive information. These documents often include confidential terms, proprietary clauses, or personal data that shouldn’t be shared with cloud services. Having an AI assistant that can parse complex legal language while keeping everything local provides an enormous value for freelancers and small business owners like me to handle sensitive documentation.
Brief coding assistance also works well, particularly for quick debugging help or explaining unfamiliar code patterns. While I wouldn’t rely on local AI for complex development projects, it excels at focused technical questions without exposing proprietary code to external servers.
Lastly, travel situations showcase another major advantage of offline AI. During long travels, remote work sessions, or areas with poor connectivity, having a fully functional AI assistant proves invaluable. I’ve used Dolphin3 to draft emails, analyze data, and solve problems while completely offline, something impossible with cloud-based alternatives.
I still use cloud-based AI
I can’t completely abandon cloud-based AI, and honestly, that wasn’t the goal. The fact is, to run truly powerful models, the only rational option is really to use cloud-based AI. I personally love using Perplexity for research and web-connected tasks where I need current information and broader knowledge bases. These services excel at tasks that require massive computational resources, real-time data, or the latest training.
What’s important is finding the right balance between cloud-based AI and offline AI to ensure maximum privacy, security, and less reliance on infrastructure. I use cloud services when I need cutting-edge capabilities, web search integration, or don’t have sensitive information involved. For everything else, especially conversations involving personal data, proprietary information, or situations where I need guaranteed availability, my local setup takes over.