Running AI Models Locally: What They Are, Where to Find Them, and How to Get Started

By Corporal Punishment

on 04/03/2025

AI models are everywhere now. People are using them for creative writing, coding, making art, and answering your weirdest questions. But what if we told you you don't have to rely on the cloud anymore? With a decent PC, you can run these models locally and have total control with no subscriptions, token fees, or data leaks.

But with anything new, many concepts and terminology can get confusing, so let's look at what AI models are, where to find good ones, and how to install and use them. Bonus: no coding degree is required. We promise you can do this.

AI Models 101 — What They Are and Why You Should Care

Think of an AI model as more than a database of like items scraped from sources around the web. An AI model is more like a mega-powered brain trained on tons of data in that database. It doesn't follow rigid rules; it learns patterns and then helps to conclude an outcome. Feed it enough examples of cats to figure out what makes a cat a cat. Same with text, audio, and images. The model of those items then takes a prompt for you, processes it, and generates your answer.

Some models write stories, answer questions, crunch spreadsheets, or debug code. Others create "original" artwork or transcribe audio. There's a model for pretty much anything you can imagine at this point.

Why Bother Running AI Models Locally?

Running AI models on your own machine gives you three huge wins: privacy, speed, and control. Cloud services like ChatGPT or Midjourney work, but you're handing over your data, prompts, content, and wallet every time you use them. You also submit to their control what acceptable output is, which is a fancy way of saying censorship.

Running locally, you can train your model on your data, documents, and even your own photography that you have stored for decades on your device. That information stays with you and you only maintain privacy and ownership.

When using these services, typically, you "own" the generated material and inputs. But they reserve the right to use your material and your inputs. You could see how this would be bad if you sorted medical records or your company's proprietary information. Further, let's say you come up with the concept of how to make a time machine using online AI tools. Those tools can now use your idea to "train" their model on it, effectively releasing your new idea into the wild with no competition. This is exactly why Doc Brown never put the Fluxx Capa into a public AI model.

Running locally means your files and ideas stay yours. Once loaded, LLM responses are lightning-fast (with good hardware). And, best of all, no subscription fees --- unless you feel like renting a $20k GPU rig, but that's another story. https://lambdalabs.com/service/gpu-cloud/private-cloud

The trade-off? Like anything, there is some setup work involved and a beefy GPU if you want to run the more prominent models and those can cost a few bucks --- BUT I'm guessing more than a few of you reading this have some pretty beefy rigs running for your latest game builds, so that won't be an issue for you.

Where to Find AI Models — Beyond Just Hugging Face

Hugging Face https://huggingface.co/ is the easiest starting point. It's the GitHub of AI models, hosting thousands of free models covering text, images, audio, and science. Models come with descriptions, licenses, and sample code, making it newbie-friendly.

But it's not the only game in town. If you're hunting for more models or want cutting-edge stuff, check these out:

CivitAI — Massive community sharing custom Stable Diffusion models, LoRA add-ons, and image generation tools. If you're into art or character generation, this is gold.

Ollama Model Library — A growing collection of models optimized for Ollama's local LLM runner.

RedPajama — Open-source models focusing on training transparency and academic use.

GitHub — Some models are released directly on GitHub, especially the newer or experimental ones. Just search by task (like "open-source GPT model site:github.com").

Keep an eye on Reddit too — communities like r/LocalLLaMA often surface new model drops that aren't mainstream yet.

Downloading and Prepping the Model

Once you've picked a model, downloading it is usually simple. Just know that model files are huge and easily pass 30GB uncompressed. Many are quite a bit larger than that.

Always check the model card or README first. You'll find hardware requirements, what the model is good at, and licensing rules. Different licenses exist to give model creators control over how their work is used, just like any of the software you download here at MajorGeeks. Permissive licenses like MIT or Apache 2.0 allow free and commercial use, while restrictive ones like CC-BY-NC limit use to non-commercial projects. Some licenses are designed to try and protect against legal concerns like copyrighted training data, or others are designed to support a specific business model and reserve their commercial rights.

Installing and Running and AI Model

Ok, enough background. Let's get into it. If you like the old-school, manual setup, here's the route:

Step 1: Are you into command prompts?

Most LLM's run on Python, a powerful language used for web development and data. So you need that installed so your system has the files. You could install Python yourself and dive into the weeds of installing the dependencies and environment, if you enjoy research and frustration… but there’s really no need.

If you prefer the command line, grab Ollama from MajorGeeks. It handles the installation of Python, sets up your environment, and grabs all the things for you with far fewer headaches.

Step 2: Set Up a Local WebUI

If typing isn't your thing, you can install something like Oobabooga's Text Generation WebUI, LM Studio or KoboldCPP. These tools do all the hard back-end work for you. Installing the software to run the model. Then, it gives you a browser interface where you type prompts and get answers with no coding or command lines required. A WAY better experience.

For images, Automatic1111's Stable Diffusion WebUI is the gold standard, if not a bit finicky. It runs locally, handles LoRA models, and has tons of extensions.

Think of WEB UI apps like Steam for games. You load the WebUI, browse your 'library' of language models, tweak your settings, and launch intelligence instead of games.

Can You Actually Run It?

Here's where VRAM matters. A 7B model generally needs 12-16GB of VRAM. A 13B model? 24GB+. If you're running an RTX 4070 or better, you're fine for most smaller models — especially if you grab quantized versions. These models are shrunk down to 4-8GB VRAM but still perform surprisingly well. For image models like Stable Diffusion XL, you'll want a card with at least 12GB VRAM to get smooth performance.

How to Pick an AI Model

This can get really personal. Picking the right local model for the right task is all about knowing what you need and choosing a model trained to do that job well. Some models are built for handling natural language, others are fine-tuned to crank out clean code, and a few are designed to follow instructions with laser focus. Taking some time to read descriptions will save you some hassle. If you try to use a general-purpose model for technical writing or complex logic, end up with fluff instead of facts. Conversely, using a code-heavy model for creative writing your cool new sci-fi novel will sound like it was written by HAL. Before you fire up your WebUI and load a model, think about the task at hand and make sure the model you're using is trained and tuned for that kind of work. It'll save you time, give you better results, and make the whole local AI experience a lot smoother.

You also should always check out the specifications. Some models are small and efficient, designed to run smoothly on lower-end hardware, while others are absolute beasts that demand serious RAM and a capable GPU. For reference, 4-bit models are quantized for reduced size and performance efficiency, whereas 16-bit models retain full precision and detail. Always check the model size, context length, and quantization level before loading it up. A 4-bit quantized model might work great on a laptop, but a full 16-bit version could choke unless you've got a high-end rig.

Another thing to look at in the description is the "B" if you see 7B or 13B or 30B. This refers to billions of parameters, which are the adjustable values that a model uses to learn and make predictions. More parameters generally allow a model to learn more complex relationships, leading to better performance. But the more B's, the more processing power you will need. Our advice is to start small and work from there.

Five Models Worth Trying

Here's a short list of popular models that won't waste your time:

Llama-2-7B-Chat — Great for general conversation, Q&A, and writing help. Good performance locally when quantized.
Mistral-7 B-Instruct — New, open-source, and fast. Handles coding and writing tasks well. A favorite for lighter systems.

Stable Diffusion XL — For generating art, characters, and landscapes from text. Grab the 4-bit quantized version if you're running on mid-tier hardware.

Whisper-large-v2 — OpenAI's model for turning speech into text. Perfect if you want to transcribe podcasts or meetings.

Phi-2 — [Considered a SLM or Small Language Model by Microsoft, it's great for testing if you're low on VRAM or just starting out.

Final Thoughts

LLM's are fun, interesting and way more than just toys. You can use them to push your computer to actually BE a computer - instead of a social media consumption device. Once you dive into LLM, you open up a world of tools, from automating tasks to generating art and content; local AI models are crazy powerful once you get comfortable.

Running AI models locally feels a bit overwhelming at first. Frankly, you can also dive down some serious rabbit holes with models too. But that's half the fun! Once you school up, running your own models is far more doable than people think, especially now that you've got multiple paths to take to get started.

Either way, once you're rolling, you unlock private, fast AI with zero ongoing costs.

Want help fine-tuning or setting up your first batch of prompts? Just say the word we'll geek out together.

comments powered by Disqus