LM Studio Tutorial: How to Run AI Models Locally (2026 Guide)

LM Studio is the easiest way to run AI models on your own computer — no internet required, no API keys, no subscriptions.

You download models once, run them offline, and your conversations stay completely private. It works on Windows, Mac (Apple Silicon and Intel), and Linux.

This tutorial covers everything you need to get started: installing LM Studio, finding and downloading models, chatting locally, and using the built-in local server to connect other tools.

What Is LM Studio?

LM Studio is a desktop app that lets you download and run large language models (LLMs) directly on your hardware. Think of it as a ChatGPT-style interface — but it runs entirely on your machine.

Why run AI locally?

100% private — your chats never leave your computer
No API costs — no per-token billing
Works offline — useful for travel, airgapped systems, or slow internet
Full control — choose any model, tweak any setting

The tradeoff: local models require decent hardware, and they're generally slower than cloud APIs. But for many use cases — writing assistance, coding help, document Q&A — a good local model is more than enough.

Before you start: Check how much VRAM your GPU has — it determines which model sizes you can run. See our guide to checking VRAM for AI for a quick walkthrough.

System Requirements

Hardware	Minimum	Recommended
RAM	8 GB	16 GB+
GPU VRAM	0 GB (CPU-only)	6 GB+
Storage	10 GB free	50 GB+ for multiple models
OS	Windows 10+, macOS 12+, Ubuntu 20.04+	—

No GPU? No problem. LM Studio can run models on CPU only — it's slower, but works fine for smaller models (1–4B parameters) on any modern laptop.

Step 1: Download LM Studio

Go to lmstudio.ai
Click the download button for your OS (Windows .exe, macOS .dmg, or Linux .AppImage)
Install it like any normal app

On Mac with Apple Silicon (M1/M2/M3/M4), LM Studio is especially fast — Metal GPU acceleration is built in. If you're on a Mac, this is one of the smoothest local AI experiences available.

On Windows, you may need to allow the installer through Windows Defender. It's safe — the app is widely used and open source.

Need help with terminal or command-line tools? Check our terminal beginners guide for a quick overview.

Step 2: Find and Download a Model

When you open LM Studio for the first time, click the Search tab (magnifying glass icon on the left sidebar).

You can browse models directly from Hugging Face inside the app. Here's what to search for as a beginner:

Best Models to Start With

Model	Size (VRAM needed)	Best For
Llama 3.2 3B	~2 GB	Low-RAM laptops, CPU-only
Qwen2.5 7B	~5 GB	Great all-rounder for chatting, writing
Phi-4 Mini	~3 GB	Strong reasoning, small footprint
Mistral 7B Instruct	~5 GB	Instructions, coding, general use
Llama 3.1 8B	~6 GB	Best quality in small form factor

How to pick:

If you have 6 GB VRAM or less: start with a 3B or 4B model
If you have 8–12 GB VRAM: 7B or 8B models run well
If you have 16+ GB VRAM: you can run 13B models at full quality
CPU only: start with Phi-4 Mini or Llama 3.2 3B — they're fast even without a GPU

Downloading a Model

In the Search tab, type the model name (e.g., llama 3.1 8b)
You'll see several variants — look for Q4_K_M or Q5_K_M in the file name
Click Download

What's Q4_K_M? It's a quantization format — a compressed version of the model that uses less VRAM with minimal quality loss. Q4_K_M is the standard starting point. Q5_K_M is slightly larger but marginally better quality.

Download speed depends on your internet — models are 2–8 GB for small models, up to 40+ GB for large ones.

Step 3: Load and Chat

Once the download finishes:

Click the Chat tab (speech bubble icon)
At the top, click "Select a model to load" → choose your downloaded model
LM Studio loads it into RAM/VRAM (takes 5–30 seconds)
Type your message and press Enter

You're now chatting with a local AI — offline, private, and free.

Chat Settings Worth Knowing

In the right-side panel, you'll see configuration sliders:

Context Length — how many tokens the model can "remember" in a conversation. 4,096 is fine to start; 8,192 if you're pasting long documents.
Temperature — controls randomness. 0.7 is standard. Lower = more predictable; higher = more creative.
System Prompt — instructions given to the model before your chat starts. You can tell it to behave like a coding assistant, a writing editor, or anything else.

Step 4: Use the System Prompt

The system prompt is one of the most useful features in LM Studio.

Click System Prompt in the chat panel and paste something like:

For writing help:

You are a professional writing editor. Help the user improve clarity, tone, and structure. Ask clarifying questions when needed.

For coding:

You are an expert software engineer. Answer coding questions concisely with working code examples. Prefer Python unless otherwise specified.

For document Q&A (paste the document into your first message):

You are a research assistant. Answer questions based only on the document the user provides. If the answer isn't in the document, say so.

Step 5: Use the Local Server (Optional)

One of LM Studio's best features: it can run as a local API server, compatible with the OpenAI API format.

This means you can connect LM Studio to:

Open WebUI — a browser-based chat interface (see our Open WebUI setup guide)
AnythingLLM — for document Q&A with file uploads
Your own scripts — using Python's openai library pointed at localhost

Starting the Server

Click the Developer tab (code icon in the sidebar)
Toggle "Start Server" ON
The server starts on http://localhost:1234

Connecting from Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

print(response.choices[0].message.content)

The api_key value doesn't matter — LM Studio doesn't check it — but the library requires you to set something.

Step 6: Managing Your Models

Models take up storage. LM Studio stores them in:

Windows: C:\Users\<you>\.lmstudio\models
macOS: ~/.lmstudio/models

To delete a model you no longer need:

Go to the My Models tab
Right-click the model → Delete

Or just delete the files directly from your file system.

Pro tip: Keep 2–3 models loaded on disk — one small fast one for quick queries, one larger one for quality work.

Common Issues and Fixes

"The model is loading but nothing happens" → You may not have enough RAM. Close other apps and try a smaller model.

Very slow responses → You're likely running on CPU. This is normal for large models without a GPU. Try a smaller model (3B–4B) or enable GPU layers in settings if you have a GPU.

Model output looks garbled or repetitive → Try a different quantization variant (Q4_K_S instead of Q4_K_M), or reduce the context length.

GPU not being used (Windows) → In Settings → GPU, make sure your GPU is enabled and GPU layers is set to a positive number (try 35 for a 7B model).

Can't find a specific model → LM Studio searches Hugging Face. Try searching just the base name (e.g., qwen 2.5) instead of the full model ID.

What to Do Next

Once you're comfortable chatting locally, the natural next steps are:

Set up Open WebUI — nicer interface with conversation history, model switching, and file uploads
Try AnythingLLM — chat with your own documents (PDFs, Word files, websites) using your local model as the brain
Explore ComfyUI — if you want to extend into local image generation alongside local text AI

LM Studio + Open WebUI is the most popular local AI stack for beginners in 2026. They complement each other well.

FAQ

Q: Is LM Studio free?
A: Yes, completely free for personal use. The desktop app has no cost, no subscription, and no API fees.

Q: Do I need an internet connection after downloading the model?
A: No. Once the model is downloaded, everything runs 100% offline. LM Studio only needs internet to search for and download new models.

Q: What's the difference between LM Studio and Ollama?
A: Both run local models. LM Studio is a GUI app — easy to use without any command line. Ollama is command-line-first and more popular for developers who want to script and automate. Many people use both. LM Studio is the better starting point for beginners.

Q: Can I run multiple models at once?
A: You can have multiple models downloaded, but you generally load one at a time (the loaded model uses your RAM/VRAM). LM Studio Pro supports multi-model loading on systems with enough VRAM.

Q: What's a GGUF file?
A: The file format LM Studio uses for models. GGUF is the standard quantized model format for local AI — it's how large models are packaged to run efficiently on consumer hardware.

Q: Will local AI replace ChatGPT?
A: For most people, no — GPT-5 and Claude are still more capable for complex tasks. But local models are closing the gap fast, and for everyday tasks (writing, coding help, Q&A on your own documents) they're already competitive. The real advantage is privacy and zero cost.

Q: Is my data safe running locally?
A: Yes. When you chat through LM Studio, your messages are processed entirely on your machine. Nothing is sent to any server — no company sees your conversations.

Q: What's the best model for coding?
A: DeepSeek Coder V2 Lite or Qwen2.5 Coder 7B are the top performers in the 7B size range for code. Both are available on Hugging Face and downloadable directly through LM Studio's search.

LM Studio is the fastest path to running AI privately and for free. The hardest part is the first model download — after that, it just works.