A no-hype guide for experienced developers — March 2026 — compiled by Opus 4.5
localhost:11434 → back to your editor. That's itThis isn't "trust our privacy policy." There is no network connection. The model is a file on your disk, inference happens on your CPU/GPU, and the results go to your editor. Cloud AI tools (ChatGPT, Copilot) are fine for brainstorming generic questions, but you'd never point them at proprietary engine code. Local open-source models solve this completely. Studio IT can verify with a packet capture in 30 seconds.
Everything runs locally in Docker. No accounts, no API keys, no cost. Nuke it all with one command when done.
Download from docker.com/products/docker-desktop
# Mac (Apple Silicon — uses GPU automatically) docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama # PC with NVIDIA GPU — add --gpus all docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Download Qwen3-Coder 30B (~18GB, go get coffee) docker exec ollama ollama pull qwen3-coder:30b # Talk to it directly docker exec -it ollama ollama run qwen3-coder:30b # Try: "Write a C++ ACharacter subclass with replicated health, # a TakeDamage override, and a BlueprintCallable heal function" # Type /bye to exit
Once Ollama is running, any tool that speaks its API (localhost:11434) can use the model. Pick whichever fits your workflow:
qwen3-coder:30b. Gives you chat, inline edits, and autocompletelocalhost:11434. Any tool that supports custom OpenAI endpoints (many do) can connectGood first tests: ask it to scaffold a new actor component, write a spatial query, explain an engine function you're unfamiliar with, or generate test boilerplate for an existing class.
localhost:11434 is the only open port — only your machine can reach itProve to IT the model has zero internet access:
# Create an internal-only Docker network after pulling the model docker network create --internal ollama-sandbox docker network disconnect bridge ollama docker network connect ollama-sandbox ollama # Verify: this should fail docker exec ollama curl -s https://google.com || echo "No internet. Good."
docker rm -f ollama && docker volume rm ollama && docker network rm ollama-sandbox # Zero trace on your system except Docker Desktop itself.
Benchmarks put it at Claude Sonnet level. Solid C++ generation. On SecCodeBench it beats Claude Opus on secure code generation (61.2% vs 52.5%). You'll use the 30b variant locally — 30B total / 3.3B active, runs on 32GB RAM.
Best at "read a bug report, navigate a codebase, generate a working patch."
Strong reasoning, good at navigating large codebases. Solid all-around coder.
Swap models anytime: docker exec ollama ollama pull deepseek-v3.2 — any connected tool picks them up automatically.
The real value: AI handles the tedious 60% so you spend more time on the hard 40%.
Open-weight models (Apache 2.0, MIT) are yours once downloaded. No subscription, no API metering. The tooling (Ollama, Continue.dev, Aider) is free and community-maintained. Hardware is the only cost — and you already have it.
Proprietary services (OpenAI, Anthropic, Google) have to keep prices competitive because open-source alternatives are now this good. If cloud AI costs 10x what a local model does for 90% of the quality, developers will just run Qwen locally. The labs know this.
Two other forces: compute costs keep falling (MoE architectures mean you don't need a datacenter anymore), and the total addressable market is still massive and largely untapped globally. These companies are competing for hundreds of millions of developers who haven't adopted AI tools yet. That's not a market where you raise prices — that's a market where you race to make it accessible.
The top open model (Kimi K2.5, 76.8% SWE-bench) is within striking distance of the top proprietary ones (Claude Opus, ~80.8%). The gap narrows every quarter. And even worst case — if every AI company folded tomorrow — the models on your disk still work.
qwen3-coder:8b for faster responses at lower quality