Quick Tips
Model guidance at a glance. What to use, what to avoid, and what to try only when hardware is very limited.
Recommended for real work
- Qwen2.5‑Coder 32B (or Qwen Coder 30B A3A) — excellent coding quality and a strong default choice.
- GPT‑OSS‑120B — top‑tier quality; typically requires a proper GPU server, not a laptop.
If you have a capable workstation or server (e.g., 3090 / 4090 / A‑series / M‑series Ultra), consider hosting
the model remotely and using the Remote GPU Server setup.
Not recommended for professional use
- GPT‑OSS‑20B — generally insufficient in practice for real‑world coding workloads.
You can run GPT‑OSS‑20B as a last‑resort fallback on restricted hardware, but expect quality limitations.
Low‑resource fallbacks (local‑only)
These options are for single‑machine setups where VS Code + Cline Local + the model all run on the same laptop/PC. They are not intended for server mode.
- Qwen 32B (4‑bit quantized) — reduced VRAM requirement at the cost of quality/latency.
- GPT‑OSS‑20B — use only if you cannot run the recommended models.
Expect trade‑offs: slower token speeds, lower quality on complex tasks, and potential context limitations. If
possible, prefer a remote GPU server and the recommended models.
Sizing notes
- 32B class: prefer ≥24 GB VRAM for smooth experience. 4‑bit quantization can reduce memory needs.
- 120B class: server‑grade hardware; plan for multi‑GPU or very high VRAM, and use the Remote GPU Server guide.
Provider & naming
- LM Studio is a simple way to run models locally or on a server. Match the model name exactly as LM Studio displays it.
- Cline Local supports LM Studio and OpenAI‑compatible endpoints. Configure the provider and endpoint in Cline settings.