Quick Tips

Model guidance at a glance. What to use, what to avoid, and what to try only when hardware is very limited.

Recommended for real work

Qwen2.5‑Coder 32B (or Qwen Coder 30B A3A) — excellent coding quality and a strong default choice.
GPT‑OSS‑120B — top‑tier quality; typically requires a proper GPU server, not a laptop.

If you have a capable workstation or server (e.g., 3090 / 4090 / A‑series / M‑series Ultra), consider hosting the model remotely and using the Remote GPU Server setup.

Not recommended for professional use

GPT‑OSS‑20B — generally insufficient in practice for real‑world coding workloads.

You can run GPT‑OSS‑20B as a last‑resort fallback on restricted hardware, but expect quality limitations.

Low‑resource fallbacks (local‑only)

These options are for single‑machine setups where VS Code + Cline Local + the model all run on the same laptop/PC. They are not intended for server mode.

Qwen 32B (4‑bit quantized) — reduced VRAM requirement at the cost of quality/latency.
GPT‑OSS‑20B — use only if you cannot run the recommended models.

Expect trade‑offs: slower token speeds, lower quality on complex tasks, and potential context limitations. If possible, prefer a remote GPU server and the recommended models.

Sizing notes

32B class: prefer ≥24 GB VRAM for smooth experience. 4‑bit quantization can reduce memory needs.
120B class: server‑grade hardware; plan for multi‑GPU or very high VRAM, and use the Remote GPU Server guide.

Provider & naming

LM Studio is a simple way to run models locally or on a server. Match the model name exactly as LM Studio displays it.
Cline Local supports LM Studio and OpenAI‑compatible endpoints. Configure the provider and endpoint in Cline settings.