Remote GPU Server

Host a larger coding model on a workstation / server (e.g., 4060/3090/4090/M‑series Ultra) and connect to it from a separate machine running VS Code + Cline Local.

Who this is for: You have a capable GPU box for inference and want better quality/latency than a laptop can provide. Your client machine connects over LAN/VPN.

Topology

Client (VS Code + Cline Local) connects over network to the model server.

server:~ — remote
Client Machine VS Code + Cline Local GPU Server LM Studio API: http://0.0.0.0:1234 Model: Qwen2.5‑Coder 32B / GPT‑OSS‑120B LAN / VPN

Recommended Models

See Quick Tips for what to avoid (e.g., GPT‑OSS‑20B) and low‑resource local‑only fallbacks.

Server Setup (GPU box)

  1. Install LM Studio
    Download from https://lmstudio.ai and install on the GPU server.
  2. Download your model
    In LM Studio, search and download Qwen2.5‑Coder‑32B‑Instruct (or GPT‑OSS‑120B if your hardware supports it).
  3. Start the API server (listen on network)
    - Open Server tab in LM Studio
    - Host: 0.0.0.0 (listen on all interfaces)
    - Port: 1234
    - Enable CORS and keep‑alive
    - Start server; ensure the model is loaded
    Test from the server itself:
    curl http://127.0.0.1:1234/v1/models
  4. Find the server IP
    - Windows: ipconfig
    - Linux/macOS: ip addr or ifconfig
    Use the LAN/VPN IP reachable by the client.
  5. Open firewall for the port
    - Windows (PowerShell as admin):
    New-NetFirewallRule -DisplayName "LM Studio 1234" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 1234
    - Linux (ufw): sudo ufw allow 1234/tcp
  6. Optional: Reverse proxy + TLS
    If exposing beyond LAN, put NGINX/Caddy in front, terminate TLS, and restrict access (IP allowlists/VPN/auth). Prefer VPN (WireGuard/Tailscale) over direct WAN exposure.

Client Setup (VS Code machine)

  1. Install Cline Local (VSIX)
    Download the latest release VSIX from Releases.
    In VS Code: Extensions → ••• → Install from VSIX… → pick the file → Reload.
  2. Open Cline Local settings within VS Code.
  3. Provider: LM Studio (or OpenAI‑compatible if using an alternative server).
  4. Endpoint: http://SERVER_IP:1234 (replace SERVER_IP with the GPU server's IP).
  5. Model: the exact model name shown by the server (e.g., qwen2.5-coder-32b-instruct).
  6. Run a small coding task and verify token streaming.
Alternative servers: You can run an OpenAI‑compatible server (e.g., vLLM) on the GPU box and point Cline Local to it. Steps are similar: bind to 0.0.0.0, enable CORS, open firewall, and use the server IP in Cline settings.

Troubleshooting

Need quick model guidance? See Quick Tips.