📚

Recommended Stacks

Pre-configured combinations of launcher + engine + format + quantization for your hardware.

🟢 NVIDIA Stacks

NVIDIA GPU users - largest user base, clear VRAM-based recommendations

8GB VRAM (RTX 3060/3070, RTX 4060)

8GB VRAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q4_K_M
Install
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:3b
💡 7B models (Q4) run smoothly. 13B is challenging
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q4_K_M
Install
Download from lmstudio.ai
💡 Easy GUI. Simple model management
Power
Stack
text-generation-webui + llama.cpp
Formats
gguf, gptq
Quantization
Q4_K_M or GPTQ-4bit
💡 For users who need fine-grained control

12GB VRAM (RTX 3060 12GB, RTX 4070)

12GB VRAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q4_K_M to Q5_K_M
Install
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
💡 7-8B models (Q5/Q6) run smoothly. 13B possible
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q5_K_M
💡 13B models work well
Power
Stack
open-webui + ollama
Formats
gguf
Quantization
Q5_K_M
💡 ChatGPT-style UI + Ollama backend

16GB VRAM (RTX 4080, RTX 4070 Ti Super)

16GB VRAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q5_K_M to Q6_K
Install
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
💡 8B-13B run smoothly. Maximum performance with high-quality quants
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q6_K
💡 13B with high-quality quantization
Power
Stack
vllm + vllm
Formats
safetensors, awq
Quantization
AWQ-4bit
Install
pip install vllm
vllm serve meta-llama/Llama-3.1-8B-Instruct
💡 For production server use. Supports batching

24GB+ VRAM (RTX 3090, RTX 4090, A5000)

24GB VRAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q6_K to Q8
Install
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:70b-instruct-q4_K_M
💡 70B (Q4) works. Ultimate local setup
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q8
💡 High-quality quantization even for 70B
Power
Stack
vllm + vllm
Formats
safetensors, awq, gptq
Quantization
FP16 or AWQ
Install
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct --tensor-parallel-size 1
💡 70B FP16 possible. Production server ready