📚

Recommended Stacks

Pre-configured combinations of launcher + engine + format + quantization for your hardware.

🍎 Mac (Apple Silicon) Stacks

Apple Silicon users - unified memory advantage, MLX ecosystem

16GB RAM (M1/M2/M3 Base)

16GB RAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q4_K_M
Install
brew install ollama
ollama pull llama3.2:3b
💡 7B models (Q4) run smoothly. Metal supported
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q4_K_M
💡 Mac native. Easy to use
Apple Native
Stack
mlx-community + mlx
Formats
mlx
Quantization
4bit
Install
pip install mlx mlx-lm
mlx_lm.generate --model mlx-community/Llama-3.2-3B-Instruct-4bit
💡 Apple optimized. Limited model support

32GB RAM (M1/M2/M3 Pro, M2/M3 Max Base)

32GB RAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q4_K_M to Q5_K_M
Install
brew install ollama
ollama pull llama3.1:8b
💡 8B runs smoothly. 13B (Q4) possible
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q5_K_M
💡 Comfortable even with 13B models
Apple Native
Stack
mlx-community + mlx
Formats
mlx
Quantization
4bit-8bit
Install
pip install mlx mlx-lm
💡 8B models run fastest

64GB RAM (M2/M3 Max, M2/M3 Ultra Base)

64GB RAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q5_K_M to Q6_K
Install
brew install ollama
ollama pull llama3.1:70b-instruct-q4_K_M
💡 70B (Q4) works!
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q6_K
💡 Can handle 70B models
Apple Native
Stack
mlx-community + mlx
Formats
mlx
Quantization
4bit
Install
pip install mlx mlx-lm
mlx_lm.generate --model mlx-community/Llama-3.1-70B-Instruct-4bit
💡 70B 4bit fastest on Mac

128GB+ RAM (M2/M3 Ultra Max)

128GB RAM

Beginner
Stack
ollama + llama.cpp
Formats
gguf
Quantization
Q8
💡 70B Q8 runs easily
GUI
Stack
lm-studio + llama.cpp
Formats
gguf
Quantization
Q8
💡 70B with highest quality quantization
Apple Native
Stack
mlx-community + mlx
Formats
mlx
Quantization
8bit or FP16
💡 70B FP16 within reach