🏠

Local LLM Hub

Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.

NVIDIA-first. Mac-strong. Pick your GPU, get your stack.

Quick Start: NVIDIA (8GB VRAM (RTX 3060/3070, RTX 4060))

Beginnergguf

ollama + llama.cpp

Quant: Q4_K_M

7B models (Q4) run smoothly. 13B is challenging

GUIgguf

lm-studio + llama.cpp

Quant: Q4_K_M

Easy GUI. Simple model management

Powergguf, gptq

text-generation-webui + llama.cpp

Quant: Q4_K_M or GPTQ-4bit

For users who need fine-grained control

10 local LLM tools

Name	Role	Backends	Formats	Score	Install
llama.cpp LLM inference in C/C++ with minimal dependencies	Engine	cuda, metal, rocm...	gguf, ggml	93 (A+)	🍎🐧🪟
Text Generation Inference Hugging Face's production-ready LLM serving solution	Engine	cuda, rocm	safetensors, gptq...	78 (B+)	🐧
vLLM High-throughput LLM serving with PagedAttention	Engine	cuda, rocm	safetensors, pytorch...	78 (B+)	🐧
llamafile Distribute and run LLMs with a single file	Engine	cuda, metal, cpu	gguf	75 (B+)	🍎🐧🪟
Candle Minimalist ML framework for Rust with GPU support	Engine	cuda, metal, cpu	safetensors, gguf	70 (B)	🍎🐧
CTransformers Python bindings for GGML models with GPU acceleration	Engine	cuda, metal, cpu	gguf, ggml	70 (B)	🍎🐧
MLC LLM Machine Learning Compilation for LLMs	Engine	cuda, metal, rocm...	safetensors	70 (B)	🍎🐧
ONNX Runtime Cross-platform, high performance ML inferencing and training accelerator	Engine	cuda, cpu, metal	onnx	68 (B-)	🍎🐧🪟
ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs	Engine	cuda	exl2, safetensors	65 (B-)	🐧
MLX Apple's array framework for machine learning on Apple Silicon	Engine	metal	mlx, safetensors	60 (C+)	🍎