🏠

Local LLM Hub

Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.

NVIDIA-first. Mac-strong. Pick your GPU, get your stack.

Quick Start: NVIDIA (8GB VRAM (RTX 3060/3070, RTX 4060))

Beginnergguf
ollama + llama.cpp
Quant: Q4_K_M
7B models (Q4) run smoothly. 13B is challenging
GUIgguf
lm-studio + llama.cpp
Quant: Q4_K_M
Easy GUI. Simple model management
Powergguf, gptq
text-generation-webui + llama.cpp
Quant: Q4_K_M or GPTQ-4bit
For users who need fine-grained control
10 local LLM tools
NameRoleBackendsFormatsScoreInstall
llama.cpp
LLM inference in C/C++ with minimal dependencies
Enginecuda, metal, rocm...gguf, ggml93 (A+)🍎🐧🪟
Text Generation Inference
Hugging Face's production-ready LLM serving solution
Enginecuda, rocmsafetensors, gptq...78 (B+)🐧
vLLM
High-throughput LLM serving with PagedAttention
Enginecuda, rocmsafetensors, pytorch...78 (B+)🐧
llamafile
Distribute and run LLMs with a single file
Enginecuda, metal, cpugguf75 (B+)🍎🐧🪟
Candle
Minimalist ML framework for Rust with GPU support
Enginecuda, metal, cpusafetensors, gguf70 (B)🍎🐧
CTransformers
Python bindings for GGML models with GPU acceleration
Enginecuda, metal, cpugguf, ggml70 (B)🍎🐧
MLC LLM
Machine Learning Compilation for LLMs
Enginecuda, metal, rocm...safetensors70 (B)🍎🐧
ONNX Runtime
Cross-platform, high performance ML inferencing and training accelerator
Enginecuda, cpu, metalonnx68 (B-)🍎🐧🪟
ExLlamaV2
Fast inference library for running LLMs locally on NVIDIA GPUs
Enginecudaexl2, safetensors65 (B-)🐧
MLX
Apple's array framework for machine learning on Apple Silicon
Enginemetalmlx, safetensors60 (C+)🍎