Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.
NVIDIA-first. Mac-strong. Pick your GPU, get your stack.
| Name | Role | Backends | Formats | Score | Install |
|---|---|---|---|---|---|
| llama.cpp LLM inference in C/C++ with minimal dependencies | Engine | cuda, metal, rocm... | gguf, ggml | 93 (A+) | 🍎🐧🪟 |
| Text Generation Inference Hugging Face's production-ready LLM serving solution | Engine | cuda, rocm | safetensors, gptq... | 78 (B+) | 🐧 |
| vLLM High-throughput LLM serving with PagedAttention | Engine | cuda, rocm | safetensors, pytorch... | 78 (B+) | 🐧 |
| llamafile Distribute and run LLMs with a single file | Engine | cuda, metal, cpu | gguf | 75 (B+) | 🍎🐧🪟 |
| Candle Minimalist ML framework for Rust with GPU support | Engine | cuda, metal, cpu | safetensors, gguf | 70 (B) | 🍎🐧 |
| CTransformers Python bindings for GGML models with GPU acceleration | Engine | cuda, metal, cpu | gguf, ggml | 70 (B) | 🍎🐧 |
| MLC LLM Machine Learning Compilation for LLMs | Engine | cuda, metal, rocm... | safetensors | 70 (B) | 🍎🐧 |
| ONNX Runtime Cross-platform, high performance ML inferencing and training accelerator | Engine | cuda, cpu, metal | onnx | 68 (B-) | 🍎🐧🪟 |
| ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs | Engine | cuda | exl2, safetensors | 65 (B-) | 🐧 |
| MLX Apple's array framework for machine learning on Apple Silicon | Engine | metal | mlx, safetensors | 60 (C+) | 🍎 |