๐Ÿ 

Local LLM Hub

Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.

NVIDIA-first. Mac-strong. Pick your GPU, get your stack.

Quick Start: NVIDIA (8GB VRAM (RTX 3060/3070, RTX 4060))

Beginnergguf
ollama + llama.cpp
Quant: Q4_K_M
7Bใƒขใƒ‡ใƒซ(Q4)ใŒๅฟซ้ฉใ€‚13BใฏๅŽณใ—ใ„
GUIgguf
lm-studio + llama.cpp
Quant: Q4_K_M
GUIใง็ฐกๅ˜ใ€‚ใƒขใƒ‡ใƒซ็ฎก็†ใ‚‚ๆฅฝ
Powergguf, gptq
text-generation-webui + llama.cpp
Quant: Q4_K_M or GPTQ-4bit
็ดฐใ‹ใ„่จญๅฎšใŒๅฟ…่ฆใชไบบๅ‘ใ‘
25 local LLM tools
NameRoleBackendsFormatsScoreInstall
Text Generation WebUI
Gradio web UI for running Large Language Models
Launchercuda, metal, rocm...gguf, gptq...97 (A+)๐ŸŽ๐Ÿง๐ŸชŸ
llama.cpp
LLM inference in C/C++ with minimal dependencies
Enginecuda, metal, rocm...gguf, ggml93 (A+)๐ŸŽ๐Ÿง๐ŸชŸ
KoboldCpp
Easy-to-use AI text generation with llama.cpp backend
Launchercuda, metal, rocm...gguf, ggml92 (A+)๐ŸŽ๐Ÿง๐ŸชŸ
Ollama
Get up and running with large language models locally
Launchercuda, metal, rocm...gguf90 (A+)๐ŸŽ๐Ÿง๐ŸชŸ
Jan
Open-source ChatGPT alternative that runs offline
Launchercuda, metal, cpu...gguf87 (A)๐ŸŽ๐Ÿง๐ŸชŸ
LM Studio
Discover, download, and run local LLMs with a beautiful GUI
Launchercuda, metal, cpu...gguf87 (A)๐ŸŽ๐Ÿง๐ŸชŸ
Text Generation Inference
Hugging Face's production-ready LLM serving solution
Enginecuda, rocmsafetensors, gptq...78 (B+)๐Ÿง
vLLM
High-throughput LLM serving with PagedAttention
Enginecuda, rocmsafetensors, pytorch...78 (B+)๐Ÿง
LocalAI
Free, open-source OpenAI alternative with local inference
Launchercuda, metal, rocm...gguf, safetensors77 (B+)๐ŸŽ๐Ÿง
llamafile
Distribute and run LLMs with a single file
Enginecuda, metal, cpugguf75 (B+)๐ŸŽ๐Ÿง๐ŸชŸ
GPT4All
Free-to-use, locally running, privacy-aware chatbot
Launchercuda, metal, cpugguf72 (B)๐ŸŽ๐Ÿง๐ŸชŸ
Candle
Minimalist ML framework for Rust with GPU support
Enginecuda, metal, cpusafetensors, gguf70 (B)๐ŸŽ๐Ÿง
CTransformers
Python bindings for GGML models with GPU acceleration
Enginecuda, metal, cpugguf, ggml70 (B)๐ŸŽ๐Ÿง
MLC LLM
Machine Learning Compilation for LLMs
Enginecuda, metal, rocm...safetensors70 (B)๐ŸŽ๐Ÿง
ONNX Runtime
Cross-platform, high performance ML inferencing and training accelerator
Enginecuda, cpu, metalonnx68 (B-)๐ŸŽ๐Ÿง๐ŸชŸ
ExLlamaV2
Fast inference library for running LLMs locally on NVIDIA GPUs
Enginecudaexl2, safetensors65 (B-)๐Ÿง
Open WebUI
User-friendly WebUI for LLMs with Ollama/OpenAI support
Launchercuda, metal, rocm...gguf62 (C+)๐ŸŽ๐Ÿง
MLX
Apple's array framework for machine learning on Apple Silicon
Enginemetalmlx, safetensors60 (C+)๐ŸŽ
LLM (Python CLI)
Access large language models from the command-line
Toolcuda, metal, cpugguf52 (C-)๐ŸŽ๐Ÿง
GGUF
GPT-Generated Unified Format for efficient LLM storage
Formatcuda, metal, rocm...-โ€”
safetensors
Safe and fast tensor serialization format by Hugging Face
Formatcuda, metal, rocm...-โ€”
CUDA Runtime
NVIDIA's parallel computing platform for GPU acceleration
Backendcuda-โ€”๐Ÿง๐ŸชŸ
ROCm
AMD's open-source GPU computing platform
Backendrocm-โ€”๐Ÿง
Metal
Apple's GPU framework for Apple Silicon acceleration
Backendmetal-โ€”
Vulkan
Cross-platform GPU API for compute and graphics
Backendvulkan-โ€”