Local AI tools directory. Launchers, inference engines, model formats, and GPU backends for running LLMs on your hardware.
| Name | Role | Type | Exec | Languages | Score | Cold Start | Memory |
|---|---|---|---|---|---|---|---|
| GGUF GPT-Generated Unified Format for efficient LLM storage | Format | format | aot | Any | — | — | — |
| safetensors Safe and fast tensor serialization format by Hugging Face | Format | format | aot | Any | — | — | — |
| Metal Apple's GPU framework for Apple Silicon acceleration | Backend | backend | aot | Swift, Objective-C, C++ | — | — | — |
| CUDA Runtime NVIDIA's parallel computing platform for GPU acceleration | Backend | backend | aot | C, C++, Python | — | — | — |
| Vulkan Cross-platform GPU API for compute and graphics | Backend | backend | aot | C, C++ | — | — | — |
| llama.cpp LLM inference in C/C++ with minimal dependencies | Engine | engine | aot | C, C++ | C+ | 100ms | 50MB |
| ROCm AMD's open-source GPU computing platform | Backend | backend | aot | C, C++, Python | — | — | — |
| llamafile Distribute and run LLMs with a single file | Engine | engine | aot | C, C++ | C- | 500ms | 100MB |
| ONNX Runtime Cross-platform, high performance ML inferencing and training accelerator | Interop | engine | hybrid | Python, C++, C#, ... | C- | 500ms | 300MB |
| LLM (Python CLI) Access large language models from the command-line | Tool | tool | hybrid | Python | D | 500ms | 100MB |
| Ollama Get up and running with large language models locally | Launcher | launcher | hybrid | Python, JavaScript, Go | D | 1000ms | 500MB |
| Candle Minimalist ML framework for Rust with GPU support | Engine | engine | jit | Rust | D | 300ms | 200MB |
| ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs | Engine | engine | aot | Python, C++, CUDA | D | 1000ms | 300MB |
| MLX Apple's array framework for machine learning on Apple Silicon | Engine | engine | jit | Python, C++, Swift | D | 500ms | 200MB |
| CTransformers Python bindings for GGML models with GPU acceleration | Engine | engine | hybrid | Python, C++ | D | 800ms | 200MB |
| Open WebUI User-friendly WebUI for LLMs with Ollama/OpenAI support | UI | launcher | hybrid | Python, TypeScript | F | 3000ms | 500MB |
| Text Generation Inference Hugging Face's production-ready LLM serving solution | Serving | engine | hybrid | Rust, Python | F | 10000ms | 2000MB |
| KoboldCpp Easy-to-use AI text generation with llama.cpp backend | UI | launcher | hybrid | C++, Python | F | 1500ms | 400MB |
| MLC LLM Machine Learning Compilation for LLMs | Interop | engine | aot | Python, C++ | F | 2000ms | 500MB |
| LocalAI Free, open-source OpenAI alternative with local inference | Launcher | launcher | hybrid | Go, Python | F | 3000ms | 800MB |
| vLLM High-throughput LLM serving with PagedAttention | Serving | engine | jit | Python | F | 5000ms | 2000MB |
| GPT4All Free-to-use, locally running, privacy-aware chatbot | UI | launcher | hybrid | C++, Python | F | 2000ms | 600MB |
| Jan Open-source ChatGPT alternative that runs offline | UI | launcher | hybrid | TypeScript, Python | F | 2000ms | 600MB |
| LM Studio Discover, download, and run local LLMs with a beautiful GUI | UI | launcher | hybrid | Python | F | 2000ms | 800MB |
| Text Generation WebUI Gradio web UI for running Large Language Models | UI | launcher | hybrid | Python | F | 5000ms | 1000MB |