AI launchers and inference engines for local LLM deployment.
| Name | Role | Type | Exec | Languages | Score | Cold Start | Memory |
|---|---|---|---|---|---|---|---|
| llama.cpp LLM inference in C/C++ with minimal dependencies | Engine | engine | aot | C, C++ | C+ | 100ms | 50MB |
| llamafile Distribute and run LLMs with a single file | Engine | engine | aot | C, C++ | C- | 500ms | 100MB |
| Candle Minimalist ML framework for Rust with GPU support | Engine | engine | jit | Rust | D | 300ms | 200MB |
| ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs | Engine | engine | aot | Python, C++, CUDA | D | 1000ms | 300MB |
| MLX Apple's array framework for machine learning on Apple Silicon | Engine | engine | jit | Python, C++, Swift | D | 500ms | 200MB |
| CTransformers Python bindings for GGML models with GPU acceleration | Engine | engine | hybrid | Python, C++ | D | 800ms | 200MB |