llama.cpp
LLM inference in C/C++ with minimal dependencies
C+
Score: 63/100
Type
Execution
aot
Interface
cli
About
llama.cpp is the de facto standard inference engine for running LLMs locally. Written in C/C++, it's designed for minimal dependencies and maximum portability. Supports GGUF format, extensive quantization options, and multiple backends including CUDA, Metal, ROCm, Vulkan, and CPU with AVX/AVX2/AVX512.
Performance
100ms
Cold Start
50MB
Base Memory
10ms
Startup Overhead
✓ Last Verified
Date: Jan 18, 2026
Version:
b4604Method: version check
brew install + llama-cli --version verified
Languages
CC++
Details
- Isolation
- process
- Maturity
- production
- License
- MIT