Tiny but Mighty.
Impossibly Fast AI
for Apple Silicon.
A Zig-native GGUF inference engine built specifically for Apple Silicon. We deliver a single-binary, highly optimized CLI experience focusing exclusively on Apple Metal and GGUF files.
Core Architecture
Apple Silicon Native
Built exclusively for Apple Silicon, making it incredibly fast for local AI tasks.
Metal First
Optimized specifically for Apple Metal to deliver the highest possible performance.
GGUF Only
A deliberate focus strictly on GGUF files for understandable and narrow local inference.
Single Binary
A highly optimized CLI experience. Just drop the binary and run with zero external dependencies.
Quick Start
Getting up and running takes just a few seconds. Ensure you have Zig 0.15.2 or newer installed.
# 1. Clone
git clone https://github.com/Alex188dot/ziggy-llm.git
cd ziggy-llm
# 2. Build
zig build -Doptimize=ReleaseFast
# 3. Chat
./zig-out/bin/ziggy-llm chat \
--model path/to/model.gguf \
--backend metal \
--temperature 0 \
--seed 42
Blazing Fast Performance
🏎️Tokens per second (t/s) measured on MacBook Pro M3 18GB. Higher is better.
| Model | GGUF | llama.cpp | ZINC | ziggy-llm |
|---|---|---|---|---|
| TinyLlama 1.1B | Q4_K_M | 151.4 | — | ~123 |
| Llama 3.2 3B | Q4_K_M | 53.5 | — | ~48 |
| Llama 3.1 8B | Q4_K_M | 23.1 | ~10 | ~22.4 |
| Mistral 7B | Q4_K_M | 28.0 | — | ~20 |
| Ministral 3B | Q4_K_M | 43.7 | — | ~45.5 |
| Gemma 2 2B | Q4_K_M | — | — | ~48 |
| Qwen3 1.7B | Q4_K_M | 92.0 | — | ~65 |
| Qwen3.8B | Q4_K_M | 25.0 | ~8 | ~17.5 |
| Qwen3.5 2B | Q4_K_M | 62.4 | — | ~48.9 |
Join the Community
🤝ziggy-llm is open source and in active development, check out our issue tracker for things that need immediate attention:
Support the Project
If you find this project interesting, please consider starring the repo. It genuinely helps us grow and reach more developers in the local AI ecosystem!
Star on GitHub