Tiny but Mighty.
Impossibly Fast AI
for Apple Silicon.

A Zig-native GGUF inference engine built specifically for Apple Silicon. We deliver a single-binary, highly optimized CLI experience focusing exclusively on Apple Metal and GGUF files.

ziggy-llm logo

Core Architecture

memory

Apple Silicon Native

Built exclusively for Apple Silicon, making it incredibly fast for local AI tasks.

bolt

Metal First

Optimized specifically for Apple Metal to deliver the highest possible performance.

dataset

GGUF Only

A deliberate focus strictly on GGUF files for understandable and narrow local inference.

terminal

Single Binary

A highly optimized CLI experience. Just drop the binary and run with zero external dependencies.

Quick Start

Getting up and running takes just a few seconds. Ensure you have Zig 0.15.2 or newer installed.

terminal
# 1. Clone
git clone https://github.com/Alex188dot/ziggy-llm.git
cd ziggy-llm

# 2. Build
zig build -Doptimize=ReleaseFast

# 3. Chat
./zig-out/bin/ziggy-llm chat \
  --model path/to/model.gguf \
  --backend metal \
  --temperature 0 \
  --seed 42

Blazing Fast Performance

🏎️

Tokens per second (t/s) measured on MacBook Pro M3 18GB. Higher is better.

Model GGUF llama.cpp ZINC ziggy-llm
TinyLlama 1.1B Q4_K_M 151.4 ~123
Llama 3.2 3B Q4_K_M 53.5 ~48
Llama 3.1 8B Q4_K_M 23.1 ~10 ~22.4
Mistral 7B Q4_K_M 28.0 ~20
Ministral 3B Q4_K_M 43.7 ~45.5
Gemma 2 2B Q4_K_M ~48
Qwen3 1.7B Q4_K_M 92.0 ~65
Qwen3.8B Q4_K_M 25.0 ~8 ~17.5
Qwen3.5 2B Q4_K_M 62.4 ~48.9

Join the Community

🤝

ziggy-llm is open source and in active development, check out our issue tracker for things that need immediate attention:

radio_button_unchecked Implement OpenAI compatible server
radio_button_unchecked Add support for Qwen 3.5 (MoE and DeltaNet variants) and Gemma 4
radio_button_unchecked Make chat more robust
radio_button_unchecked Test all quants (currently tested only Q4_K_M and Q6_K)
radio_button_unchecked Test bigger models (of Qwen 3 and Llama families)
star

Support the Project

If you find this project interesting, please consider starring the repo. It genuinely helps us grow and reach more developers in the local AI ecosystem!

Star on GitHub