MLX-native inference. 47–125% faster than GGUF translation layers. OpenAI-compatible API. 168+ models. Zero config.
Every workflow fits in one command. See the real CLI in action.
MacBook Pro M4 Pro, 48 GB. Same models, same prompts, 3 runs averaged.
OpenAI API, Anthropic Messages API, Responses API. If it speaks HTTP, it works.
Streaming, tools, vision. Drop-in for any SDK — Python, Node, Go, Rust.
Claude Code runs on your local GPU. One command to launch.
Function calling in XML and JSON. Powers coding agents like Codex.
Images via mlx-vlm. Vectors for RAG. Same server, same API.
Llama, Qwen, Mistral, Phi, Gemma, DeepSeek. Curated registry with aliases.
LRU model cache, lazy loading. Every request logged to SQLite.
Stop paying per token for local tasks.