Claude Code Local — MLX On-Device AI

Skill by ara.so — Claude Code Skills collection.

Run Claude Code 100% on-device with local AI on Apple Silicon. This project provides an MLX-native Anthropic-API-compatible server that lets you use powerful local models (Qwen 3.5 122B at 65 tok/s, Llama 3.3 70B, Gemma 4 31B, DeepSeek V4 Flash with 1M context) as drop-in replacements for Claude. Built for privacy-critical workflows (NDA, legal, healthcare) where data cannot leave the device.

What It Does

MLX-native inference optimized for Apple Silicon (M1/M2/M3/M4)
Anthropic API compatibility — point Claude Code clients at localhost:8000
Four model options: Gemma 4 31B (fast), Qwen 3.5 122B (balanced), Llama 3.3 70B (dense), DeepSeek V4 Flash (1M context)
100% offline — works in airplane mode, airgap-ready
Voice mode — hands-free coding with on-device STT/TTS
Browser agent — remote access from any device on your LAN

claude-code-local-mlx

Claude Code Local — MLX On-Device AI

What It Does

Installation

Prerequisites