Two Models, One Keystroke

Ghost Pepper v2.0.1 is a macOS hold-to-talk tool that quietly chains WhisperKit and a local Qwen 3.5 model to transcribe and clean up speech without any cloud call. It's a small app, but a clear signal of where on-device AI composition is heading.

Read more →

Ollama Switches to MLX and Doubles Decode Speed

Ollama's preview MLX backend replaces direct Metal calls on Apple Silicon with Apple's dedicated ML framework, yielding a 93% decode speedup for Qwen3.5-35B-A3B on M5 chips. The update also adds NVFP4 quantization and a smarter KV cache — including prefix-aware eviction that keeps shared system prompts hot across conversations.

Read more →