Microsoft has introduced Mu, a new small-scale language model designed to run locally on Neural Processing Units (NPUs), starting with its deployment in the Windows Settings application for Copilot+ PCs. The model allows users to control system settings using natural language, aiming to reduce reliance on cloud-based processing.
Mu is a 330 million parameter encoder–decoder transformer optimized for edge devices. According to Microsoft, this architecture reduces latency by reusing encoded input representations, unlike decoder-only models that must reprocess the full input-output sequence during generation. The result, the company says, is faster inference with lower memory overhead, meeting the performance needs for real-time interaction on personal devices.
Source: Microsoft Blog
Microsoft reports that on Qualcomm’s Hexagon NPU, Mu achieves a 47% reduction in first-token latency and nearly five times faster decoding compared to decoder-only models of similar size. Key features contributing to this include rotary positional embeddings (RoPE), grouped-query attention (GQA), dual LayerNorm, and model quantization techniques such as post-training quantization (PTQ) to 8- and 16-bit formats. These optimizations were developed in collaboration with chipmakers including AMD, Intel, and Qualcomm.
To adapt Mu for the Windows Settings agent, Microsoft fine-tuned the model on over 3.6 million examples spanning hundreds of adjustable settings. Training included synthetic data generation, noise injection, prompt tuning, and low-rank adaptation (LoRA). The result is a system that can map user input, such as “turn off Bluetooth” or “increase brightness,” to actionable system-level changes, with Microsoft stating that typical response times remain under 500 milliseconds.
The agent is currently available to Windows Insiders in the Dev Channel using Copilot+ devices. To deal with unclear input—like short or vague questions—Microsoft added a fallback system that shows regular search results when there isn’t enough context.
Industry observers have taken note of Mu’s potential. Michał Choiński, an AI researcher and developer, commented:
If Mu delivers consistently at that speed and scale, it could quietly redefine the desktop AI experience.
Muhammad Akif, a founder of Techling LLC, added:
If Mu maintains that level of performance, it could shift the AI narrative from ‘cloud-first’ to ‘device-smart.
George Draco, an AI solutions specialist, highlighted its broader implications:
Big leap for on-device AI. Offline speed with contextual memory changes how we think about productivity tools. Curious to see how Mu reshapes daily workflows.
Microsoft says it plans to expand support to more settings categories and improve performance on short queries, as Mu becomes a foundation for broader on-device AI capabilities.