Foundry Local (Microsoft Azure AI Foundry Local)

Foundry Local is a lightweight SDK and runtime from Microsoft designed for building AI-powered applications with hardware-optimized, on-device inference. It prioritizes privacy, offline readiness, and low latency by keeping all data and processing on the local device.

Core Features

Privacy First: All inference happens locally; data never leaves the device.
Hardware Optimization: Automatically leverages NPUs, GPUs (via DirectML/CUDA/Metal), or CPUs.
OpenAI Compatibility: Provides a drop-in API replacement for existing OpenAI client libraries, making migration from cloud to local seamless.
Offline Ready: Operates without a network connection.

Implementation Models

Standalone: The application embeds the SDK and manages the model lifecycle directly.
Shared Service: Foundry Local runs as a background service, exposing an OpenAI-compatible REST API to multiple local applications.

Supported Models

Foundry Local focuses on optimized ONNX models, including:

Phi-3.5 (Microsoft)
Qwen 2.5 (Alibaba)
Whisper (OpenAI, for local transcription)

CLI & SDK

CLI: Managed via foundry model run <model-id>.
SDK: Available for C#, python, and other languages, allowing for programmatic model management and inference.

References

Source: 00_Raw/foundry-local.md
local-agent-environments
agentic-frameworks-moc

mcp-local-connections (Local-first security patterns)