Azure AI Foundry
Recommended path. Deploy a model in your Azure AI Foundry project and connect it to Mosaic.
Mosaic connects to a model you've deployed in your own Azure AI Foundry project. Inference happens in your Azure subscription, your billing, your governance.
Prerequisites
An Azure subscription with permission to create AI Foundry resources
Familiarity with the Azure AI Foundry portal
A Tenant Admin account in Mosaic
Steps
Create or open an Azure AI Foundry project
In the Azure AI Foundry portal, create a project (or open an existing one). The project is the unit Mosaic will connect to.
If you don't already have a project: + Create project → choose a hub → name it (e.g., mosaic-inference) → create.
Deploy a model
Inside the project: Models + endpoints → Deploy model → pick a model. Mosaic works well with:
Claude (Sonnet or Opus) — best for the agent flows used by Variance Commentary, Sales Pulse, etc.
GPT-4 / GPT-4o — strong general-purpose default
Mistral / Llama — open-weight options if your governance prefers them
Give the deployment a name you'll recognise (e.g., claude-sonnet-prod).
Copy the endpoint URL and key
Once the deployment is running:
Endpoint URL — visible on the deployment detail page (e.g.,
https://<project>.openai.azure.com/)API Key — under Keys and endpoint for the project
Copy both. The API key is sensitive — do not share or commit to source control.
Connect Mosaic to your Foundry endpoint
In Mosaic:
Admin → AI Configuration
Add provider → Azure AI Foundry
Paste:
Endpoint URL
API Key
Deployment name (the one from step 2)
Click Test connection — Mosaic sends a tiny test prompt and verifies the response
Click Save
Mosaic now uses your Foundry endpoint for all AI inference across the tenant.
Verify
Open a Mosaic chat and ask any question. The agent reasoning should stream as expected. Check Admin → AI Sessions — the model name shown for the new session should match your Foundry deployment name.
Recommended Foundry settings
Content filter: Microsoft's default (Strict / Default / Off). Strict reduces false positives from analyst questions about sensitive topics; Default is the safer baseline for most tenants.
TPM (tokens per minute) quota: start at 100 K TPM and scale based on usage. Foundry shows usage trends in its monitoring dashboards.
Region: pick the region closest to your users. Mosaic's app servers are in India; latency is acceptable from any global Foundry region but lower from nearby ones.
Switching models
You can change the deployed model in your Foundry project at any time. Mosaic re-uses whatever the deployment-name resolves to. There's no Mosaic-side switch; just update Foundry.
What's next
Last updated