Chapter Navigation:
- 📚 Course Home: AZD For Beginners
- 📖 Current Chapter: Chapter 2 - AI-First Development
- ⬅️ Previous: Microsoft Foundry Integration
- ➡️ Next: AI Workshop Lab
- 🚀 Next Chapter: Chapter 3: Configuration
This guide provides comprehensive instructions for deploying AI models using AZD templates, covering everything from model selection to production deployment patterns.
Validation note (2026-03-25): The AZD workflow in this guide was checked against
azd1.23.12. For AI deployments that take longer than the default service deployment window, current AZD releases supportazd deploy --timeout <seconds>.
- Model Selection Strategy
- AZD Configuration for AI Models
- Deployment Patterns
- Model Management
- Production Considerations
- Monitoring and Observability
Choose the right model for your use case:
# azure.yaml - Model configuration
services:
ai-service:
project: ./infra
host: containerapp
config:
AZURE_OPENAI_MODELS: |
[
{
"name": "gpt-4.1-mini",
"version": "2024-07-18",
"deployment": "gpt-4.1-mini",
"capacity": 10,
"format": "OpenAI"
},
{
"name": "text-embedding-3-large",
"version": "1",
"deployment": "text-embedding-3-large",
"capacity": 30,
"format": "OpenAI"
}
]| Model Type | Use Case | Recommended Capacity | Cost Considerations |
|---|---|---|---|
| gpt-4.1-mini | Chat, Q&A | 10-50 TPM | Cost-effective for most workloads |
| gpt-4.1 | Complex reasoning | 20-100 TPM | Higher cost, use for premium features |
| text-embedding-3-large | Search, RAG | 30-120 TPM | Strong default choice for semantic search and retrieval |
| Whisper | Speech-to-text | 10-50 TPM | Audio processing workloads |
Create model deployments through Bicep templates:
// infra/main.bicep
@description('OpenAI model deployments')
param openAiModelDeployments array = [
{
name: 'gpt-4.1-mini'
model: {
format: 'OpenAI'
name: 'gpt-4.1-mini'
version: '2024-07-18'
}
sku: {
name: 'Standard'
capacity: 10
}
}
{
name: 'text-embedding-3-large'
model: {
format: 'OpenAI'
name: 'text-embedding-3-large'
version: '1'
}
sku: {
name: 'Standard'
capacity: 30
}
}
]
resource openAi 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
name: openAiAccountName
location: location
kind: 'OpenAI'
properties: {
customSubDomainName: openAiAccountName
networkAcls: {
defaultAction: 'Allow'
}
publicNetworkAccess: 'Enabled'
}
sku: {
name: 'S0'
}
}
@batchSize(1)
resource deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = [for deployment in openAiModelDeployments: {
parent: openAi
name: deployment.name
properties: {
model: deployment.model
}
sku: deployment.sku
}]Configure your application environment:
# .env configuration
AZURE_OPENAI_ENDPOINT=https://your-openai-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4.1-mini
AZURE_OPENAI_EMBED_DEPLOYMENT=text-embedding-3-large# azure.yaml - Single region
services:
ai-app:
project: ./src
host: containerapp
config:
AZURE_OPENAI_ENDPOINT: ${AZURE_OPENAI_ENDPOINT}
AZURE_OPENAI_CHAT_DEPLOYMENT: gpt-4.1-miniBest for:
- Development and testing
- Single-market applications
- Cost optimization
// Multi-region deployment
param regions array = ['eastus2', 'westus2', 'francecentral']
resource openAiMultiRegion 'Microsoft.CognitiveServices/accounts@2023-05-01' = [for region in regions: {
name: '${openAiAccountName}-${region}'
location: region
// ... configuration
}]Best for:
- Global applications
- High availability requirements
- Load distribution
Combine Microsoft Foundry Models with other AI services:
// Hybrid AI services
resource cognitiveServices 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
name: cognitiveServicesName
location: location
kind: 'CognitiveServices'
properties: {
customSubDomainName: cognitiveServicesName
}
sku: {
name: 'S0'
}
}
resource documentIntelligence 'Microsoft.CognitiveServices/accounts@2023-05-01' = {
name: documentIntelligenceName
location: location
kind: 'FormRecognizer'
properties: {
customSubDomainName: documentIntelligenceName
}
sku: {
name: 'S0'
}
}Track model versions in your AZD configuration:
{
"models": {
"chat": {
"name": "gpt-4.1-mini",
"version": "2024-07-18",
"fallback": "gpt-4.1"
},
"embedding": {
"name": "text-embedding-3-large",
"version": "1"
}
}
}Use AZD hooks for model updates:
#!/bin/bash
# hooks/predeploy.sh
echo "Checking model availability..."
az cognitiveservices account list-models \
--name $AZURE_OPENAI_ACCOUNT_NAME \
--resource-group $AZURE_RESOURCE_GROUP \
--query "[?name=='gpt-4.1-mini']"
# If the deployment takes longer than the default timeout
azd deploy --timeout 1800Deploy multiple model versions:
param enableABTesting bool = false
resource chatDeployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
parent: openAi
name: 'gpt-4.1-mini-${enableABTesting ? 'v1' : 'prod'}'
properties: {
model: {
format: 'OpenAI'
name: 'gpt-4.1-mini'
version: '2024-07-18'
}
}
sku: {
name: 'Standard'
capacity: enableABTesting ? 5 : 10
}
}Calculate required capacity based on usage patterns:
# Capacity calculation example
def calculate_required_capacity(
requests_per_minute: int,
avg_prompt_tokens: int,
avg_completion_tokens: int,
safety_margin: float = 0.2
) -> int:
"""Calculate required TPM capacity."""
total_tokens_per_request = avg_prompt_tokens + avg_completion_tokens
total_tpm = requests_per_minute * total_tokens_per_request
return int(total_tpm * (1 + safety_margin))
# Example usage
required_capacity = calculate_required_capacity(
requests_per_minute=10,
avg_prompt_tokens=500,
avg_completion_tokens=200,
safety_margin=0.3
)
print(f"Required capacity: {required_capacity} TPM")Configure auto-scaling for Container Apps:
resource containerApp 'Microsoft.App/containerApps@2024-03-01' = {
name: containerAppName
properties: {
template: {
scale: {
minReplicas: 1
maxReplicas: 10
rules: [
{
name: 'http-rule'
http: {
metadata: {
concurrentRequests: '10'
}
}
}
{
name: 'cpu-rule'
custom: {
type: 'cpu'
metadata: {
type: 'Utilization'
value: '70'
}
}
}
]
}
}
}
}Implement cost controls:
@description('Enable cost management alerts')
param enableCostAlerts bool = true
resource budgetAlert 'Microsoft.Consumption/budgets@2023-05-01' = if (enableCostAlerts) {
name: 'ai-budget-alert'
properties: {
timePeriod: {
startDate: '2024-01-01'
endDate: '2024-12-31'
}
timeGrain: 'Monthly'
amount: 1000
category: 'Cost'
notifications: {
Actual_GreaterThan_80_Percent: {
enabled: true
operator: 'GreaterThan'
threshold: 80
contactEmails: [
'admin@yourcompany.com'
]
}
}
}
}Configure monitoring for AI workloads:
resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
name: applicationInsightsName
location: location
kind: 'web'
properties: {
Application_Type: 'web'
WorkspaceResourceId: logAnalyticsWorkspace.id
}
}
// Custom metrics for AI models
resource aiMetrics 'Microsoft.Insights/components/analyticsItems@2020-02-02' = {
parent: applicationInsights
name: 'ai-model-metrics'
properties: {
content: '''
customEvents
| where name == "AI_Model_Request"
| extend model = tostring(customDimensions.model)
| extend tokens = toint(customDimensions.tokens)
| extend latency = toint(customDimensions.latency_ms)
| summarize
requests = count(),
avg_tokens = avg(tokens),
avg_latency = avg(latency)
by model, bin(timestamp, 5m)
'''
type: 'query'
scope: 'shared'
}
}Track AI-specific metrics:
# Custom telemetry for AI models
import logging
from applicationinsights import TelemetryClient
class AITelemetry:
def __init__(self, instrumentation_key: str):
self.client = TelemetryClient(instrumentation_key)
def track_model_request(self, model: str, tokens: int, latency_ms: int, success: bool):
"""Track AI model request metrics."""
self.client.track_event(
'AI_Model_Request',
{
'model': model,
'tokens': str(tokens),
'latency_ms': str(latency_ms),
'success': str(success)
}
)
def track_model_error(self, model: str, error_type: str, error_message: str):
"""Track AI model errors."""
self.client.track_exception(
type=error_type,
value=error_message,
properties={
'model': model,
'component': 'ai_model'
}
)Implement AI service health monitoring:
# Health check endpoints
from fastapi import FastAPI, HTTPException
import httpx
app = FastAPI()
@app.get("/health/ai-models")
async def check_ai_models():
"""Check AI model availability."""
try:
# Test OpenAI connection
async with httpx.AsyncClient() as client:
response = await client.get(
f"{AZURE_OPENAI_ENDPOINT}/openai/deployments",
headers={"api-key": AZURE_OPENAI_API_KEY}
)
if response.status_code == 200:
return {"status": "healthy", "models": response.json()}
else:
raise HTTPException(status_code=503, detail="AI models unavailable")
except Exception as e:
raise HTTPException(status_code=503, detail=f"Health check failed: {str(e)}")- Review the Microsoft Foundry Integration Guide for service integration patterns
- Complete the AI Workshop Lab for hands-on experience
- Implement Production AI Practices for enterprise deployments
- Explore the AI Troubleshooting Guide for common issues
- Microsoft Foundry Models Model Availability
- Azure Developer CLI Documentation
- Container Apps Scaling
- AI Model Cost Optimization
Chapter Navigation:
- 📚 Course Home: AZD For Beginners
- 📖 Current Chapter: Chapter 2 - AI-First Development
- ⬅️ Previous: Microsoft Foundry Integration
- ➡️ Next: AI Workshop Lab
- 🚀 Next Chapter: Chapter 3: Configuration