5.7 KiB
5.7 KiB
Azure Speech Services - Required Permissions
Overview
The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup.
Required Azure Services
1. Azure Speech Services
- Service: Speech Services (Cognitive Services)
- API Endpoints:
- Speech-to-Text:
https://{region}.stt.speech.microsoft.com - Text-to-Speech:
https://{region}.tts.speech.microsoft.com
- Speech-to-Text:
2. Azure Translator
- Service: Translator (Cognitive Services)
- API Endpoint:
https://api.cognitive.microsofttranslator.com
Required Permissions
Microsoft Graph API Permissions
The user's Microsoft connection must have the following permissions:
-
Cognitive Services User (Application Permission)
- Scope:
https://cognitiveservices.azure.com/.default - Description: Access to Azure Cognitive Services
- Scope:
-
Speech Services (Application Permission)
- Scope:
https://speech.cognitiveservices.azure.com/.default - Description: Access to Azure Speech Services
- Scope:
-
Translator Services (Application Permission)
- Scope:
https://cognitiveservices.azure.com/.default - Description: Access to Azure Translator Services
- Scope:
Alternative: Resource-Specific Permissions
If using resource-specific permissions:
-
Speech Services Resource
- Permission:
SpeechServices.ReadWrite - Description: Read and write access to Speech Services
- Permission:
-
Translator Resource
- Permission:
Translator.ReadWrite - Description: Read and write access to Translator Services
- Permission:
Azure Resource Setup
1. Create Azure Speech Services Resource
# Using Azure CLI
az cognitiveservices account create \
--name "poweron-speech" \
--resource-group "poweron-rg" \
--location "westeurope" \
--kind "SpeechServices" \
--sku "S0"
2. Create Azure Translator Resource
# Using Azure CLI
az cognitiveservices account create \
--name "poweron-translator" \
--resource-group "poweron-rg" \
--location "westeurope" \
--kind "TextTranslation" \
--sku "S1"
3. Get Access Keys
# Get Speech Services key
az cognitiveservices account keys list \
--name "poweron-speech" \
--resource-group "poweron-rg"
# Get Translator key
az cognitiveservices account keys list \
--name "poweron-translator" \
--resource-group "poweron-rg"
Authentication Methods
1. Access Key Authentication (Current Implementation)
- Uses
Ocp-Apim-Subscription-Keyheader - Requires access keys from Azure portal
- Suitable for server-to-server authentication
2. Azure AD Authentication (Recommended for Production)
- Uses Bearer token authentication
- More secure and scalable
- Requires Azure AD app registration
Implementation Details
Current Implementation
The connector currently uses access key authentication:
headers = {
"Ocp-Apim-Subscription-Key": self.access_token,
"Ocp-Apim-Subscription-Region": self.region
}
Future Enhancement: Azure AD Authentication
For production use, consider implementing Azure AD authentication:
# Get token from Azure AD
token = await get_azure_ad_token(
client_id="your-app-id",
client_secret="your-app-secret",
tenant_id="your-tenant-id",
scope="https://cognitiveservices.azure.com/.default"
)
headers = {
"Authorization": f"Bearer {token}",
"Ocp-Apim-Subscription-Region": self.region
}
Supported Features
Speech-to-Text (STT)
- Languages: 100+ languages supported
- Formats: WAV, MP3, FLAC, OGG
- Quality: Standard and Neural voices
- Real-time: Streaming support available
Text-to-Speech (TTS)
- Voices: 400+ neural voices
- Languages: 140+ languages
- Formats: MP3, WAV, OGG, FLAC
- SSML: Full SSML support
Translation
- Languages: 100+ languages
- Real-time: Streaming translation
- Custom: Custom translation models
- Batch: Batch translation support
Error Handling
Common Error Codes
- 401 Unauthorized: Invalid access key or expired token
- 403 Forbidden: Insufficient permissions
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Service unavailable
Rate Limits
- Speech-to-Text: 20 requests per second
- Text-to-Speech: 20 requests per second
- Translation: 2,000,000 characters per month (free tier)
Security Considerations
- Access Key Rotation: Regularly rotate access keys
- Network Security: Use HTTPS only
- Data Privacy: Audio data is processed in Azure
- Compliance: Ensure GDPR/CCPA compliance
- Monitoring: Enable Azure Monitor for usage tracking
Cost Optimization
Pricing Tiers
- Free Tier: 5 hours STT, 5 hours TTS per month
- Standard Tier: Pay-per-use pricing
- Enterprise: Volume discounts available
Best Practices
- Caching: Cache TTS results for repeated text
- Compression: Use compressed audio formats
- Batching: Batch multiple requests when possible
- Monitoring: Track usage and costs
Troubleshooting
Common Issues
- Invalid Region: Ensure region matches resource location
- Expired Keys: Check key expiration and rotation
- Quota Exceeded: Monitor usage and upgrade if needed
- Network Issues: Check firewall and proxy settings
Debug Mode
Enable debug logging to troubleshoot issues:
import logging
logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG)
Next Steps
- Set up Azure resources with required permissions
- Configure access keys in the application
- Test connection using the health check endpoint
- Monitor usage and costs
- Implement caching for better performance
- Add error handling for production use