# Azure Speech Services - Required Permissions ## Overview The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup. ## Required Azure Services ### 1. Azure Speech Services - **Service**: Speech Services (Cognitive Services) - **API Endpoints**: - Speech-to-Text: `https://{region}.stt.speech.microsoft.com` - Text-to-Speech: `https://{region}.tts.speech.microsoft.com` ### 2. Azure Translator - **Service**: Translator (Cognitive Services) - **API Endpoint**: `https://api.cognitive.microsofttranslator.com` ## Required Permissions ### Microsoft Graph API Permissions The user's Microsoft connection must have the following permissions: 1. **Cognitive Services User** (Application Permission) - Scope: `https://cognitiveservices.azure.com/.default` - Description: Access to Azure Cognitive Services 2. **Speech Services** (Application Permission) - Scope: `https://speech.cognitiveservices.azure.com/.default` - Description: Access to Azure Speech Services 3. **Translator Services** (Application Permission) - Scope: `https://cognitiveservices.azure.com/.default` - Description: Access to Azure Translator Services ### Alternative: Resource-Specific Permissions If using resource-specific permissions: 1. **Speech Services Resource** - Permission: `SpeechServices.ReadWrite` - Description: Read and write access to Speech Services 2. **Translator Resource** - Permission: `Translator.ReadWrite` - Description: Read and write access to Translator Services ## Azure Resource Setup ### 1. Create Azure Speech Services Resource ```bash # Using Azure CLI az cognitiveservices account create \ --name "poweron-speech" \ --resource-group "poweron-rg" \ --location "westeurope" \ --kind "SpeechServices" \ --sku "S0" ``` ### 2. Create Azure Translator Resource ```bash # Using Azure CLI az cognitiveservices account create \ --name "poweron-translator" \ --resource-group "poweron-rg" \ --location "westeurope" \ --kind "TextTranslation" \ --sku "S1" ``` ### 3. Get Access Keys ```bash # Get Speech Services key az cognitiveservices account keys list \ --name "poweron-speech" \ --resource-group "poweron-rg" # Get Translator key az cognitiveservices account keys list \ --name "poweron-translator" \ --resource-group "poweron-rg" ``` ## Authentication Methods ### 1. Access Key Authentication (Current Implementation) - Uses `Ocp-Apim-Subscription-Key` header - Requires access keys from Azure portal - Suitable for server-to-server authentication ### 2. Azure AD Authentication (Recommended for Production) - Uses Bearer token authentication - More secure and scalable - Requires Azure AD app registration ## Implementation Details ### Current Implementation The connector currently uses access key authentication: ```python headers = { "Ocp-Apim-Subscription-Key": self.access_token, "Ocp-Apim-Subscription-Region": self.region } ``` ### Future Enhancement: Azure AD Authentication For production use, consider implementing Azure AD authentication: ```python # Get token from Azure AD token = await get_azure_ad_token( client_id="your-app-id", client_secret="your-app-secret", tenant_id="your-tenant-id", scope="https://cognitiveservices.azure.com/.default" ) headers = { "Authorization": f"Bearer {token}", "Ocp-Apim-Subscription-Region": self.region } ``` ## Supported Features ### Speech-to-Text (STT) - **Languages**: 100+ languages supported - **Formats**: WAV, MP3, FLAC, OGG - **Quality**: Standard and Neural voices - **Real-time**: Streaming support available ### Text-to-Speech (TTS) - **Voices**: 400+ neural voices - **Languages**: 140+ languages - **Formats**: MP3, WAV, OGG, FLAC - **SSML**: Full SSML support ### Translation - **Languages**: 100+ languages - **Real-time**: Streaming translation - **Custom**: Custom translation models - **Batch**: Batch translation support ## Error Handling ### Common Error Codes - **401 Unauthorized**: Invalid access key or expired token - **403 Forbidden**: Insufficient permissions - **429 Too Many Requests**: Rate limit exceeded - **500 Internal Server Error**: Service unavailable ### Rate Limits - **Speech-to-Text**: 20 requests per second - **Text-to-Speech**: 20 requests per second - **Translation**: 2,000,000 characters per month (free tier) ## Security Considerations 1. **Access Key Rotation**: Regularly rotate access keys 2. **Network Security**: Use HTTPS only 3. **Data Privacy**: Audio data is processed in Azure 4. **Compliance**: Ensure GDPR/CCPA compliance 5. **Monitoring**: Enable Azure Monitor for usage tracking ## Cost Optimization ### Pricing Tiers - **Free Tier**: 5 hours STT, 5 hours TTS per month - **Standard Tier**: Pay-per-use pricing - **Enterprise**: Volume discounts available ### Best Practices 1. **Caching**: Cache TTS results for repeated text 2. **Compression**: Use compressed audio formats 3. **Batching**: Batch multiple requests when possible 4. **Monitoring**: Track usage and costs ## Troubleshooting ### Common Issues 1. **Invalid Region**: Ensure region matches resource location 2. **Expired Keys**: Check key expiration and rotation 3. **Quota Exceeded**: Monitor usage and upgrade if needed 4. **Network Issues**: Check firewall and proxy settings ### Debug Mode Enable debug logging to troubleshoot issues: ```python import logging logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG) ``` ## Next Steps 1. **Set up Azure resources** with required permissions 2. **Configure access keys** in the application 3. **Test connection** using the health check endpoint 4. **Monitor usage** and costs 5. **Implement caching** for better performance 6. **Add error handling** for production use