203 lines
5.7 KiB
Markdown
203 lines
5.7 KiB
Markdown
# Azure Speech Services - Required Permissions
|
|
|
|
## Overview
|
|
|
|
The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup.
|
|
|
|
## Required Azure Services
|
|
|
|
### 1. Azure Speech Services
|
|
- **Service**: Speech Services (Cognitive Services)
|
|
- **API Endpoints**:
|
|
- Speech-to-Text: `https://{region}.stt.speech.microsoft.com`
|
|
- Text-to-Speech: `https://{region}.tts.speech.microsoft.com`
|
|
|
|
### 2. Azure Translator
|
|
- **Service**: Translator (Cognitive Services)
|
|
- **API Endpoint**: `https://api.cognitive.microsofttranslator.com`
|
|
|
|
## Required Permissions
|
|
|
|
### Microsoft Graph API Permissions
|
|
The user's Microsoft connection must have the following permissions:
|
|
|
|
1. **Cognitive Services User** (Application Permission)
|
|
- Scope: `https://cognitiveservices.azure.com/.default`
|
|
- Description: Access to Azure Cognitive Services
|
|
|
|
2. **Speech Services** (Application Permission)
|
|
- Scope: `https://speech.cognitiveservices.azure.com/.default`
|
|
- Description: Access to Azure Speech Services
|
|
|
|
3. **Translator Services** (Application Permission)
|
|
- Scope: `https://cognitiveservices.azure.com/.default`
|
|
- Description: Access to Azure Translator Services
|
|
|
|
### Alternative: Resource-Specific Permissions
|
|
If using resource-specific permissions:
|
|
|
|
1. **Speech Services Resource**
|
|
- Permission: `SpeechServices.ReadWrite`
|
|
- Description: Read and write access to Speech Services
|
|
|
|
2. **Translator Resource**
|
|
- Permission: `Translator.ReadWrite`
|
|
- Description: Read and write access to Translator Services
|
|
|
|
## Azure Resource Setup
|
|
|
|
### 1. Create Azure Speech Services Resource
|
|
```bash
|
|
# Using Azure CLI
|
|
az cognitiveservices account create \
|
|
--name "poweron-speech" \
|
|
--resource-group "poweron-rg" \
|
|
--location "westeurope" \
|
|
--kind "SpeechServices" \
|
|
--sku "S0"
|
|
```
|
|
|
|
### 2. Create Azure Translator Resource
|
|
```bash
|
|
# Using Azure CLI
|
|
az cognitiveservices account create \
|
|
--name "poweron-translator" \
|
|
--resource-group "poweron-rg" \
|
|
--location "westeurope" \
|
|
--kind "TextTranslation" \
|
|
--sku "S1"
|
|
```
|
|
|
|
### 3. Get Access Keys
|
|
```bash
|
|
# Get Speech Services key
|
|
az cognitiveservices account keys list \
|
|
--name "poweron-speech" \
|
|
--resource-group "poweron-rg"
|
|
|
|
# Get Translator key
|
|
az cognitiveservices account keys list \
|
|
--name "poweron-translator" \
|
|
--resource-group "poweron-rg"
|
|
```
|
|
|
|
## Authentication Methods
|
|
|
|
### 1. Access Key Authentication (Current Implementation)
|
|
- Uses `Ocp-Apim-Subscription-Key` header
|
|
- Requires access keys from Azure portal
|
|
- Suitable for server-to-server authentication
|
|
|
|
### 2. Azure AD Authentication (Recommended for Production)
|
|
- Uses Bearer token authentication
|
|
- More secure and scalable
|
|
- Requires Azure AD app registration
|
|
|
|
## Implementation Details
|
|
|
|
### Current Implementation
|
|
The connector currently uses access key authentication:
|
|
|
|
```python
|
|
headers = {
|
|
"Ocp-Apim-Subscription-Key": self.access_token,
|
|
"Ocp-Apim-Subscription-Region": self.region
|
|
}
|
|
```
|
|
|
|
### Future Enhancement: Azure AD Authentication
|
|
For production use, consider implementing Azure AD authentication:
|
|
|
|
```python
|
|
# Get token from Azure AD
|
|
token = await get_azure_ad_token(
|
|
client_id="your-app-id",
|
|
client_secret="your-app-secret",
|
|
tenant_id="your-tenant-id",
|
|
scope="https://cognitiveservices.azure.com/.default"
|
|
)
|
|
|
|
headers = {
|
|
"Authorization": f"Bearer {token}",
|
|
"Ocp-Apim-Subscription-Region": self.region
|
|
}
|
|
```
|
|
|
|
## Supported Features
|
|
|
|
### Speech-to-Text (STT)
|
|
- **Languages**: 100+ languages supported
|
|
- **Formats**: WAV, MP3, FLAC, OGG
|
|
- **Quality**: Standard and Neural voices
|
|
- **Real-time**: Streaming support available
|
|
|
|
### Text-to-Speech (TTS)
|
|
- **Voices**: 400+ neural voices
|
|
- **Languages**: 140+ languages
|
|
- **Formats**: MP3, WAV, OGG, FLAC
|
|
- **SSML**: Full SSML support
|
|
|
|
### Translation
|
|
- **Languages**: 100+ languages
|
|
- **Real-time**: Streaming translation
|
|
- **Custom**: Custom translation models
|
|
- **Batch**: Batch translation support
|
|
|
|
## Error Handling
|
|
|
|
### Common Error Codes
|
|
- **401 Unauthorized**: Invalid access key or expired token
|
|
- **403 Forbidden**: Insufficient permissions
|
|
- **429 Too Many Requests**: Rate limit exceeded
|
|
- **500 Internal Server Error**: Service unavailable
|
|
|
|
### Rate Limits
|
|
- **Speech-to-Text**: 20 requests per second
|
|
- **Text-to-Speech**: 20 requests per second
|
|
- **Translation**: 2,000,000 characters per month (free tier)
|
|
|
|
## Security Considerations
|
|
|
|
1. **Access Key Rotation**: Regularly rotate access keys
|
|
2. **Network Security**: Use HTTPS only
|
|
3. **Data Privacy**: Audio data is processed in Azure
|
|
4. **Compliance**: Ensure GDPR/CCPA compliance
|
|
5. **Monitoring**: Enable Azure Monitor for usage tracking
|
|
|
|
## Cost Optimization
|
|
|
|
### Pricing Tiers
|
|
- **Free Tier**: 5 hours STT, 5 hours TTS per month
|
|
- **Standard Tier**: Pay-per-use pricing
|
|
- **Enterprise**: Volume discounts available
|
|
|
|
### Best Practices
|
|
1. **Caching**: Cache TTS results for repeated text
|
|
2. **Compression**: Use compressed audio formats
|
|
3. **Batching**: Batch multiple requests when possible
|
|
4. **Monitoring**: Track usage and costs
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
1. **Invalid Region**: Ensure region matches resource location
|
|
2. **Expired Keys**: Check key expiration and rotation
|
|
3. **Quota Exceeded**: Monitor usage and upgrade if needed
|
|
4. **Network Issues**: Check firewall and proxy settings
|
|
|
|
### Debug Mode
|
|
Enable debug logging to troubleshoot issues:
|
|
|
|
```python
|
|
import logging
|
|
logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG)
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Set up Azure resources** with required permissions
|
|
2. **Configure access keys** in the application
|
|
3. **Test connection** using the health check endpoint
|
|
4. **Monitor usage** and costs
|
|
5. **Implement caching** for better performance
|
|
6. **Add error handling** for production use
|