wiki/poweron/archiv/README_AzureSpeechPermissions.md
2025-09-22 00:39:23 +02:00

203 lines
5.7 KiB
Markdown

# Azure Speech Services - Required Permissions
## Overview
The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup.
## Required Azure Services
### 1. Azure Speech Services
- **Service**: Speech Services (Cognitive Services)
- **API Endpoints**:
- Speech-to-Text: `https://{region}.stt.speech.microsoft.com`
- Text-to-Speech: `https://{region}.tts.speech.microsoft.com`
### 2. Azure Translator
- **Service**: Translator (Cognitive Services)
- **API Endpoint**: `https://api.cognitive.microsofttranslator.com`
## Required Permissions
### Microsoft Graph API Permissions
The user's Microsoft connection must have the following permissions:
1. **Cognitive Services User** (Application Permission)
- Scope: `https://cognitiveservices.azure.com/.default`
- Description: Access to Azure Cognitive Services
2. **Speech Services** (Application Permission)
- Scope: `https://speech.cognitiveservices.azure.com/.default`
- Description: Access to Azure Speech Services
3. **Translator Services** (Application Permission)
- Scope: `https://cognitiveservices.azure.com/.default`
- Description: Access to Azure Translator Services
### Alternative: Resource-Specific Permissions
If using resource-specific permissions:
1. **Speech Services Resource**
- Permission: `SpeechServices.ReadWrite`
- Description: Read and write access to Speech Services
2. **Translator Resource**
- Permission: `Translator.ReadWrite`
- Description: Read and write access to Translator Services
## Azure Resource Setup
### 1. Create Azure Speech Services Resource
```bash
# Using Azure CLI
az cognitiveservices account create \
--name "poweron-speech" \
--resource-group "poweron-rg" \
--location "westeurope" \
--kind "SpeechServices" \
--sku "S0"
```
### 2. Create Azure Translator Resource
```bash
# Using Azure CLI
az cognitiveservices account create \
--name "poweron-translator" \
--resource-group "poweron-rg" \
--location "westeurope" \
--kind "TextTranslation" \
--sku "S1"
```
### 3. Get Access Keys
```bash
# Get Speech Services key
az cognitiveservices account keys list \
--name "poweron-speech" \
--resource-group "poweron-rg"
# Get Translator key
az cognitiveservices account keys list \
--name "poweron-translator" \
--resource-group "poweron-rg"
```
## Authentication Methods
### 1. Access Key Authentication (Current Implementation)
- Uses `Ocp-Apim-Subscription-Key` header
- Requires access keys from Azure portal
- Suitable for server-to-server authentication
### 2. Azure AD Authentication (Recommended for Production)
- Uses Bearer token authentication
- More secure and scalable
- Requires Azure AD app registration
## Implementation Details
### Current Implementation
The connector currently uses access key authentication:
```python
headers = {
"Ocp-Apim-Subscription-Key": self.access_token,
"Ocp-Apim-Subscription-Region": self.region
}
```
### Future Enhancement: Azure AD Authentication
For production use, consider implementing Azure AD authentication:
```python
# Get token from Azure AD
token = await get_azure_ad_token(
client_id="your-app-id",
client_secret="your-app-secret",
tenant_id="your-tenant-id",
scope="https://cognitiveservices.azure.com/.default"
)
headers = {
"Authorization": f"Bearer {token}",
"Ocp-Apim-Subscription-Region": self.region
}
```
## Supported Features
### Speech-to-Text (STT)
- **Languages**: 100+ languages supported
- **Formats**: WAV, MP3, FLAC, OGG
- **Quality**: Standard and Neural voices
- **Real-time**: Streaming support available
### Text-to-Speech (TTS)
- **Voices**: 400+ neural voices
- **Languages**: 140+ languages
- **Formats**: MP3, WAV, OGG, FLAC
- **SSML**: Full SSML support
### Translation
- **Languages**: 100+ languages
- **Real-time**: Streaming translation
- **Custom**: Custom translation models
- **Batch**: Batch translation support
## Error Handling
### Common Error Codes
- **401 Unauthorized**: Invalid access key or expired token
- **403 Forbidden**: Insufficient permissions
- **429 Too Many Requests**: Rate limit exceeded
- **500 Internal Server Error**: Service unavailable
### Rate Limits
- **Speech-to-Text**: 20 requests per second
- **Text-to-Speech**: 20 requests per second
- **Translation**: 2,000,000 characters per month (free tier)
## Security Considerations
1. **Access Key Rotation**: Regularly rotate access keys
2. **Network Security**: Use HTTPS only
3. **Data Privacy**: Audio data is processed in Azure
4. **Compliance**: Ensure GDPR/CCPA compliance
5. **Monitoring**: Enable Azure Monitor for usage tracking
## Cost Optimization
### Pricing Tiers
- **Free Tier**: 5 hours STT, 5 hours TTS per month
- **Standard Tier**: Pay-per-use pricing
- **Enterprise**: Volume discounts available
### Best Practices
1. **Caching**: Cache TTS results for repeated text
2. **Compression**: Use compressed audio formats
3. **Batching**: Batch multiple requests when possible
4. **Monitoring**: Track usage and costs
## Troubleshooting
### Common Issues
1. **Invalid Region**: Ensure region matches resource location
2. **Expired Keys**: Check key expiration and rotation
3. **Quota Exceeded**: Monitor usage and upgrade if needed
4. **Network Issues**: Check firewall and proxy settings
### Debug Mode
Enable debug logging to troubleshoot issues:
```python
import logging
logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG)
```
## Next Steps
1. **Set up Azure resources** with required permissions
2. **Configure access keys** in the application
3. **Test connection** using the health check endpoint
4. **Monitor usage** and costs
5. **Implement caching** for better performance
6. **Add error handling** for production use