wiki/archiv/README_AzureSpeechPermissions.md

# Azure Speech Services - Required Permissions

## Overview

The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup.

## Required Azure Services

### 1. Azure Speech Services
- **Service**: Speech Services (Cognitive Services)
- **API Endpoints**:
  - Speech-to-Text: `https://{region}.stt.speech.microsoft.com`
  - Text-to-Speech: `https://{region}.tts.speech.microsoft.com`

### 2. Azure Translator
- **Service**: Translator (Cognitive Services)
- **API Endpoint**: `https://api.cognitive.microsofttranslator.com`

## Required Permissions

### Microsoft Graph API Permissions
The user's Microsoft connection must have the following permissions:

1. **Cognitive Services User** (Application Permission)
   - Scope: `https://cognitiveservices.azure.com/.default`
   - Description: Access to Azure Cognitive Services

2. **Speech Services** (Application Permission)
   - Scope: `https://speech.cognitiveservices.azure.com/.default`
   - Description: Access to Azure Speech Services

3. **Translator Services** (Application Permission)
   - Scope: `https://cognitiveservices.azure.com/.default`
   - Description: Access to Azure Translator Services

### Alternative: Resource-Specific Permissions
If using resource-specific permissions:

1. **Speech Services Resource**
   - Permission: `SpeechServices.ReadWrite`
   - Description: Read and write access to Speech Services

2. **Translator Resource**
   - Permission: `Translator.ReadWrite`
   - Description: Read and write access to Translator Services

## Azure Resource Setup

### 1. Create Azure Speech Services Resource
```bash
# Using Azure CLI
az cognitiveservices account create \
  --name "poweron-speech" \
  --resource-group "poweron-rg" \
  --location "westeurope" \
  --kind "SpeechServices" \
  --sku "S0"
```

### 2. Create Azure Translator Resource
```bash
# Using Azure CLI
az cognitiveservices account create \
  --name "poweron-translator" \
  --resource-group "poweron-rg" \
  --location "westeurope" \
  --kind "TextTranslation" \
  --sku "S1"
```

### 3. Get Access Keys
```bash
# Get Speech Services key
az cognitiveservices account keys list \
  --name "poweron-speech" \
  --resource-group "poweron-rg"

# Get Translator key
az cognitiveservices account keys list \
  --name "poweron-translator" \
  --resource-group "poweron-rg"
```

## Authentication Methods

### 1. Access Key Authentication (Current Implementation)
- Uses `Ocp-Apim-Subscription-Key` header
- Requires access keys from Azure portal
- Suitable for server-to-server authentication

### 2. Azure AD Authentication (Recommended for Production)
- Uses Bearer token authentication
- More secure and scalable
- Requires Azure AD app registration

## Implementation Details

### Current Implementation
The connector currently uses access key authentication:

```python
headers = {
    "Ocp-Apim-Subscription-Key": self.access_token,
    "Ocp-Apim-Subscription-Region": self.region
}
```

### Future Enhancement: Azure AD Authentication
For production use, consider implementing Azure AD authentication:

```python
# Get token from Azure AD
token = await get_azure_ad_token(
    client_id="your-app-id",
    client_secret="your-app-secret",
    tenant_id="your-tenant-id",
    scope="https://cognitiveservices.azure.com/.default"
)

headers = {
    "Authorization": f"Bearer {token}",
    "Ocp-Apim-Subscription-Region": self.region
}
```

## Supported Features

### Speech-to-Text (STT)
- **Languages**: 100+ languages supported
- **Formats**: WAV, MP3, FLAC, OGG
- **Quality**: Standard and Neural voices
- **Real-time**: Streaming support available

### Text-to-Speech (TTS)
- **Voices**: 400+ neural voices
- **Languages**: 140+ languages
- **Formats**: MP3, WAV, OGG, FLAC
- **SSML**: Full SSML support

### Translation
- **Languages**: 100+ languages
- **Real-time**: Streaming translation
- **Custom**: Custom translation models
- **Batch**: Batch translation support

## Error Handling

### Common Error Codes
- **401 Unauthorized**: Invalid access key or expired token
- **403 Forbidden**: Insufficient permissions
- **429 Too Many Requests**: Rate limit exceeded
- **500 Internal Server Error**: Service unavailable

### Rate Limits
- **Speech-to-Text**: 20 requests per second
- **Text-to-Speech**: 20 requests per second
- **Translation**: 2,000,000 characters per month (free tier)

## Security Considerations

1. **Access Key Rotation**: Regularly rotate access keys
2. **Network Security**: Use HTTPS only
3. **Data Privacy**: Audio data is processed in Azure
4. **Compliance**: Ensure GDPR/CCPA compliance
5. **Monitoring**: Enable Azure Monitor for usage tracking

## Cost Optimization

### Pricing Tiers
- **Free Tier**: 5 hours STT, 5 hours TTS per month
- **Standard Tier**: Pay-per-use pricing
- **Enterprise**: Volume discounts available

### Best Practices
1. **Caching**: Cache TTS results for repeated text
2. **Compression**: Use compressed audio formats
3. **Batching**: Batch multiple requests when possible
4. **Monitoring**: Track usage and costs

## Troubleshooting

### Common Issues
1. **Invalid Region**: Ensure region matches resource location
2. **Expired Keys**: Check key expiration and rotation
3. **Quota Exceeded**: Monitor usage and upgrade if needed
4. **Network Issues**: Check firewall and proxy settings

### Debug Mode
Enable debug logging to troubleshoot issues:

```python
import logging
logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG)
```

## Next Steps

1. **Set up Azure resources** with required permissions
2. **Configure access keys** in the application
3. **Test connection** using the health check endpoint
4. **Monitor usage** and costs
5. **Implement caching** for better performance
6. **Add error handling** for production use