wiki/poweron/appdoc/README_AzureSpeechPermissions.md
2025-09-13 01:58:04 +02:00

5.7 KiB

Azure Speech Services - Required Permissions

Overview

The Azure Speech Services connector requires specific permissions to access Azure Cognitive Services APIs. This document outlines the required permissions and setup.

Required Azure Services

1. Azure Speech Services

  • Service: Speech Services (Cognitive Services)
  • API Endpoints:
    • Speech-to-Text: https://{region}.stt.speech.microsoft.com
    • Text-to-Speech: https://{region}.tts.speech.microsoft.com

2. Azure Translator

  • Service: Translator (Cognitive Services)
  • API Endpoint: https://api.cognitive.microsofttranslator.com

Required Permissions

Microsoft Graph API Permissions

The user's Microsoft connection must have the following permissions:

  1. Cognitive Services User (Application Permission)

    • Scope: https://cognitiveservices.azure.com/.default
    • Description: Access to Azure Cognitive Services
  2. Speech Services (Application Permission)

    • Scope: https://speech.cognitiveservices.azure.com/.default
    • Description: Access to Azure Speech Services
  3. Translator Services (Application Permission)

    • Scope: https://cognitiveservices.azure.com/.default
    • Description: Access to Azure Translator Services

Alternative: Resource-Specific Permissions

If using resource-specific permissions:

  1. Speech Services Resource

    • Permission: SpeechServices.ReadWrite
    • Description: Read and write access to Speech Services
  2. Translator Resource

    • Permission: Translator.ReadWrite
    • Description: Read and write access to Translator Services

Azure Resource Setup

1. Create Azure Speech Services Resource

# Using Azure CLI
az cognitiveservices account create \
  --name "poweron-speech" \
  --resource-group "poweron-rg" \
  --location "westeurope" \
  --kind "SpeechServices" \
  --sku "S0"

2. Create Azure Translator Resource

# Using Azure CLI
az cognitiveservices account create \
  --name "poweron-translator" \
  --resource-group "poweron-rg" \
  --location "westeurope" \
  --kind "TextTranslation" \
  --sku "S1"

3. Get Access Keys

# Get Speech Services key
az cognitiveservices account keys list \
  --name "poweron-speech" \
  --resource-group "poweron-rg"

# Get Translator key
az cognitiveservices account keys list \
  --name "poweron-translator" \
  --resource-group "poweron-rg"

Authentication Methods

1. Access Key Authentication (Current Implementation)

  • Uses Ocp-Apim-Subscription-Key header
  • Requires access keys from Azure portal
  • Suitable for server-to-server authentication
  • Uses Bearer token authentication
  • More secure and scalable
  • Requires Azure AD app registration

Implementation Details

Current Implementation

The connector currently uses access key authentication:

headers = {
    "Ocp-Apim-Subscription-Key": self.access_token,
    "Ocp-Apim-Subscription-Region": self.region
}

Future Enhancement: Azure AD Authentication

For production use, consider implementing Azure AD authentication:

# Get token from Azure AD
token = await get_azure_ad_token(
    client_id="your-app-id",
    client_secret="your-app-secret",
    tenant_id="your-tenant-id",
    scope="https://cognitiveservices.azure.com/.default"
)

headers = {
    "Authorization": f"Bearer {token}",
    "Ocp-Apim-Subscription-Region": self.region
}

Supported Features

Speech-to-Text (STT)

  • Languages: 100+ languages supported
  • Formats: WAV, MP3, FLAC, OGG
  • Quality: Standard and Neural voices
  • Real-time: Streaming support available

Text-to-Speech (TTS)

  • Voices: 400+ neural voices
  • Languages: 140+ languages
  • Formats: MP3, WAV, OGG, FLAC
  • SSML: Full SSML support

Translation

  • Languages: 100+ languages
  • Real-time: Streaming translation
  • Custom: Custom translation models
  • Batch: Batch translation support

Error Handling

Common Error Codes

  • 401 Unauthorized: Invalid access key or expired token
  • 403 Forbidden: Insufficient permissions
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Service unavailable

Rate Limits

  • Speech-to-Text: 20 requests per second
  • Text-to-Speech: 20 requests per second
  • Translation: 2,000,000 characters per month (free tier)

Security Considerations

  1. Access Key Rotation: Regularly rotate access keys
  2. Network Security: Use HTTPS only
  3. Data Privacy: Audio data is processed in Azure
  4. Compliance: Ensure GDPR/CCPA compliance
  5. Monitoring: Enable Azure Monitor for usage tracking

Cost Optimization

Pricing Tiers

  • Free Tier: 5 hours STT, 5 hours TTS per month
  • Standard Tier: Pay-per-use pricing
  • Enterprise: Volume discounts available

Best Practices

  1. Caching: Cache TTS results for repeated text
  2. Compression: Use compressed audio formats
  3. Batching: Batch multiple requests when possible
  4. Monitoring: Track usage and costs

Troubleshooting

Common Issues

  1. Invalid Region: Ensure region matches resource location
  2. Expired Keys: Check key expiration and rotation
  3. Quota Exceeded: Monitor usage and upgrade if needed
  4. Network Issues: Check firewall and proxy settings

Debug Mode

Enable debug logging to troubleshoot issues:

import logging
logging.getLogger('connectorAzureSpeech').setLevel(logging.DEBUG)

Next Steps

  1. Set up Azure resources with required permissions
  2. Configure access keys in the application
  3. Test connection using the health check endpoint
  4. Monitor usage and costs
  5. Implement caching for better performance
  6. Add error handling for production use