925 B
925 B
STT: Google Speech-to-Text v2 / Chirp Evaluation (follow-up)
Status: planned (not implemented). Related: gateway connectorVoiceGoogle.py uses Speech v1 SpeechClient only.
Goal
Benchmark STT v2 (e.g. Chirp / Chirp 2) for de-DE vs current v1 latest_short / latest_long on:
- Latency (time-to-first-token, final latency)
- WER / subjective quality in meeting + coaching scenarios
- Cost and quota
Steps
- Add optional v2 client path (
google.cloud.speech_v2or REST) behind a feature flag. - Run A/B on CommCoach streaming and Teamsbot batch paths with identical audio fixtures.
- Document decision in
wiki/b-reference/and remove flag or make v2 default.
Notes
- Streaming and batch config differ between v1 and v2; keep
VoiceObjectsas the single facade. - Billing hooks (
calculateSttCostCHF) must use measured duration (see streamingresult_end_time), not compressed byte heuristics.