1.8 KiB
1.8 KiB
STT: Google Speech-to-Text v2 / Chirp Evaluation (follow-up)
Status: in progress — interactive benchmark page available for SysAdmin (2026-05-15).
Related: gateway connectorVoiceGoogle.py uses Speech v1 SpeechClient only.
Goal
Benchmark STT v2 (e.g. Chirp / Chirp 2) for de-DE vs current v1 latest_short / latest_long on:
- Latency (time-to-first-token, final latency)
- WER / subjective quality in meeting + coaching scenarios
- Cost and quota
Current State (2026-05-15)
- Interactive benchmark page added:
Administration > System > STT Benchmark(SysAdmin only).- Upload or record audio; runs v1 and v2 (Chirp 2) simultaneously; shows transcription, confidence, latency side-by-side.
- Configurable: language, v1 model, v2 model, v2 region.
- Backend:
routeAdminSttBenchmark.pyusinggoogle.cloud.speech(v1) +google.cloud.speech_v2(v2). - Frontend:
SttBenchmarkPage.tsxunder/admin/stt-benchmark.
- Production switch not yet done —
connectorVoiceGoogle.pystill uses v1 only.
Next Steps
- Run benchmark with real meeting/coaching audio samples across
de-DE,de-CH,en-US. - Compare latency + quality. Document in this file.
- If Chirp 2 wins: add v2 client path to
connectorVoiceGoogle.pybehind feature flag. - Run A/B on CommCoach streaming and Teamsbot batch paths with identical audio fixtures.
- Document decision in
wiki/b-reference/and remove flag or make v2 default.
Notes
- Streaming and batch config differ between v1 and v2; keep
VoiceObjectsas the single facade. - Billing hooks (
calculateSttCostCHF) must use measured duration (see streamingresult_end_time), not compressed byte heuristics. google-cloud-speech==2.21.0includesspeech_v2module — no dependency upgrade needed.- Chirp 2 is v2-only and requires regional endpoint (
{location}-speech.googleapis.com).