wiki/c-work/1-plan/stt-chirp-v2-evaluation.md
ValueOn AG ef83e58a85 upd
2026-05-11 18:41:56 +02:00

925 B

STT: Google Speech-to-Text v2 / Chirp Evaluation (follow-up)

Status: planned (not implemented). Related: gateway connectorVoiceGoogle.py uses Speech v1 SpeechClient only.

Goal

Benchmark STT v2 (e.g. Chirp / Chirp 2) for de-DE vs current v1 latest_short / latest_long on:

  • Latency (time-to-first-token, final latency)
  • WER / subjective quality in meeting + coaching scenarios
  • Cost and quota

Steps

  1. Add optional v2 client path (google.cloud.speech_v2 or REST) behind a feature flag.
  2. Run A/B on CommCoach streaming and Teamsbot batch paths with identical audio fixtures.
  3. Document decision in wiki/b-reference/ and remove flag or make v2 default.

Notes

  • Streaming and batch config differ between v1 and v2; keep VoiceObjects as the single facade.
  • Billing hooks (calculateSttCostCHF) must use measured duration (see streaming result_end_time), not compressed byte heuristics.