wiki/c-work/1-plan/stt-chirp-v2-evaluation.md
2026-05-16 22:54:27 +02:00

1.8 KiB

STT: Google Speech-to-Text v2 / Chirp Evaluation (follow-up)

Status: in progress — interactive benchmark page available for SysAdmin (2026-05-15).

Related: gateway connectorVoiceGoogle.py uses Speech v1 SpeechClient only.

Goal

Benchmark STT v2 (e.g. Chirp / Chirp 2) for de-DE vs current v1 latest_short / latest_long on:

  • Latency (time-to-first-token, final latency)
  • WER / subjective quality in meeting + coaching scenarios
  • Cost and quota

Current State (2026-05-15)

  • Interactive benchmark page added: Administration > System > STT Benchmark (SysAdmin only).
    • Upload or record audio; runs v1 and v2 (Chirp 2) simultaneously; shows transcription, confidence, latency side-by-side.
    • Configurable: language, v1 model, v2 model, v2 region.
    • Backend: routeAdminSttBenchmark.py using google.cloud.speech (v1) + google.cloud.speech_v2 (v2).
    • Frontend: SttBenchmarkPage.tsx under /admin/stt-benchmark.
  • Production switch not yet doneconnectorVoiceGoogle.py still uses v1 only.

Next Steps

  1. Run benchmark with real meeting/coaching audio samples across de-DE, de-CH, en-US.
  2. Compare latency + quality. Document in this file.
  3. If Chirp 2 wins: add v2 client path to connectorVoiceGoogle.py behind feature flag.
  4. Run A/B on CommCoach streaming and Teamsbot batch paths with identical audio fixtures.
  5. Document decision in wiki/b-reference/ and remove flag or make v2 default.

Notes

  • Streaming and batch config differ between v1 and v2; keep VoiceObjects as the single facade.
  • Billing hooks (calculateSttCostCHF) must use measured duration (see streaming result_end_time), not compressed byte heuristics.
  • google-cloud-speech==2.21.0 includes speech_v2 module — no dependency upgrade needed.
  • Chirp 2 is v2-only and requires regional endpoint ({location}-speech.googleapis.com).