service-teams-browser-bot/DOCUMENTATION.md
ValueOn AG 6b4172c46a docs: add documentation, update README, add marketing page
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-18 17:51:28 +01:00

39 KiB
Raw Blame History

Teams Browser Bot — Technical Documentation

Last updated: 2026-02-18

Table of Contents

  1. Business Story & Vision
  2. Use Cases
  3. System Architecture
  4. Components
  5. Data Model
  6. Call Flow
  7. Voice Flow (TTS Playback)
  8. Data Flow (Transcript Pipeline)
  9. WebSocket Protocol
  10. AI Analysis Pipeline
  11. Authentication & Credentials
  12. Teams DOM Interaction
  13. Deployment
  14. Configuration Reference
  15. Known Constraints & Lessons Learned

1. Business Story & Vision

Problem

Organizations use Microsoft Teams for meetings where important decisions, discussions, and action items occur. Without an automated assistant, teams rely on manual note-taking, miss context from earlier discussions, and lose the ability to query meeting content in real time.

Solution

The Teams Browser Bot is an AI-powered meeting participant that:

  • Joins any Teams meeting as an authenticated user (or anonymous guest)
  • Listens by capturing live captions from the Teams web interface
  • Understands by analyzing transcript segments through an AI model (GPT-4o-mini / Claude)
  • Responds via voice (TTS played through the microphone channel) and/or chat messages
  • Documents by persisting full transcripts and generating meeting summaries

The bot operates as a real participant — it appears in the meeting roster, can speak, and can write in the meeting chat.

Key Differentiator

Unlike Microsoft Graph Communications SDK bots (which require tenant admin registration and complex media handling), this bot uses browser automation (Playwright + Chromium) to join meetings as a regular web user. This enables:

  • Multi-tenant support: Join any meeting from any organization
  • No tenant admin approval required
  • Standard web technologies: DOM scraping, getUserMedia, WebRTC
  • Full meeting interaction: chat, captions, audio playback

2. Use Cases

UC-1: AI Meeting Assistant

A user starts a Teams meeting and invites the bot. The bot joins, listens to the conversation, and responds when addressed by name ("Hey Nyla, what do you think about...?"). Responses are delivered via voice and/or chat based on configuration.

UC-2: Live Transcription

The bot captures all live captions with speaker attribution and streams them to the frontend UI in real time via SSE. Users not in the meeting can follow along.

UC-3: Meeting Summary

When the session ends, the bot generates an AI-powered summary of the entire meeting, stored on the session record.

UC-4: Voice Test

Before joining a real meeting, integrators can test the TTS pipeline via a dedicated endpoint that generates and returns an audio sample.

UC-5: Multi-Bot Operations

Multiple bot sessions can run concurrently — each in its own browser instance, each connected to a different meeting with separate WebSocket channels.


3. System Architecture

┌──────────────────────────────────────────────────────────────────────────────────┐
│                              System Overview                                      │
│                                                                                  │
│  ┌──────────┐         SSE            ┌────────────────┐        WebSocket         │
│  │ Frontend │◄───────────────────────│    Gateway      │◄──────────────────────┐ │
│  │ (React)  │   transcripts,         │   (Python /     │   transcripts,        │ │
│  │          │   botResponses,        │    FastAPI)     │   chatMessages,       │ │
│  │          │   analysis,            │                 │   status,             │ │
│  │          │   status               │  - Session Mgmt │   audioChunks,        │ │
│  │          │────────────────────────►  - AI Analysis  │   voiceGreeting       │ │
│  │          │   REST (start/stop/    │  - TTS (Google) │                       │ │
│  │          │   config)              │  - Billing      │   playAudio,          │ │
│  └──────────┘                        │  - DB (Cosmos)  │   sendChatMessage,    │ │
│                                      │                 │   stopAudio           │ │
│                                      └────────┬───────┘──────────────────────►│ │
│                                               │ HTTP                          │ │
│                                               │ (join/leave/status)           │ │
│                                               ▼                               │ │
│                                      ┌────────────────┐                       │ │
│                                      │  Browser Bot   │◄──────────────────────┘ │
│                                      │  (Node.js +    │                         │
│                                      │   Playwright)  │                         │
│                                      │                │                         │
│                                      │  ┌──────────┐ │                         │
│                                      │  │ Chromium  │ │                         │
│                                      │  │ (Teams    │ │                         │
│                                      │  │  Web App) │ │                         │
│                                      │  └──────────┘ │                         │
│                                      └────────────────┘                         │
└──────────────────────────────────────────────────────────────────────────────────┘

Communication Paths

Path Protocol Direction Purpose
Frontend ↔ Gateway REST (HTTPS) Bidirectional Session management, config, system bots
Frontend ← Gateway SSE Gateway → Frontend Real-time transcript & response stream
Gateway ↔ Browser Bot WebSocket Bidirectional Transcripts, audio, status, chat, commands
Gateway → Browser Bot HTTP POST Gateway → Bot Session creation (/api/bot), leave, status
Browser Bot ↔ Teams Chromium/WebRTC Bidirectional Meeting participation, captions, chat, audio

4. Components

4.1 Browser Bot Service (this repository)

File Responsibility
src/index.ts Entry point: bootstrap, shutdown handlers
src/config.ts Environment config with defaults and timeouts
src/sessionManager.ts Session lifecycle: create, end, play audio, shutdown
src/server/httpServer.ts Express HTTP API (health, join, leave, status, auth tests)
src/server/gatewayClient.ts Alternative WebSocket client for Gateway (legacy path)
src/types/index.ts TypeScript interfaces for all message types
src/utils/logger.ts Winston logger with session-scoped child loggers

Bot Modules (src/bot/)

Module Class Responsibility
orchestrator.ts BotOrchestrator Main coordinator: browser launch, join flow, keepalive, greeting, Gateway WebSocket, state machine
joinProcedure.ts JoinProcedure Anonymous join: launcher page, name entry, "Join now", lobby handling
authProcedure.ts AuthProcedure Microsoft login: email → password → MFA check → "Stay signed in"
captionsProcedure.ts CaptionsProcedure Enable live captions via "More" menu, MutationObserver on caption DOM
chatProcedure.ts ChatProcedure Open chat panel, MutationObserver on [role="log"], send messages via CKEditor
audioProcedure.ts AudioProcedure getUserMedia override, AudioContext, queue-based MP3/WAV/PCM playback into mic stream
audioCaptureProcedure.ts AudioCaptureProcedure RTCPeerConnection wrapper, ScriptProcessor, PCM16 16kHz capture, 500ms polling
backgroundProcedure.ts BackgroundProcedure Virtual background image upload (pre-join, currently unused)
meetingUrlParser.ts (functions) URL validation, classic vs short format, redirect resolution
authTestProcedure.ts (functions) Anti-detection test variants for debugging auth flow

4.2 Gateway (external, Python/FastAPI)

File Responsibility
routeFeatureTeamsbot.py REST routes, SSE stream, WebSocket endpoint
service.py Business logic: transcript processing, AI triggers, TTS, meeting summary
datamodelTeamsbot.py Pydantic models, enums
interfaceFeatureTeamsbot.py Database interface (Cosmos DB)
config.py Feature instance config load/save
browserBotConnector.py HTTP client for Browser Bot API

4.3 Frontend (external, React/TypeScript)

The frontend provides a session management UI with:

  • Meeting link input and session start/stop controls
  • Real-time transcript display (via SSE)
  • Bot response log with reasoning, model, cost
  • Configuration panel (bot name, response channel, AI prompt, etc.)
  • System bot management (email, password)

5. Data Model

5.1 Enums

Enum Values Description
TeamsbotSessionStatus pending, joining, active, leaving, ended, error Session lifecycle state
TeamsbotResponseType audio, chat, both How the bot responded
TeamsbotResponseChannel voice, chat, both Configured response channel (user setting)
TeamsbotResponseMode auto, manual, transcribeOnly Whether bot responds automatically
TeamsbotDetectedIntent addressed, question, proactive, stop, none AI-detected intent
TeamsbotJoinMode systemBot, anonymous, userAccount How the bot joins the meeting
TeamsbotTransferMode caption, audio, auto How transcript data is captured

5.2 Core Entities

TeamsbotSession

Field Type Description
id UUID Session identifier
instanceId string Feature instance
mandateId string Tenant/mandate
meetingLink string Teams meeting URL
botName string Display name in meeting
status TeamsbotSessionStatus Current state
startedAt / endedAt datetime Timestamps
startedByUserId string Who started the session
sessionContext string Optional context for AI
summary string AI-generated meeting summary
errorMessage string Error details if failed
transcriptSegmentCount int Running count
botResponseCount int Running count

TeamsbotTranscript

Field Type Description
id UUID Segment identifier
sessionId UUID Parent session
speaker string Speaker name from captions
text string Transcript text
timestamp datetime When spoken
confidence float (01) Confidence score
language string Detected language
isFinal bool Finalized segment

TeamsbotBotResponse

Field Type Description
id UUID Response identifier
sessionId UUID Parent session
responseText string What the bot said
responseType TeamsbotResponseType Voice, chat, or both
detectedIntent TeamsbotDetectedIntent Why it responded
reasoning string AI reasoning chain
modelName string AI model used
processingTime float Seconds
priceCHF float Cost in CHF

TeamsbotSystemBot

Field Type Description
id UUID Bot account identifier
mandateId string Tenant scope
name string Display name
email string Microsoft account email
encryptedPassword string Fernet-encrypted password
isActive bool Whether this bot is the active one

TeamsbotConfig (Feature Instance Level)

Field Type Default Description
botName string "PowerOn AI" Default bot name (overridden by system bot)
aiSystemPrompt string "" Custom AI instructions
responseMode enum auto auto / manual / transcribeOnly
responseChannel enum voice voice / chat / both
transferMode enum auto caption / audio / auto
language string "de-DE" Bot language
voiceId string null TTS voice identifier
browserBotUrl string null Browser Bot service URL
triggerIntervalSeconds int 10 Periodic AI trigger interval
triggerCooldownSeconds int 5 Min time between triggers
contextWindowSegments int 20 Transcript segments sent to AI

TeamsbotUserSettings (Per-User Overrides)

Mirrors TeamsbotConfig fields (all optional). Merged over instance config with _getEffectiveConfig().


6. Call Flow

6.1 Session Start (Authenticated Join)

Frontend                    Gateway                     Browser Bot              Teams Web
   │                           │                            │                       │
   │  POST /sessions           │                            │                       │
   │  {meetingLink, botName}   │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │  Resolve system bot        │                       │
   │                           │  Decrypt password          │                       │
   │                           │  Derive bot name from email│                       │
   │                           │                            │                       │
   │                           │  POST /api/bot             │                       │
   │                           │  {sessionId, meetingUrl,   │                       │
   │                           │   botAccountEmail,         │                       │
   │                           │   botAccountPassword}      │                       │
   │                           │───────────────────────────►│                       │
   │                           │                            │                       │
   │  ◄── session created ─────│                            │  Launch Chromium      │
   │                           │                            │  (Xvfb, headful)     │
   │  GET /sessions/:id/stream │                            │                       │
   │  (SSE)                    │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │◄─── WebSocket connected ───│                       │
   │                           │                            │                       │
   │                           │                            │  Navigate to          │
   │                           │                            │  teams.microsoft.com  │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │                           │                            │  MS Login:            │
   │                           │                            │  email → password     │
   │                           │                            │  → "Stay signed in"   │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │                           │                            │  Teams loads          │
   │                           │                            │  Click "Join" header  │
   │                           │                            │◄─────────────────────│
   │                           │                            │                       │
   │                           │                            │  Pre-join screen:     │
   │                           │                            │  Ensure mic ON        │
   │                           │                            │  Camera stays OFF     │
   │                           │                            │  Click "Join now"     │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │  ◄── SSE: statusChange ───│◄─── status: "joined" ─────│  In meeting!          │
   │      {status: "active"}   │                            │                       │
   │                           │                            │  Start keepalive      │
   │                           │                            │  Init AudioContext    │
   │                           │                            │  Enable captions      │
   │                           │                            │  Enable chat          │
   │                           │                            │  Send greeting        │
   │                           │                            │  (chat + voice TTS)   │

6.2 Session End

Frontend                    Gateway                     Browser Bot              Teams
   │                           │                            │                       │
   │  POST /sessions/:id/stop  │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │  POST /api/bot/:id/leave   │                       │
   │                           │───────────────────────────►│                       │
   │                           │                            │  Stop keepalive       │
   │                           │                            │  Stop audio capture   │
   │                           │                            │  Unsubscribe captions │
   │                           │                            │  Unsubscribe chat     │
   │                           │                            │  Click hangup button  │
   │                           │                            │─────────────────────►│
   │                           │                            │  Close browser        │
   │                           │                            │  Close WS             │
   │                           │◄─── status: "left" ────────│                       │
   │                           │                            │                       │
   │                           │  Generate meeting summary  │                       │
   │                           │  (AI on full transcript)   │                       │
   │                           │  Update session → "ended"  │                       │
   │  ◄── SSE: statusChange ───│                            │                       │
   │      {status: "ended"}    │                            │                       │

7. Voice Flow (TTS Playback)

How Audio Reaches Meeting Participants

Gateway                      Browser Bot                    Chromium / Teams
  │                             │                              │
  │  AI generates response      │                              │
  │  TTS (Google Cloud) →       │                              │
  │  base64 MP3                 │                              │
  │                             │                              │
  │  WS: playAudio              │                              │
  │  {audio: {data, format}}    │                              │
  │────────────────────────────►│                              │
  │                             │  Queue audio                 │
  │                             │  Decode base64 → ArrayBuffer │
  │                             │  decodeAudioData(buffer)     │
  │                             │                              │
  │                             │  Create AudioBufferSource    │
  │                             │  Connect to                  │
  │                             │  MediaStreamDestination      │
  │                             │  (overridden getUserMedia)   │
  │                             │─────────────────────────────►│
  │                             │                              │  WebRTC sends audio
  │                             │                              │  to all participants
  │                             │                              │  via microphone channel

Audio Override Mechanism

The AudioProcedure injects a script before page load (page.addInitScript) that:

  1. Overrides navigator.mediaDevices.getUserMedia to return a custom MediaStream
  2. Creates a shared AudioContext with a MediaStreamDestination
  3. When Teams calls getUserMedia({ audio: true }), it receives the destination's stream
  4. TTS audio is decoded and played through an AudioBufferSourceNode connected to this destination
  5. The result: TTS audio flows through the "microphone" channel into the meeting

Voice Greeting Flow

When the bot joins a meeting:

  1. Bot sends voiceGreeting message to Gateway with greeting text and language
  2. Gateway calls TTS (Google Cloud) with the configured voice
  3. Gateway sends playAudio back to bot via WebSocket
  4. Bot plays the audio through the mic stream
  5. Meeting participants hear the bot speaking

8. Data Flow (Transcript Pipeline)

Caption Mode (Primary)

Teams Web UI                Browser Bot                    Gateway                  Frontend
  │                            │                             │                        │
  │  Live captions appear      │                             │                        │
  │  in overlay div            │                             │                        │
  │  [data-tid="closed-        │                             │                        │
  │   caption-renderer-        │                             │                        │
  │   wrapper"]                │                             │                        │
  │                            │                             │                        │
  │  MutationObserver fires    │                             │                        │
  │───────────────────────────►│                             │                        │
  │                            │  Extract speaker + text     │                        │
  │                            │  Dedup, noise filter        │                        │
  │                            │                             │                        │
  │                            │  WS: transcript             │                        │
  │                            │  {speaker, text, isFinal}   │                        │
  │                            │────────────────────────────►│                        │
  │                            │                             │  Store in DB           │
  │                            │                             │  Add to context buffer │
  │                            │                             │  Emit SSE: transcript  │
  │                            │                             │───────────────────────►│
  │                            │                             │                        │  Display
  │                            │                             │                        │
  │                            │                             │  _shouldTriggerAnalysis│
  │                            │                             │  → if yes:             │
  │                            │                             │  SPEECH_TEAMS AI call  │
  │                            │                             │                        │
  │                            │                             │  Emit SSE: analysis    │
  │                            │                             │───────────────────────►│
  │                            │                             │                        │
  │                            │                             │  If shouldRespond:     │
  │                            │                             │  TTS → playAudio       │
  │                            │  ◄── WS: playAudio ────────│  and/or sendChatMessage│
  │                            │                             │───────────────────────►│
  │                            │                             │  Emit SSE: botResponse │

Audio Mode (Alternative)

When transferMode is audio:

  1. AudioCaptureProcedure wraps RTCPeerConnection to intercept incoming audio tracks
  2. Audio is downsampled to PCM16 mono 16kHz via ScriptProcessorNode
  3. 500ms chunks are base64-encoded and sent as audioChunk messages
  4. Gateway runs STT (Google Cloud Speech) on the chunks
  5. STT results enter the same transcript pipeline

Chat Messages

Teams Chat Panel            Browser Bot                    Gateway                  Frontend
  │                            │                             │                        │
  │  New message in            │                             │                        │
  │  [role="log"] container    │                             │                        │
  │                            │                             │                        │
  │  MutationObserver fires    │                             │                        │
  │───────────────────────────►│                             │                        │
  │                            │  Extract sender + text      │                        │
  │                            │  Dedup, noise filter        │                        │
  │                            │                             │                        │
  │                            │  WS: chatMessage            │                        │
  │                            │  {chat: {speaker, text}}    │                        │
  │                            │────────────────────────────►│                        │
  │                            │                             │  Process as transcript │
  │                            │                             │  (source: "chat")      │
  │                            │                             │  Same AI pipeline      │

9. WebSocket Protocol

Connection

The Browser Bot connects to the Gateway at:

wss://{gatewayHost}/api/teamsbot/{instanceId}/bot/ws/{sessionId}

Messages: Bot → Gateway

Type Fields When
transcript sessionId, transcript: {speaker, text, timestamp, isFinal} Caption captured
chatMessage sessionId, chat: {speaker, text, timestamp} Meeting chat message received
status sessionId, status, message? Bot state changes (connecting, in_lobby, joined, left, error)
audioChunk sessionId, audio: {data, sampleRate, format, timestamp} PCM16 audio captured (audio mode)
voiceGreeting sessionId, text, language Request TTS for join greeting
ping Keepalive (every 30s)

Messages: Gateway → Bot

Type Fields When
playAudio sessionId, audio: {data, format} TTS response or greeting to play
sendChatMessage sessionId, text Chat response to send
stopAudio sessionId AI detected "stop" intent
pong Reply to ping

10. AI Analysis Pipeline

Trigger Logic (_shouldTriggerAnalysis)

The Gateway decides when to call the AI model. Three trigger paths:

  1. Name Trigger (highest priority, overrides cooldown): If the bot's name (or first name, or a phonetically similar word) appears in the latest transcript segment → immediate trigger. Phonetic matching uses: same first letter, length difference ≤ 2, character overlap ≥ 60%.

  2. Cooldown Gate: If time since last AI call < triggerCooldownSeconds (default 5s) → no trigger.

  3. Periodic Trigger: If time since last AI call ≥ triggerIntervalSeconds (default 10s) → trigger.

AI Call (_handleSpeechTeams)

Model selection priority: gpt-4o-mini → claude-3-5-haiku → gpt-4o → claude-sonnet-4-5 → fastest available DATA_ANALYSE model.

System prompt (built dynamically with bot name):

  • Role: "You are '{botName}', an AI participant in a Teams meeting"
  • Respond ONLY when directly addressed by name (including phonetic variants)
  • Match the language of the speaker who addressed you
  • 12 sentence responses max
  • Detect "stop" commands in any language
  • Output strict JSON: {shouldRespond, responseText, reasoning, detectedIntent}

Context window: Up to contextWindowSegments (default 20) recent transcript lines, prefixed with BOT_NAME: and optional SESSION_CONTEXT:.

Response Handling

Intent Action
stop Send stopAudio to bot, no response
addressed / question / proactive Auto mode: TTS + chat (per config). Manual mode: SSE suggestedResponse only
none No action

11. Authentication & Credentials

Credential Storage

System bot credentials are stored per mandate:

  • Email: Stored in plaintext in TeamsbotSystemBot.email
  • Password: Encrypted with Fernet (AES-128-CBC), key derived via PBKDF2 from a master key
  • Decryption: Gateway decrypts at session start and passes credentials to Browser Bot via HTTP POST body

Microsoft Login Flow

1. Navigate to teams.microsoft.com
2. Redirect to login.microsoftonline.com
3. Enter email in #i0116 input → Click "Next" (input[type="submit"])
4. Wait for password page (may redirect to org-specific login)
5. Enter password → Click "Sign in" (#idSIButton9)
6. Handle "Stay signed in?" → Click "Yes"
7. Teams loads with authenticated session

Anti-Detection

The bot uses several measures to appear as a regular browser:

  • rebrowser-playwright with puppeteer-extra-plugin-stealth
  • Headful mode via Xvfb (Teams blocks headless Chromium)
  • Standard Chrome launch arguments (disable automation, sandbox flags)
  • Real viewport (1280x720), locale/timezone matching

12. Teams DOM Interaction

Live Captions

Enable flow: "More" button (#callingButtons-showMoreBtn) → "Language and speech" (#LanguageSpeechMenuControl-id) → "Show live captions" (button with aria-checked)

Scraping target: div[data-tid="closed-caption-renderer-wrapper"]

Extraction strategies:

  • Strategy A: [data-tid] containers with speaker in <span> title/text, content in adjacent spans
  • Strategy B: Structural fallback scanning <div> trees for speaker + text patterns

Noise filter: Ignores entries matching known non-transcript patterns (buttons, timestamps without content, single-word UI elements).

Chat Panel

Open: Click button[id="chat-button"] — but ONLY if aria-pressed !== "true" (prevents toggle-off)

Scraping target: Container with [role="log"]

Send messages: Find CKEditor input ([data-tid="ckeditor-replyConversation"] or div[role="textbox"]), type text, press Enter.

Meeting Controls

Control Selector Notes
Mic toggle input[data-tid="toggle-audio"] or input[role="switch"][title*="mic" i] Check checked state before toggling
Hangup button[id="hangup-button"] or #hangup-button
More menu button[id="callingButtons-showMoreBtn"]
Join now button[data-tid="prejoin-join-button"]

13. Deployment

Infrastructure

Component Platform Details
Browser Bot Azure Container Apps 2 CPU, 4GB RAM, Xvfb for headful mode
Gateway Azure Container Apps Shared instance (cae-poweron-shared)
Database Azure Cosmos DB Sessions, transcripts, responses, system bots
Container Registry Azure Container Registry Images tagged by Git SHA

CI/CD Pipeline

Trigger: Push to main branch

Steps:

  1. Checkout code
  2. Docker login to ACR
  3. Build image with Playwright base + Xvfb
  4. Push with tags latest and {git-sha}
  5. Azure login
  6. az containerapp update with --revision-suffix deploy-{sha-prefix} (forces new revision)
  7. az containerapp revision restart on latest revision (ensures container starts)

Environment Variables (Production)

Variable Value
PORT 4100
NODE_ENV production
BOT_HEADLESS false (headful via Xvfb)
GATEWAY_WS_URL wss://gateway-int.poweron-center.net/api/teamsbot/ws
DISPLAY :99 (Xvfb)

14. Configuration Reference

Browser Bot Environment

Variable Description Default
PORT HTTP server port 4100
GATEWAY_WS_URL Gateway WebSocket URL wss://gateway-int.poweron-center.net/api/teamsbot/ws
BOT_NAME Default display name PowerOn AI
BOT_HEADLESS Run headless true (false in Docker)
LOG_LEVEL Winston log level info
SCREENSHOT_ON_ERROR Screenshots on errors true

Gateway Feature Config

Field Default Range Description
responseMode auto auto, manual, transcribeOnly
responseChannel voice voice, chat, both
transferMode auto caption, audio, auto
language de-DE BCP-47 language tag
triggerIntervalSeconds 10 360 Periodic AI trigger
triggerCooldownSeconds 5 130 Min gap between triggers
contextWindowSegments 20 5100 Transcript lines for AI context

Timeouts (Hardcoded in config.ts)

Timeout Value Purpose
lobbyWait 120s Max time waiting in lobby
joinTimeout 30s Max time for join flow
captionsEnable 10s Max time to enable captions
pageLoad 30s General page load timeout

15. Known Constraints & Lessons Learned

Things That DO NOT Work

Approach Why It Fails Alternative
Fake video injection (--use-file-for-fake-video-capture) Crashes Chromium renderer when WebRTC audio starts Keep camera OFF
Headless Chromium Teams detects and blocks headless browsers Use headful + Xvfb
_setSpokenLanguage() for live captions Language dropdown doesn't exist; captions use organizer's language setting Skip entirely
str(PythonEnum) for comparison Returns "ClassName.value", not "value" Use enum.value
Blindly clicking chat button Toggles panel off if already open Check aria-pressed first
config.botAccountEmail TeamsbotConfig doesn't have this field Use getattr(self, '_botAccountEmail', None)
model_copy(update={...}) for enum fields Pydantic v2 doesn't coerce strings to enums on copy Normalize after merge

Speech Recognition Artifacts

The bot's name gets mangled by Teams live captions speech recognition. Known variants for "Nyla": Naila, Maila, Neela, Leila, Nila. The system handles this via:

  • AI prompt: Explicit warning about phonetic distortion
  • Trigger logic: Phonetic similarity check (first letter match, length ±2, character overlap ≥ 60%)

SSE Event Queue: Create On-Demand

_emitSessionEvent() in the Gateway must create the session's event queue (_sessionEvents[sessionId] = asyncio.Queue()) if it doesn't already exist. If an event is emitted before the frontend SSE consumer connects, the queue won't exist and events are silently dropped. Always guard with on-demand creation.

Performance Characteristics

Operation Typical Duration
Microsoft login ~14s
Teams page load after auth ~10s
Pre-join to in-meeting ~5s
Caption enable flow ~8s
AI analysis (GPT-4o-mini) ~2.5s
TTS generation ~1s
Full join to first greeting ~40s