ValueOn AG 6b4172c46a docs: add documentation, update README, add marketing page

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-18 17:51:28 +01:00

39 KiB

Raw Blame History

Teams Browser Bot — Technical Documentation

Last updated: 2026-02-18

Business Story & Vision
Use Cases
System Architecture
Components
Data Model
Call Flow
Voice Flow (TTS Playback)
Data Flow (Transcript Pipeline)
WebSocket Protocol
AI Analysis Pipeline
Authentication & Credentials
Teams DOM Interaction
Deployment
Configuration Reference
Known Constraints & Lessons Learned

1. Business Story & Vision

Problem

Organizations use Microsoft Teams for meetings where important decisions, discussions, and action items occur. Without an automated assistant, teams rely on manual note-taking, miss context from earlier discussions, and lose the ability to query meeting content in real time.

Solution

The Teams Browser Bot is an AI-powered meeting participant that:

Joins any Teams meeting as an authenticated user (or anonymous guest)
Listens by capturing live captions from the Teams web interface
Understands by analyzing transcript segments through an AI model (GPT-4o-mini / Claude)
Responds via voice (TTS played through the microphone channel) and/or chat messages
Documents by persisting full transcripts and generating meeting summaries

The bot operates as a real participant — it appears in the meeting roster, can speak, and can write in the meeting chat.

Key Differentiator

Unlike Microsoft Graph Communications SDK bots (which require tenant admin registration and complex media handling), this bot uses browser automation (Playwright + Chromium) to join meetings as a regular web user. This enables:

Multi-tenant support: Join any meeting from any organization
No tenant admin approval required
Standard web technologies: DOM scraping, getUserMedia, WebRTC
Full meeting interaction: chat, captions, audio playback

2. Use Cases

UC-1: AI Meeting Assistant

A user starts a Teams meeting and invites the bot. The bot joins, listens to the conversation, and responds when addressed by name ("Hey Nyla, what do you think about...?"). Responses are delivered via voice and/or chat based on configuration.

UC-2: Live Transcription

The bot captures all live captions with speaker attribution and streams them to the frontend UI in real time via SSE. Users not in the meeting can follow along.

UC-3: Meeting Summary

When the session ends, the bot generates an AI-powered summary of the entire meeting, stored on the session record.

UC-4: Voice Test

Before joining a real meeting, integrators can test the TTS pipeline via a dedicated endpoint that generates and returns an audio sample.

UC-5: Multi-Bot Operations

Multiple bot sessions can run concurrently — each in its own browser instance, each connected to a different meeting with separate WebSocket channels.

3. System Architecture

┌──────────────────────────────────────────────────────────────────────────────────┐
│                              System Overview                                      │
│                                                                                  │
│  ┌──────────┐         SSE            ┌────────────────┐        WebSocket         │
│  │ Frontend │◄───────────────────────│    Gateway      │◄──────────────────────┐ │
│  │ (React)  │   transcripts,         │   (Python /     │   transcripts,        │ │
│  │          │   botResponses,        │    FastAPI)     │   chatMessages,       │ │
│  │          │   analysis,            │                 │   status,             │ │
│  │          │   status               │  - Session Mgmt │   audioChunks,        │ │
│  │          │────────────────────────►  - AI Analysis  │   voiceGreeting       │ │
│  │          │   REST (start/stop/    │  - TTS (Google) │                       │ │
│  │          │   config)              │  - Billing      │   playAudio,          │ │
│  └──────────┘                        │  - DB (Cosmos)  │   sendChatMessage,    │ │
│                                      │                 │   stopAudio           │ │
│                                      └────────┬───────┘──────────────────────►│ │
│                                               │ HTTP                          │ │
│                                               │ (join/leave/status)           │ │
│                                               ▼                               │ │
│                                      ┌────────────────┐                       │ │
│                                      │  Browser Bot   │◄──────────────────────┘ │
│                                      │  (Node.js +    │                         │
│                                      │   Playwright)  │                         │
│                                      │                │                         │
│                                      │  ┌──────────┐ │                         │
│                                      │  │ Chromium  │ │                         │
│                                      │  │ (Teams    │ │                         │
│                                      │  │  Web App) │ │                         │
│                                      │  └──────────┘ │                         │
│                                      └────────────────┘                         │
└──────────────────────────────────────────────────────────────────────────────────┘

Communication Paths

Path	Protocol	Direction	Purpose
Frontend ↔ Gateway	REST (HTTPS)	Bidirectional	Session management, config, system bots
Frontend ← Gateway	SSE	Gateway → Frontend	Real-time transcript & response stream
Gateway ↔ Browser Bot	WebSocket	Bidirectional	Transcripts, audio, status, chat, commands
Gateway → Browser Bot	HTTP POST	Gateway → Bot	Session creation (`/api/bot`), leave, status
Browser Bot ↔ Teams	Chromium/WebRTC	Bidirectional	Meeting participation, captions, chat, audio

4. Components

4.1 Browser Bot Service (this repository)

File	Responsibility
`src/index.ts`	Entry point: bootstrap, shutdown handlers
`src/config.ts`	Environment config with defaults and timeouts
`src/sessionManager.ts`	Session lifecycle: create, end, play audio, shutdown
`src/server/httpServer.ts`	Express HTTP API (health, join, leave, status, auth tests)
`src/server/gatewayClient.ts`	Alternative WebSocket client for Gateway (legacy path)
`src/types/index.ts`	TypeScript interfaces for all message types
`src/utils/logger.ts`	Winston logger with session-scoped child loggers

Bot Modules (`src/bot/`)

Module	Class	Responsibility
`orchestrator.ts`	`BotOrchestrator`	Main coordinator: browser launch, join flow, keepalive, greeting, Gateway WebSocket, state machine
`joinProcedure.ts`	`JoinProcedure`	Anonymous join: launcher page, name entry, "Join now", lobby handling
`authProcedure.ts`	`AuthProcedure`	Microsoft login: email → password → MFA check → "Stay signed in"
`captionsProcedure.ts`	`CaptionsProcedure`	Enable live captions via "More" menu, MutationObserver on caption DOM
`chatProcedure.ts`	`ChatProcedure`	Open chat panel, MutationObserver on `[role="log"]`, send messages via CKEditor
`audioProcedure.ts`	`AudioProcedure`	getUserMedia override, AudioContext, queue-based MP3/WAV/PCM playback into mic stream
`audioCaptureProcedure.ts`	`AudioCaptureProcedure`	RTCPeerConnection wrapper, ScriptProcessor, PCM16 16kHz capture, 500ms polling
`backgroundProcedure.ts`	`BackgroundProcedure`	Virtual background image upload (pre-join, currently unused)
`meetingUrlParser.ts`	(functions)	URL validation, classic vs short format, redirect resolution
`authTestProcedure.ts`	(functions)	Anti-detection test variants for debugging auth flow

4.2 Gateway (external, Python/FastAPI)

File	Responsibility
`routeFeatureTeamsbot.py`	REST routes, SSE stream, WebSocket endpoint
`service.py`	Business logic: transcript processing, AI triggers, TTS, meeting summary
`datamodelTeamsbot.py`	Pydantic models, enums
`interfaceFeatureTeamsbot.py`	Database interface (Cosmos DB)
`config.py`	Feature instance config load/save
`browserBotConnector.py`	HTTP client for Browser Bot API

4.3 Frontend (external, React/TypeScript)

The frontend provides a session management UI with:

Meeting link input and session start/stop controls
Real-time transcript display (via SSE)
Bot response log with reasoning, model, cost
Configuration panel (bot name, response channel, AI prompt, etc.)
System bot management (email, password)

5. Data Model

5.1 Enums

Enum	Values	Description
`TeamsbotSessionStatus`	`pending`, `joining`, `active`, `leaving`, `ended`, `error`	Session lifecycle state
`TeamsbotResponseType`	`audio`, `chat`, `both`	How the bot responded
`TeamsbotResponseChannel`	`voice`, `chat`, `both`	Configured response channel (user setting)
`TeamsbotResponseMode`	`auto`, `manual`, `transcribeOnly`	Whether bot responds automatically
`TeamsbotDetectedIntent`	`addressed`, `question`, `proactive`, `stop`, `none`	AI-detected intent
`TeamsbotJoinMode`	`systemBot`, `anonymous`, `userAccount`	How the bot joins the meeting
`TeamsbotTransferMode`	`caption`, `audio`, `auto`	How transcript data is captured

5.2 Core Entities

TeamsbotSession

Field	Type	Description
`id`	UUID	Session identifier
`instanceId`	string	Feature instance
`mandateId`	string	Tenant/mandate
`meetingLink`	string	Teams meeting URL
`botName`	string	Display name in meeting
`status`	TeamsbotSessionStatus	Current state
`startedAt` / `endedAt`	datetime	Timestamps
`startedByUserId`	string	Who started the session
`sessionContext`	string	Optional context for AI
`summary`	string	AI-generated meeting summary
`errorMessage`	string	Error details if failed
`transcriptSegmentCount`	int	Running count
`botResponseCount`	int	Running count

TeamsbotTranscript

Field	Type	Description
`id`	UUID	Segment identifier
`sessionId`	UUID	Parent session
`speaker`	string	Speaker name from captions
`text`	string	Transcript text
`timestamp`	datetime	When spoken
`confidence`	float (0–1)	Confidence score
`language`	string	Detected language
`isFinal`	bool	Finalized segment

TeamsbotBotResponse

Field	Type	Description
`id`	UUID	Response identifier
`sessionId`	UUID	Parent session
`responseText`	string	What the bot said
`responseType`	TeamsbotResponseType	Voice, chat, or both
`detectedIntent`	TeamsbotDetectedIntent	Why it responded
`reasoning`	string	AI reasoning chain
`modelName`	string	AI model used
`processingTime`	float	Seconds
`priceCHF`	float	Cost in CHF

TeamsbotSystemBot

Field	Type	Description
`id`	UUID	Bot account identifier
`mandateId`	string	Tenant scope
`name`	string	Display name
`email`	string	Microsoft account email
`encryptedPassword`	string	Fernet-encrypted password
`isActive`	bool	Whether this bot is the active one

TeamsbotConfig (Feature Instance Level)

Field	Type	Default	Description
`botName`	string	`"PowerOn AI"`	Default bot name (overridden by system bot)
`aiSystemPrompt`	string	`""`	Custom AI instructions
`responseMode`	enum	`auto`	auto / manual / transcribeOnly
`responseChannel`	enum	`voice`	voice / chat / both
`transferMode`	enum	`auto`	caption / audio / auto
`language`	string	`"de-DE"`	Bot language
`voiceId`	string	`null`	TTS voice identifier
`browserBotUrl`	string	`null`	Browser Bot service URL
`triggerIntervalSeconds`	int	`10`	Periodic AI trigger interval
`triggerCooldownSeconds`	int	`5`	Min time between triggers
`contextWindowSegments`	int	`20`	Transcript segments sent to AI

TeamsbotUserSettings (Per-User Overrides)

Mirrors TeamsbotConfig fields (all optional). Merged over instance config with _getEffectiveConfig().

6. Call Flow

6.1 Session Start (Authenticated Join)

Frontend                    Gateway                     Browser Bot              Teams Web
   │                           │                            │                       │
   │  POST /sessions           │                            │                       │
   │  {meetingLink, botName}   │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │  Resolve system bot        │                       │
   │                           │  Decrypt password          │                       │
   │                           │  Derive bot name from email│                       │
   │                           │                            │                       │
   │                           │  POST /api/bot             │                       │
   │                           │  {sessionId, meetingUrl,   │                       │
   │                           │   botAccountEmail,         │                       │
   │                           │   botAccountPassword}      │                       │
   │                           │───────────────────────────►│                       │
   │                           │                            │                       │
   │  ◄── session created ─────│                            │  Launch Chromium      │
   │                           │                            │  (Xvfb, headful)     │
   │  GET /sessions/:id/stream │                            │                       │
   │  (SSE)                    │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │◄─── WebSocket connected ───│                       │
   │                           │                            │                       │
   │                           │                            │  Navigate to          │
   │                           │                            │  teams.microsoft.com  │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │                           │                            │  MS Login:            │
   │                           │                            │  email → password     │
   │                           │                            │  → "Stay signed in"   │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │                           │                            │  Teams loads          │
   │                           │                            │  Click "Join" header  │
   │                           │                            │◄─────────────────────│
   │                           │                            │                       │
   │                           │                            │  Pre-join screen:     │
   │                           │                            │  Ensure mic ON        │
   │                           │                            │  Camera stays OFF     │
   │                           │                            │  Click "Join now"     │
   │                           │                            │─────────────────────►│
   │                           │                            │                       │
   │  ◄── SSE: statusChange ───│◄─── status: "joined" ─────│  In meeting!          │
   │      {status: "active"}   │                            │                       │
   │                           │                            │  Start keepalive      │
   │                           │                            │  Init AudioContext    │
   │                           │                            │  Enable captions      │
   │                           │                            │  Enable chat          │
   │                           │                            │  Send greeting        │
   │                           │                            │  (chat + voice TTS)   │

6.2 Session End

Frontend                    Gateway                     Browser Bot              Teams
   │                           │                            │                       │
   │  POST /sessions/:id/stop  │                            │                       │
   │──────────────────────────►│                            │                       │
   │                           │  POST /api/bot/:id/leave   │                       │
   │                           │───────────────────────────►│                       │
   │                           │                            │  Stop keepalive       │
   │                           │                            │  Stop audio capture   │
   │                           │                            │  Unsubscribe captions │
   │                           │                            │  Unsubscribe chat     │
   │                           │                            │  Click hangup button  │
   │                           │                            │─────────────────────►│
   │                           │                            │  Close browser        │
   │                           │                            │  Close WS             │
   │                           │◄─── status: "left" ────────│                       │
   │                           │                            │                       │
   │                           │  Generate meeting summary  │                       │
   │                           │  (AI on full transcript)   │                       │
   │                           │  Update session → "ended"  │                       │
   │  ◄── SSE: statusChange ───│                            │                       │
   │      {status: "ended"}    │                            │                       │

7. Voice Flow (TTS Playback)

How Audio Reaches Meeting Participants

Gateway                      Browser Bot                    Chromium / Teams
  │                             │                              │
  │  AI generates response      │                              │
  │  TTS (Google Cloud) →       │                              │
  │  base64 MP3                 │                              │
  │                             │                              │
  │  WS: playAudio              │                              │
  │  {audio: {data, format}}    │                              │
  │────────────────────────────►│                              │
  │                             │  Queue audio                 │
  │                             │  Decode base64 → ArrayBuffer │
  │                             │  decodeAudioData(buffer)     │
  │                             │                              │
  │                             │  Create AudioBufferSource    │
  │                             │  Connect to                  │
  │                             │  MediaStreamDestination      │
  │                             │  (overridden getUserMedia)   │
  │                             │─────────────────────────────►│
  │                             │                              │  WebRTC sends audio
  │                             │                              │  to all participants
  │                             │                              │  via microphone channel

Audio Override Mechanism

The AudioProcedure injects a script before page load (page.addInitScript) that:

Overrides navigator.mediaDevices.getUserMedia to return a custom MediaStream
Creates a shared AudioContext with a MediaStreamDestination
When Teams calls getUserMedia({ audio: true }), it receives the destination's stream
TTS audio is decoded and played through an AudioBufferSourceNode connected to this destination
The result: TTS audio flows through the "microphone" channel into the meeting

Voice Greeting Flow

When the bot joins a meeting:

Bot sends voiceGreeting message to Gateway with greeting text and language
Gateway calls TTS (Google Cloud) with the configured voice
Gateway sends playAudio back to bot via WebSocket
Bot plays the audio through the mic stream
Meeting participants hear the bot speaking

8. Data Flow (Transcript Pipeline)

Caption Mode (Primary)

Teams Web UI                Browser Bot                    Gateway                  Frontend
  │                            │                             │                        │
  │  Live captions appear      │                             │                        │
  │  in overlay div            │                             │                        │
  │  [data-tid="closed-        │                             │                        │
  │   caption-renderer-        │                             │                        │
  │   wrapper"]                │                             │                        │
  │                            │                             │                        │
  │  MutationObserver fires    │                             │                        │
  │───────────────────────────►│                             │                        │
  │                            │  Extract speaker + text     │                        │
  │                            │  Dedup, noise filter        │                        │
  │                            │                             │                        │
  │                            │  WS: transcript             │                        │
  │                            │  {speaker, text, isFinal}   │                        │
  │                            │────────────────────────────►│                        │
  │                            │                             │  Store in DB           │
  │                            │                             │  Add to context buffer │
  │                            │                             │  Emit SSE: transcript  │
  │                            │                             │───────────────────────►│
  │                            │                             │                        │  Display
  │                            │                             │                        │
  │                            │                             │  _shouldTriggerAnalysis│
  │                            │                             │  → if yes:             │
  │                            │                             │  SPEECH_TEAMS AI call  │
  │                            │                             │                        │
  │                            │                             │  Emit SSE: analysis    │
  │                            │                             │───────────────────────►│
  │                            │                             │                        │
  │                            │                             │  If shouldRespond:     │
  │                            │                             │  TTS → playAudio       │
  │                            │  ◄── WS: playAudio ────────│  and/or sendChatMessage│
  │                            │                             │───────────────────────►│
  │                            │                             │  Emit SSE: botResponse │

Audio Mode (Alternative)

When transferMode is audio:

AudioCaptureProcedure wraps RTCPeerConnection to intercept incoming audio tracks
Audio is downsampled to PCM16 mono 16kHz via ScriptProcessorNode
500ms chunks are base64-encoded and sent as audioChunk messages
Gateway runs STT (Google Cloud Speech) on the chunks
STT results enter the same transcript pipeline

Chat Messages

Teams Chat Panel            Browser Bot                    Gateway                  Frontend
  │                            │                             │                        │
  │  New message in            │                             │                        │
  │  [role="log"] container    │                             │                        │
  │                            │                             │                        │
  │  MutationObserver fires    │                             │                        │
  │───────────────────────────►│                             │                        │
  │                            │  Extract sender + text      │                        │
  │                            │  Dedup, noise filter        │                        │
  │                            │                             │                        │
  │                            │  WS: chatMessage            │                        │
  │                            │  {chat: {speaker, text}}    │                        │
  │                            │────────────────────────────►│                        │
  │                            │                             │  Process as transcript │
  │                            │                             │  (source: "chat")      │
  │                            │                             │  Same AI pipeline      │

9. WebSocket Protocol

Connection

The Browser Bot connects to the Gateway at:

wss://{gatewayHost}/api/teamsbot/{instanceId}/bot/ws/{sessionId}

Messages: Bot → Gateway

Type	Fields	When
`transcript`	`sessionId`, `transcript: {speaker, text, timestamp, isFinal}`	Caption captured
`chatMessage`	`sessionId`, `chat: {speaker, text, timestamp}`	Meeting chat message received
`status`	`sessionId`, `status`, `message?`	Bot state changes (connecting, in_lobby, joined, left, error)
`audioChunk`	`sessionId`, `audio: {data, sampleRate, format, timestamp}`	PCM16 audio captured (audio mode)
`voiceGreeting`	`sessionId`, `text`, `language`	Request TTS for join greeting
`ping`	—	Keepalive (every 30s)

Messages: Gateway → Bot

Type	Fields	When
`playAudio`	`sessionId`, `audio: {data, format}`	TTS response or greeting to play
`sendChatMessage`	`sessionId`, `text`	Chat response to send
`stopAudio`	`sessionId`	AI detected "stop" intent
`pong`	—	Reply to ping

10. AI Analysis Pipeline

Trigger Logic (`_shouldTriggerAnalysis`)

The Gateway decides when to call the AI model. Three trigger paths:

Name Trigger (highest priority, overrides cooldown): If the bot's name (or first name, or a phonetically similar word) appears in the latest transcript segment → immediate trigger. Phonetic matching uses: same first letter, length difference ≤ 2, character overlap ≥ 60%.
Cooldown Gate: If time since last AI call < triggerCooldownSeconds (default 5s) → no trigger.
Periodic Trigger: If time since last AI call ≥ triggerIntervalSeconds (default 10s) → trigger.

AI Call (`_handleSpeechTeams`)

Model selection priority: gpt-4o-mini → claude-3-5-haiku → gpt-4o → claude-sonnet-4-5 → fastest available DATA_ANALYSE model.

System prompt (built dynamically with bot name):

Role: "You are '{botName}', an AI participant in a Teams meeting"
Respond ONLY when directly addressed by name (including phonetic variants)
Match the language of the speaker who addressed you
1–2 sentence responses max
Detect "stop" commands in any language
Output strict JSON: {shouldRespond, responseText, reasoning, detectedIntent}

Context window: Up to contextWindowSegments (default 20) recent transcript lines, prefixed with BOT_NAME: and optional SESSION_CONTEXT:.

Response Handling

Intent	Action
`stop`	Send `stopAudio` to bot, no response
`addressed` / `question` / `proactive`	Auto mode: TTS + chat (per config). Manual mode: SSE `suggestedResponse` only
`none`	No action

11. Authentication & Credentials

Credential Storage

System bot credentials are stored per mandate:

Email: Stored in plaintext in TeamsbotSystemBot.email
Password: Encrypted with Fernet (AES-128-CBC), key derived via PBKDF2 from a master key
Decryption: Gateway decrypts at session start and passes credentials to Browser Bot via HTTP POST body

1. Navigate to teams.microsoft.com
2. Redirect to login.microsoftonline.com
3. Enter email in #i0116 input → Click "Next" (input[type="submit"])
4. Wait for password page (may redirect to org-specific login)
5. Enter password → Click "Sign in" (#idSIButton9)
6. Handle "Stay signed in?" → Click "Yes"
7. Teams loads with authenticated session

Anti-Detection

The bot uses several measures to appear as a regular browser:

rebrowser-playwright with puppeteer-extra-plugin-stealth
Headful mode via Xvfb (Teams blocks headless Chromium)
Standard Chrome launch arguments (disable automation, sandbox flags)
Real viewport (1280x720), locale/timezone matching

12. Teams DOM Interaction

Live Captions

Enable flow: "More" button (#callingButtons-showMoreBtn) → "Language and speech" (#LanguageSpeechMenuControl-id) → "Show live captions" (button with aria-checked)

Scraping target: div[data-tid="closed-caption-renderer-wrapper"]

Extraction strategies:

Strategy A: [data-tid] containers with speaker in <span> title/text, content in adjacent spans
Strategy B: Structural fallback scanning <div> trees for speaker + text patterns

Noise filter: Ignores entries matching known non-transcript patterns (buttons, timestamps without content, single-word UI elements).

Chat Panel

Open: Click button[id="chat-button"] — but ONLY if aria-pressed !== "true" (prevents toggle-off)

Scraping target: Container with [role="log"]

Send messages: Find CKEditor input ([data-tid="ckeditor-replyConversation"] or div[role="textbox"]), type text, press Enter.

Meeting Controls

Control	Selector	Notes
Mic toggle	`input[data-tid="toggle-audio"]` or `input[role="switch"][title*="mic" i]`	Check `checked` state before toggling
Hangup	`button[id="hangup-button"]` or `#hangup-button`
More menu	`button[id="callingButtons-showMoreBtn"]`
Join now	`button[data-tid="prejoin-join-button"]`

13. Deployment

Infrastructure

Component	Platform	Details
Browser Bot	Azure Container Apps	2 CPU, 4GB RAM, Xvfb for headful mode
Gateway	Azure Container Apps	Shared instance (`cae-poweron-shared`)
Database	Azure Cosmos DB	Sessions, transcripts, responses, system bots
Container Registry	Azure Container Registry	Images tagged by Git SHA

CI/CD Pipeline

Trigger: Push to main branch

Steps:

Checkout code
Docker login to ACR
Build image with Playwright base + Xvfb
Push with tags latest and {git-sha}
Azure login
az containerapp update with --revision-suffix deploy-{sha-prefix} (forces new revision)
az containerapp revision restart on latest revision (ensures container starts)

Environment Variables (Production)

Variable	Value
`PORT`	`4100`
`NODE_ENV`	`production`
`BOT_HEADLESS`	`false` (headful via Xvfb)
`GATEWAY_WS_URL`	`wss://gateway-int.poweron-center.net/api/teamsbot/ws`
`DISPLAY`	`:99` (Xvfb)

14. Configuration Reference

Browser Bot Environment

Variable	Description	Default
`PORT`	HTTP server port	`4100`
`GATEWAY_WS_URL`	Gateway WebSocket URL	`wss://gateway-int.poweron-center.net/api/teamsbot/ws`
`BOT_NAME`	Default display name	`PowerOn AI`
`BOT_HEADLESS`	Run headless	`true` (`false` in Docker)
`LOG_LEVEL`	Winston log level	`info`
`SCREENSHOT_ON_ERROR`	Screenshots on errors	`true`

Gateway Feature Config

Field	Default	Range	Description
`responseMode`	`auto`	—	auto, manual, transcribeOnly
`responseChannel`	`voice`	—	voice, chat, both
`transferMode`	`auto`	—	caption, audio, auto
`language`	`de-DE`	—	BCP-47 language tag
`triggerIntervalSeconds`	`10`	3–60	Periodic AI trigger
`triggerCooldownSeconds`	`5`	1–30	Min gap between triggers
`contextWindowSegments`	`20`	5–100	Transcript lines for AI context

Timeouts (Hardcoded in config.ts)

Timeout	Value	Purpose
`lobbyWait`	120s	Max time waiting in lobby
`joinTimeout`	30s	Max time for join flow
`captionsEnable`	10s	Max time to enable captions
`pageLoad`	30s	General page load timeout

15. Known Constraints & Lessons Learned

Things That DO NOT Work

Approach	Why It Fails	Alternative
Fake video injection (`--use-file-for-fake-video-capture`)	Crashes Chromium renderer when WebRTC audio starts	Keep camera OFF
Headless Chromium	Teams detects and blocks headless browsers	Use headful + Xvfb
`_setSpokenLanguage()` for live captions	Language dropdown doesn't exist; captions use organizer's language setting	Skip entirely
`str(PythonEnum)` for comparison	Returns `"ClassName.value"`, not `"value"`	Use `enum.value`
Blindly clicking chat button	Toggles panel off if already open	Check `aria-pressed` first
`config.botAccountEmail`	`TeamsbotConfig` doesn't have this field	Use `getattr(self, '_botAccountEmail', None)`
`model_copy(update={...})` for enum fields	Pydantic v2 doesn't coerce strings to enums on copy	Normalize after merge

Speech Recognition Artifacts

The bot's name gets mangled by Teams live captions speech recognition. Known variants for "Nyla": Naila, Maila, Neela, Leila, Nila. The system handles this via:

AI prompt: Explicit warning about phonetic distortion
Trigger logic: Phonetic similarity check (first letter match, length ±2, character overlap ≥ 60%)

SSE Event Queue: Create On-Demand

_emitSessionEvent() in the Gateway must create the session's event queue (_sessionEvents[sessionId] = asyncio.Queue()) if it doesn't already exist. If an event is emitted before the frontend SSE consumer connects, the queue won't exist and events are silently dropped. Always guard with on-demand creation.

Performance Characteristics

Operation	Typical Duration
Microsoft login	~14s
Teams page load after auth	~10s
Pre-join to in-meeting	~5s
Caption enable flow	~8s
AI analysis (GPT-4o-mini)	~2.5s
TTS generation	~1s
Full join to first greeting	~40s

39 KiB Raw Blame History Unescape Escape