39 KiB
Teams Browser Bot — Technical Documentation
Last updated: 2026-02-18
Table of Contents
- Business Story & Vision
- Use Cases
- System Architecture
- Components
- Data Model
- Call Flow
- Voice Flow (TTS Playback)
- Data Flow (Transcript Pipeline)
- WebSocket Protocol
- AI Analysis Pipeline
- Authentication & Credentials
- Teams DOM Interaction
- Deployment
- Configuration Reference
- Known Constraints & Lessons Learned
1. Business Story & Vision
Problem
Organizations use Microsoft Teams for meetings where important decisions, discussions, and action items occur. Without an automated assistant, teams rely on manual note-taking, miss context from earlier discussions, and lose the ability to query meeting content in real time.
Solution
The Teams Browser Bot is an AI-powered meeting participant that:
- Joins any Teams meeting as an authenticated user (or anonymous guest)
- Listens by capturing live captions from the Teams web interface
- Understands by analyzing transcript segments through an AI model (GPT-4o-mini / Claude)
- Responds via voice (TTS played through the microphone channel) and/or chat messages
- Documents by persisting full transcripts and generating meeting summaries
The bot operates as a real participant — it appears in the meeting roster, can speak, and can write in the meeting chat.
Key Differentiator
Unlike Microsoft Graph Communications SDK bots (which require tenant admin registration and complex media handling), this bot uses browser automation (Playwright + Chromium) to join meetings as a regular web user. This enables:
- Multi-tenant support: Join any meeting from any organization
- No tenant admin approval required
- Standard web technologies: DOM scraping, getUserMedia, WebRTC
- Full meeting interaction: chat, captions, audio playback
2. Use Cases
UC-1: AI Meeting Assistant
A user starts a Teams meeting and invites the bot. The bot joins, listens to the conversation, and responds when addressed by name ("Hey Nyla, what do you think about...?"). Responses are delivered via voice and/or chat based on configuration.
UC-2: Live Transcription
The bot captures all live captions with speaker attribution and streams them to the frontend UI in real time via SSE. Users not in the meeting can follow along.
UC-3: Meeting Summary
When the session ends, the bot generates an AI-powered summary of the entire meeting, stored on the session record.
UC-4: Voice Test
Before joining a real meeting, integrators can test the TTS pipeline via a dedicated endpoint that generates and returns an audio sample.
UC-5: Multi-Bot Operations
Multiple bot sessions can run concurrently — each in its own browser instance, each connected to a different meeting with separate WebSocket channels.
3. System Architecture
┌──────────────────────────────────────────────────────────────────────────────────┐
│ System Overview │
│ │
│ ┌──────────┐ SSE ┌────────────────┐ WebSocket │
│ │ Frontend │◄───────────────────────│ Gateway │◄──────────────────────┐ │
│ │ (React) │ transcripts, │ (Python / │ transcripts, │ │
│ │ │ botResponses, │ FastAPI) │ chatMessages, │ │
│ │ │ analysis, │ │ status, │ │
│ │ │ status │ - Session Mgmt │ audioChunks, │ │
│ │ │────────────────────────► - AI Analysis │ voiceGreeting │ │
│ │ │ REST (start/stop/ │ - TTS (Google) │ │ │
│ │ │ config) │ - Billing │ playAudio, │ │
│ └──────────┘ │ - DB (Cosmos) │ sendChatMessage, │ │
│ │ │ stopAudio │ │
│ └────────┬───────┘──────────────────────►│ │
│ │ HTTP │ │
│ │ (join/leave/status) │ │
│ ▼ │ │
│ ┌────────────────┐ │ │
│ │ Browser Bot │◄──────────────────────┘ │
│ │ (Node.js + │ │
│ │ Playwright) │ │
│ │ │ │
│ │ ┌──────────┐ │ │
│ │ │ Chromium │ │ │
│ │ │ (Teams │ │ │
│ │ │ Web App) │ │ │
│ │ └──────────┘ │ │
│ └────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────────┘
Communication Paths
| Path | Protocol | Direction | Purpose |
|---|---|---|---|
| Frontend ↔ Gateway | REST (HTTPS) | Bidirectional | Session management, config, system bots |
| Frontend ← Gateway | SSE | Gateway → Frontend | Real-time transcript & response stream |
| Gateway ↔ Browser Bot | WebSocket | Bidirectional | Transcripts, audio, status, chat, commands |
| Gateway → Browser Bot | HTTP POST | Gateway → Bot | Session creation (/api/bot), leave, status |
| Browser Bot ↔ Teams | Chromium/WebRTC | Bidirectional | Meeting participation, captions, chat, audio |
4. Components
4.1 Browser Bot Service (this repository)
| File | Responsibility |
|---|---|
src/index.ts |
Entry point: bootstrap, shutdown handlers |
src/config.ts |
Environment config with defaults and timeouts |
src/sessionManager.ts |
Session lifecycle: create, end, play audio, shutdown |
src/server/httpServer.ts |
Express HTTP API (health, join, leave, status, auth tests) |
src/server/gatewayClient.ts |
Alternative WebSocket client for Gateway (legacy path) |
src/types/index.ts |
TypeScript interfaces for all message types |
src/utils/logger.ts |
Winston logger with session-scoped child loggers |
Bot Modules (src/bot/)
| Module | Class | Responsibility |
|---|---|---|
orchestrator.ts |
BotOrchestrator |
Main coordinator: browser launch, join flow, keepalive, greeting, Gateway WebSocket, state machine |
joinProcedure.ts |
JoinProcedure |
Anonymous join: launcher page, name entry, "Join now", lobby handling |
authProcedure.ts |
AuthProcedure |
Microsoft login: email → password → MFA check → "Stay signed in" |
captionsProcedure.ts |
CaptionsProcedure |
Enable live captions via "More" menu, MutationObserver on caption DOM |
chatProcedure.ts |
ChatProcedure |
Open chat panel, MutationObserver on [role="log"], send messages via CKEditor |
audioProcedure.ts |
AudioProcedure |
getUserMedia override, AudioContext, queue-based MP3/WAV/PCM playback into mic stream |
audioCaptureProcedure.ts |
AudioCaptureProcedure |
RTCPeerConnection wrapper, ScriptProcessor, PCM16 16kHz capture, 500ms polling |
backgroundProcedure.ts |
BackgroundProcedure |
Virtual background image upload (pre-join, currently unused) |
meetingUrlParser.ts |
(functions) | URL validation, classic vs short format, redirect resolution |
authTestProcedure.ts |
(functions) | Anti-detection test variants for debugging auth flow |
4.2 Gateway (external, Python/FastAPI)
| File | Responsibility |
|---|---|
routeFeatureTeamsbot.py |
REST routes, SSE stream, WebSocket endpoint |
service.py |
Business logic: transcript processing, AI triggers, TTS, meeting summary |
datamodelTeamsbot.py |
Pydantic models, enums |
interfaceFeatureTeamsbot.py |
Database interface (Cosmos DB) |
config.py |
Feature instance config load/save |
browserBotConnector.py |
HTTP client for Browser Bot API |
4.3 Frontend (external, React/TypeScript)
The frontend provides a session management UI with:
- Meeting link input and session start/stop controls
- Real-time transcript display (via SSE)
- Bot response log with reasoning, model, cost
- Configuration panel (bot name, response channel, AI prompt, etc.)
- System bot management (email, password)
5. Data Model
5.1 Enums
| Enum | Values | Description |
|---|---|---|
TeamsbotSessionStatus |
pending, joining, active, leaving, ended, error |
Session lifecycle state |
TeamsbotResponseType |
audio, chat, both |
How the bot responded |
TeamsbotResponseChannel |
voice, chat, both |
Configured response channel (user setting) |
TeamsbotResponseMode |
auto, manual, transcribeOnly |
Whether bot responds automatically |
TeamsbotDetectedIntent |
addressed, question, proactive, stop, none |
AI-detected intent |
TeamsbotJoinMode |
systemBot, anonymous, userAccount |
How the bot joins the meeting |
TeamsbotTransferMode |
caption, audio, auto |
How transcript data is captured |
5.2 Core Entities
TeamsbotSession
| Field | Type | Description |
|---|---|---|
id |
UUID | Session identifier |
instanceId |
string | Feature instance |
mandateId |
string | Tenant/mandate |
meetingLink |
string | Teams meeting URL |
botName |
string | Display name in meeting |
status |
TeamsbotSessionStatus | Current state |
startedAt / endedAt |
datetime | Timestamps |
startedByUserId |
string | Who started the session |
sessionContext |
string | Optional context for AI |
summary |
string | AI-generated meeting summary |
errorMessage |
string | Error details if failed |
transcriptSegmentCount |
int | Running count |
botResponseCount |
int | Running count |
TeamsbotTranscript
| Field | Type | Description |
|---|---|---|
id |
UUID | Segment identifier |
sessionId |
UUID | Parent session |
speaker |
string | Speaker name from captions |
text |
string | Transcript text |
timestamp |
datetime | When spoken |
confidence |
float (0–1) | Confidence score |
language |
string | Detected language |
isFinal |
bool | Finalized segment |
TeamsbotBotResponse
| Field | Type | Description |
|---|---|---|
id |
UUID | Response identifier |
sessionId |
UUID | Parent session |
responseText |
string | What the bot said |
responseType |
TeamsbotResponseType | Voice, chat, or both |
detectedIntent |
TeamsbotDetectedIntent | Why it responded |
reasoning |
string | AI reasoning chain |
modelName |
string | AI model used |
processingTime |
float | Seconds |
priceCHF |
float | Cost in CHF |
TeamsbotSystemBot
| Field | Type | Description |
|---|---|---|
id |
UUID | Bot account identifier |
mandateId |
string | Tenant scope |
name |
string | Display name |
email |
string | Microsoft account email |
encryptedPassword |
string | Fernet-encrypted password |
isActive |
bool | Whether this bot is the active one |
TeamsbotConfig (Feature Instance Level)
| Field | Type | Default | Description |
|---|---|---|---|
botName |
string | "PowerOn AI" |
Default bot name (overridden by system bot) |
aiSystemPrompt |
string | "" |
Custom AI instructions |
responseMode |
enum | auto |
auto / manual / transcribeOnly |
responseChannel |
enum | voice |
voice / chat / both |
transferMode |
enum | auto |
caption / audio / auto |
language |
string | "de-DE" |
Bot language |
voiceId |
string | null |
TTS voice identifier |
browserBotUrl |
string | null |
Browser Bot service URL |
triggerIntervalSeconds |
int | 10 |
Periodic AI trigger interval |
triggerCooldownSeconds |
int | 5 |
Min time between triggers |
contextWindowSegments |
int | 20 |
Transcript segments sent to AI |
TeamsbotUserSettings (Per-User Overrides)
Mirrors TeamsbotConfig fields (all optional). Merged over instance config with _getEffectiveConfig().
6. Call Flow
6.1 Session Start (Authenticated Join)
Frontend Gateway Browser Bot Teams Web
│ │ │ │
│ POST /sessions │ │ │
│ {meetingLink, botName} │ │ │
│──────────────────────────►│ │ │
│ │ Resolve system bot │ │
│ │ Decrypt password │ │
│ │ Derive bot name from email│ │
│ │ │ │
│ │ POST /api/bot │ │
│ │ {sessionId, meetingUrl, │ │
│ │ botAccountEmail, │ │
│ │ botAccountPassword} │ │
│ │───────────────────────────►│ │
│ │ │ │
│ ◄── session created ─────│ │ Launch Chromium │
│ │ │ (Xvfb, headful) │
│ GET /sessions/:id/stream │ │ │
│ (SSE) │ │ │
│──────────────────────────►│ │ │
│ │◄─── WebSocket connected ───│ │
│ │ │ │
│ │ │ Navigate to │
│ │ │ teams.microsoft.com │
│ │ │─────────────────────►│
│ │ │ │
│ │ │ MS Login: │
│ │ │ email → password │
│ │ │ → "Stay signed in" │
│ │ │─────────────────────►│
│ │ │ │
│ │ │ Teams loads │
│ │ │ Click "Join" header │
│ │ │◄─────────────────────│
│ │ │ │
│ │ │ Pre-join screen: │
│ │ │ Ensure mic ON │
│ │ │ Camera stays OFF │
│ │ │ Click "Join now" │
│ │ │─────────────────────►│
│ │ │ │
│ ◄── SSE: statusChange ───│◄─── status: "joined" ─────│ In meeting! │
│ {status: "active"} │ │ │
│ │ │ Start keepalive │
│ │ │ Init AudioContext │
│ │ │ Enable captions │
│ │ │ Enable chat │
│ │ │ Send greeting │
│ │ │ (chat + voice TTS) │
6.2 Session End
Frontend Gateway Browser Bot Teams
│ │ │ │
│ POST /sessions/:id/stop │ │ │
│──────────────────────────►│ │ │
│ │ POST /api/bot/:id/leave │ │
│ │───────────────────────────►│ │
│ │ │ Stop keepalive │
│ │ │ Stop audio capture │
│ │ │ Unsubscribe captions │
│ │ │ Unsubscribe chat │
│ │ │ Click hangup button │
│ │ │─────────────────────►│
│ │ │ Close browser │
│ │ │ Close WS │
│ │◄─── status: "left" ────────│ │
│ │ │ │
│ │ Generate meeting summary │ │
│ │ (AI on full transcript) │ │
│ │ Update session → "ended" │ │
│ ◄── SSE: statusChange ───│ │ │
│ {status: "ended"} │ │ │
7. Voice Flow (TTS Playback)
How Audio Reaches Meeting Participants
Gateway Browser Bot Chromium / Teams
│ │ │
│ AI generates response │ │
│ TTS (Google Cloud) → │ │
│ base64 MP3 │ │
│ │ │
│ WS: playAudio │ │
│ {audio: {data, format}} │ │
│────────────────────────────►│ │
│ │ Queue audio │
│ │ Decode base64 → ArrayBuffer │
│ │ decodeAudioData(buffer) │
│ │ │
│ │ Create AudioBufferSource │
│ │ Connect to │
│ │ MediaStreamDestination │
│ │ (overridden getUserMedia) │
│ │─────────────────────────────►│
│ │ │ WebRTC sends audio
│ │ │ to all participants
│ │ │ via microphone channel
Audio Override Mechanism
The AudioProcedure injects a script before page load (page.addInitScript) that:
- Overrides
navigator.mediaDevices.getUserMediato return a customMediaStream - Creates a shared
AudioContextwith aMediaStreamDestination - When Teams calls
getUserMedia({ audio: true }), it receives the destination's stream - TTS audio is decoded and played through an
AudioBufferSourceNodeconnected to this destination - The result: TTS audio flows through the "microphone" channel into the meeting
Voice Greeting Flow
When the bot joins a meeting:
- Bot sends
voiceGreetingmessage to Gateway with greeting text and language - Gateway calls TTS (Google Cloud) with the configured voice
- Gateway sends
playAudioback to bot via WebSocket - Bot plays the audio through the mic stream
- Meeting participants hear the bot speaking
8. Data Flow (Transcript Pipeline)
Caption Mode (Primary)
Teams Web UI Browser Bot Gateway Frontend
│ │ │ │
│ Live captions appear │ │ │
│ in overlay div │ │ │
│ [data-tid="closed- │ │ │
│ caption-renderer- │ │ │
│ wrapper"] │ │ │
│ │ │ │
│ MutationObserver fires │ │ │
│───────────────────────────►│ │ │
│ │ Extract speaker + text │ │
│ │ Dedup, noise filter │ │
│ │ │ │
│ │ WS: transcript │ │
│ │ {speaker, text, isFinal} │ │
│ │────────────────────────────►│ │
│ │ │ Store in DB │
│ │ │ Add to context buffer │
│ │ │ Emit SSE: transcript │
│ │ │───────────────────────►│
│ │ │ │ Display
│ │ │ │
│ │ │ _shouldTriggerAnalysis│
│ │ │ → if yes: │
│ │ │ SPEECH_TEAMS AI call │
│ │ │ │
│ │ │ Emit SSE: analysis │
│ │ │───────────────────────►│
│ │ │ │
│ │ │ If shouldRespond: │
│ │ │ TTS → playAudio │
│ │ ◄── WS: playAudio ────────│ and/or sendChatMessage│
│ │ │───────────────────────►│
│ │ │ Emit SSE: botResponse │
Audio Mode (Alternative)
When transferMode is audio:
AudioCaptureProcedurewrapsRTCPeerConnectionto intercept incoming audio tracks- Audio is downsampled to PCM16 mono 16kHz via
ScriptProcessorNode - 500ms chunks are base64-encoded and sent as
audioChunkmessages - Gateway runs STT (Google Cloud Speech) on the chunks
- STT results enter the same transcript pipeline
Chat Messages
Teams Chat Panel Browser Bot Gateway Frontend
│ │ │ │
│ New message in │ │ │
│ [role="log"] container │ │ │
│ │ │ │
│ MutationObserver fires │ │ │
│───────────────────────────►│ │ │
│ │ Extract sender + text │ │
│ │ Dedup, noise filter │ │
│ │ │ │
│ │ WS: chatMessage │ │
│ │ {chat: {speaker, text}} │ │
│ │────────────────────────────►│ │
│ │ │ Process as transcript │
│ │ │ (source: "chat") │
│ │ │ Same AI pipeline │
9. WebSocket Protocol
Connection
The Browser Bot connects to the Gateway at:
wss://{gatewayHost}/api/teamsbot/{instanceId}/bot/ws/{sessionId}
Messages: Bot → Gateway
| Type | Fields | When |
|---|---|---|
transcript |
sessionId, transcript: {speaker, text, timestamp, isFinal} |
Caption captured |
chatMessage |
sessionId, chat: {speaker, text, timestamp} |
Meeting chat message received |
status |
sessionId, status, message? |
Bot state changes (connecting, in_lobby, joined, left, error) |
audioChunk |
sessionId, audio: {data, sampleRate, format, timestamp} |
PCM16 audio captured (audio mode) |
voiceGreeting |
sessionId, text, language |
Request TTS for join greeting |
ping |
— | Keepalive (every 30s) |
Messages: Gateway → Bot
| Type | Fields | When |
|---|---|---|
playAudio |
sessionId, audio: {data, format} |
TTS response or greeting to play |
sendChatMessage |
sessionId, text |
Chat response to send |
stopAudio |
sessionId |
AI detected "stop" intent |
pong |
— | Reply to ping |
10. AI Analysis Pipeline
Trigger Logic (_shouldTriggerAnalysis)
The Gateway decides when to call the AI model. Three trigger paths:
-
Name Trigger (highest priority, overrides cooldown): If the bot's name (or first name, or a phonetically similar word) appears in the latest transcript segment → immediate trigger. Phonetic matching uses: same first letter, length difference ≤ 2, character overlap ≥ 60%.
-
Cooldown Gate: If time since last AI call <
triggerCooldownSeconds(default 5s) → no trigger. -
Periodic Trigger: If time since last AI call ≥
triggerIntervalSeconds(default 10s) → trigger.
AI Call (_handleSpeechTeams)
Model selection priority: gpt-4o-mini → claude-3-5-haiku → gpt-4o → claude-sonnet-4-5 → fastest available DATA_ANALYSE model.
System prompt (built dynamically with bot name):
- Role: "You are '{botName}', an AI participant in a Teams meeting"
- Respond ONLY when directly addressed by name (including phonetic variants)
- Match the language of the speaker who addressed you
- 1–2 sentence responses max
- Detect "stop" commands in any language
- Output strict JSON:
{shouldRespond, responseText, reasoning, detectedIntent}
Context window: Up to contextWindowSegments (default 20) recent transcript lines, prefixed with BOT_NAME: and optional SESSION_CONTEXT:.
Response Handling
| Intent | Action |
|---|---|
stop |
Send stopAudio to bot, no response |
addressed / question / proactive |
Auto mode: TTS + chat (per config). Manual mode: SSE suggestedResponse only |
none |
No action |
11. Authentication & Credentials
Credential Storage
System bot credentials are stored per mandate:
- Email: Stored in plaintext in
TeamsbotSystemBot.email - Password: Encrypted with Fernet (AES-128-CBC), key derived via PBKDF2 from a master key
- Decryption: Gateway decrypts at session start and passes credentials to Browser Bot via HTTP POST body
Microsoft Login Flow
1. Navigate to teams.microsoft.com
2. Redirect to login.microsoftonline.com
3. Enter email in #i0116 input → Click "Next" (input[type="submit"])
4. Wait for password page (may redirect to org-specific login)
5. Enter password → Click "Sign in" (#idSIButton9)
6. Handle "Stay signed in?" → Click "Yes"
7. Teams loads with authenticated session
Anti-Detection
The bot uses several measures to appear as a regular browser:
rebrowser-playwrightwithpuppeteer-extra-plugin-stealth- Headful mode via Xvfb (Teams blocks headless Chromium)
- Standard Chrome launch arguments (disable automation, sandbox flags)
- Real viewport (1280x720), locale/timezone matching
12. Teams DOM Interaction
Live Captions
Enable flow: "More" button (#callingButtons-showMoreBtn) → "Language and speech" (#LanguageSpeechMenuControl-id) → "Show live captions" (button with aria-checked)
Scraping target: div[data-tid="closed-caption-renderer-wrapper"]
Extraction strategies:
- Strategy A:
[data-tid]containers with speaker in<span>title/text, content in adjacent spans - Strategy B: Structural fallback scanning
<div>trees for speaker + text patterns
Noise filter: Ignores entries matching known non-transcript patterns (buttons, timestamps without content, single-word UI elements).
Chat Panel
Open: Click button[id="chat-button"] — but ONLY if aria-pressed !== "true" (prevents toggle-off)
Scraping target: Container with [role="log"]
Send messages: Find CKEditor input ([data-tid="ckeditor-replyConversation"] or div[role="textbox"]), type text, press Enter.
Meeting Controls
| Control | Selector | Notes |
|---|---|---|
| Mic toggle | input[data-tid="toggle-audio"] or input[role="switch"][title*="mic" i] |
Check checked state before toggling |
| Hangup | button[id="hangup-button"] or #hangup-button |
|
| More menu | button[id="callingButtons-showMoreBtn"] |
|
| Join now | button[data-tid="prejoin-join-button"] |
13. Deployment
Infrastructure
| Component | Platform | Details |
|---|---|---|
| Browser Bot | Azure Container Apps | 2 CPU, 4GB RAM, Xvfb for headful mode |
| Gateway | Azure Container Apps | Shared instance (cae-poweron-shared) |
| Database | Azure Cosmos DB | Sessions, transcripts, responses, system bots |
| Container Registry | Azure Container Registry | Images tagged by Git SHA |
CI/CD Pipeline
Trigger: Push to main branch
Steps:
- Checkout code
- Docker login to ACR
- Build image with Playwright base + Xvfb
- Push with tags
latestand{git-sha} - Azure login
az containerapp updatewith--revision-suffix deploy-{sha-prefix}(forces new revision)az containerapp revision restarton latest revision (ensures container starts)
Environment Variables (Production)
| Variable | Value |
|---|---|
PORT |
4100 |
NODE_ENV |
production |
BOT_HEADLESS |
false (headful via Xvfb) |
GATEWAY_WS_URL |
wss://gateway-int.poweron-center.net/api/teamsbot/ws |
DISPLAY |
:99 (Xvfb) |
14. Configuration Reference
Browser Bot Environment
| Variable | Description | Default |
|---|---|---|
PORT |
HTTP server port | 4100 |
GATEWAY_WS_URL |
Gateway WebSocket URL | wss://gateway-int.poweron-center.net/api/teamsbot/ws |
BOT_NAME |
Default display name | PowerOn AI |
BOT_HEADLESS |
Run headless | true (false in Docker) |
LOG_LEVEL |
Winston log level | info |
SCREENSHOT_ON_ERROR |
Screenshots on errors | true |
Gateway Feature Config
| Field | Default | Range | Description |
|---|---|---|---|
responseMode |
auto |
— | auto, manual, transcribeOnly |
responseChannel |
voice |
— | voice, chat, both |
transferMode |
auto |
— | caption, audio, auto |
language |
de-DE |
— | BCP-47 language tag |
triggerIntervalSeconds |
10 |
3–60 | Periodic AI trigger |
triggerCooldownSeconds |
5 |
1–30 | Min gap between triggers |
contextWindowSegments |
20 |
5–100 | Transcript lines for AI context |
Timeouts (Hardcoded in config.ts)
| Timeout | Value | Purpose |
|---|---|---|
lobbyWait |
120s | Max time waiting in lobby |
joinTimeout |
30s | Max time for join flow |
captionsEnable |
10s | Max time to enable captions |
pageLoad |
30s | General page load timeout |
15. Known Constraints & Lessons Learned
Things That DO NOT Work
| Approach | Why It Fails | Alternative |
|---|---|---|
Fake video injection (--use-file-for-fake-video-capture) |
Crashes Chromium renderer when WebRTC audio starts | Keep camera OFF |
| Headless Chromium | Teams detects and blocks headless browsers | Use headful + Xvfb |
_setSpokenLanguage() for live captions |
Language dropdown doesn't exist; captions use organizer's language setting | Skip entirely |
str(PythonEnum) for comparison |
Returns "ClassName.value", not "value" |
Use enum.value |
| Blindly clicking chat button | Toggles panel off if already open | Check aria-pressed first |
config.botAccountEmail |
TeamsbotConfig doesn't have this field |
Use getattr(self, '_botAccountEmail', None) |
model_copy(update={...}) for enum fields |
Pydantic v2 doesn't coerce strings to enums on copy | Normalize after merge |
Speech Recognition Artifacts
The bot's name gets mangled by Teams live captions speech recognition. Known variants for "Nyla": Naila, Maila, Neela, Leila, Nila. The system handles this via:
- AI prompt: Explicit warning about phonetic distortion
- Trigger logic: Phonetic similarity check (first letter match, length ±2, character overlap ≥ 60%)
SSE Event Queue: Create On-Demand
_emitSessionEvent() in the Gateway must create the session's event queue (_sessionEvents[sessionId] = asyncio.Queue()) if it doesn't already exist. If an event is emitted before the frontend SSE consumer connects, the queue won't exist and events are silently dropped. Always guard with on-demand creation.
Performance Characteristics
| Operation | Typical Duration |
|---|---|
| Microsoft login | ~14s |
| Teams page load after auth | ~10s |
| Pre-join to in-meeting | ~5s |
| Caption enable flow | ~8s |
| AI analysis (GPT-4o-mini) | ~2.5s |
| TTS generation | ~1s |
| Full join to first greeting | ~40s |