From 6b4172c46a56285f32d064975cf2ebece8596edf Mon Sep 17 00:00:00 2001 From: ValueOn AG Date: Wed, 18 Feb 2026 17:51:28 +0100 Subject: [PATCH] docs: add documentation, update README, add marketing page Co-authored-by: Cursor --- DOCUMENTATION.md | 704 ++++++++++++++++++++++++++++++++++++++++++++ README.md | 394 ++++++++++++++++--------- nyla-marketing.html | 321 ++++++++++++++++++++ 3 files changed, 1273 insertions(+), 146 deletions(-) create mode 100644 DOCUMENTATION.md create mode 100644 nyla-marketing.html diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md new file mode 100644 index 0000000..254c87b --- /dev/null +++ b/DOCUMENTATION.md @@ -0,0 +1,704 @@ +# Teams Browser Bot — Technical Documentation + +Last updated: 2026-02-18 + +## Table of Contents + +1. [Business Story & Vision](#1-business-story--vision) +2. [Use Cases](#2-use-cases) +3. [System Architecture](#3-system-architecture) +4. [Components](#4-components) +5. [Data Model](#5-data-model) +6. [Call Flow](#6-call-flow) +7. [Voice Flow (TTS Playback)](#7-voice-flow-tts-playback) +8. [Data Flow (Transcript Pipeline)](#8-data-flow-transcript-pipeline) +9. [WebSocket Protocol](#9-websocket-protocol) +10. [AI Analysis Pipeline](#10-ai-analysis-pipeline) +11. [Authentication & Credentials](#11-authentication--credentials) +12. [Teams DOM Interaction](#12-teams-dom-interaction) +13. [Deployment](#13-deployment) +14. [Configuration Reference](#14-configuration-reference) +15. [Known Constraints & Lessons Learned](#15-known-constraints--lessons-learned) + +--- + +## 1. Business Story & Vision + +### Problem + +Organizations use Microsoft Teams for meetings where important decisions, discussions, and action items occur. Without an automated assistant, teams rely on manual note-taking, miss context from earlier discussions, and lose the ability to query meeting content in real time. + +### Solution + +The Teams Browser Bot is an AI-powered meeting participant that: + +- **Joins** any Teams meeting as an authenticated user (or anonymous guest) +- **Listens** by capturing live captions from the Teams web interface +- **Understands** by analyzing transcript segments through an AI model (GPT-4o-mini / Claude) +- **Responds** via voice (TTS played through the microphone channel) and/or chat messages +- **Documents** by persisting full transcripts and generating meeting summaries + +The bot operates as a real participant — it appears in the meeting roster, can speak, and can write in the meeting chat. + +### Key Differentiator + +Unlike Microsoft Graph Communications SDK bots (which require tenant admin registration and complex media handling), this bot uses **browser automation** (Playwright + Chromium) to join meetings as a regular web user. This enables: + +- **Multi-tenant support**: Join any meeting from any organization +- **No tenant admin approval** required +- **Standard web technologies**: DOM scraping, getUserMedia, WebRTC +- **Full meeting interaction**: chat, captions, audio playback + +--- + +## 2. Use Cases + +### UC-1: AI Meeting Assistant +A user starts a Teams meeting and invites the bot. The bot joins, listens to the conversation, and responds when addressed by name ("Hey Nyla, what do you think about...?"). Responses are delivered via voice and/or chat based on configuration. + +### UC-2: Live Transcription +The bot captures all live captions with speaker attribution and streams them to the frontend UI in real time via SSE. Users not in the meeting can follow along. + +### UC-3: Meeting Summary +When the session ends, the bot generates an AI-powered summary of the entire meeting, stored on the session record. + +### UC-4: Voice Test +Before joining a real meeting, integrators can test the TTS pipeline via a dedicated endpoint that generates and returns an audio sample. + +### UC-5: Multi-Bot Operations +Multiple bot sessions can run concurrently — each in its own browser instance, each connected to a different meeting with separate WebSocket channels. + +--- + +## 3. System Architecture + +``` +┌──────────────────────────────────────────────────────────────────────────────────┐ +│ System Overview │ +│ │ +│ ┌──────────┐ SSE ┌────────────────┐ WebSocket │ +│ │ Frontend │◄───────────────────────│ Gateway │◄──────────────────────┐ │ +│ │ (React) │ transcripts, │ (Python / │ transcripts, │ │ +│ │ │ botResponses, │ FastAPI) │ chatMessages, │ │ +│ │ │ analysis, │ │ status, │ │ +│ │ │ status │ - Session Mgmt │ audioChunks, │ │ +│ │ │────────────────────────► - AI Analysis │ voiceGreeting │ │ +│ │ │ REST (start/stop/ │ - TTS (Google) │ │ │ +│ │ │ config) │ - Billing │ playAudio, │ │ +│ └──────────┘ │ - DB (Cosmos) │ sendChatMessage, │ │ +│ │ │ stopAudio │ │ +│ └────────┬───────┘──────────────────────►│ │ +│ │ HTTP │ │ +│ │ (join/leave/status) │ │ +│ ▼ │ │ +│ ┌────────────────┐ │ │ +│ │ Browser Bot │◄──────────────────────┘ │ +│ │ (Node.js + │ │ +│ │ Playwright) │ │ +│ │ │ │ +│ │ ┌──────────┐ │ │ +│ │ │ Chromium │ │ │ +│ │ │ (Teams │ │ │ +│ │ │ Web App) │ │ │ +│ │ └──────────┘ │ │ +│ └────────────────┘ │ +└──────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Communication Paths + +| Path | Protocol | Direction | Purpose | +|------|----------|-----------|---------| +| Frontend ↔ Gateway | REST (HTTPS) | Bidirectional | Session management, config, system bots | +| Frontend ← Gateway | SSE | Gateway → Frontend | Real-time transcript & response stream | +| Gateway ↔ Browser Bot | WebSocket | Bidirectional | Transcripts, audio, status, chat, commands | +| Gateway → Browser Bot | HTTP POST | Gateway → Bot | Session creation (`/api/bot`), leave, status | +| Browser Bot ↔ Teams | Chromium/WebRTC | Bidirectional | Meeting participation, captions, chat, audio | + +--- + +## 4. Components + +### 4.1 Browser Bot Service (this repository) + +| File | Responsibility | +|------|---------------| +| `src/index.ts` | Entry point: bootstrap, shutdown handlers | +| `src/config.ts` | Environment config with defaults and timeouts | +| `src/sessionManager.ts` | Session lifecycle: create, end, play audio, shutdown | +| `src/server/httpServer.ts` | Express HTTP API (health, join, leave, status, auth tests) | +| `src/server/gatewayClient.ts` | Alternative WebSocket client for Gateway (legacy path) | +| `src/types/index.ts` | TypeScript interfaces for all message types | +| `src/utils/logger.ts` | Winston logger with session-scoped child loggers | + +#### Bot Modules (`src/bot/`) + +| Module | Class | Responsibility | +|--------|-------|---------------| +| `orchestrator.ts` | `BotOrchestrator` | Main coordinator: browser launch, join flow, keepalive, greeting, Gateway WebSocket, state machine | +| `joinProcedure.ts` | `JoinProcedure` | Anonymous join: launcher page, name entry, "Join now", lobby handling | +| `authProcedure.ts` | `AuthProcedure` | Microsoft login: email → password → MFA check → "Stay signed in" | +| `captionsProcedure.ts` | `CaptionsProcedure` | Enable live captions via "More" menu, MutationObserver on caption DOM | +| `chatProcedure.ts` | `ChatProcedure` | Open chat panel, MutationObserver on `[role="log"]`, send messages via CKEditor | +| `audioProcedure.ts` | `AudioProcedure` | getUserMedia override, AudioContext, queue-based MP3/WAV/PCM playback into mic stream | +| `audioCaptureProcedure.ts` | `AudioCaptureProcedure` | RTCPeerConnection wrapper, ScriptProcessor, PCM16 16kHz capture, 500ms polling | +| `backgroundProcedure.ts` | `BackgroundProcedure` | Virtual background image upload (pre-join, currently unused) | +| `meetingUrlParser.ts` | (functions) | URL validation, classic vs short format, redirect resolution | +| `authTestProcedure.ts` | (functions) | Anti-detection test variants for debugging auth flow | + +### 4.2 Gateway (external, Python/FastAPI) + +| File | Responsibility | +|------|---------------| +| `routeFeatureTeamsbot.py` | REST routes, SSE stream, WebSocket endpoint | +| `service.py` | Business logic: transcript processing, AI triggers, TTS, meeting summary | +| `datamodelTeamsbot.py` | Pydantic models, enums | +| `interfaceFeatureTeamsbot.py` | Database interface (Cosmos DB) | +| `config.py` | Feature instance config load/save | +| `browserBotConnector.py` | HTTP client for Browser Bot API | + +### 4.3 Frontend (external, React/TypeScript) + +The frontend provides a session management UI with: +- Meeting link input and session start/stop controls +- Real-time transcript display (via SSE) +- Bot response log with reasoning, model, cost +- Configuration panel (bot name, response channel, AI prompt, etc.) +- System bot management (email, password) + +--- + +## 5. Data Model + +### 5.1 Enums + +| Enum | Values | Description | +|------|--------|-------------| +| `TeamsbotSessionStatus` | `pending`, `joining`, `active`, `leaving`, `ended`, `error` | Session lifecycle state | +| `TeamsbotResponseType` | `audio`, `chat`, `both` | How the bot responded | +| `TeamsbotResponseChannel` | `voice`, `chat`, `both` | Configured response channel (user setting) | +| `TeamsbotResponseMode` | `auto`, `manual`, `transcribeOnly` | Whether bot responds automatically | +| `TeamsbotDetectedIntent` | `addressed`, `question`, `proactive`, `stop`, `none` | AI-detected intent | +| `TeamsbotJoinMode` | `systemBot`, `anonymous`, `userAccount` | How the bot joins the meeting | +| `TeamsbotTransferMode` | `caption`, `audio`, `auto` | How transcript data is captured | + +### 5.2 Core Entities + +#### TeamsbotSession +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Session identifier | +| `instanceId` | string | Feature instance | +| `mandateId` | string | Tenant/mandate | +| `meetingLink` | string | Teams meeting URL | +| `botName` | string | Display name in meeting | +| `status` | TeamsbotSessionStatus | Current state | +| `startedAt` / `endedAt` | datetime | Timestamps | +| `startedByUserId` | string | Who started the session | +| `sessionContext` | string | Optional context for AI | +| `summary` | string | AI-generated meeting summary | +| `errorMessage` | string | Error details if failed | +| `transcriptSegmentCount` | int | Running count | +| `botResponseCount` | int | Running count | + +#### TeamsbotTranscript +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Segment identifier | +| `sessionId` | UUID | Parent session | +| `speaker` | string | Speaker name from captions | +| `text` | string | Transcript text | +| `timestamp` | datetime | When spoken | +| `confidence` | float (0–1) | Confidence score | +| `language` | string | Detected language | +| `isFinal` | bool | Finalized segment | + +#### TeamsbotBotResponse +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Response identifier | +| `sessionId` | UUID | Parent session | +| `responseText` | string | What the bot said | +| `responseType` | TeamsbotResponseType | Voice, chat, or both | +| `detectedIntent` | TeamsbotDetectedIntent | Why it responded | +| `reasoning` | string | AI reasoning chain | +| `modelName` | string | AI model used | +| `processingTime` | float | Seconds | +| `priceCHF` | float | Cost in CHF | + +#### TeamsbotSystemBot +| Field | Type | Description | +|-------|------|-------------| +| `id` | UUID | Bot account identifier | +| `mandateId` | string | Tenant scope | +| `name` | string | Display name | +| `email` | string | Microsoft account email | +| `encryptedPassword` | string | Fernet-encrypted password | +| `isActive` | bool | Whether this bot is the active one | + +#### TeamsbotConfig (Feature Instance Level) +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `botName` | string | `"PowerOn AI"` | Default bot name (overridden by system bot) | +| `aiSystemPrompt` | string | `""` | Custom AI instructions | +| `responseMode` | enum | `auto` | auto / manual / transcribeOnly | +| `responseChannel` | enum | `voice` | voice / chat / both | +| `transferMode` | enum | `auto` | caption / audio / auto | +| `language` | string | `"de-DE"` | Bot language | +| `voiceId` | string | `null` | TTS voice identifier | +| `browserBotUrl` | string | `null` | Browser Bot service URL | +| `triggerIntervalSeconds` | int | `10` | Periodic AI trigger interval | +| `triggerCooldownSeconds` | int | `5` | Min time between triggers | +| `contextWindowSegments` | int | `20` | Transcript segments sent to AI | + +#### TeamsbotUserSettings (Per-User Overrides) +Mirrors `TeamsbotConfig` fields (all optional). Merged over instance config with `_getEffectiveConfig()`. + +--- + +## 6. Call Flow + +### 6.1 Session Start (Authenticated Join) + +``` +Frontend Gateway Browser Bot Teams Web + │ │ │ │ + │ POST /sessions │ │ │ + │ {meetingLink, botName} │ │ │ + │──────────────────────────►│ │ │ + │ │ Resolve system bot │ │ + │ │ Decrypt password │ │ + │ │ Derive bot name from email│ │ + │ │ │ │ + │ │ POST /api/bot │ │ + │ │ {sessionId, meetingUrl, │ │ + │ │ botAccountEmail, │ │ + │ │ botAccountPassword} │ │ + │ │───────────────────────────►│ │ + │ │ │ │ + │ ◄── session created ─────│ │ Launch Chromium │ + │ │ │ (Xvfb, headful) │ + │ GET /sessions/:id/stream │ │ │ + │ (SSE) │ │ │ + │──────────────────────────►│ │ │ + │ │◄─── WebSocket connected ───│ │ + │ │ │ │ + │ │ │ Navigate to │ + │ │ │ teams.microsoft.com │ + │ │ │─────────────────────►│ + │ │ │ │ + │ │ │ MS Login: │ + │ │ │ email → password │ + │ │ │ → "Stay signed in" │ + │ │ │─────────────────────►│ + │ │ │ │ + │ │ │ Teams loads │ + │ │ │ Click "Join" header │ + │ │ │◄─────────────────────│ + │ │ │ │ + │ │ │ Pre-join screen: │ + │ │ │ Ensure mic ON │ + │ │ │ Camera stays OFF │ + │ │ │ Click "Join now" │ + │ │ │─────────────────────►│ + │ │ │ │ + │ ◄── SSE: statusChange ───│◄─── status: "joined" ─────│ In meeting! │ + │ {status: "active"} │ │ │ + │ │ │ Start keepalive │ + │ │ │ Init AudioContext │ + │ │ │ Enable captions │ + │ │ │ Enable chat │ + │ │ │ Send greeting │ + │ │ │ (chat + voice TTS) │ +``` + +### 6.2 Session End + +``` +Frontend Gateway Browser Bot Teams + │ │ │ │ + │ POST /sessions/:id/stop │ │ │ + │──────────────────────────►│ │ │ + │ │ POST /api/bot/:id/leave │ │ + │ │───────────────────────────►│ │ + │ │ │ Stop keepalive │ + │ │ │ Stop audio capture │ + │ │ │ Unsubscribe captions │ + │ │ │ Unsubscribe chat │ + │ │ │ Click hangup button │ + │ │ │─────────────────────►│ + │ │ │ Close browser │ + │ │ │ Close WS │ + │ │◄─── status: "left" ────────│ │ + │ │ │ │ + │ │ Generate meeting summary │ │ + │ │ (AI on full transcript) │ │ + │ │ Update session → "ended" │ │ + │ ◄── SSE: statusChange ───│ │ │ + │ {status: "ended"} │ │ │ +``` + +--- + +## 7. Voice Flow (TTS Playback) + +### How Audio Reaches Meeting Participants + +``` +Gateway Browser Bot Chromium / Teams + │ │ │ + │ AI generates response │ │ + │ TTS (Google Cloud) → │ │ + │ base64 MP3 │ │ + │ │ │ + │ WS: playAudio │ │ + │ {audio: {data, format}} │ │ + │────────────────────────────►│ │ + │ │ Queue audio │ + │ │ Decode base64 → ArrayBuffer │ + │ │ decodeAudioData(buffer) │ + │ │ │ + │ │ Create AudioBufferSource │ + │ │ Connect to │ + │ │ MediaStreamDestination │ + │ │ (overridden getUserMedia) │ + │ │─────────────────────────────►│ + │ │ │ WebRTC sends audio + │ │ │ to all participants + │ │ │ via microphone channel +``` + +### Audio Override Mechanism + +The `AudioProcedure` injects a script before page load (`page.addInitScript`) that: + +1. Overrides `navigator.mediaDevices.getUserMedia` to return a custom `MediaStream` +2. Creates a shared `AudioContext` with a `MediaStreamDestination` +3. When Teams calls `getUserMedia({ audio: true })`, it receives the destination's stream +4. TTS audio is decoded and played through an `AudioBufferSourceNode` connected to this destination +5. The result: TTS audio flows through the "microphone" channel into the meeting + +### Voice Greeting Flow + +When the bot joins a meeting: + +1. Bot sends `voiceGreeting` message to Gateway with greeting text and language +2. Gateway calls TTS (Google Cloud) with the configured voice +3. Gateway sends `playAudio` back to bot via WebSocket +4. Bot plays the audio through the mic stream +5. Meeting participants hear the bot speaking + +--- + +## 8. Data Flow (Transcript Pipeline) + +### Caption Mode (Primary) + +``` +Teams Web UI Browser Bot Gateway Frontend + │ │ │ │ + │ Live captions appear │ │ │ + │ in overlay div │ │ │ + │ [data-tid="closed- │ │ │ + │ caption-renderer- │ │ │ + │ wrapper"] │ │ │ + │ │ │ │ + │ MutationObserver fires │ │ │ + │───────────────────────────►│ │ │ + │ │ Extract speaker + text │ │ + │ │ Dedup, noise filter │ │ + │ │ │ │ + │ │ WS: transcript │ │ + │ │ {speaker, text, isFinal} │ │ + │ │────────────────────────────►│ │ + │ │ │ Store in DB │ + │ │ │ Add to context buffer │ + │ │ │ Emit SSE: transcript │ + │ │ │───────────────────────►│ + │ │ │ │ Display + │ │ │ │ + │ │ │ _shouldTriggerAnalysis│ + │ │ │ → if yes: │ + │ │ │ SPEECH_TEAMS AI call │ + │ │ │ │ + │ │ │ Emit SSE: analysis │ + │ │ │───────────────────────►│ + │ │ │ │ + │ │ │ If shouldRespond: │ + │ │ │ TTS → playAudio │ + │ │ ◄── WS: playAudio ────────│ and/or sendChatMessage│ + │ │ │───────────────────────►│ + │ │ │ Emit SSE: botResponse │ +``` + +### Audio Mode (Alternative) + +When `transferMode` is `audio`: + +1. `AudioCaptureProcedure` wraps `RTCPeerConnection` to intercept incoming audio tracks +2. Audio is downsampled to PCM16 mono 16kHz via `ScriptProcessorNode` +3. 500ms chunks are base64-encoded and sent as `audioChunk` messages +4. Gateway runs STT (Google Cloud Speech) on the chunks +5. STT results enter the same transcript pipeline + +### Chat Messages + +``` +Teams Chat Panel Browser Bot Gateway Frontend + │ │ │ │ + │ New message in │ │ │ + │ [role="log"] container │ │ │ + │ │ │ │ + │ MutationObserver fires │ │ │ + │───────────────────────────►│ │ │ + │ │ Extract sender + text │ │ + │ │ Dedup, noise filter │ │ + │ │ │ │ + │ │ WS: chatMessage │ │ + │ │ {chat: {speaker, text}} │ │ + │ │────────────────────────────►│ │ + │ │ │ Process as transcript │ + │ │ │ (source: "chat") │ + │ │ │ Same AI pipeline │ +``` + +--- + +## 9. WebSocket Protocol + +### Connection + +The Browser Bot connects to the Gateway at: +``` +wss://{gatewayHost}/api/teamsbot/{instanceId}/bot/ws/{sessionId} +``` + +### Messages: Bot → Gateway + +| Type | Fields | When | +|------|--------|------| +| `transcript` | `sessionId`, `transcript: {speaker, text, timestamp, isFinal}` | Caption captured | +| `chatMessage` | `sessionId`, `chat: {speaker, text, timestamp}` | Meeting chat message received | +| `status` | `sessionId`, `status`, `message?` | Bot state changes (connecting, in_lobby, joined, left, error) | +| `audioChunk` | `sessionId`, `audio: {data, sampleRate, format, timestamp}` | PCM16 audio captured (audio mode) | +| `voiceGreeting` | `sessionId`, `text`, `language` | Request TTS for join greeting | +| `ping` | — | Keepalive (every 30s) | + +### Messages: Gateway → Bot + +| Type | Fields | When | +|------|--------|------| +| `playAudio` | `sessionId`, `audio: {data, format}` | TTS response or greeting to play | +| `sendChatMessage` | `sessionId`, `text` | Chat response to send | +| `stopAudio` | `sessionId` | AI detected "stop" intent | +| `pong` | — | Reply to ping | + +--- + +## 10. AI Analysis Pipeline + +### Trigger Logic (`_shouldTriggerAnalysis`) + +The Gateway decides when to call the AI model. Three trigger paths: + +1. **Name Trigger (highest priority, overrides cooldown)**: If the bot's name (or first name, or a phonetically similar word) appears in the latest transcript segment → immediate trigger. Phonetic matching uses: same first letter, length difference ≤ 2, character overlap ≥ 60%. + +2. **Cooldown Gate**: If time since last AI call < `triggerCooldownSeconds` (default 5s) → no trigger. + +3. **Periodic Trigger**: If time since last AI call ≥ `triggerIntervalSeconds` (default 10s) → trigger. + +### AI Call (`_handleSpeechTeams`) + +**Model selection priority**: gpt-4o-mini → claude-3-5-haiku → gpt-4o → claude-sonnet-4-5 → fastest available DATA_ANALYSE model. + +**System prompt** (built dynamically with bot name): +- Role: "You are '{botName}', an AI participant in a Teams meeting" +- Respond ONLY when directly addressed by name (including phonetic variants) +- Match the language of the speaker who addressed you +- 1–2 sentence responses max +- Detect "stop" commands in any language +- Output strict JSON: `{shouldRespond, responseText, reasoning, detectedIntent}` + +**Context window**: Up to `contextWindowSegments` (default 20) recent transcript lines, prefixed with `BOT_NAME:` and optional `SESSION_CONTEXT:`. + +### Response Handling + +| Intent | Action | +|--------|--------| +| `stop` | Send `stopAudio` to bot, no response | +| `addressed` / `question` / `proactive` | Auto mode: TTS + chat (per config). Manual mode: SSE `suggestedResponse` only | +| `none` | No action | + +--- + +## 11. Authentication & Credentials + +### Credential Storage + +System bot credentials are stored per mandate: +- **Email**: Stored in plaintext in `TeamsbotSystemBot.email` +- **Password**: Encrypted with Fernet (AES-128-CBC), key derived via PBKDF2 from a master key +- **Decryption**: Gateway decrypts at session start and passes credentials to Browser Bot via HTTP POST body + +### Microsoft Login Flow + +``` +1. Navigate to teams.microsoft.com +2. Redirect to login.microsoftonline.com +3. Enter email in #i0116 input → Click "Next" (input[type="submit"]) +4. Wait for password page (may redirect to org-specific login) +5. Enter password → Click "Sign in" (#idSIButton9) +6. Handle "Stay signed in?" → Click "Yes" +7. Teams loads with authenticated session +``` + +### Anti-Detection + +The bot uses several measures to appear as a regular browser: +- `rebrowser-playwright` with `puppeteer-extra-plugin-stealth` +- Headful mode via Xvfb (Teams blocks headless Chromium) +- Standard Chrome launch arguments (disable automation, sandbox flags) +- Real viewport (1280x720), locale/timezone matching + +--- + +## 12. Teams DOM Interaction + +### Live Captions + +**Enable flow**: "More" button (`#callingButtons-showMoreBtn`) → "Language and speech" (`#LanguageSpeechMenuControl-id`) → "Show live captions" (button with `aria-checked`) + +**Scraping target**: `div[data-tid="closed-caption-renderer-wrapper"]` + +**Extraction strategies**: +- Strategy A: `[data-tid]` containers with speaker in `` title/text, content in adjacent spans +- Strategy B: Structural fallback scanning `
` trees for speaker + text patterns + +**Noise filter**: Ignores entries matching known non-transcript patterns (buttons, timestamps without content, single-word UI elements). + +### Chat Panel + +**Open**: Click `button[id="chat-button"]` — but ONLY if `aria-pressed !== "true"` (prevents toggle-off) + +**Scraping target**: Container with `[role="log"]` + +**Send messages**: Find CKEditor input (`[data-tid="ckeditor-replyConversation"]` or `div[role="textbox"]`), type text, press Enter. + +### Meeting Controls + +| Control | Selector | Notes | +|---------|----------|-------| +| Mic toggle | `input[data-tid="toggle-audio"]` or `input[role="switch"][title*="mic" i]` | Check `checked` state before toggling | +| Hangup | `button[id="hangup-button"]` or `#hangup-button` | | +| More menu | `button[id="callingButtons-showMoreBtn"]` | | +| Join now | `button[data-tid="prejoin-join-button"]` | | + +--- + +## 13. Deployment + +### Infrastructure + +| Component | Platform | Details | +|-----------|----------|---------| +| Browser Bot | Azure Container Apps | 2 CPU, 4GB RAM, Xvfb for headful mode | +| Gateway | Azure Container Apps | Shared instance (`cae-poweron-shared`) | +| Database | Azure Cosmos DB | Sessions, transcripts, responses, system bots | +| Container Registry | Azure Container Registry | Images tagged by Git SHA | + +### CI/CD Pipeline + +**Trigger**: Push to `main` branch + +**Steps**: +1. Checkout code +2. Docker login to ACR +3. Build image with Playwright base + Xvfb +4. Push with tags `latest` and `{git-sha}` +5. Azure login +6. `az containerapp update` with `--revision-suffix deploy-{sha-prefix}` (forces new revision) +7. `az containerapp revision restart` on latest revision (ensures container starts) + +### Environment Variables (Production) + +| Variable | Value | +|----------|-------| +| `PORT` | `4100` | +| `NODE_ENV` | `production` | +| `BOT_HEADLESS` | `false` (headful via Xvfb) | +| `GATEWAY_WS_URL` | `wss://gateway-int.poweron-center.net/api/teamsbot/ws` | +| `DISPLAY` | `:99` (Xvfb) | + +--- + +## 14. Configuration Reference + +### Browser Bot Environment + +| Variable | Description | Default | +|----------|-------------|---------| +| `PORT` | HTTP server port | `4100` | +| `GATEWAY_WS_URL` | Gateway WebSocket URL | `wss://gateway-int.poweron-center.net/api/teamsbot/ws` | +| `BOT_NAME` | Default display name | `PowerOn AI` | +| `BOT_HEADLESS` | Run headless | `true` (`false` in Docker) | +| `LOG_LEVEL` | Winston log level | `info` | +| `SCREENSHOT_ON_ERROR` | Screenshots on errors | `true` | + +### Gateway Feature Config + +| Field | Default | Range | Description | +|-------|---------|-------|-------------| +| `responseMode` | `auto` | — | auto, manual, transcribeOnly | +| `responseChannel` | `voice` | — | voice, chat, both | +| `transferMode` | `auto` | — | caption, audio, auto | +| `language` | `de-DE` | — | BCP-47 language tag | +| `triggerIntervalSeconds` | `10` | 3–60 | Periodic AI trigger | +| `triggerCooldownSeconds` | `5` | 1–30 | Min gap between triggers | +| `contextWindowSegments` | `20` | 5–100 | Transcript lines for AI context | + +### Timeouts (Hardcoded in config.ts) + +| Timeout | Value | Purpose | +|---------|-------|---------| +| `lobbyWait` | 120s | Max time waiting in lobby | +| `joinTimeout` | 30s | Max time for join flow | +| `captionsEnable` | 10s | Max time to enable captions | +| `pageLoad` | 30s | General page load timeout | + +--- + +## 15. Known Constraints & Lessons Learned + +### Things That DO NOT Work + +| Approach | Why It Fails | Alternative | +|----------|-------------|-------------| +| Fake video injection (`--use-file-for-fake-video-capture`) | Crashes Chromium renderer when WebRTC audio starts | Keep camera OFF | +| Headless Chromium | Teams detects and blocks headless browsers | Use headful + Xvfb | +| `_setSpokenLanguage()` for live captions | Language dropdown doesn't exist; captions use organizer's language setting | Skip entirely | +| `str(PythonEnum)` for comparison | Returns `"ClassName.value"`, not `"value"` | Use `enum.value` | +| Blindly clicking chat button | Toggles panel off if already open | Check `aria-pressed` first | +| `config.botAccountEmail` | `TeamsbotConfig` doesn't have this field | Use `getattr(self, '_botAccountEmail', None)` | +| `model_copy(update={...})` for enum fields | Pydantic v2 doesn't coerce strings to enums on copy | Normalize after merge | + +### Speech Recognition Artifacts + +The bot's name gets mangled by Teams live captions speech recognition. Known variants for "Nyla": Naila, Maila, Neela, Leila, Nila. The system handles this via: +- **AI prompt**: Explicit warning about phonetic distortion +- **Trigger logic**: Phonetic similarity check (first letter match, length ±2, character overlap ≥ 60%) + +### SSE Event Queue: Create On-Demand + +`_emitSessionEvent()` in the Gateway must create the session's event queue (`_sessionEvents[sessionId] = asyncio.Queue()`) if it doesn't already exist. If an event is emitted before the frontend SSE consumer connects, the queue won't exist and events are silently dropped. Always guard with on-demand creation. + +### Performance Characteristics + +| Operation | Typical Duration | +|-----------|-----------------| +| Microsoft login | ~14s | +| Teams page load after auth | ~10s | +| Pre-join to in-meeting | ~5s | +| Caption enable flow | ~8s | +| AI analysis (GPT-4o-mini) | ~2.5s | +| TTS generation | ~1s | +| Full join to first greeting | ~40s | diff --git a/README.md b/README.md index b0e7df6..da8acd7 100644 --- a/README.md +++ b/README.md @@ -1,67 +1,269 @@ -# Teams Browser Bot Service +# Teams Bot Service -Browser-based Microsoft Teams Meeting Bot using Playwright. This service joins Teams meetings via the web interface, captures live captions, and plays TTS audio responses. -Last rev. 2026-01-15 +AI-powered Microsoft Teams meeting bot. Joins meetings, captures live transcripts, monitors chat, and responds via voice and/or chat messages. ## Architecture ``` -┌─────────────────────────────────────────────────────────────────────────────┐ -│ Browser-based Architecture │ -│ │ -│ ┌───────────────────┐ ┌───────────────────────────────┐ │ -│ │ Gateway │ │ Browser Bot Service │ │ -│ │ (Python) │ │ (Node.js + Playwright) │ │ -│ │ │ │ │ │ -│ │ - STT (Google) │ WebSocket │ - Headless Chrome │ │ -│ │ - AI (OpenAI) │◄──────────────────►│ - Teams Web App │ │ -│ │ - TTS (Google) │ Transcripts │ - Meeting join flow │ │ -│ │ - Session Mgmt │ + TTS Audio │ - Captions scraping │ │ -│ │ │ │ - Audio playback │ │ -│ └───────────────────┘ └───────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────────────┘ + ┌───────────────┐ ┌───────────────────────┐ + │ │ WebSocket │ │ + │ Gateway │◄──────────────────────────────────►│ Bot Service │ + │ │ transcripts, status, chat, │ │ + │ ● AI engine │ audioChunks, voiceGreeting │ Joins Teams meetings │ + │ ● TTS │ │ as a participant │ + │ ● Sessions │ playAudio, sendChatMessage, │ │ + │ ● Billing │ stopAudio │ Capabilities: │ + │ │ │ ├ Live transcripts │ + │ │ HTTP │ ├ Chat messages │ + │ │────────────────────────────────────► ├ Voice playback │ + └───────────────┘ join, leave, status │ └ Audio capture │ + └───────────────────────┘ ``` -## Features +| Path | Protocol | Purpose | +|------|----------|---------| +| Gateway ↔ Bot | WebSocket | Real-time transcript, chat, audio, status exchange | +| Gateway → Bot | HTTP | Session control (join, leave, status) | -- **Multi-tenant support**: Can join any Teams meeting (not limited to own tenant) -- **Browser-based**: Uses Teams web app, no Graph Communications SDK needed -- **Captions scraping**: Captures live captions for transcription -- **Audio playback**: Plays TTS audio through the browser into the meeting -- **WebSocket integration**: Real-time communication with Gateway +## Integration Guide -## Prerequisites +### How It Works -- Node.js 18+ -- Docker (for production deployment) +1. Gateway sends a **POST** to the Bot with a meeting URL and optional credentials +2. Bot joins the meeting and appears as a regular participant +3. A **WebSocket** connection is established for real-time data exchange +4. Bot streams transcript segments and chat messages to the Gateway +5. Gateway can send TTS audio or chat responses back into the meeting -## Quick Start +### Step 1: Start a Session -### Local Development +```http +POST /api/bot +Content-Type: application/json + +{ + "sessionId": "550e8400-e29b-41d4-a716-446655440000", + "meetingUrl": "https://teams.microsoft.com/meet/123456789?p=abc123", + "botName": "AI Assistant", + "instanceId": "feature-instance-uuid", + "gatewayWsUrl": "wss://gateway.example.com/api/teamsbot/ws", + "language": "de-DE", + "botAccountEmail": "bot@example.com", + "botAccountPassword": "decrypted-password", + "transferMode": "caption" +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `sessionId` | Yes | Unique session UUID (generated by Gateway) | +| `meetingUrl` | Yes | Teams meeting URL (classic or short format) | +| `botName` | No | Display name in meeting (default: env `BOT_NAME`) | +| `instanceId` | No | Feature instance ID for Gateway WebSocket path | +| `gatewayWsUrl` | No | Gateway WebSocket base URL (default: env `GATEWAY_WS_URL`) | +| `language` | No | BCP-47 language code (default: `de-DE`) | +| `botAccountEmail` | No | Microsoft account for authenticated join | +| `botAccountPassword` | No | Decrypted password for authenticated join | +| `transferMode` | No | `caption` or `audio` | + +If `botAccountEmail` + `botAccountPassword` are provided, the bot joins as an authenticated user. Otherwise, it joins as an anonymous guest. + +**Response:** + +```json +{ + "success": true, + "sessionId": "550e8400-e29b-41d4-a716-446655440000", + "message": "Bot deployment initiated" +} +``` + +### Step 2: Receive Data via WebSocket + +After deployment, the Bot connects to the Gateway at: + +``` +wss://{gatewayHost}/api/teamsbot/{instanceId}/bot/ws/{sessionId} +``` + +#### Messages: Bot → Gateway + +**Transcript segment:** + +```json +{ + "type": "transcript", + "sessionId": "...", + "transcript": { + "speaker": "Jane Doe", + "text": "Hey Bot, can you summarize this?", + "timestamp": "2026-02-18T10:30:00.000Z", + "isFinal": true + } +} +``` + +**Chat message:** + +```json +{ + "type": "chatMessage", + "sessionId": "...", + "chat": { + "speaker": "Jane Doe", + "text": "Please summarize the discussion", + "timestamp": "2026-02-18T10:31:00.000Z" + } +} +``` + +**Status update:** + +```json +{ + "type": "status", + "sessionId": "...", + "status": "joined", + "message": "Bot joined the meeting" +} +``` + +Status values: `connecting` | `in_lobby` | `joined` | `left` | `error` + +**Voice greeting request** (bot asks Gateway for TTS): + +```json +{ + "type": "voiceGreeting", + "sessionId": "...", + "text": "Hello, I am ready.", + "language": "de-DE" +} +``` + +**Raw audio chunk** (`transferMode: "audio"` only): + +```json +{ + "type": "audioChunk", + "sessionId": "...", + "audio": { + "data": "", + "sampleRate": 16000, + "format": "pcm16", + "timestamp": "2026-02-18T10:30:00.500Z" + } +} +``` + +**Keepalive:** + +```json +{ "type": "ping" } +``` + +#### Messages: Gateway → Bot + +**Play TTS audio:** + +```json +{ + "type": "playAudio", + "sessionId": "...", + "audio": { + "data": "", + "format": "mp3" + } +} +``` + +**Send chat message:** + +```json +{ + "type": "sendChatMessage", + "sessionId": "...", + "text": "Here is my summary of the discussion..." +} +``` + +**Stop audio playback:** + +```json +{ "type": "stopAudio", "sessionId": "..." } +``` + +**Keepalive response:** + +```json +{ "type": "pong" } +``` + +### Step 3: Leave the Meeting + +```http +POST /api/bot/:sessionId/leave +``` + +```json +{ + "success": true, + "message": "Leave initiated" +} +``` + +### Step 4: Check Status + +```http +GET /api/bot/:sessionId/status +``` + +```json +{ + "sessionId": "550e8400-e29b-41d4-a716-446655440000", + "state": "in_meeting", + "error": null +} +``` + +### Health Check + +```http +GET /health +``` + +Returns `200 OK` with `{ "status": "ok", "timestamp": "..." }`. + +## Meeting URL Formats + +Both formats are supported: + +``` +# Classic +https://teams.microsoft.com/l/meetup-join/19%3ameeting_xxx/0?context=... + +# Short +https://teams.microsoft.com/meet/123456789?p=abc123 +``` + +## Running the Service + +### Local ```bash -# Install dependencies npm install - -# Install Playwright browsers -npx playwright install chromium - -# Copy and configure environment -cp .env.sample .env -# Edit .env with your settings - -# Run in development mode -npm run dev +cp .env.sample .env # Configure Gateway URL, bot name, etc. +npm run dev # Dev mode ``` ### Docker ```bash -# Build and run -docker-compose up --build +docker build -t teams-bot . -# Or build image only -docker build -t teams-browser-bot . +docker run -p 4100:4100 \ + -e GATEWAY_WS_URL=wss://gateway.example.com/api/teamsbot/ws \ + teams-bot ``` ## Configuration @@ -69,106 +271,6 @@ docker build -t teams-browser-bot . | Variable | Description | Default | |----------|-------------|---------| | `PORT` | HTTP server port | `4100` | -| `GATEWAY_WS_URL` | Gateway WebSocket URL | `wss://gateway-int.poweron-center.net/api/teamsbot/ws` | -| `BOT_NAME` | Display name in meetings | `PowerOn AI` | -| `BOT_HEADLESS` | Run browser headless | `true` | -| `LOG_LEVEL` | Logging level | `info` | -| `SCREENSHOT_ON_ERROR` | Take screenshots on errors | `true` | - -## API Endpoints - -### Health Check -``` -GET /health -``` - -### Deploy Bot -``` -POST /api/bot -Content-Type: application/json - -{ - "sessionId": "uuid", - "meetingUrl": "https://teams.microsoft.com/meet/...", - "botName": "PowerOn AI" -} -``` - -### Leave Meeting -``` -POST /api/bot/:sessionId/leave -``` - -### Get Status -``` -GET /api/bot/:sessionId/status -``` - -## WebSocket Protocol - -### Gateway → Bot - -```typescript -// Join a meeting -{ type: "joinMeeting", sessionId: "uuid", meetingUrl: "...", botName?: "..." } - -// Leave meeting -{ type: "leaveMeeting", sessionId: "uuid" } - -// Play audio -{ type: "playAudio", sessionId: "uuid", audio: { format: "mp3", data: "base64..." } } -``` - -### Bot → Gateway - -```typescript -// Transcript -{ type: "transcript", sessionId: "uuid", transcript: { speaker: "...", text: "...", timestamp: "...", isFinal: true } } - -// Status -{ type: "status", sessionId: "uuid", status: "joined" | "in_lobby" | "left" | "error", message?: "..." } -``` - -## Meeting URL Formats - -Supports both classic and new (short) URL formats: - -``` -# Classic format -https://teams.microsoft.com/l/meetup-join/19%3ameeting_xxx/0?context=... - -# New format (since 2025) -https://teams.microsoft.com/meet/36438888781520?p=5fGqrujxzewPFjJacW -``` - -## Deployment - -### Azure Container Instance - -```bash -# Create resource group -az group create --name rg-teams-bot --location westeurope - -# Create container instance -az container create \ - --resource-group rg-teams-bot \ - --name teams-browser-bot \ - --image /teams-browser-bot:latest \ - --cpu 2 \ - --memory 4 \ - --ports 4100 \ - --environment-variables \ - GATEWAY_WS_URL=wss://gateway-int.poweron-center.net/api/teamsbot/ws \ - BOT_NAME="PowerOn AI" -``` - -## Debugging - -- Logs are written to `output/logs/` -- Screenshots (on error) are saved to `output/screenshots/` -- Set `BOT_HEADLESS=false` for local debugging with visible browser - -## Based On - -- [Recall.ai Microsoft Teams Meeting Bot](https://github.com/recallai/microsoft-teams-meeting-bot) -- [Recall.ai Blog: How to build a Microsoft Teams Bot](https://www.recall.ai/blog/how-to-build-a-microsoft-teams-bot) +| `GATEWAY_WS_URL` | Gateway WebSocket base URL | — | +| `BOT_NAME` | Default bot display name | — | +| `LOG_LEVEL` | Log level (`debug`, `info`, `warn`, `error`) | `info` | diff --git a/nyla-marketing.html b/nyla-marketing.html new file mode 100644 index 0000000..704394f --- /dev/null +++ b/nyla-marketing.html @@ -0,0 +1,321 @@ + + + + + +Nyla — Your AI Colleague in Every Meeting + + + + + + + + + + + + + + + +
+
+
New from PowerOn
+

Meet Nyla

+

She joins your Microsoft Teams meeting as a real colleague. She listens. She speaks. She answers your questions — live.

+ Discover Nyla +
+
+ + +
+
+ +

No one in the meeting knows it's AI.

+

+ Nyla appears as a real person in the participant list. No "Bot" badge. No "Guest" tag. No IT setup at your end. Just share the meeting link. +

+ +
+
+

Other AI meeting tools

+
×
Show up as "Bot" or "Guest" in the roster
+
×
Need your IT admin to allow guest access
+
×
Require app installation or licenses
+
×
Only transcribe — can't speak or act
+
+
+

Nyla

+
Appears as a real person — no labels
+
Zero setup for you — no admin, no config
+
Nothing to install. No licenses needed.
+
Speaks, chats, answers questions — live
+
+
+ +
+
Zero
IT Setup
+
No
Bot Label
+
Any
Meeting & Org
+
Full
Meeting Access
+
+
+
+ + +
+
+ +

Not a recorder. A participant.

+

Nyla listens, speaks, answers, analyzes, and summarizes — while the meeting is still running.

+ +
+
+
+

Speaks in the meeting

+

Participants hear Nyla speak — just like any other colleague.

+
+
+
+

Chats in real-time

+

Reads and writes in the meeting chat. Voice, text, or both.

+
+
+
+

Answers questions live

+

"Nyla, summarize what we discussed." She responds immediately.

+
+
+
+

Analyzes continuously

+

Real-time AI running throughout the meeting. Not just after.

+
+
+
+

Full transcript

+

Every word, every speaker, properly attributed. Streamed live.

+
+
+
+

Meeting summary

+

Key decisions, action items, topics — delivered when the meeting ends.

+
+
+
+
+ + +
+
+ +

See how Nyla stacks up.

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
What you getNylaFireflies.aiRead.aiOtter.aiMS Copilot
Appears as a real colleagueGuest labelApp labelBot labelNative
Zero IT setup at your end××Partial×
Speaks in the meeting×××
Chats in the meeting××Text only
Answers questions on demand××Partial
Runs commands in Teams×××Partial
Works with any organizationAdmin neededApp installPartial×
+
+
+
+ + + + +
+
+ PowerOn +

© 2026 PowerOn AG. All rights reserved.

+
+
+ + + +