## User Prompt Analysis: Intent Extraction and Context Documentization ### Objective - Extract a clean, concise user intent from the first user message of each workflow round. - Move large or detailed inline supportive content into `ChatDocument` entries attached to the same first user message. - Persist the cleaned intent in `services.currentUserPrompt` and keep the original message in `services.rawUserPrompt`. - Normalize the intent to the detected language. ### Integration Point - Layer: Workflow level, same module where task planning is initiated. - Timing: Immediately when a new round starts and the first user message is being created (before task planning and any action planning). - Side effects: - Create/attach `ChatDocument` items to the first user message with `documentsLabel = "user_context"`. - Ensure these documents are discoverable via existing `AVAILABLE_DOCUMENTS*` placeholders. ### Data Flow 1) Receive raw user message for the round → store `services.rawUserPrompt`. 2) Run AI-based analyzer to produce `{ detectedLanguage, intent, contextItems[] }`. 3) Set `services.user.language = detectedLanguage` (if present). 4) Set `services.currentUserPrompt = intent`. 5) For each `contextItems[i]`, create a `ChatDocument` (fileName: `user_context_{i}.txt` or derived) and attach to the first user message. Group via `docList:messageId:user_context`. ### Minimal User Input Object (in-memory) - detectedLanguage: string (ISO, e.g., "en") - intent: string (concise, normalized) - contextItems: array of items to be persisted as ChatDocuments only (not retained as a list beyond creation) ### AI Analyzer Prompt (JSON braces escaped for docs) Use this prompt for the analyzer call. Output must be JSON-only and use the following structure. Note: to display JSON in docs, we show braces as doubled `{{` `}}`. ``` You are an input analyzer. Split the user's message into: 1) intent: the user's core request in one concise paragraph, normalized to the user's language. 2) contextItems: supportive data to attach as separate documents if significantly larger than the intent. Include large literal data blocks, long lists/tables, code/JSON blocks, quoted transcripts, CSV fragments, or detailed specs. Keep URLs in the intent unless they include large pasted content. Rules: - If total content length (intent + data) is less than 10% of the model's max tokens, do not extract; return an empty contextItems and keep a compact, self-contained intent. - If content exceeds that, move bulky parts into contextItems, keeping the intent short and clear. - Preserve critical references (URLs, filenames) in the intent. - Normalize the intent to the detected language. If mixed-language, use the primary detected language and normalize. Output JSON only (no markdown): {{ "detectedLanguage": "en", "intent": "Concise normalized request...", "contextItems": [ {{ "title": "User context 1", "mimeType": "text/plain", "content": "Full extracted content block here" }} ] }} ``` ### Algorithm (concise) 1) On new round user message creation: - Set `services.rawUserPrompt = rawMessage`. - Determine model `maxTokens` (from current model selection). - Call AI analyzer with prompt above and the raw message. 2) Parse analyzer result: - Fallback: if invalid, set `services.currentUserPrompt = rawMessage`, `contextItems = []`. - Else set `services.currentUserPrompt = intent`, update `services.user.language` when provided. 3) Create context documents: - For each `contextItem`, create a `ChatDocument` using component/file interfaces. - Attach to the first user message; label group as `user_context` so it appears in `docList:messageId:user_context`. 4) Downstream prompt extractors: - `extractUserPrompt` returns `services.currentUserPrompt` if available, otherwise fallback. - `AVAILABLE_DOCUMENTS*` functions continue to index attached documents. ### Pseudocode (high-level) ``` raw = userMessage.text services.rawUserPrompt = raw modelMax = ai.getModelMaxTokens() analysis = ai.callAnalyzer(raw, modelMax) if !analysis.valid: services.currentUserPrompt = raw items = [] else: services.user.language = analysis.detectedLanguage or services.user.language services.currentUserPrompt = analysis.intent items = analysis.contextItems or [] for i, item in enumerate(items): fileName = inferFileName(item.title, i) // default: user_context_{i}.txt doc = createChatDocument(fileName, item.mimeType, item.content, messageId=firstMessage.id) attachDocumentToMessage(doc, label="user_context") ``` ### Edge Cases - Analyzer returns empty/invalid → keep raw prompt as current. - Extremely large context blocks → rely on file storage and existing compression paths. - Mixed-language messages → normalize intent to detected primary language. - Token threshold (~10% of model max) → skip extraction when very small. ### Telemetry & Logging - Log analyzer input size, output size, number of context items, and time. - Trace the final intent and number of documents created (not content). ### Rollout 1) Implement analyzer call and storage. 2) Attach documents and verify they appear in AVAILABLE_DOCUMENTS index. 3) Update `extractUserPrompt` to prefer `services.currentUserPrompt`. 4) Add metrics and guardrails; enable behind a feature flag if needed. ### Testing - Unit: parsing analyzer response; document creation; `extractUserPrompt` fallback. - Integration: start workflow round → verify `services.currentUserPrompt` set and `user_context` docs indexed. - Regression: prompts render correctly; parameters generation can reference new docs. ### Acceptance Criteria - Clean intent set on `services.currentUserPrompt` consistently. - Context extracted into documents when above threshold; otherwise kept inline. - `AVAILABLE_DOCUMENTS*` includes new context docs; `extractUserPrompt` returns cleaned intent.