5.8 KiB
5.8 KiB
User Prompt Analysis: Intent Extraction and Context Documentization
Objective
- Extract a clean, concise user intent from the first user message of each workflow round.
- Move large or detailed inline supportive content into
ChatDocumententries attached to the same first user message. - Persist the cleaned intent in
services.currentUserPromptand keep the original message inservices.rawUserPrompt. - Normalize the intent to the detected language.
Integration Point
- Layer: Workflow level, same module where task planning is initiated.
- Timing: Immediately when a new round starts and the first user message is being created (before task planning and any action planning).
- Side effects:
- Create/attach
ChatDocumentitems to the first user message withdocumentsLabel = "user_context". - Ensure these documents are discoverable via existing
AVAILABLE_DOCUMENTS*placeholders.
- Create/attach
Data Flow
- Receive raw user message for the round → store
services.rawUserPrompt. - Run AI-based analyzer to produce
{ detectedLanguage, intent, contextItems[] }. - Set
services.user.language = detectedLanguage(if present). - Set
services.currentUserPrompt = intent. - For each
contextItems[i], create aChatDocument(fileName:user_context_{i}.txtor derived) and attach to the first user message. Group viadocList:messageId:user_context.
Minimal User Input Object (in-memory)
- detectedLanguage: string (ISO, e.g., "en")
- intent: string (concise, normalized)
- contextItems: array of items to be persisted as ChatDocuments only (not retained as a list beyond creation)
AI Analyzer Prompt (JSON braces escaped for docs)
Use this prompt for the analyzer call. Output must be JSON-only and use the following structure. Note: to display JSON in docs, we show braces as doubled {{ }}.
You are an input analyzer. Split the user's message into:
1) intent: the user's core request in one concise paragraph, normalized to the user's language.
2) contextItems: supportive data to attach as separate documents if significantly larger than the intent. Include large literal data blocks, long lists/tables, code/JSON blocks, quoted transcripts, CSV fragments, or detailed specs. Keep URLs in the intent unless they include large pasted content.
Rules:
- If total content length (intent + data) is less than 10% of the model's max tokens, do not extract; return an empty contextItems and keep a compact, self-contained intent.
- If content exceeds that, move bulky parts into contextItems, keeping the intent short and clear.
- Preserve critical references (URLs, filenames) in the intent.
- Normalize the intent to the detected language. If mixed-language, use the primary detected language and normalize.
Output JSON only (no markdown):
{{
"detectedLanguage": "en",
"intent": "Concise normalized request...",
"contextItems": [
{{
"title": "User context 1",
"mimeType": "text/plain",
"content": "Full extracted content block here"
}}
]
}}
Algorithm (concise)
- On new round user message creation:
- Set
services.rawUserPrompt = rawMessage. - Determine model
maxTokens(from current model selection). - Call AI analyzer with prompt above and the raw message.
- Set
- Parse analyzer result:
- Fallback: if invalid, set
services.currentUserPrompt = rawMessage,contextItems = []. - Else set
services.currentUserPrompt = intent, updateservices.user.languagewhen provided.
- Fallback: if invalid, set
- Create context documents:
- For each
contextItem, create aChatDocumentusing component/file interfaces. - Attach to the first user message; label group as
user_contextso it appears indocList:messageId:user_context.
- For each
- Downstream prompt extractors:
extractUserPromptreturnsservices.currentUserPromptif available, otherwise fallback.AVAILABLE_DOCUMENTS*functions continue to index attached documents.
Pseudocode (high-level)
raw = userMessage.text
services.rawUserPrompt = raw
modelMax = ai.getModelMaxTokens()
analysis = ai.callAnalyzer(raw, modelMax)
if !analysis.valid:
services.currentUserPrompt = raw
items = []
else:
services.user.language = analysis.detectedLanguage or services.user.language
services.currentUserPrompt = analysis.intent
items = analysis.contextItems or []
for i, item in enumerate(items):
fileName = inferFileName(item.title, i) // default: user_context_{i}.txt
doc = createChatDocument(fileName, item.mimeType, item.content, messageId=firstMessage.id)
attachDocumentToMessage(doc, label="user_context")
Edge Cases
- Analyzer returns empty/invalid → keep raw prompt as current.
- Extremely large context blocks → rely on file storage and existing compression paths.
- Mixed-language messages → normalize intent to detected primary language.
- Token threshold (~10% of model max) → skip extraction when very small.
Telemetry & Logging
- Log analyzer input size, output size, number of context items, and time.
- Trace the final intent and number of documents created (not content).
Rollout
- Implement analyzer call and storage.
- Attach documents and verify they appear in AVAILABLE_DOCUMENTS index.
- Update
extractUserPromptto preferservices.currentUserPrompt. - Add metrics and guardrails; enable behind a feature flag if needed.
Testing
- Unit: parsing analyzer response; document creation;
extractUserPromptfallback. - Integration: start workflow round → verify
services.currentUserPromptset anduser_contextdocs indexed. - Regression: prompts render correctly; parameters generation can reference new docs.
Acceptance Criteria
- Clean intent set on
services.currentUserPromptconsistently. - Context extracted into documents when above threshold; otherwise kept inline.
AVAILABLE_DOCUMENTS*includes new context docs;extractUserPromptreturns cleaned intent.