12 KiB
AI Call Flow Architecture Analysis
Executive Summary
This document analyzes the current AI call flow in the workflow system and compares it with Claude's approach to identify weaknesses causing:
- Over-complication for simple requests
- Slow rendering for small documents
Current Architecture Flow
Complete Flow for ai.process Action
User Request
↓
workflowProcessor.generateTaskPlan()
├─→ modeDynamic.generateTaskPlan() [AI Call #1: Task Planning]
└─→ Creates TaskPlan with TaskSteps
↓
workflowProcessor.executeTask()
├─→ modeDynamic.executeTask()
│ ├─→ _planSelect() [AI Call #2: Action Selection]
│ │ └─→ generateDynamicPlanSelectionPrompt()
│ │ └─→ callAiPlanning() [~30s, DETAILED mode]
│ │
│ ├─→ _actExecute()
│ │ ├─→ generateDynamicParametersPrompt() [AI Call #3: Parameter Generation]
│ │ │ └─→ callAiPlanning() [~30s, DETAILED mode]
│ │ │
│ │ ├─→ actionExecutor.executeSingleAction()
│ │ │ └─→ methodAi.process()
│ │ │ ├─→ progressLogStart()
│ │ │ ├─→ getChatDocumentsFromDocumentList() [Document Loading]
│ │ │ ├─→ _analyzePromptAndCreateOptions() [AI Call #4: Prompt Analysis]
│ │ │ │ └─→ callAiPlanning() [~10s, BASIC mode]
│ │ │ │
│ │ │ ├─→ callAiDocuments()
│ │ │ │ ├─→ progressLogStart() [Nested progress tracking]
│ │ │ │ ├─→ callAiText() [if documents exist]
│ │ │ │ │ └─→ extractionService.processDocumentsPerChunk()
│ │ │ │ │ └─→ Multiple AI calls per chunk [AI Call #5-N]
│ │ │ │ │
│ │ │ │ ├─→ buildGenerationPrompt() [Complex JSON template]
│ │ │ │ ├─→ _callAiWithLooping()
│ │ │ │ │ ├─→ AI Call [AI Call #6: First iteration]
│ │ │ │ │ ├─→ Check complete_response flag
│ │ │ │ │ ├─→ Extract sections
│ │ │ │ │ ├─→ Repair broken JSON if needed
│ │ │ │ │ └─→ Loop up to 50 iterations [AI Call #6-55]
│ │ │ │ │
│ │ │ │ ├─→ Parse generated JSON
│ │ │ │ ├─→ generationService.renderReport() [RENDERING PHASE]
│ │ │ │ │ ├─→ _getFormatRenderer()
│ │ │ │ │ ├─→ renderer.render() [Format-specific rendering]
│ │ │ │ │ │ └─→ For DOCX: python-docx library calls
│ │ │ │ │ │ └─→ For PDF: ReportLab/other library
│ │ │ │ │ │ └─→ For HTML: Template rendering
│ │ │ │ │ └─→ Returns rendered bytes/base64
│ │ │ │ │
│ │ │ │ └─→ Build result dict
│ │ │ │
│ │ │ └─→ progressLogFinish()
│ │ │
│ │ └─→ progressLogFinish()
│ │
│ ├─→ _observeBuild()
│ │ └─→ Build Observation object
│ │
│ ├─→ contentValidator.validateContent() [AI Call #7: Content Validation]
│ │ └─→ Multiple validation checks
│ │
│ ├─→ _refineDecide() [AI Call #8: Refinement Decision]
│ │ ├─→ extractReviewContent()
│ │ ├─→ generateDynamicRefinementPrompt()
│ │ └─→ callAiPlanning() [~30s, ADVANCED mode]
│ │
│ └─→ Loop continues if decision = "continue"
│
└─→ createTaskCompletionMessage()
Key Bottlenecks Identified
1. Multiple AI Calls for Simple Requests
Problem: Even for a simple "generate a text file" request, the system makes:
- AI Call #1: Task Planning (unnecessary for simple requests)
- AI Call #2: Action Selection (could be deterministic)
- AI Call #3: Parameter Generation (overkill for simple prompts)
- AI Call #4: Prompt Analysis (redundant - prompt is already clear)
- AI Call #5-N: Document extraction per chunk (if documents exist)
- AI Call #6-55: Document generation with looping (up to 50 iterations!)
- AI Call #7: Content Validation (could be optional for simple outputs)
- AI Call #8: Refinement Decision (unnecessary if output is simple)
Total: 8-60+ AI calls for a simple request that Claude handles in 1-2 calls.
2. Complex Prompt Generation
Current Approach:
- Stage 1:
generateDynamicPlanSelectionPrompt()- Large template with many placeholders - Stage 2:
generateDynamicParametersPrompt()- Another large template - Stage 3:
buildGenerationPrompt()- Complex JSON template with sections structure
Claude's Approach: Direct prompt, minimal overhead.
3. Inefficient Rendering
Current Flow:
AI generates JSON with sections
↓
Parse JSON
↓
Extract sections array
↓
Get format renderer
↓
Renderer processes sections
├─→ For DOCX: Create Document object
├─→ For each section: Add paragraph/heading
├─→ Apply formatting
├─→ Generate Table of Contents
└─→ Convert to bytes/base64
Issues:
- Rendering happens AFTER AI generation completes
- No streaming or progressive rendering
- Full document structure built even for simple text
- Complex renderers for simple formats (e.g., TXT rendered through DOCX pipeline)
4. Unnecessary Iteration Looping
Current: _callAiWithLooping() loops up to 50 times:
- Checks for
complete_responseflag - Repairs broken JSON
- Extracts sections incrementally
- Continues until complete
For Simple Requests: This is overkill. A simple text generation should be single-shot.
5. Redundant Progress Tracking
- Nested progress tracking (method level + service level)
- Multiple progress updates for same operation
- Progress logging adds overhead
Claude's Architecture (From Concept Documents)
Claude's Flow
User Input
↓
Input Reception & Analysis [AI Call #1: Semantic Understanding]
├─→ AI understands intent semantically (not regex/keyword matching)
├─→ Detects patterns like "write a document" → create docx
├─→ Detects "continue our conversation" → use past chats tool
├─→ Multi-language support (semantic, not pattern-based)
└─→ Categorizes request complexity
↓
Understanding + Execution [Combined AI Call]
├─→ Simple requests: 1 AI call that understands AND executes
│ └─→ AI generates content directly, no separate parameter generation
├─→ Moderate requests: 1-2 AI calls total
└─→ Complex requests: 5-20 AI calls (iterative research + generation)
↓
Tool Selection [Part of AI understanding, not separate call]
├─→ AI understands which tool to use as part of intent analysis
└─→ Direct tool execution (no separate parameter generation call)
↓
Execution [Direct tool calls]
├─→ web_search → Direct API call
├─→ create_file → Direct file creation (no rendering pipeline)
└─→ bash_tool → Direct command execution
↓
Output [Minimal formatting]
├─→ Text: Direct return
├─→ Files: Copy to output directory (no JSON → render pipeline)
└─→ Code: Direct render
Key Differences
- Semantic AI Understanding: Claude uses AI for pattern matching, but it's semantic understanding (not regex). The AI understands "write a document" means create docx, regardless of language.
- Combined AI Calls: Instead of separate calls for plan → select → parameters → generate, Claude makes 1 AI call that understands intent AND generates output
- No Separate Parameter Generation: When AI understands "create a text file with Hello World", it directly generates the content - no separate parameter extraction step
- Progressive Complexity: Simple = 1 AI call (understand + execute), Complex = 5-20 AI calls (iterative)
- No Rendering Pipeline: Files are created directly from AI output, not rendered from JSON structure
- Streaming Output: Results shown as they're generated
Comparison Table
| Aspect | Current System | Claude's Approach | Impact |
|---|---|---|---|
| Simple Request AI Calls | 8-60+ calls (sequential) | 1-2 calls (combined) | 40x overhead |
| Action Selection | Separate AI call (30s) | Part of understanding call | 30s saved |
| Parameter Generation | Separate AI call (30s) | Combined with generation | 30s saved |
| Prompt Analysis | Separate AI call (10s) | Part of understanding call | 10s saved |
| Document Generation | Looping (up to 50 iterations) | Single-shot for simple | Variable |
| Rendering | Post-generation pipeline | Direct file creation | Slow for small docs |
| Content Validation | Always separate AI call | Optional/combined | 30s saved |
| Refinement Decision | Always separate AI call | Combined with understanding | 30s saved |
Root Causes
1. Over-Complication
Root Cause: The system makes separate AI calls for each step (plan → select → parameters → generate → validate → refine), even when a single AI call could understand intent AND execute.
Solution: Combine AI calls for efficiency:
- Single AI Call for Simple Requests: One call that understands intent AND generates output (like Claude)
- Combined Understanding: Merge action selection + parameter generation into the generation call
- Skip Mechanical Steps: Don't make separate AI calls for steps that can be inferred from the main understanding
2. Slow Rendering
Root Cause: Rendering happens as a separate phase AFTER AI generation, using complex renderers even for simple formats.
Solution:
- For simple formats (TXT, MD): Return directly from AI, no rendering
- For complex formats (DOCX, PDF): Use lightweight renderers for small documents
- Implement streaming rendering for large documents
- Cache renderer instances
Recommendations
Immediate Fixes (High Impact, Low Effort)
-
Combine AI Calls for Simple Requests
- Key Insight: Claude uses AI for semantic understanding, but combines understanding + execution
- Merge action selection + parameter generation into the main generation call
- Use one AI call that understands intent AND generates output (not separate calls)
- Skip separate refinement decision if output is simple (check in same call)
-
Optimize Rendering
- For TXT/MD: Return AI output directly, no rendering
- For small documents (<10KB): Use lightweight renderers
- Cache renderer instances
-
Reduce Iteration Looping
- For simple requests: Single-shot AI call (no looping)
- Only use looping for complex/long documents
Medium-Term Improvements
-
Request Complexity Detection
- Add complexity analyzer (pattern-based, not AI-based)
- Route to appropriate workflow path
-
Streaming Output
- Stream AI responses as they're generated
- Progressive rendering for large documents
-
Direct Tool Execution
- For simple actions: Skip parameter generation AI call
- Use default parameters or pattern-based parameter extraction
Long-Term Architecture Changes
-
Unified AI Call Interface
- Single entry point with complexity-aware routing
- Automatic optimization based on request type
-
Progressive Enhancement
- Start with simple execution
- Add complexity only if needed (validation fails, user requests refinement)
-
Renderer Optimization
- Lazy rendering (only when needed)
- Format-specific optimizations
- Parallel rendering for multiple documents
Implementation Priority
- P0 (Critical): Skip unnecessary AI calls for simple requests
- P0 (Critical): Optimize rendering for simple formats
- P1 (High): Reduce iteration looping for simple requests
- P1 (High): Add request complexity detection
- P2 (Medium): Implement streaming output
- P3 (Low): Long-term architecture refactoring
Metrics to Track
- AI Calls per Request: Target <2 for simple requests
- Rendering Time: Target <1s for simple documents
- Total Request Time: Target <5s for simple requests
- User Satisfaction: Measure via feedback