wiki/z-archive/appdoc/ai_call_flow_analysis.md

290 lines
12 KiB
Markdown

# AI Call Flow Architecture Analysis
## Executive Summary
This document analyzes the current AI call flow in the workflow system and compares it with Claude's approach to identify weaknesses causing:
1. **Over-complication for simple requests**
2. **Slow rendering for small documents**
## Current Architecture Flow
### Complete Flow for `ai.process` Action
```
User Request
workflowProcessor.generateTaskPlan()
├─→ modeDynamic.generateTaskPlan() [AI Call #1: Task Planning]
└─→ Creates TaskPlan with TaskSteps
workflowProcessor.executeTask()
├─→ modeDynamic.executeTask()
│ ├─→ _planSelect() [AI Call #2: Action Selection]
│ │ └─→ generateDynamicPlanSelectionPrompt()
│ │ └─→ callAiPlanning() [~30s, DETAILED mode]
│ │
│ ├─→ _actExecute()
│ │ ├─→ generateDynamicParametersPrompt() [AI Call #3: Parameter Generation]
│ │ │ └─→ callAiPlanning() [~30s, DETAILED mode]
│ │ │
│ │ ├─→ actionExecutor.executeSingleAction()
│ │ │ └─→ methodAi.process()
│ │ │ ├─→ progressLogStart()
│ │ │ ├─→ getChatDocumentsFromDocumentList() [Document Loading]
│ │ │ ├─→ _analyzePromptAndCreateOptions() [AI Call #4: Prompt Analysis]
│ │ │ │ └─→ callAiPlanning() [~10s, BASIC mode]
│ │ │ │
│ │ │ ├─→ callAiDocuments()
│ │ │ │ ├─→ progressLogStart() [Nested progress tracking]
│ │ │ │ ├─→ callAiText() [if documents exist]
│ │ │ │ │ └─→ extractionService.processDocumentsPerChunk()
│ │ │ │ │ └─→ Multiple AI calls per chunk [AI Call #5-N]
│ │ │ │ │
│ │ │ │ ├─→ buildGenerationPrompt() [Complex JSON template]
│ │ │ │ ├─→ _callAiWithLooping()
│ │ │ │ │ ├─→ AI Call [AI Call #6: First iteration]
│ │ │ │ │ ├─→ Check complete_response flag
│ │ │ │ │ ├─→ Extract sections
│ │ │ │ │ ├─→ Repair broken JSON if needed
│ │ │ │ │ └─→ Loop up to 50 iterations [AI Call #6-55]
│ │ │ │ │
│ │ │ │ ├─→ Parse generated JSON
│ │ │ │ ├─→ generationService.renderReport() [RENDERING PHASE]
│ │ │ │ │ ├─→ _getFormatRenderer()
│ │ │ │ │ ├─→ renderer.render() [Format-specific rendering]
│ │ │ │ │ │ └─→ For DOCX: python-docx library calls
│ │ │ │ │ │ └─→ For PDF: ReportLab/other library
│ │ │ │ │ │ └─→ For HTML: Template rendering
│ │ │ │ │ └─→ Returns rendered bytes/base64
│ │ │ │ │
│ │ │ │ └─→ Build result dict
│ │ │ │
│ │ │ └─→ progressLogFinish()
│ │ │
│ │ └─→ progressLogFinish()
│ │
│ ├─→ _observeBuild()
│ │ └─→ Build Observation object
│ │
│ ├─→ contentValidator.validateContent() [AI Call #7: Content Validation]
│ │ └─→ Multiple validation checks
│ │
│ ├─→ _refineDecide() [AI Call #8: Refinement Decision]
│ │ ├─→ extractReviewContent()
│ │ ├─→ generateDynamicRefinementPrompt()
│ │ └─→ callAiPlanning() [~30s, ADVANCED mode]
│ │
│ └─→ Loop continues if decision = "continue"
└─→ createTaskCompletionMessage()
```
### Key Bottlenecks Identified
#### 1. **Multiple AI Calls for Simple Requests**
**Problem**: Even for a simple "generate a text file" request, the system makes:
- **AI Call #1**: Task Planning (unnecessary for simple requests)
- **AI Call #2**: Action Selection (could be deterministic)
- **AI Call #3**: Parameter Generation (overkill for simple prompts)
- **AI Call #4**: Prompt Analysis (redundant - prompt is already clear)
- **AI Call #5-N**: Document extraction per chunk (if documents exist)
- **AI Call #6-55**: Document generation with looping (up to 50 iterations!)
- **AI Call #7**: Content Validation (could be optional for simple outputs)
- **AI Call #8**: Refinement Decision (unnecessary if output is simple)
**Total**: 8-60+ AI calls for a simple request that Claude handles in 1-2 calls.
#### 2. **Complex Prompt Generation**
**Current Approach**:
- Stage 1: `generateDynamicPlanSelectionPrompt()` - Large template with many placeholders
- Stage 2: `generateDynamicParametersPrompt()` - Another large template
- Stage 3: `buildGenerationPrompt()` - Complex JSON template with sections structure
**Claude's Approach**: Direct prompt, minimal overhead.
#### 3. **Inefficient Rendering**
**Current Flow**:
```
AI generates JSON with sections
Parse JSON
Extract sections array
Get format renderer
Renderer processes sections
├─→ For DOCX: Create Document object
├─→ For each section: Add paragraph/heading
├─→ Apply formatting
├─→ Generate Table of Contents
└─→ Convert to bytes/base64
```
**Issues**:
- Rendering happens AFTER AI generation completes
- No streaming or progressive rendering
- Full document structure built even for simple text
- Complex renderers for simple formats (e.g., TXT rendered through DOCX pipeline)
#### 4. **Unnecessary Iteration Looping**
**Current**: `_callAiWithLooping()` loops up to 50 times:
- Checks for `complete_response` flag
- Repairs broken JSON
- Extracts sections incrementally
- Continues until complete
**For Simple Requests**: This is overkill. A simple text generation should be single-shot.
#### 5. **Redundant Progress Tracking**
- Nested progress tracking (method level + service level)
- Multiple progress updates for same operation
- Progress logging adds overhead
## Claude's Architecture (From Concept Documents)
### Claude's Flow
```
User Input
Input Reception & Analysis [AI Call #1: Semantic Understanding]
├─→ AI understands intent semantically (not regex/keyword matching)
├─→ Detects patterns like "write a document" → create docx
├─→ Detects "continue our conversation" → use past chats tool
├─→ Multi-language support (semantic, not pattern-based)
└─→ Categorizes request complexity
Understanding + Execution [Combined AI Call]
├─→ Simple requests: 1 AI call that understands AND executes
│ └─→ AI generates content directly, no separate parameter generation
├─→ Moderate requests: 1-2 AI calls total
└─→ Complex requests: 5-20 AI calls (iterative research + generation)
Tool Selection [Part of AI understanding, not separate call]
├─→ AI understands which tool to use as part of intent analysis
└─→ Direct tool execution (no separate parameter generation call)
Execution [Direct tool calls]
├─→ web_search → Direct API call
├─→ create_file → Direct file creation (no rendering pipeline)
└─→ bash_tool → Direct command execution
Output [Minimal formatting]
├─→ Text: Direct return
├─→ Files: Copy to output directory (no JSON → render pipeline)
└─→ Code: Direct render
```
### Key Differences
1. **Semantic AI Understanding**: Claude uses AI for pattern matching, but it's semantic understanding (not regex). The AI understands "write a document" means create docx, regardless of language.
2. **Combined AI Calls**: Instead of separate calls for plan → select → parameters → generate, Claude makes 1 AI call that understands intent AND generates output
3. **No Separate Parameter Generation**: When AI understands "create a text file with Hello World", it directly generates the content - no separate parameter extraction step
4. **Progressive Complexity**: Simple = 1 AI call (understand + execute), Complex = 5-20 AI calls (iterative)
5. **No Rendering Pipeline**: Files are created directly from AI output, not rendered from JSON structure
6. **Streaming Output**: Results shown as they're generated
## Comparison Table
| Aspect | Current System | Claude's Approach | Impact |
|--------|---------------|-------------------|---------|
| **Simple Request AI Calls** | 8-60+ calls (sequential) | 1-2 calls (combined) | **40x overhead** |
| **Action Selection** | Separate AI call (30s) | Part of understanding call | **30s saved** |
| **Parameter Generation** | Separate AI call (30s) | Combined with generation | **30s saved** |
| **Prompt Analysis** | Separate AI call (10s) | Part of understanding call | **10s saved** |
| **Document Generation** | Looping (up to 50 iterations) | Single-shot for simple | **Variable** |
| **Rendering** | Post-generation pipeline | Direct file creation | **Slow for small docs** |
| **Content Validation** | Always separate AI call | Optional/combined | **30s saved** |
| **Refinement Decision** | Always separate AI call | Combined with understanding | **30s saved** |
## Root Causes
### 1. Over-Complication
**Root Cause**: The system makes separate AI calls for each step (plan → select → parameters → generate → validate → refine), even when a single AI call could understand intent AND execute.
**Solution**: Combine AI calls for efficiency:
- **Single AI Call for Simple Requests**: One call that understands intent AND generates output (like Claude)
- **Combined Understanding**: Merge action selection + parameter generation into the generation call
- **Skip Mechanical Steps**: Don't make separate AI calls for steps that can be inferred from the main understanding
### 2. Slow Rendering
**Root Cause**: Rendering happens as a separate phase AFTER AI generation, using complex renderers even for simple formats.
**Solution**:
- For simple formats (TXT, MD): Return directly from AI, no rendering
- For complex formats (DOCX, PDF): Use lightweight renderers for small documents
- Implement streaming rendering for large documents
- Cache renderer instances
## Recommendations
### Immediate Fixes (High Impact, Low Effort)
1. **Combine AI Calls for Simple Requests**
- **Key Insight**: Claude uses AI for semantic understanding, but combines understanding + execution
- Merge action selection + parameter generation into the main generation call
- Use one AI call that understands intent AND generates output (not separate calls)
- Skip separate refinement decision if output is simple (check in same call)
2. **Optimize Rendering**
- For TXT/MD: Return AI output directly, no rendering
- For small documents (<10KB): Use lightweight renderers
- Cache renderer instances
3. **Reduce Iteration Looping**
- For simple requests: Single-shot AI call (no looping)
- Only use looping for complex/long documents
### Medium-Term Improvements
1. **Request Complexity Detection**
- Add complexity analyzer (pattern-based, not AI-based)
- Route to appropriate workflow path
2. **Streaming Output**
- Stream AI responses as they're generated
- Progressive rendering for large documents
3. **Direct Tool Execution**
- For simple actions: Skip parameter generation AI call
- Use default parameters or pattern-based parameter extraction
### Long-Term Architecture Changes
1. **Unified AI Call Interface**
- Single entry point with complexity-aware routing
- Automatic optimization based on request type
2. **Progressive Enhancement**
- Start with simple execution
- Add complexity only if needed (validation fails, user requests refinement)
3. **Renderer Optimization**
- Lazy rendering (only when needed)
- Format-specific optimizations
- Parallel rendering for multiple documents
## Implementation Priority
1. **P0 (Critical)**: Skip unnecessary AI calls for simple requests
2. **P0 (Critical)**: Optimize rendering for simple formats
3. **P1 (High)**: Reduce iteration looping for simple requests
4. **P1 (High)**: Add request complexity detection
5. **P2 (Medium)**: Implement streaming output
6. **P3 (Low)**: Long-term architecture refactoring
## Metrics to Track
- **AI Calls per Request**: Target <2 for simple requests
- **Rendering Time**: Target <1s for simple documents
- **Total Request Time**: Target <5s for simple requests
- **User Satisfaction**: Measure via feedback