290 lines
12 KiB
Markdown
290 lines
12 KiB
Markdown
# AI Call Flow Architecture Analysis
|
|
|
|
## Executive Summary
|
|
|
|
This document analyzes the current AI call flow in the workflow system and compares it with Claude's approach to identify weaknesses causing:
|
|
1. **Over-complication for simple requests**
|
|
2. **Slow rendering for small documents**
|
|
|
|
## Current Architecture Flow
|
|
|
|
### Complete Flow for `ai.process` Action
|
|
|
|
```
|
|
User Request
|
|
↓
|
|
workflowProcessor.generateTaskPlan()
|
|
├─→ modeDynamic.generateTaskPlan() [AI Call #1: Task Planning]
|
|
└─→ Creates TaskPlan with TaskSteps
|
|
↓
|
|
workflowProcessor.executeTask()
|
|
├─→ modeDynamic.executeTask()
|
|
│ ├─→ _planSelect() [AI Call #2: Action Selection]
|
|
│ │ └─→ generateDynamicPlanSelectionPrompt()
|
|
│ │ └─→ callAiPlanning() [~30s, DETAILED mode]
|
|
│ │
|
|
│ ├─→ _actExecute()
|
|
│ │ ├─→ generateDynamicParametersPrompt() [AI Call #3: Parameter Generation]
|
|
│ │ │ └─→ callAiPlanning() [~30s, DETAILED mode]
|
|
│ │ │
|
|
│ │ ├─→ actionExecutor.executeSingleAction()
|
|
│ │ │ └─→ methodAi.process()
|
|
│ │ │ ├─→ progressLogStart()
|
|
│ │ │ ├─→ getChatDocumentsFromDocumentList() [Document Loading]
|
|
│ │ │ ├─→ _analyzePromptAndCreateOptions() [AI Call #4: Prompt Analysis]
|
|
│ │ │ │ └─→ callAiPlanning() [~10s, BASIC mode]
|
|
│ │ │ │
|
|
│ │ │ ├─→ callAiDocuments()
|
|
│ │ │ │ ├─→ progressLogStart() [Nested progress tracking]
|
|
│ │ │ │ ├─→ callAiText() [if documents exist]
|
|
│ │ │ │ │ └─→ extractionService.processDocumentsPerChunk()
|
|
│ │ │ │ │ └─→ Multiple AI calls per chunk [AI Call #5-N]
|
|
│ │ │ │ │
|
|
│ │ │ │ ├─→ buildGenerationPrompt() [Complex JSON template]
|
|
│ │ │ │ ├─→ _callAiWithLooping()
|
|
│ │ │ │ │ ├─→ AI Call [AI Call #6: First iteration]
|
|
│ │ │ │ │ ├─→ Check complete_response flag
|
|
│ │ │ │ │ ├─→ Extract sections
|
|
│ │ │ │ │ ├─→ Repair broken JSON if needed
|
|
│ │ │ │ │ └─→ Loop up to 50 iterations [AI Call #6-55]
|
|
│ │ │ │ │
|
|
│ │ │ │ ├─→ Parse generated JSON
|
|
│ │ │ │ ├─→ generationService.renderReport() [RENDERING PHASE]
|
|
│ │ │ │ │ ├─→ _getFormatRenderer()
|
|
│ │ │ │ │ ├─→ renderer.render() [Format-specific rendering]
|
|
│ │ │ │ │ │ └─→ For DOCX: python-docx library calls
|
|
│ │ │ │ │ │ └─→ For PDF: ReportLab/other library
|
|
│ │ │ │ │ │ └─→ For HTML: Template rendering
|
|
│ │ │ │ │ └─→ Returns rendered bytes/base64
|
|
│ │ │ │ │
|
|
│ │ │ │ └─→ Build result dict
|
|
│ │ │ │
|
|
│ │ │ └─→ progressLogFinish()
|
|
│ │ │
|
|
│ │ └─→ progressLogFinish()
|
|
│ │
|
|
│ ├─→ _observeBuild()
|
|
│ │ └─→ Build Observation object
|
|
│ │
|
|
│ ├─→ contentValidator.validateContent() [AI Call #7: Content Validation]
|
|
│ │ └─→ Multiple validation checks
|
|
│ │
|
|
│ ├─→ _refineDecide() [AI Call #8: Refinement Decision]
|
|
│ │ ├─→ extractReviewContent()
|
|
│ │ ├─→ generateDynamicRefinementPrompt()
|
|
│ │ └─→ callAiPlanning() [~30s, ADVANCED mode]
|
|
│ │
|
|
│ └─→ Loop continues if decision = "continue"
|
|
│
|
|
└─→ createTaskCompletionMessage()
|
|
```
|
|
|
|
### Key Bottlenecks Identified
|
|
|
|
#### 1. **Multiple AI Calls for Simple Requests**
|
|
|
|
**Problem**: Even for a simple "generate a text file" request, the system makes:
|
|
- **AI Call #1**: Task Planning (unnecessary for simple requests)
|
|
- **AI Call #2**: Action Selection (could be deterministic)
|
|
- **AI Call #3**: Parameter Generation (overkill for simple prompts)
|
|
- **AI Call #4**: Prompt Analysis (redundant - prompt is already clear)
|
|
- **AI Call #5-N**: Document extraction per chunk (if documents exist)
|
|
- **AI Call #6-55**: Document generation with looping (up to 50 iterations!)
|
|
- **AI Call #7**: Content Validation (could be optional for simple outputs)
|
|
- **AI Call #8**: Refinement Decision (unnecessary if output is simple)
|
|
|
|
**Total**: 8-60+ AI calls for a simple request that Claude handles in 1-2 calls.
|
|
|
|
#### 2. **Complex Prompt Generation**
|
|
|
|
**Current Approach**:
|
|
- Stage 1: `generateDynamicPlanSelectionPrompt()` - Large template with many placeholders
|
|
- Stage 2: `generateDynamicParametersPrompt()` - Another large template
|
|
- Stage 3: `buildGenerationPrompt()` - Complex JSON template with sections structure
|
|
|
|
**Claude's Approach**: Direct prompt, minimal overhead.
|
|
|
|
#### 3. **Inefficient Rendering**
|
|
|
|
**Current Flow**:
|
|
```
|
|
AI generates JSON with sections
|
|
↓
|
|
Parse JSON
|
|
↓
|
|
Extract sections array
|
|
↓
|
|
Get format renderer
|
|
↓
|
|
Renderer processes sections
|
|
├─→ For DOCX: Create Document object
|
|
├─→ For each section: Add paragraph/heading
|
|
├─→ Apply formatting
|
|
├─→ Generate Table of Contents
|
|
└─→ Convert to bytes/base64
|
|
```
|
|
|
|
**Issues**:
|
|
- Rendering happens AFTER AI generation completes
|
|
- No streaming or progressive rendering
|
|
- Full document structure built even for simple text
|
|
- Complex renderers for simple formats (e.g., TXT rendered through DOCX pipeline)
|
|
|
|
#### 4. **Unnecessary Iteration Looping**
|
|
|
|
**Current**: `_callAiWithLooping()` loops up to 50 times:
|
|
- Checks for `complete_response` flag
|
|
- Repairs broken JSON
|
|
- Extracts sections incrementally
|
|
- Continues until complete
|
|
|
|
**For Simple Requests**: This is overkill. A simple text generation should be single-shot.
|
|
|
|
#### 5. **Redundant Progress Tracking**
|
|
|
|
- Nested progress tracking (method level + service level)
|
|
- Multiple progress updates for same operation
|
|
- Progress logging adds overhead
|
|
|
|
## Claude's Architecture (From Concept Documents)
|
|
|
|
### Claude's Flow
|
|
|
|
```
|
|
User Input
|
|
↓
|
|
Input Reception & Analysis [AI Call #1: Semantic Understanding]
|
|
├─→ AI understands intent semantically (not regex/keyword matching)
|
|
├─→ Detects patterns like "write a document" → create docx
|
|
├─→ Detects "continue our conversation" → use past chats tool
|
|
├─→ Multi-language support (semantic, not pattern-based)
|
|
└─→ Categorizes request complexity
|
|
↓
|
|
Understanding + Execution [Combined AI Call]
|
|
├─→ Simple requests: 1 AI call that understands AND executes
|
|
│ └─→ AI generates content directly, no separate parameter generation
|
|
├─→ Moderate requests: 1-2 AI calls total
|
|
└─→ Complex requests: 5-20 AI calls (iterative research + generation)
|
|
↓
|
|
Tool Selection [Part of AI understanding, not separate call]
|
|
├─→ AI understands which tool to use as part of intent analysis
|
|
└─→ Direct tool execution (no separate parameter generation call)
|
|
↓
|
|
Execution [Direct tool calls]
|
|
├─→ web_search → Direct API call
|
|
├─→ create_file → Direct file creation (no rendering pipeline)
|
|
└─→ bash_tool → Direct command execution
|
|
↓
|
|
Output [Minimal formatting]
|
|
├─→ Text: Direct return
|
|
├─→ Files: Copy to output directory (no JSON → render pipeline)
|
|
└─→ Code: Direct render
|
|
```
|
|
|
|
### Key Differences
|
|
|
|
1. **Semantic AI Understanding**: Claude uses AI for pattern matching, but it's semantic understanding (not regex). The AI understands "write a document" means create docx, regardless of language.
|
|
2. **Combined AI Calls**: Instead of separate calls for plan → select → parameters → generate, Claude makes 1 AI call that understands intent AND generates output
|
|
3. **No Separate Parameter Generation**: When AI understands "create a text file with Hello World", it directly generates the content - no separate parameter extraction step
|
|
4. **Progressive Complexity**: Simple = 1 AI call (understand + execute), Complex = 5-20 AI calls (iterative)
|
|
5. **No Rendering Pipeline**: Files are created directly from AI output, not rendered from JSON structure
|
|
6. **Streaming Output**: Results shown as they're generated
|
|
|
|
## Comparison Table
|
|
|
|
| Aspect | Current System | Claude's Approach | Impact |
|
|
|--------|---------------|-------------------|---------|
|
|
| **Simple Request AI Calls** | 8-60+ calls (sequential) | 1-2 calls (combined) | **40x overhead** |
|
|
| **Action Selection** | Separate AI call (30s) | Part of understanding call | **30s saved** |
|
|
| **Parameter Generation** | Separate AI call (30s) | Combined with generation | **30s saved** |
|
|
| **Prompt Analysis** | Separate AI call (10s) | Part of understanding call | **10s saved** |
|
|
| **Document Generation** | Looping (up to 50 iterations) | Single-shot for simple | **Variable** |
|
|
| **Rendering** | Post-generation pipeline | Direct file creation | **Slow for small docs** |
|
|
| **Content Validation** | Always separate AI call | Optional/combined | **30s saved** |
|
|
| **Refinement Decision** | Always separate AI call | Combined with understanding | **30s saved** |
|
|
|
|
## Root Causes
|
|
|
|
### 1. Over-Complication
|
|
|
|
**Root Cause**: The system makes separate AI calls for each step (plan → select → parameters → generate → validate → refine), even when a single AI call could understand intent AND execute.
|
|
|
|
**Solution**: Combine AI calls for efficiency:
|
|
- **Single AI Call for Simple Requests**: One call that understands intent AND generates output (like Claude)
|
|
- **Combined Understanding**: Merge action selection + parameter generation into the generation call
|
|
- **Skip Mechanical Steps**: Don't make separate AI calls for steps that can be inferred from the main understanding
|
|
|
|
### 2. Slow Rendering
|
|
|
|
**Root Cause**: Rendering happens as a separate phase AFTER AI generation, using complex renderers even for simple formats.
|
|
|
|
**Solution**:
|
|
- For simple formats (TXT, MD): Return directly from AI, no rendering
|
|
- For complex formats (DOCX, PDF): Use lightweight renderers for small documents
|
|
- Implement streaming rendering for large documents
|
|
- Cache renderer instances
|
|
|
|
## Recommendations
|
|
|
|
### Immediate Fixes (High Impact, Low Effort)
|
|
|
|
1. **Combine AI Calls for Simple Requests**
|
|
- **Key Insight**: Claude uses AI for semantic understanding, but combines understanding + execution
|
|
- Merge action selection + parameter generation into the main generation call
|
|
- Use one AI call that understands intent AND generates output (not separate calls)
|
|
- Skip separate refinement decision if output is simple (check in same call)
|
|
|
|
2. **Optimize Rendering**
|
|
- For TXT/MD: Return AI output directly, no rendering
|
|
- For small documents (<10KB): Use lightweight renderers
|
|
- Cache renderer instances
|
|
|
|
3. **Reduce Iteration Looping**
|
|
- For simple requests: Single-shot AI call (no looping)
|
|
- Only use looping for complex/long documents
|
|
|
|
### Medium-Term Improvements
|
|
|
|
1. **Request Complexity Detection**
|
|
- Add complexity analyzer (pattern-based, not AI-based)
|
|
- Route to appropriate workflow path
|
|
|
|
2. **Streaming Output**
|
|
- Stream AI responses as they're generated
|
|
- Progressive rendering for large documents
|
|
|
|
3. **Direct Tool Execution**
|
|
- For simple actions: Skip parameter generation AI call
|
|
- Use default parameters or pattern-based parameter extraction
|
|
|
|
### Long-Term Architecture Changes
|
|
|
|
1. **Unified AI Call Interface**
|
|
- Single entry point with complexity-aware routing
|
|
- Automatic optimization based on request type
|
|
|
|
2. **Progressive Enhancement**
|
|
- Start with simple execution
|
|
- Add complexity only if needed (validation fails, user requests refinement)
|
|
|
|
3. **Renderer Optimization**
|
|
- Lazy rendering (only when needed)
|
|
- Format-specific optimizations
|
|
- Parallel rendering for multiple documents
|
|
|
|
## Implementation Priority
|
|
|
|
1. **P0 (Critical)**: Skip unnecessary AI calls for simple requests
|
|
2. **P0 (Critical)**: Optimize rendering for simple formats
|
|
3. **P1 (High)**: Reduce iteration looping for simple requests
|
|
4. **P1 (High)**: Add request complexity detection
|
|
5. **P2 (Medium)**: Implement streaming output
|
|
6. **P3 (Low)**: Long-term architecture refactoring
|
|
|
|
## Metrics to Track
|
|
|
|
- **AI Calls per Request**: Target <2 for simple requests
|
|
- **Rendering Time**: Target <1s for simple documents
|
|
- **Total Request Time**: Target <5s for simple requests
|
|
- **User Satisfaction**: Measure via feedback
|
|
|