wiki/appdoc/ai_call_flow_analysis.md
2025-12-03 23:02:58 +01:00

12 KiB

AI Call Flow Architecture Analysis

Executive Summary

This document analyzes the current AI call flow in the workflow system and compares it with Claude's approach to identify weaknesses causing:

  1. Over-complication for simple requests
  2. Slow rendering for small documents

Current Architecture Flow

Complete Flow for ai.process Action

User Request
  ↓
workflowProcessor.generateTaskPlan()
  ├─→ modeDynamic.generateTaskPlan() [AI Call #1: Task Planning]
  └─→ Creates TaskPlan with TaskSteps
  ↓
workflowProcessor.executeTask()
  ├─→ modeDynamic.executeTask()
  │   ├─→ _planSelect() [AI Call #2: Action Selection]
  │   │   └─→ generateDynamicPlanSelectionPrompt()
  │   │       └─→ callAiPlanning() [~30s, DETAILED mode]
  │   │
  │   ├─→ _actExecute()
  │   │   ├─→ generateDynamicParametersPrompt() [AI Call #3: Parameter Generation]
  │   │   │   └─→ callAiPlanning() [~30s, DETAILED mode]
  │   │   │
  │   │   ├─→ actionExecutor.executeSingleAction()
  │   │   │   └─→ methodAi.process()
  │   │   │       ├─→ progressLogStart()
  │   │   │       ├─→ getChatDocumentsFromDocumentList() [Document Loading]
  │   │   │       ├─→ _analyzePromptAndCreateOptions() [AI Call #4: Prompt Analysis]
  │   │   │       │   └─→ callAiPlanning() [~10s, BASIC mode]
  │   │   │       │
  │   │   │       ├─→ callAiDocuments()
  │   │   │       │   ├─→ progressLogStart() [Nested progress tracking]
  │   │   │       │   ├─→ callAiText() [if documents exist]
  │   │   │       │   │   └─→ extractionService.processDocumentsPerChunk()
  │   │   │       │   │       └─→ Multiple AI calls per chunk [AI Call #5-N]
  │   │   │       │   │
  │   │   │       │   ├─→ buildGenerationPrompt() [Complex JSON template]
  │   │   │       │   ├─→ _callAiWithLooping()
  │   │   │       │   │   ├─→ AI Call [AI Call #6: First iteration]
  │   │   │       │   │   ├─→ Check complete_response flag
  │   │   │       │   │   ├─→ Extract sections
  │   │   │       │   │   ├─→ Repair broken JSON if needed
  │   │   │       │   │   └─→ Loop up to 50 iterations [AI Call #6-55]
  │   │   │       │   │
  │   │   │       │   ├─→ Parse generated JSON
  │   │   │       │   ├─→ generationService.renderReport() [RENDERING PHASE]
  │   │   │       │   │   ├─→ _getFormatRenderer()
  │   │   │       │   │   ├─→ renderer.render() [Format-specific rendering]
  │   │   │       │   │   │   └─→ For DOCX: python-docx library calls
  │   │   │       │   │   │   └─→ For PDF: ReportLab/other library
  │   │   │       │   │   │   └─→ For HTML: Template rendering
  │   │   │       │   │   └─→ Returns rendered bytes/base64
  │   │   │       │   │
  │   │   │       │   └─→ Build result dict
  │   │   │       │
  │   │   │       └─→ progressLogFinish()
  │   │   │
  │   │   └─→ progressLogFinish()
  │   │
  │   ├─→ _observeBuild()
  │   │   └─→ Build Observation object
  │   │
  │   ├─→ contentValidator.validateContent() [AI Call #7: Content Validation]
  │   │   └─→ Multiple validation checks
  │   │
  │   ├─→ _refineDecide() [AI Call #8: Refinement Decision]
  │   │   ├─→ extractReviewContent()
  │   │   ├─→ generateDynamicRefinementPrompt()
  │   │   └─→ callAiPlanning() [~30s, ADVANCED mode]
  │   │
  │   └─→ Loop continues if decision = "continue"
  │
  └─→ createTaskCompletionMessage()

Key Bottlenecks Identified

1. Multiple AI Calls for Simple Requests

Problem: Even for a simple "generate a text file" request, the system makes:

  • AI Call #1: Task Planning (unnecessary for simple requests)
  • AI Call #2: Action Selection (could be deterministic)
  • AI Call #3: Parameter Generation (overkill for simple prompts)
  • AI Call #4: Prompt Analysis (redundant - prompt is already clear)
  • AI Call #5-N: Document extraction per chunk (if documents exist)
  • AI Call #6-55: Document generation with looping (up to 50 iterations!)
  • AI Call #7: Content Validation (could be optional for simple outputs)
  • AI Call #8: Refinement Decision (unnecessary if output is simple)

Total: 8-60+ AI calls for a simple request that Claude handles in 1-2 calls.

2. Complex Prompt Generation

Current Approach:

  • Stage 1: generateDynamicPlanSelectionPrompt() - Large template with many placeholders
  • Stage 2: generateDynamicParametersPrompt() - Another large template
  • Stage 3: buildGenerationPrompt() - Complex JSON template with sections structure

Claude's Approach: Direct prompt, minimal overhead.

3. Inefficient Rendering

Current Flow:

AI generates JSON with sections
  ↓
Parse JSON
  ↓
Extract sections array
  ↓
Get format renderer
  ↓
Renderer processes sections
  ├─→ For DOCX: Create Document object
  ├─→ For each section: Add paragraph/heading
  ├─→ Apply formatting
  ├─→ Generate Table of Contents
  └─→ Convert to bytes/base64

Issues:

  • Rendering happens AFTER AI generation completes
  • No streaming or progressive rendering
  • Full document structure built even for simple text
  • Complex renderers for simple formats (e.g., TXT rendered through DOCX pipeline)

4. Unnecessary Iteration Looping

Current: _callAiWithLooping() loops up to 50 times:

  • Checks for complete_response flag
  • Repairs broken JSON
  • Extracts sections incrementally
  • Continues until complete

For Simple Requests: This is overkill. A simple text generation should be single-shot.

5. Redundant Progress Tracking

  • Nested progress tracking (method level + service level)
  • Multiple progress updates for same operation
  • Progress logging adds overhead

Claude's Architecture (From Concept Documents)

Claude's Flow

User Input
  ↓
Input Reception & Analysis [AI Call #1: Semantic Understanding]
  ├─→ AI understands intent semantically (not regex/keyword matching)
  ├─→ Detects patterns like "write a document" → create docx
  ├─→ Detects "continue our conversation" → use past chats tool
  ├─→ Multi-language support (semantic, not pattern-based)
  └─→ Categorizes request complexity
  ↓
Understanding + Execution [Combined AI Call]
  ├─→ Simple requests: 1 AI call that understands AND executes
  │   └─→ AI generates content directly, no separate parameter generation
  ├─→ Moderate requests: 1-2 AI calls total
  └─→ Complex requests: 5-20 AI calls (iterative research + generation)
  ↓
Tool Selection [Part of AI understanding, not separate call]
  ├─→ AI understands which tool to use as part of intent analysis
  └─→ Direct tool execution (no separate parameter generation call)
  ↓
Execution [Direct tool calls]
  ├─→ web_search → Direct API call
  ├─→ create_file → Direct file creation (no rendering pipeline)
  └─→ bash_tool → Direct command execution
  ↓
Output [Minimal formatting]
  ├─→ Text: Direct return
  ├─→ Files: Copy to output directory (no JSON → render pipeline)
  └─→ Code: Direct render

Key Differences

  1. Semantic AI Understanding: Claude uses AI for pattern matching, but it's semantic understanding (not regex). The AI understands "write a document" means create docx, regardless of language.
  2. Combined AI Calls: Instead of separate calls for plan → select → parameters → generate, Claude makes 1 AI call that understands intent AND generates output
  3. No Separate Parameter Generation: When AI understands "create a text file with Hello World", it directly generates the content - no separate parameter extraction step
  4. Progressive Complexity: Simple = 1 AI call (understand + execute), Complex = 5-20 AI calls (iterative)
  5. No Rendering Pipeline: Files are created directly from AI output, not rendered from JSON structure
  6. Streaming Output: Results shown as they're generated

Comparison Table

Aspect Current System Claude's Approach Impact
Simple Request AI Calls 8-60+ calls (sequential) 1-2 calls (combined) 40x overhead
Action Selection Separate AI call (30s) Part of understanding call 30s saved
Parameter Generation Separate AI call (30s) Combined with generation 30s saved
Prompt Analysis Separate AI call (10s) Part of understanding call 10s saved
Document Generation Looping (up to 50 iterations) Single-shot for simple Variable
Rendering Post-generation pipeline Direct file creation Slow for small docs
Content Validation Always separate AI call Optional/combined 30s saved
Refinement Decision Always separate AI call Combined with understanding 30s saved

Root Causes

1. Over-Complication

Root Cause: The system makes separate AI calls for each step (plan → select → parameters → generate → validate → refine), even when a single AI call could understand intent AND execute.

Solution: Combine AI calls for efficiency:

  • Single AI Call for Simple Requests: One call that understands intent AND generates output (like Claude)
  • Combined Understanding: Merge action selection + parameter generation into the generation call
  • Skip Mechanical Steps: Don't make separate AI calls for steps that can be inferred from the main understanding

2. Slow Rendering

Root Cause: Rendering happens as a separate phase AFTER AI generation, using complex renderers even for simple formats.

Solution:

  • For simple formats (TXT, MD): Return directly from AI, no rendering
  • For complex formats (DOCX, PDF): Use lightweight renderers for small documents
  • Implement streaming rendering for large documents
  • Cache renderer instances

Recommendations

Immediate Fixes (High Impact, Low Effort)

  1. Combine AI Calls for Simple Requests

    • Key Insight: Claude uses AI for semantic understanding, but combines understanding + execution
    • Merge action selection + parameter generation into the main generation call
    • Use one AI call that understands intent AND generates output (not separate calls)
    • Skip separate refinement decision if output is simple (check in same call)
  2. Optimize Rendering

    • For TXT/MD: Return AI output directly, no rendering
    • For small documents (<10KB): Use lightweight renderers
    • Cache renderer instances
  3. Reduce Iteration Looping

    • For simple requests: Single-shot AI call (no looping)
    • Only use looping for complex/long documents

Medium-Term Improvements

  1. Request Complexity Detection

    • Add complexity analyzer (pattern-based, not AI-based)
    • Route to appropriate workflow path
  2. Streaming Output

    • Stream AI responses as they're generated
    • Progressive rendering for large documents
  3. Direct Tool Execution

    • For simple actions: Skip parameter generation AI call
    • Use default parameters or pattern-based parameter extraction

Long-Term Architecture Changes

  1. Unified AI Call Interface

    • Single entry point with complexity-aware routing
    • Automatic optimization based on request type
  2. Progressive Enhancement

    • Start with simple execution
    • Add complexity only if needed (validation fails, user requests refinement)
  3. Renderer Optimization

    • Lazy rendering (only when needed)
    • Format-specific optimizations
    • Parallel rendering for multiple documents

Implementation Priority

  1. P0 (Critical): Skip unnecessary AI calls for simple requests
  2. P0 (Critical): Optimize rendering for simple formats
  3. P1 (High): Reduce iteration looping for simple requests
  4. P1 (High): Add request complexity detection
  5. P2 (Medium): Implement streaming output
  6. P3 (Low): Long-term architecture refactoring

Metrics to Track

  • AI Calls per Request: Target <2 for simple requests
  • Rendering Time: Target <1s for simple documents
  • Total Request Time: Target <5s for simple requests
  • User Satisfaction: Measure via feedback