ValueOn AG e4225a88ea upd doc

2025-12-03 23:02:58 +01:00

12 KiB

Raw Blame History

AI Call Flow Architecture Analysis

Executive Summary

This document analyzes the current AI call flow in the workflow system and compares it with Claude's approach to identify weaknesses causing:

Over-complication for simple requests
Slow rendering for small documents

Current Architecture Flow

Complete Flow for `ai.process` Action

User Request
  ↓
workflowProcessor.generateTaskPlan()
  ├─→ modeDynamic.generateTaskPlan() [AI Call #1: Task Planning]
  └─→ Creates TaskPlan with TaskSteps
  ↓
workflowProcessor.executeTask()
  ├─→ modeDynamic.executeTask()
  │   ├─→ _planSelect() [AI Call #2: Action Selection]
  │   │   └─→ generateDynamicPlanSelectionPrompt()
  │   │       └─→ callAiPlanning() [~30s, DETAILED mode]
  │   │
  │   ├─→ _actExecute()
  │   │   ├─→ generateDynamicParametersPrompt() [AI Call #3: Parameter Generation]
  │   │   │   └─→ callAiPlanning() [~30s, DETAILED mode]
  │   │   │
  │   │   ├─→ actionExecutor.executeSingleAction()
  │   │   │   └─→ methodAi.process()
  │   │   │       ├─→ progressLogStart()
  │   │   │       ├─→ getChatDocumentsFromDocumentList() [Document Loading]
  │   │   │       ├─→ _analyzePromptAndCreateOptions() [AI Call #4: Prompt Analysis]
  │   │   │       │   └─→ callAiPlanning() [~10s, BASIC mode]
  │   │   │       │
  │   │   │       ├─→ callAiDocuments()
  │   │   │       │   ├─→ progressLogStart() [Nested progress tracking]
  │   │   │       │   ├─→ callAiText() [if documents exist]
  │   │   │       │   │   └─→ extractionService.processDocumentsPerChunk()
  │   │   │       │   │       └─→ Multiple AI calls per chunk [AI Call #5-N]
  │   │   │       │   │
  │   │   │       │   ├─→ buildGenerationPrompt() [Complex JSON template]
  │   │   │       │   ├─→ _callAiWithLooping()
  │   │   │       │   │   ├─→ AI Call [AI Call #6: First iteration]
  │   │   │       │   │   ├─→ Check complete_response flag
  │   │   │       │   │   ├─→ Extract sections
  │   │   │       │   │   ├─→ Repair broken JSON if needed
  │   │   │       │   │   └─→ Loop up to 50 iterations [AI Call #6-55]
  │   │   │       │   │
  │   │   │       │   ├─→ Parse generated JSON
  │   │   │       │   ├─→ generationService.renderReport() [RENDERING PHASE]
  │   │   │       │   │   ├─→ _getFormatRenderer()
  │   │   │       │   │   ├─→ renderer.render() [Format-specific rendering]
  │   │   │       │   │   │   └─→ For DOCX: python-docx library calls
  │   │   │       │   │   │   └─→ For PDF: ReportLab/other library
  │   │   │       │   │   │   └─→ For HTML: Template rendering
  │   │   │       │   │   └─→ Returns rendered bytes/base64
  │   │   │       │   │
  │   │   │       │   └─→ Build result dict
  │   │   │       │
  │   │   │       └─→ progressLogFinish()
  │   │   │
  │   │   └─→ progressLogFinish()
  │   │
  │   ├─→ _observeBuild()
  │   │   └─→ Build Observation object
  │   │
  │   ├─→ contentValidator.validateContent() [AI Call #7: Content Validation]
  │   │   └─→ Multiple validation checks
  │   │
  │   ├─→ _refineDecide() [AI Call #8: Refinement Decision]
  │   │   ├─→ extractReviewContent()
  │   │   ├─→ generateDynamicRefinementPrompt()
  │   │   └─→ callAiPlanning() [~30s, ADVANCED mode]
  │   │
  │   └─→ Loop continues if decision = "continue"
  │
  └─→ createTaskCompletionMessage()

Key Bottlenecks Identified

1. Multiple AI Calls for Simple Requests

Problem: Even for a simple "generate a text file" request, the system makes:

AI Call #1: Task Planning (unnecessary for simple requests)
AI Call #2: Action Selection (could be deterministic)
AI Call #3: Parameter Generation (overkill for simple prompts)
AI Call #4: Prompt Analysis (redundant - prompt is already clear)
AI Call #5-N: Document extraction per chunk (if documents exist)
AI Call #6-55: Document generation with looping (up to 50 iterations!)
AI Call #7: Content Validation (could be optional for simple outputs)
AI Call #8: Refinement Decision (unnecessary if output is simple)

Total: 8-60+ AI calls for a simple request that Claude handles in 1-2 calls.

2. Complex Prompt Generation

Current Approach:

Stage 1: generateDynamicPlanSelectionPrompt() - Large template with many placeholders
Stage 2: generateDynamicParametersPrompt() - Another large template
Stage 3: buildGenerationPrompt() - Complex JSON template with sections structure

Claude's Approach: Direct prompt, minimal overhead.

3. Inefficient Rendering

Current Flow:

AI generates JSON with sections
  ↓
Parse JSON
  ↓
Extract sections array
  ↓
Get format renderer
  ↓
Renderer processes sections
  ├─→ For DOCX: Create Document object
  ├─→ For each section: Add paragraph/heading
  ├─→ Apply formatting
  ├─→ Generate Table of Contents
  └─→ Convert to bytes/base64

Issues:

Rendering happens AFTER AI generation completes
No streaming or progressive rendering
Full document structure built even for simple text
Complex renderers for simple formats (e.g., TXT rendered through DOCX pipeline)

4. Unnecessary Iteration Looping

Current: _callAiWithLooping() loops up to 50 times:

Checks for complete_response flag
Repairs broken JSON
Extracts sections incrementally
Continues until complete

For Simple Requests: This is overkill. A simple text generation should be single-shot.

5. Redundant Progress Tracking

Nested progress tracking (method level + service level)
Multiple progress updates for same operation
Progress logging adds overhead

Claude's Architecture (From Concept Documents)

Claude's Flow

User Input
  ↓
Input Reception & Analysis [AI Call #1: Semantic Understanding]
  ├─→ AI understands intent semantically (not regex/keyword matching)
  ├─→ Detects patterns like "write a document" → create docx
  ├─→ Detects "continue our conversation" → use past chats tool
  ├─→ Multi-language support (semantic, not pattern-based)
  └─→ Categorizes request complexity
  ↓
Understanding + Execution [Combined AI Call]
  ├─→ Simple requests: 1 AI call that understands AND executes
  │   └─→ AI generates content directly, no separate parameter generation
  ├─→ Moderate requests: 1-2 AI calls total
  └─→ Complex requests: 5-20 AI calls (iterative research + generation)
  ↓
Tool Selection [Part of AI understanding, not separate call]
  ├─→ AI understands which tool to use as part of intent analysis
  └─→ Direct tool execution (no separate parameter generation call)
  ↓
Execution [Direct tool calls]
  ├─→ web_search → Direct API call
  ├─→ create_file → Direct file creation (no rendering pipeline)
  └─→ bash_tool → Direct command execution
  ↓
Output [Minimal formatting]
  ├─→ Text: Direct return
  ├─→ Files: Copy to output directory (no JSON → render pipeline)
  └─→ Code: Direct render

Key Differences

Semantic AI Understanding: Claude uses AI for pattern matching, but it's semantic understanding (not regex). The AI understands "write a document" means create docx, regardless of language.
Combined AI Calls: Instead of separate calls for plan → select → parameters → generate, Claude makes 1 AI call that understands intent AND generates output
No Separate Parameter Generation: When AI understands "create a text file with Hello World", it directly generates the content - no separate parameter extraction step
Progressive Complexity: Simple = 1 AI call (understand + execute), Complex = 5-20 AI calls (iterative)
No Rendering Pipeline: Files are created directly from AI output, not rendered from JSON structure
Streaming Output: Results shown as they're generated

Comparison Table

Aspect	Current System	Claude's Approach	Impact
Simple Request AI Calls	8-60+ calls (sequential)	1-2 calls (combined)	40x overhead
Action Selection	Separate AI call (30s)	Part of understanding call	30s saved
Parameter Generation	Separate AI call (30s)	Combined with generation	30s saved
Prompt Analysis	Separate AI call (10s)	Part of understanding call	10s saved
Document Generation	Looping (up to 50 iterations)	Single-shot for simple	Variable
Rendering	Post-generation pipeline	Direct file creation	Slow for small docs
Content Validation	Always separate AI call	Optional/combined	30s saved
Refinement Decision	Always separate AI call	Combined with understanding	30s saved

Root Causes

1. Over-Complication

Root Cause: The system makes separate AI calls for each step (plan → select → parameters → generate → validate → refine), even when a single AI call could understand intent AND execute.

Solution: Combine AI calls for efficiency:

Single AI Call for Simple Requests: One call that understands intent AND generates output (like Claude)
Combined Understanding: Merge action selection + parameter generation into the generation call
Skip Mechanical Steps: Don't make separate AI calls for steps that can be inferred from the main understanding

2. Slow Rendering

Root Cause: Rendering happens as a separate phase AFTER AI generation, using complex renderers even for simple formats.

Solution:

For simple formats (TXT, MD): Return directly from AI, no rendering
For complex formats (DOCX, PDF): Use lightweight renderers for small documents
Implement streaming rendering for large documents
Cache renderer instances

Recommendations

Immediate Fixes (High Impact, Low Effort)

Combine AI Calls for Simple Requests
- Key Insight: Claude uses AI for semantic understanding, but combines understanding + execution
- Merge action selection + parameter generation into the main generation call
- Use one AI call that understands intent AND generates output (not separate calls)
- Skip separate refinement decision if output is simple (check in same call)
Optimize Rendering
- For TXT/MD: Return AI output directly, no rendering
- For small documents (<10KB): Use lightweight renderers
- Cache renderer instances
Reduce Iteration Looping
- For simple requests: Single-shot AI call (no looping)
- Only use looping for complex/long documents

Medium-Term Improvements

Request Complexity Detection
- Add complexity analyzer (pattern-based, not AI-based)
- Route to appropriate workflow path
Streaming Output
- Stream AI responses as they're generated
- Progressive rendering for large documents
Direct Tool Execution
- For simple actions: Skip parameter generation AI call
- Use default parameters or pattern-based parameter extraction

Long-Term Architecture Changes

Unified AI Call Interface
- Single entry point with complexity-aware routing
- Automatic optimization based on request type
Progressive Enhancement
- Start with simple execution
- Add complexity only if needed (validation fails, user requests refinement)
Renderer Optimization
- Lazy rendering (only when needed)
- Format-specific optimizations
- Parallel rendering for multiple documents

Implementation Priority

P0 (Critical): Skip unnecessary AI calls for simple requests
P0 (Critical): Optimize rendering for simple formats
P1 (High): Reduce iteration looping for simple requests
P1 (High): Add request complexity detection
P2 (Medium): Implement streaming output
P3 (Low): Long-term architecture refactoring

Metrics to Track

AI Calls per Request: Target <2 for simple requests
Rendering Time: Target <1s for simple documents
Total Request Time: Target <5s for simple requests
User Satisfaction: Measure via feedback

12 KiB Raw Blame History