wiki/z-archive/implementation/implementation_dynamic_generic_ai_calls.md

14 KiB

Dynamic Generic AI Calls Implementation Strategy

Overview

This document outlines the implementation strategy for a robust, model-agnostic AI service that handles both planning and text processing calls with intelligent fallbacks, size management, and integration with the existing ExtractionService architecture.

Core Principles

  1. Model Fallback Strategy: Iterative model trying for maximum reliability
  2. Size Management: 90% token limit with configurable safety margins
  3. Separation of Concerns: Planning vs. text calls have different parameter sets and processing logic
  4. TypeGroup-Aware Processing: Leverage existing chunking and merging logic based on content types
  5. Capability-Based Model Selection: Only use models capable of handling specific operations

Call Type Distinction

Planning Calls

Criteria: no documents AND (operationType in ["generate_plan", "analyse_content"])

Characteristics:

  • Use placeholder system for selective content summarization
  • Prompt integrity is critical (can be protected with compressPrompt=False)
  • Placeholders can be summarized while preserving prompt structure
  • Examples: Task planning, action definition, validation, decision making

Text Calls

Criteria: has documents OR operationType not in ["generate_plan", "analyse_content"]

Characteristics:

  • Process documents through ExtractionService
  • Use typeGroup-aware chunking and merging
  • Can process documents individually or as a group
  • Examples: Document analysis, content generation, format conversion

Enhanced Data Models

AiCallOptions

class AiCallOptions(BaseModel):
    # Existing fields
    operationType: OperationType
    priority: Priority = Priority.BALANCED
    compressPrompt: bool = True
    compressContext: bool = True
    processDocumentsIndividually: bool = False
    maxContextBytes: Optional[int] = None
    
    # New fields for dynamic strategy
    callType: Literal["planning", "text"] = Field(default_factory=lambda: "text")
    safetyMargin: float = Field(default=0.1, ge=0.0, le=0.5)
    modelCapabilities: Optional[List[str]] = None  # e.g., ["text", "image", "vision"]

Model Capabilities

class ModelCapabilities(BaseModel):
    name: str
    maxTokens: int
    capabilities: List[str]  # ["text", "image", "vision", "reasoning", "analysis"]
    costPerToken: float
    processingTime: float
    isAvailable: bool = True

Implementation Architecture

1. Unified AI Call Interface

async def callAi(
    self,
    prompt: str,
    documents: Optional[List[ChatDocument]] = None,
    placeholders: Optional[Dict[str, str]] = None,
    options: AiCallOptions
) -> str:
    """
    Unified AI call interface that automatically routes to appropriate handler.
    
    Args:
        prompt: The main prompt for the AI call
        documents: Optional list of documents to process
        placeholders: Optional dictionary of placeholder replacements for planning calls
        options: AI call configuration options
    
    Returns:
        AI response as string
        
    Raises:
        Exception: If all available models fail
    """
    # Auto-determine call type based on documents and operation type
    call_type = self._determineCallType(documents, options.operationType)
    
    if call_type == "planning":
        return await self._callAiPlanning(prompt, placeholders, options)
    else:
        return await self._callAiText(prompt, documents, options)

2. Planning Call Implementation

async def _callAiPlanning(
    self,
    prompt: str,
    placeholders: Optional[Dict[str, str]],
    options: AiCallOptions
) -> str:
    """
    Handle planning calls with placeholder system and selective summarization.
    
    Process:
    1. Get models capable of planning operations
    2. Build full prompt with placeholders
    3. Check token limits and reduce if needed
    4. Try each model until one succeeds
    """
    # Get available models for planning (text + reasoning capabilities)
    models = self._getModelsForOperation("planning", options)
    
    for model in models:
        try:
            # Build full prompt with placeholders
            full_prompt = self._buildPromptWithPlaceholders(prompt, placeholders)
            
            # Check size and reduce if needed
            if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
                full_prompt = self._reducePlanningPrompt(full_prompt, placeholders, model, options)
            
            # Make AI call
            result = await self._callModel(model, full_prompt, options)
            return result
            
        except Exception as e:
            logger.warning(f"Planning model {model.name} failed: {e}")
            continue
    
    raise Exception("All planning models failed - check model availability and capabilities")

3. Text Call Implementation

async def _callAiText(
    self,
    prompt: str,
    documents: Optional[List[ChatDocument]],
    options: AiCallOptions
) -> str:
    """
    Handle text calls with document processing through ExtractionService.
    
    Process:
    1. Get models capable of text operations
    2. Extract and process documents using ExtractionService
    3. Check token limits and reduce if needed
    4. Try each model until one succeeds
    """
    # Get available models for text processing
    models = self._getModelsForOperation("text", options)
    
    for model in models:
        try:
            # Extract and process documents using ExtractionService
            context = ""
            if documents:
                extracted_content = await self.extractionService.extractDocuments(
                    documentList=[{
                        "id": d.id,
                        "bytes": d.fileData,
                        "fileName": d.fileName,
                        "mimeType": d.mimeType
                    } for d in documents],
                    options={
                        "prompt": prompt,
                        "operationType": options.operationType.value,
                        "processDocumentsIndividually": options.processDocumentsIndividually,
                        "maxSize": options.maxContextBytes or int(model.maxTokens * 0.9),
                        "chunkAllowed": not options.compressContext,
                        "mergeStrategy": {"groupBy": "typeGroup"}
                    }
                )
                
                # Get text content from extracted parts using typeGroup-aware processing
                context = self._extractTextFromContentParts(extracted_content)
            
            # Check size and reduce if needed
            full_prompt = prompt + "\n\n" + context if context else prompt
            if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
                full_prompt = self._reduceTextPrompt(prompt, context, model, options)
            
            # Make AI call
            result = await self._callModel(model, full_prompt, options)
            return result
            
        except Exception as e:
            logger.warning(f"Text model {model.name} failed: {e}")
            continue
    
    raise Exception("All text models failed - check model availability and capabilities")

4. Model Selection Strategy

def _getModelsForOperation(self, operation_type: str, options: AiCallOptions) -> List[Model]:
    """
    Get models capable of handling the specific operation with capability filtering.
    
    Args:
        operation_type: "planning" or "text"
        options: AI call options including required capabilities
    
    Returns:
        List of models sorted by priority and capability match
    """
    all_models = self._getAvailableModels()
    
    # Filter by operation type capabilities
    if operation_type == "planning":
        capable_models = [m for m in all_models 
                         if "text" in m.capabilities and "reasoning" in m.capabilities]
    elif operation_type == "text":
        capable_models = [m for m in all_models if "text" in m.capabilities]
    else:
        capable_models = all_models
    
    # Filter by specific capabilities if requested
    if options.modelCapabilities:
        capable_models = [m for m in capable_models 
                         if all(cap in m.capabilities for cap in options.modelCapabilities)]
    
    # Sort by priority preference (quality > balanced > speed > cost)
    return self._sortModelsByPriority(capable_models, options.priority)

5. Size Management with TypeGroup-Aware Chunking

def _reduceTextPrompt(
    self,
    prompt: str,
    context: str,
    model: Model,
    options: AiCallOptions
) -> str:
    """
    Reduce text prompt size using typeGroup-aware chunking and merging.
    
    Args:
        prompt: Original prompt
        context: Extracted document context
        model: Target model with token limits
        options: AI call options
    
    Returns:
        Reduced prompt that fits within token limits
    """
    max_size = int(model.maxTokens * (1 - options.safetyMargin))
    
    if options.compressPrompt:
        # Reduce both prompt and context
        target_size = max_size
        current_size = len(prompt) + len(context)
        reduction_factor = (target_size * 0.7) / current_size
        
        if reduction_factor < 1.0:
            prompt = self._reduceText(prompt, reduction_factor)
            context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
    else:
        # Only reduce context, preserve prompt integrity
        max_context_size = max_size - len(prompt)
        if len(context) > max_context_size:
            reduction_factor = max_context_size / len(context)
            context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
    
    return prompt + "\n\n" + context if context else prompt

def _reduceTextWithTypeGroups(
    self,
    context: str,
    reduction_factor: float,
    options: AiCallOptions
) -> str:
    """
    Reduce text using typeGroup-aware chunking and merging strategies.
    
    Leverages existing chunking/merging modules:
    - text_chunker.py / text_merger.py
    - table_chunker.py / table_merger.py  
    - structure_chunker.py / default_merger.py
    """
    if options.compressContext:
        # Summarize content using AI
        return await self._summarizeContent(context, reduction_factor)
    else:
        # Chunk content using typeGroup-aware chunkers
        return await self._chunkContent(context, reduction_factor, options)

Integration Points

1. ExtractionService Integration

  • Use extractionService.extractDocuments() for all document processing
  • Leverage existing 3-pass pipeline (Extract → Chunk → Merge)
  • Utilize typeGroup-based processing for different content types

2. Existing Chunking/Merging Logic

  • Text Content: text_chunker.py / text_merger.py
  • Table Content: table_chunker.py / table_merger.py
  • Structured Content: structure_chunker.py / default_merger.py

3. Model Capability Management

  • Maintain model capability registry
  • Filter models based on operation requirements
  • Support dynamic model availability

Error Handling Strategy

Model Failure Handling

  1. Individual Model Failure: Log warning, try next model
  2. All Models Failed: Return error with diagnostic information including:
    • List of attempted models
    • Failure reasons for each model
    • Suggested alternatives or parameter adjustments

Size Management Failures

  1. Token Limit Exceeded: Apply reduction strategies
  2. Reduction Failed: Fall back to emergency chunking
  3. Critical Content Lost: Return error with size analysis

Configuration and Tuning

Safety Margins

  • Default: 10% safety margin (0.1)
  • Configurable: Per-call basis via AiCallOptions.safetyMargin
  • Range: 0.0 to 0.5 (0% to 50% safety margin)

Model Selection Priority

  1. Quality: Best model for accuracy
  2. Balanced: Good balance of speed and quality
  3. Speed: Fastest available model
  4. Cost: Most cost-effective model

Size Reduction Strategies

  • Prompt Compression: When compressPrompt=True
  • Context Summarization: When compressContext=True
  • Document Chunking: When processDocumentsIndividually=True

Migration Strategy

Phase 1: Enhanced AiCallOptions

  • Add new fields to AiCallOptions model
  • Update existing AI calls to use new options

Phase 2: Unified Interface

  • Implement callAi() as unified entry point
  • Maintain backward compatibility with existing callAiText()

Phase 3: Model Management

  • Implement model capability registry
  • Add model selection and fallback logic

Phase 4: Size Management

  • Integrate with ExtractionService
  • Implement typeGroup-aware reduction strategies

Phase 5: Full Migration

  • Migrate all AI calls to use unified interface
  • Remove legacy AI call methods

Benefits

  1. Reliability: Multiple model fallbacks ensure high success rate
  2. Efficiency: Intelligent size management prevents token limit issues
  3. Flexibility: TypeGroup-aware processing handles diverse content types
  4. Maintainability: Centralized logic reduces code duplication
  5. Scalability: Easy to add new models and capabilities
  6. Integration: Seamless integration with existing ExtractionService

Future Enhancements

  1. Dynamic Model Loading: Load models on-demand based on requirements
  2. Performance Monitoring: Track model performance and optimize selection
  3. Cost Optimization: Balance quality vs. cost based on use case
  4. Caching: Cache processed content for repeated operations
  5. Streaming: Support for streaming responses from models