# Dynamic Generic AI Calls Implementation Strategy ## Overview This document outlines the implementation strategy for a robust, model-agnostic AI service that handles both planning and text processing calls with intelligent fallbacks, size management, and integration with the existing ExtractionService architecture. ## Core Principles 1. **Model Fallback Strategy**: Iterative model trying for maximum reliability 2. **Size Management**: 90% token limit with configurable safety margins 3. **Separation of Concerns**: Planning vs. text calls have different parameter sets and processing logic 4. **TypeGroup-Aware Processing**: Leverage existing chunking and merging logic based on content types 5. **Capability-Based Model Selection**: Only use models capable of handling specific operations ## Call Type Distinction ### Planning Calls **Criteria**: `no documents AND (operationType in ["generate_plan", "analyse_content"])` **Characteristics**: - Use placeholder system for selective content summarization - Prompt integrity is critical (can be protected with `compressPrompt=False`) - Placeholders can be summarized while preserving prompt structure - Examples: Task planning, action definition, validation, decision making ### Text Calls **Criteria**: `has documents OR operationType not in ["generate_plan", "analyse_content"]` **Characteristics**: - Process documents through ExtractionService - Use typeGroup-aware chunking and merging - Can process documents individually or as a group - Examples: Document analysis, content generation, format conversion ## Enhanced Data Models ### AiCallOptions ```python class AiCallOptions(BaseModel): # Existing fields operationType: OperationType priority: Priority = Priority.BALANCED compressPrompt: bool = True compressContext: bool = True processDocumentsIndividually: bool = False maxContextBytes: Optional[int] = None # New fields for dynamic strategy callType: Literal["planning", "text"] = Field(default_factory=lambda: "text") safetyMargin: float = Field(default=0.1, ge=0.0, le=0.5) modelCapabilities: Optional[List[str]] = None # e.g., ["text", "image", "vision"] ``` ### Model Capabilities ```python class ModelCapabilities(BaseModel): name: str maxTokens: int capabilities: List[str] # ["text", "image", "vision", "reasoning", "analysis"] costPerToken: float processingTime: float isAvailable: bool = True ``` ## Implementation Architecture ### 1. Unified AI Call Interface ```python async def callAi( self, prompt: str, documents: Optional[List[ChatDocument]] = None, placeholders: Optional[Dict[str, str]] = None, options: AiCallOptions ) -> str: """ Unified AI call interface that automatically routes to appropriate handler. Args: prompt: The main prompt for the AI call documents: Optional list of documents to process placeholders: Optional dictionary of placeholder replacements for planning calls options: AI call configuration options Returns: AI response as string Raises: Exception: If all available models fail """ # Auto-determine call type based on documents and operation type call_type = self._determineCallType(documents, options.operationType) if call_type == "planning": return await self._callAiPlanning(prompt, placeholders, options) else: return await self._callAiText(prompt, documents, options) ``` ### 2. Planning Call Implementation ```python async def _callAiPlanning( self, prompt: str, placeholders: Optional[Dict[str, str]], options: AiCallOptions ) -> str: """ Handle planning calls with placeholder system and selective summarization. Process: 1. Get models capable of planning operations 2. Build full prompt with placeholders 3. Check token limits and reduce if needed 4. Try each model until one succeeds """ # Get available models for planning (text + reasoning capabilities) models = self._getModelsForOperation("planning", options) for model in models: try: # Build full prompt with placeholders full_prompt = self._buildPromptWithPlaceholders(prompt, placeholders) # Check size and reduce if needed if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin): full_prompt = self._reducePlanningPrompt(full_prompt, placeholders, model, options) # Make AI call result = await self._callModel(model, full_prompt, options) return result except Exception as e: logger.warning(f"Planning model {model.name} failed: {e}") continue raise Exception("All planning models failed - check model availability and capabilities") ``` ### 3. Text Call Implementation ```python async def _callAiText( self, prompt: str, documents: Optional[List[ChatDocument]], options: AiCallOptions ) -> str: """ Handle text calls with document processing through ExtractionService. Process: 1. Get models capable of text operations 2. Extract and process documents using ExtractionService 3. Check token limits and reduce if needed 4. Try each model until one succeeds """ # Get available models for text processing models = self._getModelsForOperation("text", options) for model in models: try: # Extract and process documents using ExtractionService context = "" if documents: extracted_content = await self.extractionService.extractDocuments( documentList=[{ "id": d.id, "bytes": d.fileData, "fileName": d.fileName, "mimeType": d.mimeType } for d in documents], options={ "prompt": prompt, "operationType": options.operationType.value, "processDocumentsIndividually": options.processDocumentsIndividually, "maxSize": options.maxContextBytes or int(model.maxTokens * 0.9), "chunkAllowed": not options.compressContext, "mergeStrategy": {"groupBy": "typeGroup"} } ) # Get text content from extracted parts using typeGroup-aware processing context = self._extractTextFromContentParts(extracted_content) # Check size and reduce if needed full_prompt = prompt + "\n\n" + context if context else prompt if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin): full_prompt = self._reduceTextPrompt(prompt, context, model, options) # Make AI call result = await self._callModel(model, full_prompt, options) return result except Exception as e: logger.warning(f"Text model {model.name} failed: {e}") continue raise Exception("All text models failed - check model availability and capabilities") ``` ### 4. Model Selection Strategy ```python def _getModelsForOperation(self, operation_type: str, options: AiCallOptions) -> List[Model]: """ Get models capable of handling the specific operation with capability filtering. Args: operation_type: "planning" or "text" options: AI call options including required capabilities Returns: List of models sorted by priority and capability match """ all_models = self._getAvailableModels() # Filter by operation type capabilities if operation_type == "planning": capable_models = [m for m in all_models if "text" in m.capabilities and "reasoning" in m.capabilities] elif operation_type == "text": capable_models = [m for m in all_models if "text" in m.capabilities] else: capable_models = all_models # Filter by specific capabilities if requested if options.modelCapabilities: capable_models = [m for m in capable_models if all(cap in m.capabilities for cap in options.modelCapabilities)] # Sort by priority preference (quality > balanced > speed > cost) return self._sortModelsByPriority(capable_models, options.priority) ``` ### 5. Size Management with TypeGroup-Aware Chunking ```python def _reduceTextPrompt( self, prompt: str, context: str, model: Model, options: AiCallOptions ) -> str: """ Reduce text prompt size using typeGroup-aware chunking and merging. Args: prompt: Original prompt context: Extracted document context model: Target model with token limits options: AI call options Returns: Reduced prompt that fits within token limits """ max_size = int(model.maxTokens * (1 - options.safetyMargin)) if options.compressPrompt: # Reduce both prompt and context target_size = max_size current_size = len(prompt) + len(context) reduction_factor = (target_size * 0.7) / current_size if reduction_factor < 1.0: prompt = self._reduceText(prompt, reduction_factor) context = self._reduceTextWithTypeGroups(context, reduction_factor, options) else: # Only reduce context, preserve prompt integrity max_context_size = max_size - len(prompt) if len(context) > max_context_size: reduction_factor = max_context_size / len(context) context = self._reduceTextWithTypeGroups(context, reduction_factor, options) return prompt + "\n\n" + context if context else prompt def _reduceTextWithTypeGroups( self, context: str, reduction_factor: float, options: AiCallOptions ) -> str: """ Reduce text using typeGroup-aware chunking and merging strategies. Leverages existing chunking/merging modules: - text_chunker.py / text_merger.py - table_chunker.py / table_merger.py - structure_chunker.py / default_merger.py """ if options.compressContext: # Summarize content using AI return await self._summarizeContent(context, reduction_factor) else: # Chunk content using typeGroup-aware chunkers return await self._chunkContent(context, reduction_factor, options) ``` ## Integration Points ### 1. ExtractionService Integration - Use `extractionService.extractDocuments()` for all document processing - Leverage existing 3-pass pipeline (Extract → Chunk → Merge) - Utilize typeGroup-based processing for different content types ### 2. Existing Chunking/Merging Logic - **Text Content**: `text_chunker.py` / `text_merger.py` - **Table Content**: `table_chunker.py` / `table_merger.py` - **Structured Content**: `structure_chunker.py` / `default_merger.py` ### 3. Model Capability Management - Maintain model capability registry - Filter models based on operation requirements - Support dynamic model availability ## Error Handling Strategy ### Model Failure Handling 1. **Individual Model Failure**: Log warning, try next model 2. **All Models Failed**: Return error with diagnostic information including: - List of attempted models - Failure reasons for each model - Suggested alternatives or parameter adjustments ### Size Management Failures 1. **Token Limit Exceeded**: Apply reduction strategies 2. **Reduction Failed**: Fall back to emergency chunking 3. **Critical Content Lost**: Return error with size analysis ## Configuration and Tuning ### Safety Margins - **Default**: 10% safety margin (0.1) - **Configurable**: Per-call basis via `AiCallOptions.safetyMargin` - **Range**: 0.0 to 0.5 (0% to 50% safety margin) ### Model Selection Priority 1. **Quality**: Best model for accuracy 2. **Balanced**: Good balance of speed and quality 3. **Speed**: Fastest available model 4. **Cost**: Most cost-effective model ### Size Reduction Strategies - **Prompt Compression**: When `compressPrompt=True` - **Context Summarization**: When `compressContext=True` - **Document Chunking**: When `processDocumentsIndividually=True` ## Migration Strategy ### Phase 1: Enhanced AiCallOptions - Add new fields to `AiCallOptions` model - Update existing AI calls to use new options ### Phase 2: Unified Interface - Implement `callAi()` as unified entry point - Maintain backward compatibility with existing `callAiText()` ### Phase 3: Model Management - Implement model capability registry - Add model selection and fallback logic ### Phase 4: Size Management - Integrate with ExtractionService - Implement typeGroup-aware reduction strategies ### Phase 5: Full Migration - Migrate all AI calls to use unified interface - Remove legacy AI call methods ## Benefits 1. **Reliability**: Multiple model fallbacks ensure high success rate 2. **Efficiency**: Intelligent size management prevents token limit issues 3. **Flexibility**: TypeGroup-aware processing handles diverse content types 4. **Maintainability**: Centralized logic reduces code duplication 5. **Scalability**: Easy to add new models and capabilities 6. **Integration**: Seamless integration with existing ExtractionService ## Future Enhancements 1. **Dynamic Model Loading**: Load models on-demand based on requirements 2. **Performance Monitoring**: Track model performance and optimize selection 3. **Cost Optimization**: Balance quality vs. cost based on use case 4. **Caching**: Cache processed content for repeated operations 5. **Streaming**: Support for streaming responses from models