14 KiB
14 KiB
Dynamic Generic AI Calls Implementation Strategy
Overview
This document outlines the implementation strategy for a robust, model-agnostic AI service that handles both planning and text processing calls with intelligent fallbacks, size management, and integration with the existing ExtractionService architecture.
Core Principles
- Model Fallback Strategy: Iterative model trying for maximum reliability
- Size Management: 90% token limit with configurable safety margins
- Separation of Concerns: Planning vs. text calls have different parameter sets and processing logic
- TypeGroup-Aware Processing: Leverage existing chunking and merging logic based on content types
- Capability-Based Model Selection: Only use models capable of handling specific operations
Call Type Distinction
Planning Calls
Criteria: no documents AND (operationType in ["generate_plan", "analyse_content"])
Characteristics:
- Use placeholder system for selective content summarization
- Prompt integrity is critical (can be protected with
compressPrompt=False) - Placeholders can be summarized while preserving prompt structure
- Examples: Task planning, action definition, validation, decision making
Text Calls
Criteria: has documents OR operationType not in ["generate_plan", "analyse_content"]
Characteristics:
- Process documents through ExtractionService
- Use typeGroup-aware chunking and merging
- Can process documents individually or as a group
- Examples: Document analysis, content generation, format conversion
Enhanced Data Models
AiCallOptions
class AiCallOptions(BaseModel):
# Existing fields
operationType: OperationType
priority: Priority = Priority.BALANCED
compressPrompt: bool = True
compressContext: bool = True
processDocumentsIndividually: bool = False
maxContextBytes: Optional[int] = None
# New fields for dynamic strategy
callType: Literal["planning", "text"] = Field(default_factory=lambda: "text")
safetyMargin: float = Field(default=0.1, ge=0.0, le=0.5)
modelCapabilities: Optional[List[str]] = None # e.g., ["text", "image", "vision"]
Model Capabilities
class ModelCapabilities(BaseModel):
name: str
maxTokens: int
capabilities: List[str] # ["text", "image", "vision", "reasoning", "analysis"]
costPerToken: float
processingTime: float
isAvailable: bool = True
Implementation Architecture
1. Unified AI Call Interface
async def callAi(
self,
prompt: str,
documents: Optional[List[ChatDocument]] = None,
placeholders: Optional[Dict[str, str]] = None,
options: AiCallOptions
) -> str:
"""
Unified AI call interface that automatically routes to appropriate handler.
Args:
prompt: The main prompt for the AI call
documents: Optional list of documents to process
placeholders: Optional dictionary of placeholder replacements for planning calls
options: AI call configuration options
Returns:
AI response as string
Raises:
Exception: If all available models fail
"""
# Auto-determine call type based on documents and operation type
call_type = self._determineCallType(documents, options.operationType)
if call_type == "planning":
return await self._callAiPlanning(prompt, placeholders, options)
else:
return await self._callAiText(prompt, documents, options)
2. Planning Call Implementation
async def _callAiPlanning(
self,
prompt: str,
placeholders: Optional[Dict[str, str]],
options: AiCallOptions
) -> str:
"""
Handle planning calls with placeholder system and selective summarization.
Process:
1. Get models capable of planning operations
2. Build full prompt with placeholders
3. Check token limits and reduce if needed
4. Try each model until one succeeds
"""
# Get available models for planning (text + reasoning capabilities)
models = self._getModelsForOperation("planning", options)
for model in models:
try:
# Build full prompt with placeholders
full_prompt = self._buildPromptWithPlaceholders(prompt, placeholders)
# Check size and reduce if needed
if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
full_prompt = self._reducePlanningPrompt(full_prompt, placeholders, model, options)
# Make AI call
result = await self._callModel(model, full_prompt, options)
return result
except Exception as e:
logger.warning(f"Planning model {model.name} failed: {e}")
continue
raise Exception("All planning models failed - check model availability and capabilities")
3. Text Call Implementation
async def _callAiText(
self,
prompt: str,
documents: Optional[List[ChatDocument]],
options: AiCallOptions
) -> str:
"""
Handle text calls with document processing through ExtractionService.
Process:
1. Get models capable of text operations
2. Extract and process documents using ExtractionService
3. Check token limits and reduce if needed
4. Try each model until one succeeds
"""
# Get available models for text processing
models = self._getModelsForOperation("text", options)
for model in models:
try:
# Extract and process documents using ExtractionService
context = ""
if documents:
extracted_content = await self.extractionService.extractDocuments(
documentList=[{
"id": d.id,
"bytes": d.fileData,
"fileName": d.fileName,
"mimeType": d.mimeType
} for d in documents],
options={
"prompt": prompt,
"operationType": options.operationType.value,
"processDocumentsIndividually": options.processDocumentsIndividually,
"maxSize": options.maxContextBytes or int(model.maxTokens * 0.9),
"chunkAllowed": not options.compressContext,
"mergeStrategy": {"groupBy": "typeGroup"}
}
)
# Get text content from extracted parts using typeGroup-aware processing
context = self._extractTextFromContentParts(extracted_content)
# Check size and reduce if needed
full_prompt = prompt + "\n\n" + context if context else prompt
if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
full_prompt = self._reduceTextPrompt(prompt, context, model, options)
# Make AI call
result = await self._callModel(model, full_prompt, options)
return result
except Exception as e:
logger.warning(f"Text model {model.name} failed: {e}")
continue
raise Exception("All text models failed - check model availability and capabilities")
4. Model Selection Strategy
def _getModelsForOperation(self, operation_type: str, options: AiCallOptions) -> List[Model]:
"""
Get models capable of handling the specific operation with capability filtering.
Args:
operation_type: "planning" or "text"
options: AI call options including required capabilities
Returns:
List of models sorted by priority and capability match
"""
all_models = self._getAvailableModels()
# Filter by operation type capabilities
if operation_type == "planning":
capable_models = [m for m in all_models
if "text" in m.capabilities and "reasoning" in m.capabilities]
elif operation_type == "text":
capable_models = [m for m in all_models if "text" in m.capabilities]
else:
capable_models = all_models
# Filter by specific capabilities if requested
if options.modelCapabilities:
capable_models = [m for m in capable_models
if all(cap in m.capabilities for cap in options.modelCapabilities)]
# Sort by priority preference (quality > balanced > speed > cost)
return self._sortModelsByPriority(capable_models, options.priority)
5. Size Management with TypeGroup-Aware Chunking
def _reduceTextPrompt(
self,
prompt: str,
context: str,
model: Model,
options: AiCallOptions
) -> str:
"""
Reduce text prompt size using typeGroup-aware chunking and merging.
Args:
prompt: Original prompt
context: Extracted document context
model: Target model with token limits
options: AI call options
Returns:
Reduced prompt that fits within token limits
"""
max_size = int(model.maxTokens * (1 - options.safetyMargin))
if options.compressPrompt:
# Reduce both prompt and context
target_size = max_size
current_size = len(prompt) + len(context)
reduction_factor = (target_size * 0.7) / current_size
if reduction_factor < 1.0:
prompt = self._reduceText(prompt, reduction_factor)
context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
else:
# Only reduce context, preserve prompt integrity
max_context_size = max_size - len(prompt)
if len(context) > max_context_size:
reduction_factor = max_context_size / len(context)
context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
return prompt + "\n\n" + context if context else prompt
def _reduceTextWithTypeGroups(
self,
context: str,
reduction_factor: float,
options: AiCallOptions
) -> str:
"""
Reduce text using typeGroup-aware chunking and merging strategies.
Leverages existing chunking/merging modules:
- text_chunker.py / text_merger.py
- table_chunker.py / table_merger.py
- structure_chunker.py / default_merger.py
"""
if options.compressContext:
# Summarize content using AI
return await self._summarizeContent(context, reduction_factor)
else:
# Chunk content using typeGroup-aware chunkers
return await self._chunkContent(context, reduction_factor, options)
Integration Points
1. ExtractionService Integration
- Use
extractionService.extractDocuments()for all document processing - Leverage existing 3-pass pipeline (Extract → Chunk → Merge)
- Utilize typeGroup-based processing for different content types
2. Existing Chunking/Merging Logic
- Text Content:
text_chunker.py/text_merger.py - Table Content:
table_chunker.py/table_merger.py - Structured Content:
structure_chunker.py/default_merger.py
3. Model Capability Management
- Maintain model capability registry
- Filter models based on operation requirements
- Support dynamic model availability
Error Handling Strategy
Model Failure Handling
- Individual Model Failure: Log warning, try next model
- All Models Failed: Return error with diagnostic information including:
- List of attempted models
- Failure reasons for each model
- Suggested alternatives or parameter adjustments
Size Management Failures
- Token Limit Exceeded: Apply reduction strategies
- Reduction Failed: Fall back to emergency chunking
- Critical Content Lost: Return error with size analysis
Configuration and Tuning
Safety Margins
- Default: 10% safety margin (0.1)
- Configurable: Per-call basis via
AiCallOptions.safetyMargin - Range: 0.0 to 0.5 (0% to 50% safety margin)
Model Selection Priority
- Quality: Best model for accuracy
- Balanced: Good balance of speed and quality
- Speed: Fastest available model
- Cost: Most cost-effective model
Size Reduction Strategies
- Prompt Compression: When
compressPrompt=True - Context Summarization: When
compressContext=True - Document Chunking: When
processDocumentsIndividually=True
Migration Strategy
Phase 1: Enhanced AiCallOptions
- Add new fields to
AiCallOptionsmodel - Update existing AI calls to use new options
Phase 2: Unified Interface
- Implement
callAi()as unified entry point - Maintain backward compatibility with existing
callAiText()
Phase 3: Model Management
- Implement model capability registry
- Add model selection and fallback logic
Phase 4: Size Management
- Integrate with ExtractionService
- Implement typeGroup-aware reduction strategies
Phase 5: Full Migration
- Migrate all AI calls to use unified interface
- Remove legacy AI call methods
Benefits
- Reliability: Multiple model fallbacks ensure high success rate
- Efficiency: Intelligent size management prevents token limit issues
- Flexibility: TypeGroup-aware processing handles diverse content types
- Maintainability: Centralized logic reduces code duplication
- Scalability: Easy to add new models and capabilities
- Integration: Seamless integration with existing ExtractionService
Future Enhancements
- Dynamic Model Loading: Load models on-demand based on requirements
- Performance Monitoring: Track model performance and optimize selection
- Cost Optimization: Balance quality vs. cost based on use case
- Caching: Cache processed content for repeated operations
- Streaming: Support for streaming responses from models