388 lines
14 KiB
Markdown
388 lines
14 KiB
Markdown
# Dynamic Generic AI Calls Implementation Strategy
|
|
|
|
## Overview
|
|
|
|
This document outlines the implementation strategy for a robust, model-agnostic AI service that handles both planning and text processing calls with intelligent fallbacks, size management, and integration with the existing ExtractionService architecture.
|
|
|
|
## Core Principles
|
|
|
|
1. **Model Fallback Strategy**: Iterative model trying for maximum reliability
|
|
2. **Size Management**: 90% token limit with configurable safety margins
|
|
3. **Separation of Concerns**: Planning vs. text calls have different parameter sets and processing logic
|
|
4. **TypeGroup-Aware Processing**: Leverage existing chunking and merging logic based on content types
|
|
5. **Capability-Based Model Selection**: Only use models capable of handling specific operations
|
|
|
|
## Call Type Distinction
|
|
|
|
### Planning Calls
|
|
**Criteria**: `no documents AND (operationType in ["generate_plan", "analyse_content"])`
|
|
|
|
**Characteristics**:
|
|
- Use placeholder system for selective content summarization
|
|
- Prompt integrity is critical (can be protected with `compressPrompt=False`)
|
|
- Placeholders can be summarized while preserving prompt structure
|
|
- Examples: Task planning, action definition, validation, decision making
|
|
|
|
### Text Calls
|
|
**Criteria**: `has documents OR operationType not in ["generate_plan", "analyse_content"]`
|
|
|
|
**Characteristics**:
|
|
- Process documents through ExtractionService
|
|
- Use typeGroup-aware chunking and merging
|
|
- Can process documents individually or as a group
|
|
- Examples: Document analysis, content generation, format conversion
|
|
|
|
## Enhanced Data Models
|
|
|
|
### AiCallOptions
|
|
```python
|
|
class AiCallOptions(BaseModel):
|
|
# Existing fields
|
|
operationType: OperationType
|
|
priority: Priority = Priority.BALANCED
|
|
compressPrompt: bool = True
|
|
compressContext: bool = True
|
|
processDocumentsIndividually: bool = False
|
|
maxContextBytes: Optional[int] = None
|
|
|
|
# New fields for dynamic strategy
|
|
callType: Literal["planning", "text"] = Field(default_factory=lambda: "text")
|
|
safetyMargin: float = Field(default=0.1, ge=0.0, le=0.5)
|
|
modelCapabilities: Optional[List[str]] = None # e.g., ["text", "image", "vision"]
|
|
```
|
|
|
|
### Model Capabilities
|
|
```python
|
|
class ModelCapabilities(BaseModel):
|
|
name: str
|
|
maxTokens: int
|
|
capabilities: List[str] # ["text", "image", "vision", "reasoning", "analysis"]
|
|
costPerToken: float
|
|
processingTime: float
|
|
isAvailable: bool = True
|
|
```
|
|
|
|
## Implementation Architecture
|
|
|
|
### 1. Unified AI Call Interface
|
|
|
|
```python
|
|
async def callAi(
|
|
self,
|
|
prompt: str,
|
|
documents: Optional[List[ChatDocument]] = None,
|
|
placeholders: Optional[Dict[str, str]] = None,
|
|
options: AiCallOptions
|
|
) -> str:
|
|
"""
|
|
Unified AI call interface that automatically routes to appropriate handler.
|
|
|
|
Args:
|
|
prompt: The main prompt for the AI call
|
|
documents: Optional list of documents to process
|
|
placeholders: Optional dictionary of placeholder replacements for planning calls
|
|
options: AI call configuration options
|
|
|
|
Returns:
|
|
AI response as string
|
|
|
|
Raises:
|
|
Exception: If all available models fail
|
|
"""
|
|
# Auto-determine call type based on documents and operation type
|
|
call_type = self._determineCallType(documents, options.operationType)
|
|
|
|
if call_type == "planning":
|
|
return await self._callAiPlanning(prompt, placeholders, options)
|
|
else:
|
|
return await self._callAiText(prompt, documents, options)
|
|
```
|
|
|
|
### 2. Planning Call Implementation
|
|
|
|
```python
|
|
async def _callAiPlanning(
|
|
self,
|
|
prompt: str,
|
|
placeholders: Optional[Dict[str, str]],
|
|
options: AiCallOptions
|
|
) -> str:
|
|
"""
|
|
Handle planning calls with placeholder system and selective summarization.
|
|
|
|
Process:
|
|
1. Get models capable of planning operations
|
|
2. Build full prompt with placeholders
|
|
3. Check token limits and reduce if needed
|
|
4. Try each model until one succeeds
|
|
"""
|
|
# Get available models for planning (text + reasoning capabilities)
|
|
models = self._getModelsForOperation("planning", options)
|
|
|
|
for model in models:
|
|
try:
|
|
# Build full prompt with placeholders
|
|
full_prompt = self._buildPromptWithPlaceholders(prompt, placeholders)
|
|
|
|
# Check size and reduce if needed
|
|
if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
|
|
full_prompt = self._reducePlanningPrompt(full_prompt, placeholders, model, options)
|
|
|
|
# Make AI call
|
|
result = await self._callModel(model, full_prompt, options)
|
|
return result
|
|
|
|
except Exception as e:
|
|
logger.warning(f"Planning model {model.name} failed: {e}")
|
|
continue
|
|
|
|
raise Exception("All planning models failed - check model availability and capabilities")
|
|
```
|
|
|
|
### 3. Text Call Implementation
|
|
|
|
```python
|
|
async def _callAiText(
|
|
self,
|
|
prompt: str,
|
|
documents: Optional[List[ChatDocument]],
|
|
options: AiCallOptions
|
|
) -> str:
|
|
"""
|
|
Handle text calls with document processing through ExtractionService.
|
|
|
|
Process:
|
|
1. Get models capable of text operations
|
|
2. Extract and process documents using ExtractionService
|
|
3. Check token limits and reduce if needed
|
|
4. Try each model until one succeeds
|
|
"""
|
|
# Get available models for text processing
|
|
models = self._getModelsForOperation("text", options)
|
|
|
|
for model in models:
|
|
try:
|
|
# Extract and process documents using ExtractionService
|
|
context = ""
|
|
if documents:
|
|
extracted_content = await self.extractionService.extractDocuments(
|
|
documentList=[{
|
|
"id": d.id,
|
|
"bytes": d.fileData,
|
|
"fileName": d.fileName,
|
|
"mimeType": d.mimeType
|
|
} for d in documents],
|
|
options={
|
|
"prompt": prompt,
|
|
"operationType": options.operationType.value,
|
|
"processDocumentsIndividually": options.processDocumentsIndividually,
|
|
"maxSize": options.maxContextBytes or int(model.maxTokens * 0.9),
|
|
"chunkAllowed": not options.compressContext,
|
|
"mergeStrategy": {"groupBy": "typeGroup"}
|
|
}
|
|
)
|
|
|
|
# Get text content from extracted parts using typeGroup-aware processing
|
|
context = self._extractTextFromContentParts(extracted_content)
|
|
|
|
# Check size and reduce if needed
|
|
full_prompt = prompt + "\n\n" + context if context else prompt
|
|
if self._exceedsTokenLimit(full_prompt, model, options.safetyMargin):
|
|
full_prompt = self._reduceTextPrompt(prompt, context, model, options)
|
|
|
|
# Make AI call
|
|
result = await self._callModel(model, full_prompt, options)
|
|
return result
|
|
|
|
except Exception as e:
|
|
logger.warning(f"Text model {model.name} failed: {e}")
|
|
continue
|
|
|
|
raise Exception("All text models failed - check model availability and capabilities")
|
|
```
|
|
|
|
### 4. Model Selection Strategy
|
|
|
|
```python
|
|
def _getModelsForOperation(self, operation_type: str, options: AiCallOptions) -> List[Model]:
|
|
"""
|
|
Get models capable of handling the specific operation with capability filtering.
|
|
|
|
Args:
|
|
operation_type: "planning" or "text"
|
|
options: AI call options including required capabilities
|
|
|
|
Returns:
|
|
List of models sorted by priority and capability match
|
|
"""
|
|
all_models = self._getAvailableModels()
|
|
|
|
# Filter by operation type capabilities
|
|
if operation_type == "planning":
|
|
capable_models = [m for m in all_models
|
|
if "text" in m.capabilities and "reasoning" in m.capabilities]
|
|
elif operation_type == "text":
|
|
capable_models = [m for m in all_models if "text" in m.capabilities]
|
|
else:
|
|
capable_models = all_models
|
|
|
|
# Filter by specific capabilities if requested
|
|
if options.modelCapabilities:
|
|
capable_models = [m for m in capable_models
|
|
if all(cap in m.capabilities for cap in options.modelCapabilities)]
|
|
|
|
# Sort by priority preference (quality > balanced > speed > cost)
|
|
return self._sortModelsByPriority(capable_models, options.priority)
|
|
```
|
|
|
|
### 5. Size Management with TypeGroup-Aware Chunking
|
|
|
|
```python
|
|
def _reduceTextPrompt(
|
|
self,
|
|
prompt: str,
|
|
context: str,
|
|
model: Model,
|
|
options: AiCallOptions
|
|
) -> str:
|
|
"""
|
|
Reduce text prompt size using typeGroup-aware chunking and merging.
|
|
|
|
Args:
|
|
prompt: Original prompt
|
|
context: Extracted document context
|
|
model: Target model with token limits
|
|
options: AI call options
|
|
|
|
Returns:
|
|
Reduced prompt that fits within token limits
|
|
"""
|
|
max_size = int(model.maxTokens * (1 - options.safetyMargin))
|
|
|
|
if options.compressPrompt:
|
|
# Reduce both prompt and context
|
|
target_size = max_size
|
|
current_size = len(prompt) + len(context)
|
|
reduction_factor = (target_size * 0.7) / current_size
|
|
|
|
if reduction_factor < 1.0:
|
|
prompt = self._reduceText(prompt, reduction_factor)
|
|
context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
|
|
else:
|
|
# Only reduce context, preserve prompt integrity
|
|
max_context_size = max_size - len(prompt)
|
|
if len(context) > max_context_size:
|
|
reduction_factor = max_context_size / len(context)
|
|
context = self._reduceTextWithTypeGroups(context, reduction_factor, options)
|
|
|
|
return prompt + "\n\n" + context if context else prompt
|
|
|
|
def _reduceTextWithTypeGroups(
|
|
self,
|
|
context: str,
|
|
reduction_factor: float,
|
|
options: AiCallOptions
|
|
) -> str:
|
|
"""
|
|
Reduce text using typeGroup-aware chunking and merging strategies.
|
|
|
|
Leverages existing chunking/merging modules:
|
|
- text_chunker.py / text_merger.py
|
|
- table_chunker.py / table_merger.py
|
|
- structure_chunker.py / default_merger.py
|
|
"""
|
|
if options.compressContext:
|
|
# Summarize content using AI
|
|
return await self._summarizeContent(context, reduction_factor)
|
|
else:
|
|
# Chunk content using typeGroup-aware chunkers
|
|
return await self._chunkContent(context, reduction_factor, options)
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### 1. ExtractionService Integration
|
|
- Use `extractionService.extractDocuments()` for all document processing
|
|
- Leverage existing 3-pass pipeline (Extract → Chunk → Merge)
|
|
- Utilize typeGroup-based processing for different content types
|
|
|
|
### 2. Existing Chunking/Merging Logic
|
|
- **Text Content**: `text_chunker.py` / `text_merger.py`
|
|
- **Table Content**: `table_chunker.py` / `table_merger.py`
|
|
- **Structured Content**: `structure_chunker.py` / `default_merger.py`
|
|
|
|
### 3. Model Capability Management
|
|
- Maintain model capability registry
|
|
- Filter models based on operation requirements
|
|
- Support dynamic model availability
|
|
|
|
## Error Handling Strategy
|
|
|
|
### Model Failure Handling
|
|
1. **Individual Model Failure**: Log warning, try next model
|
|
2. **All Models Failed**: Return error with diagnostic information including:
|
|
- List of attempted models
|
|
- Failure reasons for each model
|
|
- Suggested alternatives or parameter adjustments
|
|
|
|
### Size Management Failures
|
|
1. **Token Limit Exceeded**: Apply reduction strategies
|
|
2. **Reduction Failed**: Fall back to emergency chunking
|
|
3. **Critical Content Lost**: Return error with size analysis
|
|
|
|
## Configuration and Tuning
|
|
|
|
### Safety Margins
|
|
- **Default**: 10% safety margin (0.1)
|
|
- **Configurable**: Per-call basis via `AiCallOptions.safetyMargin`
|
|
- **Range**: 0.0 to 0.5 (0% to 50% safety margin)
|
|
|
|
### Model Selection Priority
|
|
1. **Quality**: Best model for accuracy
|
|
2. **Balanced**: Good balance of speed and quality
|
|
3. **Speed**: Fastest available model
|
|
4. **Cost**: Most cost-effective model
|
|
|
|
### Size Reduction Strategies
|
|
- **Prompt Compression**: When `compressPrompt=True`
|
|
- **Context Summarization**: When `compressContext=True`
|
|
- **Document Chunking**: When `processDocumentsIndividually=True`
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: Enhanced AiCallOptions
|
|
- Add new fields to `AiCallOptions` model
|
|
- Update existing AI calls to use new options
|
|
|
|
### Phase 2: Unified Interface
|
|
- Implement `callAi()` as unified entry point
|
|
- Maintain backward compatibility with existing `callAiText()`
|
|
|
|
### Phase 3: Model Management
|
|
- Implement model capability registry
|
|
- Add model selection and fallback logic
|
|
|
|
### Phase 4: Size Management
|
|
- Integrate with ExtractionService
|
|
- Implement typeGroup-aware reduction strategies
|
|
|
|
### Phase 5: Full Migration
|
|
- Migrate all AI calls to use unified interface
|
|
- Remove legacy AI call methods
|
|
|
|
## Benefits
|
|
|
|
1. **Reliability**: Multiple model fallbacks ensure high success rate
|
|
2. **Efficiency**: Intelligent size management prevents token limit issues
|
|
3. **Flexibility**: TypeGroup-aware processing handles diverse content types
|
|
4. **Maintainability**: Centralized logic reduces code duplication
|
|
5. **Scalability**: Easy to add new models and capabilities
|
|
6. **Integration**: Seamless integration with existing ExtractionService
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Dynamic Model Loading**: Load models on-demand based on requirements
|
|
2. **Performance Monitoring**: Track model performance and optimize selection
|
|
3. **Cost Optimization**: Balance quality vs. cost based on use case
|
|
4. **Caching**: Cache processed content for repeated operations
|
|
5. **Streaming**: Support for streaming responses from models
|