gateway/test_ai_calls.md
2025-09-02 18:58:30 +02:00

235 lines
7.7 KiB
Markdown

# AI Call Functions Test and Content Size Analysis
## Overview
This file documents the ServiceCenter AI functions that have risk of delivering too big content,
along with their usage patterns and potential size issues.
## High-Risk AI Functions
### 1. summarizeChat() -> callAiTextBasic()
**Location**: gateway/modules/chat/handling/promptFactory.py:122
**Risk Level**: MEDIUM
**Content**: Entire workflow message history
**Usage**:
```python
messageSummary = await service.summarizeChat(context.workflow.messages) if context.workflow else ""
```
**Potential Issues**:
- Long conversations can generate very large summaries
- Includes all previous messages in workflow
- No size limits or truncation
### 2. callAiTextAdvanced() -> interfaceAiCalls.callAiTextAdvanced()
**Risk Level**: HIGH
**Multiple Usage Points**:
#### A. Task Planning (handlingTasks.py:116)
```python
prompt = await self.service.callAiTextAdvanced(task_planning_prompt)
```
**Content**: User input + document context + connection context + previous results
**Risk**: VERY HIGH - includes all available documents and context
#### B. Action Definition (handlingTasks.py:388)
```python
prompt = await self.service.callAiTextAdvanced(action_prompt)
```
**Content**: Task context + available documents + connections + previous results
**Risk**: HIGH - comprehensive context for action planning
#### C. Result Review (handlingTasks.py:894)
```python
response = await self.service.callAiTextAdvanced(prompt)
```
**Content**: Action results + success criteria + context
**Risk**: MEDIUM-HIGH - depends on result size
#### D. Email Composition (methodOutlook.py:1609)
```python
composed_email = await self.service.interfaceAiCalls.callAiTextAdvanced(ai_prompt)
```
**Content**: Document content + email requirements
**Risk**: MEDIUM - depends on document size
#### E. AI Processing (methodAi.py:175)
```python
result = await self.service.callAiTextAdvanced(enhanced_prompt, context)
```
**Content**: User prompt + extracted document content
**Risk**: HIGH - includes full document content
### 3. callAiTextBasic() -> interfaceAiCalls.callAiTextBasic()
**Risk Level**: MEDIUM
**Multiple Usage Points**:
#### A. Document Format Conversion (methodDocument.py:429)
```python
formatted_content = await self.service.callAiTextBasic(ai_prompt, content)
```
**Content**: Document content + format requirements
**Risk**: MEDIUM - depends on document size
#### B. HTML Report Generation (methodDocument.py:642)
```python
aiReport = await self.service.callAiTextBasic(aiPrompt, combinedContent)
```
**Content**: Combined content from multiple documents
**Risk**: HIGH - combines multiple documents
#### C. AI Processing Fallback (methodAi.py:177)
```python
result = await self.service.callAiTextBasic(enhanced_prompt, context)
```
**Content**: User prompt + document context
**Risk**: MEDIUM - includes document content
#### D. Document Content Processing (documentExtraction.py:1459)
```python
processedContent = await self._serviceCenter.callAiTextBasic(aiPrompt, contentToProcess)
```
**Content**: Document chunks + AI prompt
**Risk**: MEDIUM - processes document chunks
### 4. extractContentFromDocument() -> documentProcessor.processFileData()
**Risk Level**: HIGH
**Multiple Usage Points**:
#### A. Document Content Extraction (methodDocument.py:74)
```python
extracted_content = await self.service.extractContentFromDocument(
prompt=aiPrompt,
document=chatDocument
)
```
**Content**: Full document + extraction prompt
**Risk**: HIGH - processes entire documents
#### B. HTML Report Generation (methodDocument.py:581)
```python
extracted_content = await self.service.extractContentFromDocument(
prompt="Extract readable text content for HTML report generation",
document=doc
)
```
**Content**: Full document content
**Risk**: HIGH - processes documents for reports
#### C. Email Composition (methodOutlook.py:1510)
```python
extracted_content = await self.service.extractContentFromDocument(
prompt="Extract readable text content for email composition",
document=doc
)
```
**Content**: Full document content
**Risk**: HIGH - processes documents for emails
#### D. AI Processing (methodAi.py:94)
```python
extracted_content = await self.service.extractContentFromDocument(
prompt=extraction_prompt.strip(),
document=doc
)
```
**Content**: Full document content
**Risk**: HIGH - processes documents for AI analysis
## Risk Assessment Summary
### CRITICAL RISK (Immediate Attention Required)
1. **Task Planning** (handlingTasks.py:116) - Entire workflow context
2. **Action Definition** (handlingTasks.py:388) - Comprehensive context
3. **Document Processing** (all extractContentFromDocument calls) - Full documents
4. **AI Method Processing** (methodAi.py:175) - Document content + context
5. **Report Generation** (methodDocument.py:642) - Multiple documents combined
### HIGH RISK (Monitor Closely)
1. **Chat Summarization** (promptFactory.py:122) - Message history
2. **Document Format Conversion** (methodDocument.py:429) - Single documents
3. **Email Composition** (methodOutlook.py:1609) - Document content
## Potential Issues
### Content Size Problems
- Large documents (PDFs, Word docs, Excel files) can exceed AI model limits
- Combined document content in reports can be massive
- Long conversation histories in chat summarization
- Full workflow context in task planning
### Performance Issues
- Timeout errors for large content
- Memory issues with large document processing
- API rate limiting with large requests
- Cost implications for large AI calls
### Error Scenarios
- OpenAI API 400 errors (content too large)
- Timeout errors (processing too slow)
- Memory exhaustion (large document processing)
- Incomplete processing (truncated content)
## Recommended Solutions
### 1. Content Size Limits
- Implement maximum content size checks before AI calls
- Truncate large content with appropriate warnings
- Split large documents into chunks
### 2. Content Filtering
- Remove unnecessary context from prompts
- Filter out large binary content
- Use document summaries instead of full content
### 3. Chunking Strategy
- Process large documents in smaller chunks
- Implement progressive processing
- Use streaming for large responses
### 4. Caching and Optimization
- Cache processed document content
- Reuse extracted content across operations
- Implement smart content selection
### 5. Error Handling
- Graceful degradation for oversized content
- Fallback strategies for failed AI calls
- User notifications for content size issues
## Test Scenarios
### Test Case 1: Large Document Processing
- Upload a 10MB PDF document
- Try to extract content for AI processing
- Monitor for size limit errors
### Test Case 2: Multiple Document Reports
- Upload 5+ large documents
- Generate HTML report
- Check for combined content size issues
### Test Case 3: Long Conversation History
- Create workflow with 50+ messages
- Test chat summarization
- Monitor for context size limits
### Test Case 4: Task Planning with Large Context
- Create workflow with many documents
- Test task planning functionality
- Check for prompt size limits
## Monitoring Recommendations
1. **Log Content Sizes**: Track the size of content sent to AI functions
2. **Monitor API Errors**: Watch for 400 errors indicating content too large
3. **Performance Metrics**: Track processing times for large content
4. **User Feedback**: Monitor for incomplete or failed operations
5. **Cost Tracking**: Monitor AI API costs for large requests
## Implementation Priority
1. **Immediate**: Add content size checks to extractContentFromDocument
2. **High**: Implement chunking for large document processing
3. **Medium**: Add content filtering to task planning prompts
4. **Low**: Implement caching for processed content
This analysis should help identify and mitigate the risks of delivering too big content to AI functions.