235 lines
7.7 KiB
Markdown
235 lines
7.7 KiB
Markdown
# AI Call Functions Test and Content Size Analysis
|
|
|
|
## Overview
|
|
This file documents the ServiceCenter AI functions that have risk of delivering too big content,
|
|
along with their usage patterns and potential size issues.
|
|
|
|
## High-Risk AI Functions
|
|
|
|
### 1. summarizeChat() -> callAiTextBasic()
|
|
**Location**: gateway/modules/chat/handling/promptFactory.py:122
|
|
**Risk Level**: MEDIUM
|
|
**Content**: Entire workflow message history
|
|
**Usage**:
|
|
```python
|
|
messageSummary = await service.summarizeChat(context.workflow.messages) if context.workflow else ""
|
|
```
|
|
**Potential Issues**:
|
|
- Long conversations can generate very large summaries
|
|
- Includes all previous messages in workflow
|
|
- No size limits or truncation
|
|
|
|
### 2. callAiTextAdvanced() -> interfaceAiCalls.callAiTextAdvanced()
|
|
**Risk Level**: HIGH
|
|
**Multiple Usage Points**:
|
|
|
|
#### A. Task Planning (handlingTasks.py:116)
|
|
```python
|
|
prompt = await self.service.callAiTextAdvanced(task_planning_prompt)
|
|
```
|
|
**Content**: User input + document context + connection context + previous results
|
|
**Risk**: VERY HIGH - includes all available documents and context
|
|
|
|
#### B. Action Definition (handlingTasks.py:388)
|
|
```python
|
|
prompt = await self.service.callAiTextAdvanced(action_prompt)
|
|
```
|
|
**Content**: Task context + available documents + connections + previous results
|
|
**Risk**: HIGH - comprehensive context for action planning
|
|
|
|
#### C. Result Review (handlingTasks.py:894)
|
|
```python
|
|
response = await self.service.callAiTextAdvanced(prompt)
|
|
```
|
|
**Content**: Action results + success criteria + context
|
|
**Risk**: MEDIUM-HIGH - depends on result size
|
|
|
|
#### D. Email Composition (methodOutlook.py:1609)
|
|
```python
|
|
composed_email = await self.service.interfaceAiCalls.callAiTextAdvanced(ai_prompt)
|
|
```
|
|
**Content**: Document content + email requirements
|
|
**Risk**: MEDIUM - depends on document size
|
|
|
|
#### E. AI Processing (methodAi.py:175)
|
|
```python
|
|
result = await self.service.callAiTextAdvanced(enhanced_prompt, context)
|
|
```
|
|
**Content**: User prompt + extracted document content
|
|
**Risk**: HIGH - includes full document content
|
|
|
|
### 3. callAiTextBasic() -> interfaceAiCalls.callAiTextBasic()
|
|
**Risk Level**: MEDIUM
|
|
**Multiple Usage Points**:
|
|
|
|
#### A. Document Format Conversion (methodDocument.py:429)
|
|
```python
|
|
formatted_content = await self.service.callAiTextBasic(ai_prompt, content)
|
|
```
|
|
**Content**: Document content + format requirements
|
|
**Risk**: MEDIUM - depends on document size
|
|
|
|
#### B. HTML Report Generation (methodDocument.py:642)
|
|
```python
|
|
aiReport = await self.service.callAiTextBasic(aiPrompt, combinedContent)
|
|
```
|
|
**Content**: Combined content from multiple documents
|
|
**Risk**: HIGH - combines multiple documents
|
|
|
|
#### C. AI Processing Fallback (methodAi.py:177)
|
|
```python
|
|
result = await self.service.callAiTextBasic(enhanced_prompt, context)
|
|
```
|
|
**Content**: User prompt + document context
|
|
**Risk**: MEDIUM - includes document content
|
|
|
|
#### D. Document Content Processing (documentExtraction.py:1459)
|
|
```python
|
|
processedContent = await self._serviceCenter.callAiTextBasic(aiPrompt, contentToProcess)
|
|
```
|
|
**Content**: Document chunks + AI prompt
|
|
**Risk**: MEDIUM - processes document chunks
|
|
|
|
### 4. extractContentFromDocument() -> documentProcessor.processFileData()
|
|
**Risk Level**: HIGH
|
|
**Multiple Usage Points**:
|
|
|
|
#### A. Document Content Extraction (methodDocument.py:74)
|
|
```python
|
|
extracted_content = await self.service.extractContentFromDocument(
|
|
prompt=aiPrompt,
|
|
document=chatDocument
|
|
)
|
|
```
|
|
**Content**: Full document + extraction prompt
|
|
**Risk**: HIGH - processes entire documents
|
|
|
|
#### B. HTML Report Generation (methodDocument.py:581)
|
|
```python
|
|
extracted_content = await self.service.extractContentFromDocument(
|
|
prompt="Extract readable text content for HTML report generation",
|
|
document=doc
|
|
)
|
|
```
|
|
**Content**: Full document content
|
|
**Risk**: HIGH - processes documents for reports
|
|
|
|
#### C. Email Composition (methodOutlook.py:1510)
|
|
```python
|
|
extracted_content = await self.service.extractContentFromDocument(
|
|
prompt="Extract readable text content for email composition",
|
|
document=doc
|
|
)
|
|
```
|
|
**Content**: Full document content
|
|
**Risk**: HIGH - processes documents for emails
|
|
|
|
#### D. AI Processing (methodAi.py:94)
|
|
```python
|
|
extracted_content = await self.service.extractContentFromDocument(
|
|
prompt=extraction_prompt.strip(),
|
|
document=doc
|
|
)
|
|
```
|
|
**Content**: Full document content
|
|
**Risk**: HIGH - processes documents for AI analysis
|
|
|
|
## Risk Assessment Summary
|
|
|
|
### CRITICAL RISK (Immediate Attention Required)
|
|
1. **Task Planning** (handlingTasks.py:116) - Entire workflow context
|
|
2. **Action Definition** (handlingTasks.py:388) - Comprehensive context
|
|
3. **Document Processing** (all extractContentFromDocument calls) - Full documents
|
|
4. **AI Method Processing** (methodAi.py:175) - Document content + context
|
|
5. **Report Generation** (methodDocument.py:642) - Multiple documents combined
|
|
|
|
### HIGH RISK (Monitor Closely)
|
|
1. **Chat Summarization** (promptFactory.py:122) - Message history
|
|
2. **Document Format Conversion** (methodDocument.py:429) - Single documents
|
|
3. **Email Composition** (methodOutlook.py:1609) - Document content
|
|
|
|
## Potential Issues
|
|
|
|
### Content Size Problems
|
|
- Large documents (PDFs, Word docs, Excel files) can exceed AI model limits
|
|
- Combined document content in reports can be massive
|
|
- Long conversation histories in chat summarization
|
|
- Full workflow context in task planning
|
|
|
|
### Performance Issues
|
|
- Timeout errors for large content
|
|
- Memory issues with large document processing
|
|
- API rate limiting with large requests
|
|
- Cost implications for large AI calls
|
|
|
|
### Error Scenarios
|
|
- OpenAI API 400 errors (content too large)
|
|
- Timeout errors (processing too slow)
|
|
- Memory exhaustion (large document processing)
|
|
- Incomplete processing (truncated content)
|
|
|
|
## Recommended Solutions
|
|
|
|
### 1. Content Size Limits
|
|
- Implement maximum content size checks before AI calls
|
|
- Truncate large content with appropriate warnings
|
|
- Split large documents into chunks
|
|
|
|
### 2. Content Filtering
|
|
- Remove unnecessary context from prompts
|
|
- Filter out large binary content
|
|
- Use document summaries instead of full content
|
|
|
|
### 3. Chunking Strategy
|
|
- Process large documents in smaller chunks
|
|
- Implement progressive processing
|
|
- Use streaming for large responses
|
|
|
|
### 4. Caching and Optimization
|
|
- Cache processed document content
|
|
- Reuse extracted content across operations
|
|
- Implement smart content selection
|
|
|
|
### 5. Error Handling
|
|
- Graceful degradation for oversized content
|
|
- Fallback strategies for failed AI calls
|
|
- User notifications for content size issues
|
|
|
|
## Test Scenarios
|
|
|
|
### Test Case 1: Large Document Processing
|
|
- Upload a 10MB PDF document
|
|
- Try to extract content for AI processing
|
|
- Monitor for size limit errors
|
|
|
|
### Test Case 2: Multiple Document Reports
|
|
- Upload 5+ large documents
|
|
- Generate HTML report
|
|
- Check for combined content size issues
|
|
|
|
### Test Case 3: Long Conversation History
|
|
- Create workflow with 50+ messages
|
|
- Test chat summarization
|
|
- Monitor for context size limits
|
|
|
|
### Test Case 4: Task Planning with Large Context
|
|
- Create workflow with many documents
|
|
- Test task planning functionality
|
|
- Check for prompt size limits
|
|
|
|
## Monitoring Recommendations
|
|
|
|
1. **Log Content Sizes**: Track the size of content sent to AI functions
|
|
2. **Monitor API Errors**: Watch for 400 errors indicating content too large
|
|
3. **Performance Metrics**: Track processing times for large content
|
|
4. **User Feedback**: Monitor for incomplete or failed operations
|
|
5. **Cost Tracking**: Monitor AI API costs for large requests
|
|
|
|
## Implementation Priority
|
|
|
|
1. **Immediate**: Add content size checks to extractContentFromDocument
|
|
2. **High**: Implement chunking for large document processing
|
|
3. **Medium**: Add content filtering to task planning prompts
|
|
4. **Low**: Implement caching for processed content
|
|
|
|
This analysis should help identify and mitigate the risks of delivering too big content to AI functions.
|