system test

This commit is contained in:
ValueOn AG 2025-07-10 16:13:05 +02:00
parent aa854f27b7
commit 86fe43e987
21 changed files with 2247 additions and 590 deletions

114
README_document_test.md Normal file
View file

@ -0,0 +1,114 @@
# Document Extraction Test
This test procedure validates the DocumentManager's ability to extract content from files using AI-powered analysis.
## Files Created
- `test_document_extraction.py` - Main test script
- `test_sample_document.txt` - Sample document for testing
- `run_document_test.ps1` - PowerShell wrapper script
- `test_document_extraction.log` - Generated log file (cleared on each run)
## Usage
### Method 1: Using PowerShell Script (Recommended)
```powershell
# Test with default sample file
.\run_document_test.ps1
# Test with custom file
.\run_document_test.ps1 "path\to\your\document.pdf"
```
### Method 2: Direct Python Execution
```bash
# Test with default sample file
python test_document_extraction.py test_sample_document.txt
# Test with custom file
python test_document_extraction.py "path/to/your/document.docx"
```
## Test Features
1. **File Validation**: Checks if the specified file exists
2. **MIME Type Detection**: Automatically detects file type based on extension
3. **Content Extraction**: Uses the DocumentManager to extract content
4. **AI Processing**: Applies the prompt "summarize the content and give list of the major topics"
5. **Comprehensive Logging**: Logs all steps and results to `test_document_extraction.log`
6. **Log Cleanup**: Clears the log file on each test run
## Supported File Types
- Text files (.txt, .md)
- CSV files (.csv)
- JSON files (.json)
- XML files (.xml)
- HTML files (.html, .htm)
- Images (.jpg, .jpeg, .png, .gif, .svg)
- PDF files (.pdf)
- Office documents (.docx, .xlsx, .pptx)
- And more (fallback to binary processing)
## Test Output
The test generates detailed logs including:
- File information (path, size, MIME type)
- Extraction process details
- Extracted content summary
- AI-processed results
- Error details if any issues occur
## Example Output
```
=== STARTING DOCUMENT EXTRACTION TEST ===
File information: {
"file_path": "test_sample_document.txt",
"filename": "test_sample_document.txt",
"mime_type": "text/plain",
"file_size_bytes": 2048,
"file_size_mb": 0.0
}
Document extraction completed successfully: {
"extracted_content_id": "test-doc-1234567890",
"content_items_count": 1,
"object_type": "ExtractedContent"
}
COMPLETE EXTRACTED CONTENT: {
"total_length": 1500,
"content": "PowerOn System Architecture Overview... [AI processed summary]"
}
```
## Error Handling
The test includes comprehensive error handling for:
- File not found errors
- File reading errors
- Document processing errors
- AI processing errors
- Import errors
All errors are logged with detailed information for debugging.
## Configuration
The test uses the same configuration as other tests:
- Environment variable: `POWERON_CONFIG_FILE = 'test_config.ini'`
- Log file: `test_document_extraction.log`
- Log level: DEBUG
## Dependencies
The test requires the same dependencies as the main PowerOn system:
- Python 3.8+
- Required Python packages (see requirements.txt)
- Access to AI services (if AI processing is enabled)
- Proper configuration in test_config.ini

View file

@ -98,7 +98,27 @@ class AiOpenai:
The response from the OpenAI Vision API as text The response from the OpenAI Vision API as text
""" """
try: try:
logger.debug(f"Starting image analysis for {mimeType} with query '{prompt}' for {mimeType} size {len(imageData)}B...") logger.debug(f"Starting image analysis with query '{prompt}' for size {len(imageData)}B...")
# Ensure imageData is a string (base64 encoded)
if not isinstance(imageData, str):
raise ValueError("imageData must be a string (base64 encoded)")
# Fix base64 padding if needed
padding_needed = len(imageData) % 4
if padding_needed:
imageData += '=' * (4 - padding_needed)
# Use default MIME type if not provided
if not mimeType:
mimeType = "image/jpeg"
logger.debug(f"Using MIME type: {mimeType}")
logger.debug(f"Base64 data length: {len(imageData)} characters")
# Create the data URL format as required by OpenAI Vision API
data_url = f"data:{mimeType};base64,{imageData}"
messages = [ messages = [
{ {
"role": "user", "role": "user",
@ -107,15 +127,40 @@ class AiOpenai:
{ {
"type": "image_url", "type": "image_url",
"image_url": { "image_url": {
"url": f"data:{mimeType};base64,{imageData}" "url": data_url
} }
} }
] ]
} }
] ]
# Use the existing callApi function with the Vision model # Use a vision-capable model for image analysis
response = await self.callApi(messages) # Override the model for vision tasks
visionModel = "gpt-4o" # or "gpt-4-vision-preview" depending on availability
# Use parameters from configuration
temperature = self.config.get("temperature", 0.2)
maxTokens = self.config.get("maxTokens", 2000)
payload = {
"model": visionModel,
"messages": messages,
"temperature": temperature,
"max_tokens": maxTokens
}
response = await self.httpClient.post(
self.apiUrl,
json=payload
)
if response.status_code != 200:
logger.error(f"OpenAI API error: {response.status_code} - {response.text}")
raise HTTPException(status_code=500, detail="Error communicating with OpenAI API")
responseJson = response.json()
content = responseJson["choices"][0]["message"]["content"]
return content
# Return content # Return content
return response return response

View file

@ -173,13 +173,31 @@ class DatabaseConnector:
record["_modifiedAt"] = currentTime.isoformat() record["_modifiedAt"] = currentTime.isoformat()
record["_modifiedBy"] = self.userId record["_modifiedBy"] = self.userId
# Save the record file # Save the record file using atomic write
recordPath = self._getRecordPath(table, recordId) recordPath = self._getRecordPath(table, recordId)
tempPath = recordPath + '.tmp'
# Ensure directory exists
os.makedirs(os.path.dirname(recordPath), exist_ok=True) os.makedirs(os.path.dirname(recordPath), exist_ok=True)
with open(recordPath, 'w', encoding='utf-8') as f: # Write to temporary file first
with open(tempPath, 'w', encoding='utf-8') as f:
json.dump(record, f, indent=2, ensure_ascii=False) json.dump(record, f, indent=2, ensure_ascii=False)
# Verify the temporary file can be read back (validation)
try:
with open(tempPath, 'r', encoding='utf-8') as f:
json.load(f) # This will fail if file is corrupted
except Exception as e:
logger.error(f"Validation failed for record {recordId}: {e}")
# Clean up temp file
if os.path.exists(tempPath):
os.remove(tempPath)
raise ValueError(f"Record validation failed: {e}")
# Atomic move from temp to final location
os.replace(tempPath, recordPath)
# Update metadata # Update metadata
metadata = self._loadTableMetadata(table) metadata = self._loadTableMetadata(table)
if recordId not in metadata["recordIds"]: if recordId not in metadata["recordIds"]:
@ -203,6 +221,13 @@ class DatabaseConnector:
except Exception as e: except Exception as e:
logger.error(f"Error saving record {recordId} to table {table}: {e}") logger.error(f"Error saving record {recordId} to table {table}: {e}")
# Clean up temp file if it exists
tempPath = self._getRecordPath(table, recordId) + '.tmp'
if os.path.exists(tempPath):
try:
os.remove(tempPath)
except:
pass
return False return False
def _loadTable(self, table: str) -> List[Dict[str, Any]]: def _loadTable(self, table: str) -> List[Dict[str, Any]]:

View file

@ -116,7 +116,7 @@ class AiCalls:
The AI response as text The AI response as text
""" """
try: try:
return await self.openaiService.callAiImage(imageData, mimeType, prompt) return await self.openaiService.callAiImage(prompt, imageData, mimeType)
except Exception as e: except Exception as e:
logger.error(f"Error in OpenAI image call: {str(e)}") logger.error(f"Error in OpenAI image call: {str(e)}")
return f"Error: {str(e)}" return f"Error: {str(e)}"

View file

@ -237,7 +237,6 @@ class AppObjects:
# Find user by username # Find user by username
for user_dict in users: for user_dict in users:
if user_dict.get("username") == username: if user_dict.get("username") == username:
logger.info(f"Found user with username {username}")
return User.from_dict(user_dict) return User.from_dict(user_dict)
logger.info(f"No user found with username {username}") logger.info(f"No user found with username {username}")

View file

@ -760,7 +760,7 @@ class ChatObjects:
else: else:
# Create new workflow # Create new workflow
workflowData = { workflowData = {
"name": userInput.name or "New Workflow", "name": "New Workflow", # Default name since UserInputRequest doesn't have a name field
"status": "running", "status": "running",
"startedAt": currentTime, "startedAt": currentTime,
"lastActivity": currentTime, "lastActivity": currentTime,

View file

@ -690,34 +690,39 @@ class ComponentObjects:
return None return None
# Process content based on file type # Process content based on file type
contentType = "binary" isText = False
content = "" content = ""
encoding = None
if file.get("mimeType", "").startswith("text/"): # Use proper attribute access for FileItem object
if file.mimeType.startswith("text/"):
# For text files, return full content # For text files, return full content
try: try:
content = fileContent.decode('utf-8') content = fileContent.decode('utf-8')
contentType = "text" isText = True
encoding = 'utf-8'
except UnicodeDecodeError: except UnicodeDecodeError:
content = fileContent.decode('latin-1') content = fileContent.decode('latin-1')
contentType = "text" isText = True
elif file.get("mimeType", "").startswith("image/"): encoding = 'latin-1'
elif file.mimeType.startswith("image/"):
# For images, return base64 # For images, return base64
contentType = "base64" import base64
content = f"data:{file['mimeType']};base64,{fileContent.hex()}" content = base64.b64encode(fileContent).decode('utf-8')
isText = False
else: else:
# For other files, return as base64 # For other files, return as base64
contentType = "base64" import base64
content = f"data:{file['mimeType']};base64,{fileContent.hex()}" content = base64.b64encode(fileContent).decode('utf-8')
isText = False
return FilePreview( return FilePreview(
id=fileId,
name=file.get("name", "Unknown"),
mimeType=file.get("mimeType", "application/octet-stream"),
size=file.get("size", 0),
content=content, content=content,
contentType=contentType, mimeType=file.mimeType,
metadata=file.get("metadata", {}) filename=file.filename,
isText=isText,
encoding=encoding,
size=file.fileSize
) )
except Exception as e: except Exception as e:

View file

@ -1,4 +1,4 @@
from typing import Dict, Any, Optional from typing import Dict, Any, Optional, List
import logging import logging
import uuid import uuid
from datetime import datetime, UTC from datetime import datetime, UTC
@ -11,10 +11,11 @@ class MethodCoder(MethodBase):
"""Coder method implementation for code operations""" """Coder method implementation for code operations"""
def __init__(self, serviceContainer: Any): def __init__(self, serviceContainer: Any):
"""Initialize the coder method"""
super().__init__(serviceContainer) super().__init__(serviceContainer)
self.name = "coder" self.name = "coder"
self.description = "Handle code operations like analysis and generation" self.description = "Handle code operations like analysis, generation, and refactoring"
@action @action
async def analyze(self, parameters: Dict[str, Any]) -> ActionResult: async def analyze(self, parameters: Dict[str, Any]) -> ActionResult:
""" """
@ -55,7 +56,7 @@ class MethodCoder(MethodBase):
error="No documents found for the provided reference" error="No documents found for the provided reference"
) )
# Extract content from all documents # Process each document individually
all_code_content = [] all_code_content = []
for chatDocument in chatDocuments: for chatDocument in chatDocuments:
@ -85,15 +86,18 @@ class MethodCoder(MethodBase):
error="No code content could be extracted from any documents" error="No code content could be extracted from any documents"
) )
# Combine all code content for analysis # Extract text content from ExtractedContent objects
combined_code = "\n\n--- CODE SEPARATOR ---\n\n".join(all_code_content) text_contents = self.service.extractTextFromContentObjects(all_code_content)
# Combine all extracted text content for analysis
combined_content = "\n\n--- CODE SEPARATOR ---\n\n".join(text_contents)
# Create analysis prompt # Create analysis prompt
analysis_prompt = f""" analysis_prompt = f"""
Analyze this {language} code for quality, structure, and potential issues. Analyze this {language} code for quality, structure, and potential issues.
Code to analyze: Code to analyze:
{combined_code} {combined_content}
Please check for: Please check for:
{', '.join(checks)} {', '.join(checks)}

View file

@ -26,18 +26,16 @@ class MethodDocument(MethodBase):
@action @action
async def extract(self, parameters: Dict[str, Any]) -> ActionResult: async def extract(self, parameters: Dict[str, Any]) -> ActionResult:
""" """
Extract content from document Extract specific content from document with ai prompt and return it as a json file
Parameters: Parameters:
documentList (str): Reference to the document list to extract content from documentList (str): Reference to the document list to extract content from
aiPrompt (str): AI prompt for content extraction aiPrompt (str): AI prompt for content extraction
format (str, optional): Output format (default: "text")
includeMetadata (bool, optional): Whether to include metadata (default: True) includeMetadata (bool, optional): Whether to include metadata (default: True)
""" """
try: try:
documentList = parameters.get("documentList") documentList = parameters.get("documentList")
aiPrompt = parameters.get("aiPrompt") aiPrompt = parameters.get("aiPrompt")
format = parameters.get("format", "text")
includeMetadata = parameters.get("includeMetadata", True) includeMetadata = parameters.get("includeMetadata", True)
if not documentList: if not documentList:
@ -95,12 +93,14 @@ class MethodDocument(MethodBase):
error="No content could be extracted from any documents" error="No content could be extracted from any documents"
) )
# Combine all extracted content # Extract text content from ExtractedContent objects
combined_content = "\n\n--- DOCUMENT SEPARATOR ---\n\n".join(all_extracted_content) text_contents = self.service.extractTextFromContentObjects(all_extracted_content)
# Combine all extracted text content
combined_content = "\n\n--- DOCUMENT SEPARATOR ---\n\n".join(text_contents)
result_data = { result_data = {
"documentCount": len(chatDocuments), "documentCount": len(chatDocuments),
"format": format,
"content": combined_content, "content": combined_content,
"fileInfos": file_infos if includeMetadata else None, "fileInfos": file_infos if includeMetadata else None,
"timestamp": datetime.now(UTC).isoformat() "timestamp": datetime.now(UTC).isoformat()
@ -124,236 +124,3 @@ class MethodDocument(MethodBase):
data={}, data={},
error=str(e) error=str(e)
) )
@action
async def analyze(self, parameters: Dict[str, Any]) -> ActionResult:
"""
Analyze document content
Parameters:
documentList (str): Reference to the document list to analyze
aiPrompt (str): AI prompt for content analysis
analysis (List[str], optional): Types of analysis to perform (default: ["entities", "topics", "sentiment"])
"""
try:
documentList = parameters.get("documentList")
aiPrompt = parameters.get("aiPrompt")
analysis = parameters.get("analysis", ["entities", "topics", "sentiment"])
if not documentList:
return self._createResult(
success=False,
data={},
error="Document list reference is required"
)
if not aiPrompt:
return self._createResult(
success=False,
data={},
error="AI prompt is required"
)
chatDocuments = self.service.getChatDocumentsFromDocumentList(documentList)
if not chatDocuments:
return self._createResult(
success=False,
data={},
error="No documents found for the provided reference"
)
# Extract content from all documents
all_extracted_content = []
for chatDocument in chatDocuments:
fileId = chatDocument.fileId
file_data = self.service.getFileData(fileId)
file_info = self.service.getFileInfo(fileId)
if not file_data:
logger.warning(f"File not found or empty for fileId: {fileId}")
continue
extracted_content = await self.service.extractContentFromFileData(
prompt=aiPrompt,
fileData=file_data,
filename=file_info.get('name', 'document'),
mimeType=file_info.get('mimeType', 'application/octet-stream'),
base64Encoded=False,
documentId=chatDocument.id
)
all_extracted_content.append(extracted_content)
if not all_extracted_content:
return self._createResult(
success=False,
data={},
error="No content could be extracted from any documents"
)
# Combine all extracted content for analysis
combined_content = "\n\n--- DOCUMENT SEPARATOR ---\n\n".join(all_extracted_content)
analysis_prompt = f"""
Analyze this document content for the following aspects:
{', '.join(analysis)}
Document content:
{combined_content[:8000]} # Limit content length
Please provide a detailed analysis including:
1. Key entities (people, organizations, locations, dates)
2. Main topics and themes
3. Sentiment analysis (positive, negative, neutral)
4. Key insights and patterns
5. Important relationships between entities
6. Document structure and organization
"""
analysis_result = await self.service.interfaceAiCalls.callAiTextAdvanced(analysis_prompt)
result_data = {
"documentCount": len(chatDocuments),
"analysis": analysis,
"results": analysis_result,
"content": combined_content,
"timestamp": datetime.now(UTC).isoformat()
}
return self._createResult(
success=True,
data={
"documents": [
{
"documentName": f"document_analysis_{datetime.now(UTC).strftime('%Y%m%d_%H%M%S')}.json",
"documentData": result_data
}
]
}
)
except Exception as e:
logger.error(f"Error analyzing content: {str(e)}")
return self._createResult(
success=False,
data={},
error=str(e)
)
@action
async def summarize(self, parameters: Dict[str, Any]) -> ActionResult:
"""
Summarize document content
Parameters:
documentList (str): Reference to the document list to summarize
aiPrompt (str): AI prompt for content extraction
maxLength (int, optional): Maximum length of summary in words (default: 200)
format (str, optional): Output format (default: "text")
"""
try:
documentList = parameters.get("documentList")
aiPrompt = parameters.get("aiPrompt")
maxLength = parameters.get("maxLength", 200)
format = parameters.get("format", "text")
if not documentList:
return self._createResult(
success=False,
data={},
error="Document list reference is required"
)
if not aiPrompt:
return self._createResult(
success=False,
data={},
error="AI prompt is required"
)
chatDocuments = self.service.getChatDocumentsFromDocumentList(documentList)
if not chatDocuments:
return self._createResult(
success=False,
data={},
error="No documents found for the provided reference"
)
# Extract content from all documents
all_extracted_content = []
for chatDocument in chatDocuments:
fileId = chatDocument.fileId
file_data = self.service.getFileData(fileId)
file_info = self.service.getFileInfo(fileId)
if not file_data:
logger.warning(f"File not found or empty for fileId: {fileId}")
continue
extracted_content = await self.service.extractContentFromFileData(
prompt=aiPrompt,
fileData=file_data,
filename=file_info.get('name', 'document'),
mimeType=file_info.get('mimeType', 'application/octet-stream'),
base64Encoded=False,
documentId=chatDocument.id
)
all_extracted_content.append(extracted_content)
if not all_extracted_content:
return self._createResult(
success=False,
data={},
error="No content could be extracted from any documents"
)
# Combine all extracted content for summarization
combined_content = "\n\n--- DOCUMENT SEPARATOR ---\n\n".join(all_extracted_content)
summary_prompt = f"""
Create a comprehensive summary of this document content.
Document content:
{combined_content[:8000]} # Limit content length
Requirements:
- Maximum length: {maxLength} words
- Format: {format}
- Include key points and main ideas
- Maintain accuracy and completeness
- Use clear, professional language
- Highlight important insights and conclusions
"""
summary = await self.service.interfaceAiCalls.callAiTextAdvanced(summary_prompt)
result_data = {
"documentCount": len(chatDocuments),
"maxLength": maxLength,
"format": format,
"summary": summary,
"wordCount": len(summary.split()),
"originalContent": combined_content,
"timestamp": datetime.now(UTC).isoformat()
}
return self._createResult(
success=True,
data={
"documents": [
{
"documentName": f"document_summary_{datetime.now(UTC).strftime('%Y%m%d_%H%M%S')}.txt",
"documentData": result_data
}
]
}
)
except Exception as e:
logger.error(f"Error summarizing content: {str(e)}")
return self._createResult(
success=False,
data={},
error=str(e)
)

View file

@ -133,7 +133,7 @@ async def get_file(
detail=f"File with ID {fileId} not found" detail=f"File with ID {fileId} not found"
) )
return FileItem(**fileData) return fileData
except interfaceComponentObjects.FileNotFoundError as e: except interfaceComponentObjects.FileNotFoundError as e:
logger.warning(f"File not found: {str(e)}") logger.warning(f"File not found: {str(e)}")
@ -180,8 +180,8 @@ async def update_file(
detail=f"File with ID {fileId} not found" detail=f"File with ID {fileId} not found"
) )
# Check if user has access to the file # Check if user has access to the file using the interface's permission system
if file.get("userId", 0) != currentUser.get("id", 0): if not managementInterface._canModify("files", fileId):
raise HTTPException( raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN, status_code=status.HTTP_403_FORBIDDEN,
detail="Not authorized to update this file" detail="Not authorized to update this file"
@ -195,9 +195,9 @@ async def update_file(
detail="Failed to update file" detail="Failed to update file"
) )
# Get updated file and convert to FileItem # Get updated file
updatedFile = managementInterface.getFile(fileId) updatedFile = managementInterface.getFile(fileId)
return FileItem(**updatedFile) return updatedFile
except HTTPException as he: except HTTPException as he:
raise he raise he
@ -328,15 +328,15 @@ async def preview_file(
try: try:
managementInterface = interfaceComponentObjects.getInterface(currentUser) managementInterface = interfaceComponentObjects.getInterface(currentUser)
# Get file preview # Get file preview using the correct method
preview = managementInterface.getFilePreview(fileId) preview = managementInterface.getFileContent(fileId)
if not preview: if not preview:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail=f"File with ID {fileId} not found or no content available" detail=f"File with ID {fileId} not found or no content available"
) )
return FilePreview(**preview) return preview
except HTTPException: except HTTPException:
raise raise
except Exception as e: except Exception as e:

View file

@ -54,7 +54,7 @@ async def create_prompt(
# Create prompt # Create prompt
newPrompt = managementInterface.createPrompt(prompt_data) newPrompt = managementInterface.createPrompt(prompt_data)
return Prompt.from_dict(newPrompt) return Prompt(**newPrompt)
@router.get("/{promptId}", response_model=Prompt) @router.get("/{promptId}", response_model=Prompt)
@limiter.limit("30/minute") @limiter.limit("30/minute")
@ -74,7 +74,7 @@ async def get_prompt(
detail=f"Prompt with ID {promptId} not found" detail=f"Prompt with ID {promptId} not found"
) )
return Prompt.from_dict(prompt) return prompt
@router.put("/{promptId}", response_model=Prompt) @router.put("/{promptId}", response_model=Prompt)
@limiter.limit("10/minute") @limiter.limit("10/minute")
@ -107,7 +107,7 @@ async def update_prompt(
detail="Error updating the prompt" detail="Error updating the prompt"
) )
return Prompt.from_dict(updatedPrompt) return Prompt(**updatedPrompt)
@router.delete("/{promptId}", response_model=Dict[str, Any]) @router.delete("/{promptId}", response_model=Dict[str, Any])
@limiter.limit("10/minute") @limiter.limit("10/minute")

View file

@ -48,7 +48,7 @@ def getServiceChat(currentUser: User):
# Consolidated endpoint for getting all workflows # Consolidated endpoint for getting all workflows
@router.get("/", response_model=List[ChatWorkflow]) @router.get("/", response_model=List[ChatWorkflow])
@limiter.limit("30/minute") @limiter.limit("120/minute")
async def get_workflows( async def get_workflows(
request: Request, request: Request,
currentUser: User = Depends(getCurrentUser) currentUser: User = Depends(getCurrentUser)
@ -56,7 +56,31 @@ async def get_workflows(
"""Get all workflows for the current user.""" """Get all workflows for the current user."""
try: try:
appInterface = getInterface(currentUser) appInterface = getInterface(currentUser)
return appInterface.getAllWorkflows() workflows_data = appInterface.getAllWorkflows()
# Convert raw dictionaries to ChatWorkflow objects
workflows = []
for workflow_data in workflows_data:
try:
workflow = ChatWorkflow(
id=workflow_data["id"],
status=workflow_data.get("status", "running"),
name=workflow_data.get("name"),
currentRound=workflow_data.get("currentRound", 1),
lastActivity=workflow_data.get("lastActivity", appInterface._getCurrentTimestamp()),
startedAt=workflow_data.get("startedAt", appInterface._getCurrentTimestamp()),
logs=[ChatLog(**log) for log in workflow_data.get("logs", [])],
messages=[ChatMessage(**msg) for msg in workflow_data.get("messages", [])],
stats=ChatStat(**workflow_data.get("dataStats", {})) if workflow_data.get("dataStats") else None,
mandateId=workflow_data.get("mandateId", currentUser.mandateId or "")
)
workflows.append(workflow)
except Exception as e:
logger.warning(f"Error converting workflow data to ChatWorkflow object: {str(e)}")
# Skip invalid workflows instead of failing the entire request
continue
return workflows
except Exception as e: except Exception as e:
logger.error(f"Error getting workflows: {str(e)}") logger.error(f"Error getting workflows: {str(e)}")
raise HTTPException( raise HTTPException(
@ -65,7 +89,7 @@ async def get_workflows(
) )
@router.get("/{workflowId}", response_model=ChatWorkflow) @router.get("/{workflowId}", response_model=ChatWorkflow)
@limiter.limit("30/minute") @limiter.limit("120/minute")
async def get_workflow( async def get_workflow(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),
@ -93,9 +117,58 @@ async def get_workflow(
detail=f"Failed to get workflow: {str(e)}" detail=f"Failed to get workflow: {str(e)}"
) )
@router.put("/{workflowId}", response_model=ChatWorkflow)
@limiter.limit("120/minute")
async def update_workflow(
request: Request,
workflowId: str = Path(..., description="ID of the workflow to update"),
workflowData: Dict[str, Any] = Body(...),
currentUser: User = Depends(getCurrentUser)
) -> ChatWorkflow:
"""Update workflow by ID"""
try:
# Get workflow interface with current user context
workflowInterface = getInterface(currentUser)
# Get raw workflow data from database to check permissions
workflows = workflowInterface.db.getRecordset("workflows", recordFilter={"id": workflowId})
if not workflows:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Workflow not found"
)
workflow_data = workflows[0]
# Check if user has permission to update using the interface's permission system
if not workflowInterface._canModify("workflows", workflowId):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="You don't have permission to update this workflow"
)
# Update workflow
updatedWorkflow = workflowInterface.updateWorkflow(workflowId, workflowData)
if not updatedWorkflow:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to update workflow"
)
return updatedWorkflow
except HTTPException:
raise
except Exception as e:
logger.error(f"Error updating workflow: {str(e)}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"Failed to update workflow: {str(e)}"
)
# API Endpoint for workflow status # API Endpoint for workflow status
@router.get("/{workflowId}/status", response_model=ChatWorkflow) @router.get("/{workflowId}/status", response_model=ChatWorkflow)
@limiter.limit("30/minute") @limiter.limit("120/minute")
async def get_workflow_status( async def get_workflow_status(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),
@ -114,7 +187,7 @@ async def get_workflow_status(
detail=f"Workflow with ID {workflowId} not found" detail=f"Workflow with ID {workflowId} not found"
) )
return ChatWorkflow(**workflow) return workflow
except HTTPException: except HTTPException:
raise raise
except Exception as e: except Exception as e:
@ -126,7 +199,7 @@ async def get_workflow_status(
# API Endpoint for workflow logs with selective data transfer # API Endpoint for workflow logs with selective data transfer
@router.get("/{workflowId}/logs", response_model=List[ChatLog]) @router.get("/{workflowId}/logs", response_model=List[ChatLog])
@limiter.limit("30/minute") @limiter.limit("120/minute")
async def get_workflow_logs( async def get_workflow_logs(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),
@ -152,12 +225,12 @@ async def get_workflow_logs(
# Apply selective data transfer if logId is provided # Apply selective data transfer if logId is provided
if logId: if logId:
# Find the index of the log with the given ID # Find the index of the log with the given ID
logIndex = next((i for i, log in enumerate(allLogs) if log.get("id") == logId), -1) logIndex = next((i for i, log in enumerate(allLogs) if log.id == logId), -1)
if logIndex >= 0: if logIndex >= 0:
# Return only logs after the specified log # Return only logs after the specified log
return [ChatLog(**log) for log in allLogs[logIndex + 1:]] return allLogs[logIndex + 1:]
return [ChatLog(**log) for log in allLogs] return allLogs
except HTTPException: except HTTPException:
raise raise
except Exception as e: except Exception as e:
@ -169,7 +242,7 @@ async def get_workflow_logs(
# API Endpoint for workflow messages with selective data transfer # API Endpoint for workflow messages with selective data transfer
@router.get("/{workflowId}/messages", response_model=List[ChatMessage]) @router.get("/{workflowId}/messages", response_model=List[ChatMessage])
@limiter.limit("30/minute") @limiter.limit("120/minute")
async def get_workflow_messages( async def get_workflow_messages(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),
@ -195,12 +268,12 @@ async def get_workflow_messages(
# Apply selective data transfer if messageId is provided # Apply selective data transfer if messageId is provided
if messageId: if messageId:
# Find the index of the message with the given ID # Find the index of the message with the given ID
messageIndex = next((i for i, msg in enumerate(allMessages) if msg.get("id") == messageId), -1) messageIndex = next((i for i, msg in enumerate(allMessages) if msg.id == messageId), -1)
if messageIndex >= 0: if messageIndex >= 0:
# Return only messages after the specified message # Return only messages after the specified message
return [ChatMessage(**msg) for msg in allMessages[messageIndex + 1:]] return allMessages[messageIndex + 1:]
return [ChatMessage(**msg) for msg in allMessages] return allMessages
except HTTPException: except HTTPException:
raise raise
except Exception as e: except Exception as e:
@ -212,7 +285,7 @@ async def get_workflow_messages(
# State 1: Workflow Initialization endpoint # State 1: Workflow Initialization endpoint
@router.post("/start", response_model=ChatWorkflow) @router.post("/start", response_model=ChatWorkflow)
@limiter.limit("10/minute") @limiter.limit("120/minute")
async def start_workflow( async def start_workflow(
request: Request, request: Request,
workflowId: Optional[str] = Query(None, description="Optional ID of the workflow to continue"), workflowId: Optional[str] = Query(None, description="Optional ID of the workflow to continue"),
@ -230,7 +303,7 @@ async def start_workflow(
# Start or continue workflow using ChatObjects # Start or continue workflow using ChatObjects
workflow = await interfaceChat.workflowStart(currentUser, userInput, workflowId) workflow = await interfaceChat.workflowStart(currentUser, userInput, workflowId)
return ChatWorkflow(**workflow) return workflow
except Exception as e: except Exception as e:
logger.error(f"Error in start_workflow: {str(e)}") logger.error(f"Error in start_workflow: {str(e)}")
@ -241,7 +314,7 @@ async def start_workflow(
# State 8: Workflow Stopped endpoint # State 8: Workflow Stopped endpoint
@router.post("/{workflowId}/stop", response_model=ChatWorkflow) @router.post("/{workflowId}/stop", response_model=ChatWorkflow)
@limiter.limit("10/minute") @limiter.limit("120/minute")
async def stop_workflow( async def stop_workflow(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow to stop"), workflowId: str = Path(..., description="ID of the workflow to stop"),
@ -255,7 +328,7 @@ async def stop_workflow(
# Stop workflow using ChatObjects # Stop workflow using ChatObjects
workflow = await interfaceChat.workflowStop(workflowId) workflow = await interfaceChat.workflowStop(workflowId)
return ChatWorkflow(**workflow) return workflow
except Exception as e: except Exception as e:
logger.error(f"Error in stop_workflow: {str(e)}") logger.error(f"Error in stop_workflow: {str(e)}")
@ -266,7 +339,7 @@ async def stop_workflow(
# State 11: Workflow Reset/Deletion endpoint # State 11: Workflow Reset/Deletion endpoint
@router.delete("/{workflowId}", response_model=Dict[str, Any]) @router.delete("/{workflowId}", response_model=Dict[str, Any])
@limiter.limit("10/minute") @limiter.limit("120/minute")
async def delete_workflow( async def delete_workflow(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow to delete"), workflowId: str = Path(..., description="ID of the workflow to delete"),
@ -277,16 +350,18 @@ async def delete_workflow(
# Get service container # Get service container
interfaceChat = getServiceChat(currentUser) interfaceChat = getServiceChat(currentUser)
# Verify workflow exists # Get raw workflow data from database to check permissions
workflow = interfaceChat.getWorkflow(workflowId) workflows = interfaceChat.db.getRecordset("workflows", recordFilter={"id": workflowId})
if not workflow: if not workflows:
raise HTTPException( raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, status_code=status.HTTP_404_NOT_FOUND,
detail=f"Workflow with ID {workflowId} not found" detail=f"Workflow with ID {workflowId} not found"
) )
# Check if user has permission to delete workflow_data = workflows[0]
if workflow.get("_userId") != currentUser["id"]:
# Check if user has permission to delete using the interface's permission system
if not interfaceChat._canModify("workflows", workflowId):
raise HTTPException( raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN, status_code=status.HTTP_403_FORBIDDEN,
detail="You don't have permission to delete this workflow" detail="You don't have permission to delete this workflow"
@ -318,7 +393,7 @@ async def delete_workflow(
# Document Management Endpoints # Document Management Endpoints
@router.delete("/{workflowId}/messages/{messageId}", response_model=Dict[str, Any]) @router.delete("/{workflowId}/messages/{messageId}", response_model=Dict[str, Any])
@limiter.limit("10/minute") @limiter.limit("120/minute")
async def delete_workflow_message( async def delete_workflow_message(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),
@ -368,7 +443,7 @@ async def delete_workflow_message(
) )
@router.delete("/{workflowId}/messages/{messageId}/files/{fileId}", response_model=Dict[str, Any]) @router.delete("/{workflowId}/messages/{messageId}/files/{fileId}", response_model=Dict[str, Any])
@limiter.limit("10/minute") @limiter.limit("120/minute")
async def delete_file_from_message( async def delete_file_from_message(
request: Request, request: Request,
workflowId: str = Path(..., description="ID of the workflow"), workflowId: str = Path(..., description="ID of the workflow"),

File diff suppressed because it is too large Load diff

View file

@ -17,8 +17,8 @@ class DocumentManager:
def __init__(self, serviceContainer): def __init__(self, serviceContainer):
self.service = serviceContainer self.service = serviceContainer
# Create processor without any dependencies # Create processor with service container for AI calls
self._processor = DocumentProcessor() self._processor = DocumentProcessor(serviceContainer)
async def extractContentFromDocument(self, prompt: str, document: ChatDocument) -> ExtractedContent: async def extractContentFromDocument(self, prompt: str, document: ChatDocument) -> ExtractedContent:
"""Extract content from ChatDocument using prompt""" """Extract content from ChatDocument using prompt"""

View file

@ -52,8 +52,56 @@ class WorkflowManager:
except WorkflowStoppedException: except WorkflowStoppedException:
logger.info("Workflow stopped by user") logger.info("Workflow stopped by user")
# Update workflow status to stopped
workflow.status = "stopped"
workflow.lastActivity = datetime.now(UTC).isoformat()
self.chatInterface.updateWorkflow(workflow.id, {
"status": "stopped",
"lastActivity": workflow.lastActivity
})
# Add log entry
self.chatInterface.createWorkflowLog({
"workflowId": workflow.id,
"message": "Workflow stopped by user",
"type": "warning",
"status": "stopped",
"progress": 100
})
except Exception as e: except Exception as e:
logger.error(f"Workflow processing error: {str(e)}") logger.error(f"Workflow processing error: {str(e)}")
# Update workflow status to failed
workflow.status = "failed"
workflow.lastActivity = datetime.now(UTC).isoformat()
self.chatInterface.updateWorkflow(workflow.id, {
"status": "failed",
"lastActivity": workflow.lastActivity
})
# Create error message
error_message = {
"workflowId": workflow.id,
"role": "assistant",
"message": f"Workflow processing failed: {str(e)}",
"status": "last",
"sequenceNr": len(workflow.messages) + 1,
"publishedAt": datetime.now(UTC).isoformat()
}
message = self.chatInterface.createWorkflowMessage(error_message)
if message:
workflow.messages.append(message)
# Add error log entry
self.chatInterface.createWorkflowLog({
"workflowId": workflow.id,
"message": f"Workflow failed: {str(e)}",
"type": "error",
"status": "failed",
"progress": 100
})
raise raise
async def _sendFirstMessage(self, userInput: UserInputRequest, workflow: ChatWorkflow) -> ChatMessage: async def _sendFirstMessage(self, userInput: UserInputRequest, workflow: ChatWorkflow) -> ChatMessage:
@ -108,6 +156,25 @@ class WorkflowManager:
if message: if message:
workflow.messages.append(message) workflow.messages.append(message)
# Update workflow status to completed
workflow.status = "completed"
workflow.lastActivity = datetime.now(UTC).isoformat()
# Update workflow in database
self.chatInterface.updateWorkflow(workflow.id, {
"status": "completed",
"lastActivity": workflow.lastActivity
})
# Add completion log entry
self.chatInterface.createWorkflowLog({
"workflowId": workflow.id,
"message": "Workflow completed successfully",
"type": "success",
"status": "completed",
"progress": 100
})
except Exception as e: except Exception as e:
logger.error(f"Error sending last message: {str(e)}") logger.error(f"Error sending last message: {str(e)}")
raise raise
@ -128,6 +195,14 @@ class WorkflowManager:
message = self.chatInterface.createWorkflowMessage(error_message) message = self.chatInterface.createWorkflowMessage(error_message)
if message: if message:
workflow.messages.append(message) workflow.messages.append(message)
# Update workflow status to failed
workflow.status = "failed"
workflow.lastActivity = datetime.now(UTC).isoformat()
self.chatInterface.updateWorkflow(workflow.id, {
"status": "failed",
"lastActivity": workflow.lastActivity
})
return return
# Process successful workflow results # Process successful workflow results
@ -174,6 +249,14 @@ class WorkflowManager:
if message: if message:
workflow.messages.append(message) workflow.messages.append(message)
# Update workflow status to completed for successful workflows
workflow.status = "completed"
workflow.lastActivity = datetime.now(UTC).isoformat()
self.chatInterface.updateWorkflow(workflow.id, {
"status": "completed",
"lastActivity": workflow.lastActivity
})
except Exception as e: except Exception as e:
logger.error(f"Error processing workflow results: {str(e)}") logger.error(f"Error processing workflow results: {str(e)}")
# Create error message # Create error message
@ -188,4 +271,12 @@ class WorkflowManager:
message = self.chatInterface.createWorkflowMessage(error_message) message = self.chatInterface.createWorkflowMessage(error_message)
if message: if message:
workflow.messages.append(message) workflow.messages.append(message)
# Update workflow status to failed
workflow.status = "failed"
workflow.lastActivity = datetime.now(UTC).isoformat()
self.chatInterface.updateWorkflow(workflow.id, {
"status": "failed",
"lastActivity": workflow.lastActivity
})

View file

@ -32,9 +32,10 @@ class FileProcessingError(Exception):
class DocumentProcessor: class DocumentProcessor:
"""Processor for handling document operations and content extraction.""" """Processor for handling document operations and content extraction."""
def __init__(self): def __init__(self, serviceContainer=None):
"""Initialize the document processor.""" """Initialize the document processor."""
self._neutralizer = DataAnonymizer() if APP_CONFIG.get("ENABLE_CONTENT_NEUTRALIZATION", False) else None self._neutralizer = DataAnonymizer() if APP_CONFIG.get("ENABLE_CONTENT_NEUTRALIZATION", False) else None
self._serviceContainer = serviceContainer
self.supportedTypes: Dict[str, Callable[[bytes, str, str], Awaitable[List[ContentItem]]]] = { self.supportedTypes: Dict[str, Callable[[bytes, str, str], Awaitable[List[ContentItem]]]] = {
'text/plain': self._processText, 'text/plain': self._processText,
@ -108,7 +109,9 @@ class DocumentProcessor:
logger.info("Image processing libraries successfully loaded") logger.info("Image processing libraries successfully loaded")
except ImportError as e: except ImportError as e:
logger.warning(f"Image processing libraries could not be loaded: {e}") logger.warning(f"Image processing libraries could not be loaded: {e}")
async def processFileData(self, fileData: bytes, filename: str, mimeType: str, base64Encoded: bool = False, prompt: str = None, documentId: str = None) -> ExtractedContent: async def processFileData(self, fileData: bytes, filename: str, mimeType: str, base64Encoded: bool = False, prompt: str = None, documentId: str = None) -> ExtractedContent:
""" """
Process file data directly and extract its contents with AI processing. Process file data directly and extract its contents with AI processing.
@ -133,7 +136,7 @@ class DocumentProcessor:
# Detect content type if needed # Detect content type if needed
if mimeType == "application/octet-stream": if mimeType == "application/octet-stream":
mimeType = self._detectContentTypeFromData(fileData, filename) mimeType = self._serviceContainer.detectContentTypeFromData(fileData, filename)
# Process document based on type # Process document based on type
if mimeType not in self.supportedTypes: if mimeType not in self.supportedTypes:
@ -161,61 +164,8 @@ class DocumentProcessor:
except Exception as e: except Exception as e:
logger.error(f"Error processing file data: {str(e)}") logger.error(f"Error processing file data: {str(e)}")
raise FileProcessingError(f"Failed to process file data: {str(e)}") raise FileProcessingError(f"Failed to process file data: {str(e)}")
def _detectContentTypeFromData(self, fileData: bytes, filename: str) -> str:
"""Detect content type from file data and filename"""
try:
# Check file extension first
ext = os.path.splitext(filename)[1].lower()
if ext:
# Map common extensions to MIME types
extToMime = {
'.txt': 'text/plain',
'.md': 'text/markdown',
'.csv': 'text/csv',
'.json': 'application/json',
'.xml': 'application/xml',
'.js': 'application/javascript',
'.py': 'application/x-python',
'.svg': 'image/svg+xml',
'.jpg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.pdf': 'application/pdf',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.doc': 'application/msword',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.xls': 'application/vnd.ms-excel',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'.ppt': 'application/vnd.ms-powerpoint'
}
if ext in extToMime:
return extToMime[ext]
# Try to detect from content
if fileData.startswith(b'%PDF'):
return 'application/pdf'
elif fileData.startswith(b'PK\x03\x04'):
# ZIP-based formats (docx, xlsx, pptx)
return 'application/zip'
elif fileData.startswith(b'<'):
# XML-based formats
try:
text = fileData.decode('utf-8', errors='ignore')
if '<svg' in text.lower():
return 'image/svg+xml'
elif '<html' in text.lower():
return 'text/html'
else:
return 'application/xml'
except:
pass
return 'application/octet-stream'
except Exception as e:
logger.error(f"Error detecting content type from data: {str(e)}")
return 'application/octet-stream'
async def _processText(self, fileData: bytes, filename: str, mimeType: str) -> List[ContentItem]: async def _processText(self, fileData: bytes, filename: str, mimeType: str) -> List[ContentItem]:
"""Process text document""" """Process text document"""
@ -546,14 +496,22 @@ class DocumentProcessor:
try: try:
# Get content type from metadata # Get content type from metadata
mimeType = item.metadata.mimeType if hasattr(item.metadata, 'mimeType') else "text/plain" mimeType = item.metadata.mimeType if hasattr(item.metadata, 'mimeType') else "text/plain"
logger.debug(f"Processing content item with MIME type: {mimeType}, label: {item.label}")
# Chunk content based on type # Chunk content based on type
if mimeType.startswith('text/'): if mimeType.startswith('text/'):
chunks = self._chunkText(item.data, mimeType) chunks = self._chunkText(item.data, mimeType)
elif mimeType.startswith('image/'): elif mimeType.startswith('image/'):
chunks = self._chunkImage(item.data) # Images should not be chunked - process as single unit
elif mimeType.startswith('video/'): chunks = [item.data]
chunks = self._chunkVideo(item.data) elif mimeType == "application/pdf":
chunks = self._chunkPdf(item.data)
elif mimeType == "application/vnd.openxmlformats-officedocument.wordprocessingml.document":
chunks = self._chunkDocx(item.data)
elif mimeType == "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":
chunks = self._chunkXlsx(item.data)
elif mimeType.startswith('application/vnd.openxmlformats-officedocument.presentationml.presentation'):
chunks = self._chunkPptx(item.data)
else: else:
# Binary data - no chunking # Binary data - no chunking
chunks = [item.data] chunks = [item.data]
@ -561,26 +519,42 @@ class DocumentProcessor:
# Process each chunk # Process each chunk
chunkResults = [] chunkResults = []
for chunk in chunks: for chunk in chunks:
# Neutralize content if neutralizer is enabled # Process with AI based on content type
contentToProcess = chunk try:
if self._neutralizer and contentToProcess: logger.debug(f"AI processing chunk with MIME type: {mimeType}")
contentToProcess = self._neutralizer.neutralize(contentToProcess) if mimeType.startswith('image/'):
# For images, use image AI service with base64 data
# Create AI prompt for this chunk # chunk is already base64 encoded string from _processImage
aiPrompt = f""" # Use the original prompt directly for images (no content embedding)
Extract relevant information from this content based on the following prompt: logger.debug(f"Calling image AI service for MIME type: {mimeType}")
processedContent = await self._serviceContainer.callAiImageBasic(prompt, chunk, mimeType)
PROMPT: {prompt} else:
# For text content, use text AI service
CONTENT: # Neutralize content if neutralizer is enabled (only for text)
{contentToProcess} contentToProcess = chunk
if self._neutralizer and contentToProcess:
Return ONLY the extracted information in a clear, concise format. contentToProcess = self._neutralizer.neutralize(contentToProcess)
"""
# Create AI prompt for text content
# Note: This would need to be implemented with actual AI service aiPrompt = f"""
# For now, just return the original content Extract relevant information from this content based on the following prompt:
chunkResults.append(contentToProcess)
PROMPT: {prompt}
CONTENT:
{contentToProcess}
Return ONLY the extracted information in a clear, concise format.
"""
logger.debug(f"Calling text AI service for MIME type: {mimeType}")
processedContent = await self._serviceContainer.callAiTextBasic(aiPrompt, contentToProcess)
chunkResults.append(processedContent)
except Exception as aiError:
logger.error(f"AI processing failed for chunk: {str(aiError)}")
# Fallback to original content
chunkResults.append(chunk)
# Combine chunk results # Combine chunk results
combinedResult = "\n".join(chunkResults) combinedResult = "\n".join(chunkResults)
@ -604,6 +578,8 @@ class DocumentProcessor:
return processedItems return processedItems
def _chunkText(self, content: str, mimeType: str) -> List[str]: def _chunkText(self, content: str, mimeType: str) -> List[str]:
"""Chunk text content based on mime type""" """Chunk text content based on mime type"""
if mimeType == "text/plain": if mimeType == "text/plain":
@ -765,36 +741,6 @@ class DocumentProcessor:
except Exception: except Exception:
return [content] return [content]
def _chunkImage(self, content: str) -> List[str]:
"""Chunk image content"""
try:
imageData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["image"]
for i in range(0, len(imageData), chunkSize):
chunk = imageData[i:i + chunkSize]
chunks.append(base64.b64encode(chunk).decode('utf-8'))
return chunks
except Exception:
return [content]
def _chunkVideo(self, content: str) -> List[str]:
"""Chunk video content"""
try:
videoData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["video"]
for i in range(0, len(videoData), chunkSize):
chunk = videoData[i:i + chunkSize]
chunks.append(base64.b64encode(chunk).decode('utf-8'))
return chunks
except Exception:
return [content]
def _chunkBinary(self, content: str) -> List[str]: def _chunkBinary(self, content: str) -> List[str]:
"""Chunk binary content""" """Chunk binary content"""
try: try:
@ -810,4 +756,87 @@ class DocumentProcessor:
except Exception: except Exception:
return [content] return [content]
async def _chunkPdf(self, content: str) -> List[str]:
"""Chunk PDF content"""
try:
pdfData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["pdf"]
with io.BytesIO(pdfData) as pdfStream:
pdfReader = PyPDF2.PdfReader(pdfStream)
for pageNum in range(len(pdfReader.pages)):
page = pdfReader.pages[pageNum]
pageText = page.extract_text()
if pageText:
chunks.append(pageText)
return chunks
except Exception:
return [content]
async def _chunkDocx(self, content: str) -> List[str]:
"""Chunk Word document content"""
try:
docxData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["docx"]
with io.BytesIO(docxData) as docxStream:
doc = docx.Document(docxStream)
for para in doc.paragraphs:
chunks.append(para.text)
for table in doc.tables:
for row in table.rows:
rowText = []
for cell in row.cells:
rowText.append(cell.text)
chunks.append(" | ".join(rowText))
return chunks
except Exception:
return [content]
async def _chunkXlsx(self, content: str) -> List[str]:
"""Chunk Excel document content"""
try:
xlsxData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["xlsx"]
with io.BytesIO(xlsxData) as xlsxStream:
workbook = openpyxl.load_workbook(xlsxStream, data_only=True)
for sheetName in workbook.sheetnames:
sheet = workbook[sheetName]
for row in sheet.iter_rows():
rowText = []
for cell in row:
value = cell.value
if value is None:
rowText.append("")
else:
rowText.append(str(value).replace('"', '""'))
chunks.append(','.join(f'"{cell}"' for cell in rowText))
return chunks
except Exception:
return [content]
async def _chunkPptx(self, content: str) -> List[str]:
"""Chunk PowerPoint document content"""
try:
pptxData = base64.b64decode(content)
chunks = []
chunkSize = self.chunkSizes["pptx"]
with io.BytesIO(pptxData) as pptxStream:
# openpyxl is not suitable for PowerPoint, so we'll just read text
# This is a placeholder and would require a different library for full pptx processing
# For now, we'll just return the base64 encoded content as a single chunk
chunks.append(content)
return chunks
except Exception:
return [content]

View file

@ -2,6 +2,7 @@ import logging
import importlib import importlib
import pkgutil import pkgutil
import inspect import inspect
import os
from typing import Dict, Any, List, Optional from typing import Dict, Any, List, Optional
from modules.interfaces.interfaceAppModel import User, UserConnection from modules.interfaces.interfaceAppModel import User, UserConnection
from modules.interfaces.interfaceChatModel import ( from modules.interfaces.interfaceChatModel import (
@ -111,6 +112,155 @@ class ServiceContainer:
except Exception as e: except Exception as e:
logger.error(f"Error discovering methods: {str(e)}") logger.error(f"Error discovering methods: {str(e)}")
def detectContentTypeFromData(self, fileData: bytes, filename: str) -> str:
"""
Detect content type from file data and filename.
This method makes the MIME type detection function accessible through the service container.
Args:
fileData: Raw file data as bytes
filename: Name of the file
Returns:
str: Detected MIME type
"""
try:
# Check file extension first
ext = os.path.splitext(filename)[1].lower()
if ext:
# Map common extensions to MIME types
extToMime = {
'.txt': 'text/plain',
'.md': 'text/markdown',
'.csv': 'text/csv',
'.json': 'application/json',
'.xml': 'application/xml',
'.js': 'application/javascript',
'.py': 'application/x-python',
'.svg': 'image/svg+xml',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.bmp': 'image/bmp',
'.webp': 'image/webp',
'.pdf': 'application/pdf',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.doc': 'application/msword',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.xls': 'application/vnd.ms-excel',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'.ppt': 'application/vnd.ms-powerpoint',
'.html': 'text/html',
'.htm': 'text/html',
'.css': 'text/css',
'.zip': 'application/zip',
'.rar': 'application/x-rar-compressed',
'.7z': 'application/x-7z-compressed',
'.tar': 'application/x-tar',
'.gz': 'application/gzip'
}
if ext in extToMime:
return extToMime[ext]
# Try to detect from content
if fileData.startswith(b'%PDF'):
return 'application/pdf'
elif fileData.startswith(b'PK\x03\x04'):
# ZIP-based formats (docx, xlsx, pptx)
return 'application/zip'
elif fileData.startswith(b'<'):
# XML-based formats
try:
text = fileData.decode('utf-8', errors='ignore')
if '<svg' in text.lower():
return 'image/svg+xml'
elif '<html' in text.lower():
return 'text/html'
else:
return 'application/xml'
except:
pass
elif fileData.startswith(b'\x89PNG\r\n\x1a\n'):
return 'image/png'
elif fileData.startswith(b'\xff\xd8\xff'):
return 'image/jpeg'
elif fileData.startswith(b'GIF87a') or fileData.startswith(b'GIF89a'):
return 'image/gif'
elif fileData.startswith(b'BM'):
return 'image/bmp'
elif fileData.startswith(b'RIFF') and fileData[8:12] == b'WEBP':
return 'image/webp'
return 'application/octet-stream'
except Exception as e:
logger.error(f"Error detecting content type from data: {str(e)}")
return 'application/octet-stream'
def getMimeTypeFromExtension(self, extension: str) -> str:
"""
Get MIME type based on file extension.
This method consolidates MIME type detection from extension.
Args:
extension: File extension (with or without dot)
Returns:
str: MIME type for the extension
"""
# Normalize extension (remove dot if present)
if extension.startswith('.'):
extension = extension[1:]
# Map extensions to MIME types
mime_types = {
'txt': 'text/plain',
'json': 'application/json',
'xml': 'application/xml',
'csv': 'text/csv',
'html': 'text/html',
'htm': 'text/html',
'md': 'text/markdown',
'py': 'text/x-python',
'js': 'application/javascript',
'css': 'text/css',
'pdf': 'application/pdf',
'doc': 'application/msword',
'docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'xls': 'application/vnd.ms-excel',
'xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'ppt': 'application/vnd.ms-powerpoint',
'pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'svg': 'image/svg+xml',
'jpg': 'image/jpeg',
'jpeg': 'image/jpeg',
'png': 'image/png',
'gif': 'image/gif',
'bmp': 'image/bmp',
'webp': 'image/webp',
'zip': 'application/zip',
'rar': 'application/x-rar-compressed',
'7z': 'application/x-7z-compressed',
'tar': 'application/x-tar',
'gz': 'application/gzip'
}
return mime_types.get(extension.lower(), 'application/octet-stream')
def getFileExtension(self, filename: str) -> str:
"""
Extract file extension from filename.
Args:
filename: Name of the file
Returns:
str: File extension (without dot)
"""
if '.' in filename:
return filename.split('.')[-1].lower()
return "txt" # Default to text
# ===== Functions ===== # ===== Functions =====
def extractContent(self, prompt: str, document: ChatDocument) -> str: def extractContent(self, prompt: str, document: ChatDocument) -> str:
@ -399,11 +549,11 @@ Please provide a clear summary of this message."""
"""Advanced text processing using Anthropic""" """Advanced text processing using Anthropic"""
return self.interfaceAiCalls.callAiTextAdvanced(prompt, context) return self.interfaceAiCalls.callAiTextAdvanced(prompt, context)
def callAiImageBasic(self, prompt: str, imageData: bytes, mimeType: str) -> str: def callAiImageBasic(self, prompt: str, imageData: str, mimeType: str) -> str:
"""Basic image processing using OpenAI""" """Basic image processing using OpenAI"""
return self.interfaceAiCalls.callAiImageBasic(prompt, imageData, mimeType) return self.interfaceAiCalls.callAiImageBasic(prompt, imageData, mimeType)
def callAiImageAdvanced(self, prompt: str, imageData: bytes, mimeType: str) -> str: def callAiImageAdvanced(self, prompt: str, imageData: str, mimeType: str) -> str:
"""Advanced image processing using Anthropic""" """Advanced image processing using Anthropic"""
return self.interfaceAiCalls.callAiImageAdvanced(prompt, imageData, mimeType) return self.interfaceAiCalls.callAiImageAdvanced(prompt, imageData, mimeType)
@ -463,6 +613,30 @@ Please provide a clear summary of this message."""
mimeType=mimeType mimeType=mimeType
) )
def extractTextFromContentObjects(self, content_objects: List[Any]) -> List[str]:
"""
Extract text content from ExtractedContent objects or other content objects.
Args:
content_objects: List of ExtractedContent objects or other content objects
Returns:
List of extracted text strings
"""
text_contents = []
for content_obj in content_objects:
if hasattr(content_obj, 'contents') and content_obj.contents:
# Extract text from ContentItem objects
for content_item in content_obj.contents:
if hasattr(content_item, 'data') and content_item.data:
text_contents.append(content_item.data)
elif isinstance(content_obj, str):
text_contents.append(content_obj)
else:
# Fallback: convert to string representation
text_contents.append(str(content_obj))
return text_contents
async def executeAction(self, methodName: str, actionName: str, parameters: Dict[str, Any]) -> ActionResult: async def executeAction(self, methodName: str, actionName: str, parameters: Dict[str, Any]) -> ActionResult:
"""Execute a method action""" """Execute a method action"""
try: try:

31
run_document_test.ps1 Normal file
View file

@ -0,0 +1,31 @@
# PowerShell script to run document extraction test
# Usage: .\run_document_test.ps1 [file_path]
param(
[string]$FilePath = "test_sample_document.txt"
)
Write-Host "=== PowerOn Document Extraction Test ===" -ForegroundColor Green
Write-Host ""
# Check if file exists
if (-not (Test-Path $FilePath)) {
Write-Host "Error: File not found: $FilePath" -ForegroundColor Red
Write-Host "Please provide a valid file path as parameter or ensure test_sample_document.txt exists." -ForegroundColor Yellow
exit 1
}
Write-Host "Testing document extraction for file: $FilePath" -ForegroundColor Cyan
Write-Host "Log file will be: test_document_extraction.log" -ForegroundColor Cyan
Write-Host ""
# Run the Python test
try {
python test_document_extraction.py $FilePath
Write-Host ""
Write-Host "Test completed successfully!" -ForegroundColor Green
Write-Host "Check test_document_extraction.log for detailed results." -ForegroundColor Cyan
} catch {
Write-Host "Test failed with error: $($_.Exception.Message)" -ForegroundColor Red
exit 1
}

288
test_document_extraction.py Normal file
View file

@ -0,0 +1,288 @@
#!/usr/bin/env python3
"""
Test procedure for DocumentManager document extraction functionality.
"""
import asyncio
import sys
import os
import json
import argparse
from datetime import datetime, UTC
from pathlib import Path
import logging
print("Starting test_document_extraction.py...")
# Configure logging FIRST, before any other imports
import logging
# Clear any existing handlers to avoid duplicate logs
for handler in logging.root.handlers[:]:
logging.root.removeHandler(handler)
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
handlers=[
logging.StreamHandler(sys.stdout),
logging.FileHandler('test_document_extraction.log', mode='w', encoding='utf-8') # 'w' mode clears the file
],
force=True # Force reconfiguration even if already configured
)
# Filter out httpcore messages
logging.getLogger('httpcore').setLevel(logging.WARNING)
logging.getLogger('httpx').setLevel(logging.WARNING)
logger = logging.getLogger(__name__)
# Set up test configuration
os.environ['POWERON_CONFIG_FILE'] = 'test_config.ini'
print("Set POWERON_CONFIG_FILE environment variable")
try:
# Import required modules
from modules.interfaces.interfaceAppObjects import User, UserConnection
from modules.interfaces.interfaceChatModel import ChatWorkflow
from modules.workflow.managerDocument import DocumentManager
from modules.workflow.serviceContainer import ServiceContainer
print("All imports successful")
except Exception as e:
print(f"Import error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
def log_extraction_debug(message: str, data: dict = None):
"""Log extraction debug data with JSON dumps"""
timestamp = datetime.now(UTC).isoformat()
if data:
logger.debug(f"[{timestamp}] {message}\n{json.dumps(data, indent=2, ensure_ascii=False)}")
else:
logger.debug(f"[{timestamp}] {message}")
def create_test_user() -> User:
"""Create a test user for the document extraction"""
return User(
id="test-user-doc-001",
mandateId="test-mandate-doc-001",
username="testuser_doc",
email="test_doc@example.com",
fullName="Test Document User",
enabled=True,
language="en",
privilege="user",
authenticationAuthority="local"
)
def create_test_workflow() -> ChatWorkflow:
"""Create a test workflow for document extraction"""
return ChatWorkflow(
id="test-workflow-doc-001",
mandateId="test-mandate-doc-001",
status="running",
name="Document Extraction Test Workflow",
currentRound=1,
lastActivity=datetime.now(UTC).isoformat(),
startedAt=datetime.now(UTC).isoformat(),
logs=[],
messages=[],
stats=None,
tasks=[]
)
def detect_mime_type(file_path: str) -> str:
"""Detect MIME type based on file extension"""
ext = Path(file_path).suffix.lower()
mime_types = {
'.txt': 'text/plain',
'.md': 'text/markdown',
'.csv': 'text/csv',
'.json': 'application/json',
'.xml': 'application/xml',
'.js': 'application/javascript',
'.py': 'application/x-python',
'.svg': 'image/svg+xml',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.pdf': 'application/pdf',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.doc': 'application/msword',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.xls': 'application/vnd.ms-excel',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'.ppt': 'application/vnd.ms-powerpoint',
'.html': 'text/html',
'.htm': 'text/html'
}
return mime_types.get(ext, 'application/octet-stream')
async def test_document_extraction(file_path: str):
"""Test document extraction from a file path"""
try:
# Clear the log file before each run
log_file_path = "test_document_extraction.log"
if os.path.exists(log_file_path):
with open(log_file_path, 'w') as f:
f.write("") # Clear the file
logger.info(f"Cleared log file: {log_file_path}")
logger.info("=== STARTING DOCUMENT EXTRACTION TEST ===")
# Validate file path
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
# Get file info
file_path_obj = Path(file_path)
filename = file_path_obj.name
mime_type = detect_mime_type(file_path)
file_size = file_path_obj.stat().st_size
log_extraction_debug("File information", {
"file_path": file_path,
"filename": filename,
"mime_type": mime_type,
"file_size_bytes": file_size,
"file_size_mb": round(file_size / (1024 * 1024), 2)
})
# Read file data
try:
with open(file_path, 'rb') as f:
file_data = f.read()
log_extraction_debug("File read successfully", {
"bytes_read": len(file_data),
"file_encoding": "binary"
})
except Exception as e:
logger.error(f"Error reading file: {str(e)}")
raise
# Create test user and workflow
test_user = create_test_user()
test_workflow = create_test_workflow()
# Create service container
service_container = ServiceContainer(test_user, test_workflow)
log_extraction_debug("Service container created", {
"user_id": test_user.id,
"workflow_id": test_workflow.id
})
# Create document manager
document_manager = DocumentManager(service_container)
log_extraction_debug("Document manager created")
# Define extraction prompt
extraction_prompt = "extract the table and convert it to a csv table"
log_extraction_debug("Starting document extraction", {
"prompt": extraction_prompt,
"filename": filename,
"mime_type": mime_type
})
# Extract content from file data
try:
extracted_content = await document_manager.extractContentFromFileData(
prompt=extraction_prompt,
fileData=file_data,
filename=filename,
mimeType=mime_type,
base64Encoded=False,
documentId=f"test-doc-{datetime.now(UTC).timestamp()}"
)
# Log extraction results
extraction_result = {
"extracted_content_id": extracted_content.id,
"content_items_count": len(extracted_content.contents)
}
# Add objectId and objectType if they exist (set by DocumentManager)
if hasattr(extracted_content, 'objectId'):
extraction_result["object_id"] = extracted_content.objectId
if hasattr(extracted_content, 'objectType'):
extraction_result["object_type"] = extracted_content.objectType
log_extraction_debug("Document extraction completed successfully", extraction_result)
# Log detailed content information
for i, content_item in enumerate(extracted_content.contents):
content_info = {
"label": content_item.label,
"data_length": len(content_item.data) if content_item.data else 0,
"data_preview": content_item.data[:500] + "..." if content_item.data and len(content_item.data) > 500 else content_item.data
}
# Add metadata if available
if content_item.metadata:
content_info["metadata"] = {
"size": content_item.metadata.size,
"mime_type": content_item.metadata.mimeType,
"base64_encoded": content_item.metadata.base64Encoded,
"pages": content_item.metadata.pages
}
log_extraction_debug(f"CONTENT ITEM {i+1}:", content_info)
# Log summary of all extracted content
all_content = "\n\n".join([item.data for item in extracted_content.contents if item.data])
log_extraction_debug("COMPLETE EXTRACTED CONTENT:", {
"total_length": len(all_content),
"content": all_content
})
return extracted_content
except Exception as e:
log_extraction_debug("DOCUMENT EXTRACTION EXCEPTION:", {
"error_type": type(e).__name__,
"error_message": str(e),
"error_args": e.args if hasattr(e, 'args') else None
})
raise
logger.info("=== DOCUMENT EXTRACTION TEST COMPLETED ===")
return extracted_content
except Exception as e:
logger.error(f"❌ Document extraction test failed with error: {str(e)}")
log_extraction_debug("Full error details", {
"error_type": type(e).__name__,
"error_message": str(e)
})
raise
async def main():
"""Main function to run the document extraction test"""
print("Inside main()")
logger.info("=" * 50)
logger.info("DOCUMENT EXTRACTION TEST")
logger.info("=" * 50)
# Parse command line arguments
parser = argparse.ArgumentParser(description='Test document extraction functionality')
parser.add_argument('file_path', help='Path to the file to extract content from')
args = parser.parse_args()
try:
extracted_content = await test_document_extraction(args.file_path)
logger.info("=" * 50)
logger.info("TEST COMPLETED SUCCESSFULLY")
logger.info("=" * 50)
return extracted_content
except Exception as e:
logger.error("=" * 50)
logger.error("TEST FAILED")
logger.error("=" * 50)
raise
if __name__ == "__main__":
print("About to run main()")
asyncio.run(main())
print("main() finished")

289
test_retry_enhancement.py Normal file
View file

@ -0,0 +1,289 @@
#!/usr/bin/env python3
"""
Test script for retry enhancement in managerChat.py
Tests that previous action results and review feedback are properly passed to retry prompts.
"""
import asyncio
import logging
import sys
import os
# Add the gateway directory to the Python path
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'gateway'))
from modules.workflow.managerChat import ChatManager
from modules.interfaces.interfaceAppModel import User
from modules.interfaces.interfaceChatModel import ChatWorkflow, ChatMessage
from modules.interfaces.interfaceChatObjects import ChatObjects
# Configure logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
class MockChatObjects(ChatObjects):
"""Mock implementation of ChatObjects for testing"""
def createTaskAction(self, action_data):
"""Mock task action creation"""
class MockTaskAction:
def __init__(self, data):
self.id = "test_action_id"
self.execMethod = data.get("execMethod", "unknown")
self.execAction = data.get("execAction", "unknown")
self.execParameters = data.get("execParameters", {})
self.execResultLabel = data.get("execResultLabel", "")
self.status = data.get("status", "PENDING")
self.result = ""
self.error = ""
def setSuccess(self):
self.status = "COMPLETED"
def setError(self, error):
self.status = "FAILED"
self.error = error
def isSuccessful(self):
return self.status == "COMPLETED"
return MockTaskAction(action_data)
def createChatDocument(self, document_data):
"""Mock document creation"""
class MockChatDocument:
def __init__(self, data):
self.fileId = data.get("fileId", "")
self.filename = data.get("filename", "unknown")
self.fileSize = data.get("fileSize", 0)
self.mimeType = data.get("mimeType", "application/octet-stream")
self.content = ""
return MockChatDocument(document_data)
def createWorkflowMessage(self, message_data):
"""Mock message creation"""
class MockWorkflowMessage:
def __init__(self, data):
self.workflowId = data.get("workflowId", "")
self.role = data.get("role", "assistant")
self.message = data.get("message", "")
self.status = data.get("status", "step")
self.sequenceNr = data.get("sequenceNr", 1)
self.publishedAt = data.get("publishedAt", "")
self.actionId = data.get("actionId", "")
self.actionMethod = data.get("actionMethod", "")
self.actionName = data.get("actionName", "")
self.documentsLabel = data.get("documentsLabel", "")
self.documents = data.get("documents", [])
return MockWorkflowMessage(message_data)
class MockServiceContainer:
"""Mock service container for testing"""
def __init__(self, user, workflow):
self.user = user
self.workflow = workflow
def getMethodsList(self):
"""Mock methods list"""
return ["document.extract(documentList, aiPrompt)", "document.analyze(documentList, aiPrompt)"]
async def summarizeChat(self, messages):
"""Mock chat summarization"""
return "Mock chat history summary"
def getDocumentReferenceList(self):
"""Mock document references"""
return {
'chat': [],
'history': []
}
def getConnectionReferenceList(self):
"""Mock connection references"""
return ["connection1", "connection2"]
def getFileInfo(self, fileId):
"""Mock file info"""
return {
"filename": f"test_file_{fileId}.txt",
"size": 1024,
"mimeType": "text/plain"
}
def createFile(self, fileName, mimeType, content, base64encoded=False):
"""Mock file creation"""
return f"file_id_{fileName}"
def createDocument(self, fileName, mimeType, content, base64encoded=False):
"""Mock document creation"""
class MockDocument:
def __init__(self, name, mime, cont):
self.filename = name
self.mimeType = mime
self.content = cont
self.fileSize = len(cont)
return MockDocument(fileName, mimeType, content)
def getFileExtension(self, filename):
"""Mock file extension extraction"""
return filename.split('.')[-1] if '.' in filename else 'txt'
def getMimeTypeFromExtension(self, extension):
"""Mock MIME type detection"""
mime_types = {
'txt': 'text/plain',
'pdf': 'application/pdf',
'doc': 'application/msword',
'json': 'application/json'
}
return mime_types.get(extension, 'application/octet-stream')
def detectContentTypeFromData(self, file_bytes, filename):
"""Mock content type detection"""
if filename.endswith('.txt'):
return 'text/plain'
elif filename.endswith('.pdf'):
return 'application/pdf'
elif filename.endswith('.json'):
return 'application/json'
return 'application/octet-stream'
async def callAiTextBasic(self, prompt):
"""Mock AI call"""
return '{"actions": [{"method": "document", "action": "extract", "parameters": {"documentList": ["test"], "aiPrompt": "Test prompt"}, "resultLabel": "task1_action1_test", "description": "Test action"}]}'
async def callAiTextAdvanced(self, prompt):
"""Mock advanced AI call"""
return '{"overview": "Test plan", "tasks": [{"id": "task_1", "description": "Test task", "dependencies": [], "expected_outputs": ["output1"], "success_criteria": ["criteria1"], "required_documents": [], "estimated_complexity": "low", "ai_prompt": "Test prompt"}]}'
async def executeAction(self, methodName, actionName, parameters):
"""Mock action execution"""
class MockResult:
def __init__(self):
self.success = True
self.data = {
"result": "Mock execution result",
"documents": []
}
self.error = None
return MockResult()
async def test_retry_enhancement():
"""Test the retry enhancement functionality"""
logger.info("Testing retry enhancement in managerChat.py")
# Create mock objects
mock_user = User(id="test_user", username="testuser", email="test@example.com", mandateId="test_mandate")
mock_chat_objects = MockChatObjects()
mock_workflow = ChatWorkflow(
id="test_workflow",
userId="test_user",
status="active",
messages=[],
createdAt="2024-01-01T00:00:00Z",
updatedAt="2024-01-01T00:00:00Z",
mandateId="test_mandate",
currentRound=1,
lastActivity="2024-01-01T00:00:00Z",
startedAt="2024-01-01T00:00:00Z"
)
# Create chat manager
chat_manager = ChatManager(mock_user, mock_chat_objects)
# Mock the service container directly instead of initializing
chat_manager.service = MockServiceContainer(mock_user, mock_workflow)
chat_manager.workflow = mock_workflow
# Test 1: Basic action definition without retry
logger.info("Test 1: Basic action definition")
task_step = {
"id": "task_1",
"description": "Test task",
"expected_outputs": ["output1"],
"success_criteria": ["criteria1"],
"ai_prompt": "Test AI prompt"
}
actions = await chat_manager.defineTaskActions(task_step, mock_workflow, [])
logger.info(f"Generated {len(actions)} actions without retry context")
# Test 2: Action definition with retry context
logger.info("Test 2: Action definition with retry context")
enhanced_context = {
'task_step': task_step,
'workflow': mock_workflow,
'workflow_id': mock_workflow.id,
'available_documents': ["test_doc.txt"],
'previous_results': ["task0_action1_results"],
'improvements': "Previous attempt failed - ensure comprehensive extraction",
'retry_count': 1,
'previous_action_results': [
{
'actionMethod': 'document',
'actionName': 'extract',
'status': 'failed',
'error': 'Empty result returned',
'result': 'No content extracted',
'resultLabel': 'task1_action1_failed'
}
],
'previous_review_result': {
'status': 'retry',
'reason': 'Incomplete extraction',
'quality_score': 3,
'missing_outputs': ['detailed_analysis'],
'unmet_criteria': ['comprehensive_coverage']
}
}
retry_actions = await chat_manager.defineTaskActions(task_step, mock_workflow, [], enhanced_context)
logger.info(f"Generated {len(retry_actions)} actions with retry context")
# Test 3: Verify retry context is properly handled
logger.info("Test 3: Verifying retry context handling")
# Create a test prompt to see if retry context is included
test_prompt = await chat_manager._createActionDefinitionPrompt(enhanced_context)
# Check if retry context is in the prompt
if "RETRY CONTEXT" in test_prompt:
logger.info("✓ Retry context properly included in prompt")
else:
logger.error("✗ Retry context not found in prompt")
if "Previous action results that failed" in test_prompt:
logger.info("✓ Previous action results included in prompt")
else:
logger.error("✗ Previous action results not found in prompt")
if "Previous review feedback" in test_prompt:
logger.info("✓ Previous review feedback included in prompt")
else:
logger.error("✗ Previous review feedback not found in prompt")
if "Previous attempt failed" in test_prompt:
logger.info("✓ Improvements needed included in prompt")
else:
logger.error("✗ Improvements needed not found in prompt")
# Test 4: Verify fallback actions with retry context
logger.info("Test 4: Testing fallback actions with retry context")
fallback_actions = chat_manager._createFallbackActions(task_step, enhanced_context)
logger.info(f"Generated {len(fallback_actions)} fallback actions with retry context")
# Check if fallback actions include retry information
if any("retry" in action.get("resultLabel", "") for action in fallback_actions):
logger.info("✓ Fallback actions include retry information")
else:
logger.error("✗ Fallback actions missing retry information")
logger.info("Retry enhancement test completed successfully!")
if __name__ == "__main__":
asyncio.run(test_retry_enhancement())

47
test_sample_document.txt Normal file
View file

@ -0,0 +1,47 @@
PowerOn System Architecture Overview
This document provides a comprehensive overview of the PowerOn system architecture, including its key components, data flow, and technical specifications.
MAJOR TOPICS:
1. System Architecture
- Frontend Agents: Web-based user interface components
- Gateway: Central API and workflow management system
- Database: JSON-based data storage with component interfaces
- AI Integration: Anthropic and OpenAI connectors for intelligent processing
2. Core Components
- Document Manager: Handles file processing and content extraction
- Workflow Manager: Orchestrates complex business processes
- Service Container: Provides unified access to all system services
- Neutralizer: Data anonymization and privacy protection
3. Data Flow Architecture
- User authentication and authorization
- Document upload and processing pipeline
- AI-powered content analysis and extraction
- Workflow execution and task management
- Result generation and storage
4. Technical Specifications
- Python-based backend with async/await support
- RESTful API design with JSON data exchange
- Modular component architecture
- Extensible method system for business logic
- Comprehensive logging and monitoring
5. Security Features
- Multi-authentication authority support (Local, Microsoft, Google)
- Token-based session management
- Data encryption and anonymization
- Role-based access control
- Audit trail and compliance features
6. Integration Capabilities
- SharePoint document management
- Email system integration (Outlook)
- Web crawling and data collection
- AI service integration (Anthropic, OpenAI)
- Custom method development framework
The PowerOn system is designed to provide a comprehensive platform for intelligent document processing, workflow automation, and AI-powered business process management. It combines modern web technologies with advanced AI capabilities to deliver a robust and scalable solution for enterprise document management and workflow automation.