11 KiB
Pydantic Class Enhancement Proposal
Format Tracking & Validation Alignment
Date: 2025-11-02
Purpose: Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats
Simplified Approach: Use existing document metadata (name, size, format, mimeType) - no summary fields needed
Executive Summary
This proposal addresses:
- Validation alignment: What prompts ask for matches what validators check
- Workflow-level validation: Check ALL deliverables from ALL tasks against original user request
- Format tracking: Track expected formats (list) at workflow and task levels
- Adaptive task planning: Next task uses ALL workflow data (messages, document metadata) to refine objective
Key Simplification: Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata.
1. ActionResult Class Changes
File: gateway/modules/datamodels/datamodelChat.py (lines 483-521)
NO CHANGES NEEDED
Current Structure (KEEP ALL - ALL USED):
- ✅
success: bool- Used by validation - ✅
error: Optional[str]- Used for error handling - ✅
documents: List[ActionDocument]- Contains document metadata (name, data, mimeType) - ✅
resultLabel: Optional[str]- Used for document routing
Documents already provide all needed metadata:
documentName- File namedocumentData- ContentmimeType- MIME type (can derive format from this)
No summary field needed - document metadata is sufficient.
2. TaskResult Class Changes
File: gateway/modules/datamodels/datamodelChat.py (lines 718-736)
NO CHANGES NEEDED
Current Structure (KEEP ALL - ALL USED):
- ✅
taskId: str- Task identification - ✅
status: TaskStatus- Task status tracking - ✅
success: bool- Success flag - ✅
feedback: Optional[str]- Task feedback - ✅
error: Optional[str]- Error message
Document metadata available from workflow:
- Can extract delivered formats from documents in workflow messages
- No need to store separately - use existing document metadata
3. TaskStep Class Changes
File: gateway/modules/datamodels/datamodelChat.py (lines 790-825)
Modify
- Change
expectedFormat: Optional[str]→expectedFormats: Optional[List[str]] - Keep
dataTypeandqualityRequirementsas-is
Modified Class:
class TaskStep(BaseModel):
id: str
objective: str
dependencies: Optional[list[str]] = Field(default_factory=list)
successCriteria: Optional[list[str]] = Field(default_factory=list)
estimatedComplexity: Optional[str] = None
userMessage: Optional[str] = Field(
None, description="User-friendly message in user's language"
)
# Format details extracted from intent analysis
dataType: Optional[str] = Field(
None, description="Expected data type (text, numbers, documents, etc.)"
)
expectedFormats: Optional[List[str]] = Field(
None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms."
)
qualityRequirements: Optional[Dict[str, Any]] = Field(
None, description="Quality requirements and constraints"
)
Register Labels
Update:
"expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"}
4. ChatWorkflow Class Changes (for Workflow-Level Tracking)
File: gateway/modules/datamodels/datamodelChat.py (find ChatWorkflow class)
Add (if not exists)
expectedFormats: Optional[List[str]] = Field(
None,
description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis."
)
Note: _workflowIntent is already stored as a dict (not a model field), so expectedFormats can be extracted from there, but having it as an explicit field makes it easier to query.
5. ActionItem Class Review
File: gateway/modules/datamodels/datamodelChat.py (lines 652-715)
Current Structure (ALL USED - KEEP):
- ✅
id: str- Used for action identification - ✅
execMethod: str- Used for action execution - ✅
execAction: str- Used for action execution - ✅
execParameters: Dict[str, Any]- Used for action execution - ✅
execResultLabel: Optional[str]- Used for document routing - ✅
expectedDocumentFormats: Optional[List[Dict[str, str]]]- Used by action planning - ✅
userMessage: Optional[str]- Used for user communication - ✅
status: TaskStatus- Used for tracking - ✅
error: Optional[str]- Used for error handling - ✅
retryCount: int- Used for retry logic - ✅
retryMax: int- Used for retry logic - ✅
processingTime: Optional[float]- Used for performance tracking - ✅
timestamp: float- Used for ordering/auditing - ✅
result: Optional[str]- Used to store action result text
NO CHANGES NEEDED - All attributes are used
6. Summary of Changes
Classes to Modify:
- ✅ TaskStep - Change
expectedFormat(str) →expectedFormats(List[str]) - ✅ ChatWorkflow - Add
expectedFormats(optional, for explicit tracking)
Classes to Review (NO CHANGES):
- ✅ ActionResult - Keep as-is, documents already have metadata
- ✅ TaskResult - Keep as-is, no summary needed
- ✅ ActionDocument - Already correct (documentName, documentData, mimeType)
- ✅ ActionItem - All attributes used
- ✅ Observation - Already has contentValidation field
- ✅ TaskItem - Used for database storage, separate from TaskStep
7. Implementation Impact
Files That Will Need Updates:
-
datamodelChat.py - Class definitions (this proposal)
- Change
expectedFormat→expectedFormatsin TaskStep - Add
expectedFormatsto ChatWorkflow (optional)
- Change
-
taskPlanner.py - Populate
expectedFormatslist instead of singleexpectedFormat- Adaptive planning: Use ALL workflow data (messages, document metadata) to refine next task objective
- Extract delivered formats from workflow documents
- Compare what was delivered vs. what was planned
-
contentValidator.py - Use
expectedFormatslist for validation- Action-level validation: Check action results against task objective (already exists)
- Task-level validation: Validate THIS task's deliverables against THIS task's expectations
- Uses document metadata (name, size, format, mimeType) - no summaries needed
-
intentAnalyzer.py - Fix prompt to ask for actual file format extensions
- Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx")
-
promptGenerationTaskplan.py - Ask for
expectedFormatsin task planning- Adaptive planning: Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task
- Show what was actually delivered to help refine objective
-
workflowManager.py - Pass ALL workflow data to next task planning
- Messages (text content)
- Document metadata (names, sizes, formats, mimeTypes)
- Validation results
Key Implementation Points:
- No summary fields: Use existing document metadata (name, size, format, mimeType)
- Adaptive task planning: Next task receives ALL workflow data (messages + document metadata) to refine objective
- Validation scope: Task validation checks ONLY that task's actions, not all workflow actions
- After each action: Validate against task objective → decide if complete or next action needed
8. Validation Logic Alignment
Action-Level Validation (Within Task):
- When: After each action execution within a task
- Checks: Action results against task objective
- Against: Action documents (name, size, format, mimeType metadata)
- Purpose: Decide if task is complete or next action needed
- Triggers: Continue to next action if incomplete, complete task if done
Task Planning (Adaptive - Uses ALL Workflow Data):
- Input: ALL workflow data available:
- All messages (text content)
- All document metadata (names, sizes, formats/extensions, mimeTypes)
- Previous task validation results
- Process:
- Extract delivered formats from all workflow documents
- Compare what was ACTUALLY delivered vs. what was PLANNED
- Refine next task objective:
- Deliver MORE if previous tasks delivered less than expected
- Deliver LESS if previous tasks already delivered more
- Adapt to actual workflow progress
Task-Level Validation (Task Completion):
- When: After ALL actions in a task complete
- Checks: Task objective, task
expectedFormats, tasksuccessCriteria - Against: Documents from THIS task only (extract formats from document metadata)
- Purpose: Verify THIS task delivered what was expected for THIS task scope
- Output: Validation result (used in workflow data for next task planning)
Workflow-Level Validation (Final):
- When: After ALL tasks complete
- Checks: Original user request, workflow
expectedFormats, workflow success criteria - Against: ALL documents from ALL tasks (extract formats from document metadata)
- Purpose: Final verification that complete workflow delivered what user requested
- Triggers: New compensatory task if validation fails (missing deliverables)
9. Next Steps
- Review and approve this proposal
- Implement class changes in datamodelChat.py
- Update intent analyzer prompt to request actual file format extensions
- Update task planning prompt to request
expectedFormatslist - Update AI generation prompts to include summary instruction
- Implement aggregation logic for summaries at task/workflow levels
- Implement workflow-level validation method
- Update all references from
expectedFormattoexpectedFormats
Questions Answered
✅ Document metadata: Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed
✅ Format extraction: Extract formats from document metadata (mimeType or file extensions)
✅ Task validation scope: Task validation checks ONLY actions in that task, not all workflow actions
✅ Adaptive planning: Next task uses ALL workflow data (messages + document metadata) to refine objective
✅ After each action: Validate against task objective → decide complete or next action needed
10. Validation Flow Clarification
Simplified Flow:
-
Within Task (Action-by-Action):
- Action executes → delivers documents with metadata
- Validate action results against task objective
- If incomplete → next action needed
- If complete → task done
-
Task Planning (Adaptive):
- Receives: ALL workflow data (messages, document metadata from all previous tasks)
- Extracts: Delivered formats from document metadata (file extensions/mimeTypes)
- Compares: What was actually delivered vs. what was planned
- Refines: Next task objective (may need more/less based on actual progress)
-
Task Completion:
- Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations
- Result: Used in workflow data for next task planning
-
Workflow Completion:
- Final validation: All documents (extract formats from metadata) meet original user request
- If missing: Create compensatory task
Status: Ready for implementation after approval