# Pydantic Class Enhancement Proposal ## Format Tracking & Validation Alignment **Date:** 2025-11-02 **Purpose:** Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats **Simplified Approach:** Use existing document metadata (name, size, format, mimeType) - no summary fields needed --- ## Executive Summary This proposal addresses: 1. **Validation alignment**: What prompts ask for matches what validators check 2. **Workflow-level validation**: Check ALL deliverables from ALL tasks against original user request 3. **Format tracking**: Track expected formats (list) at workflow and task levels 4. **Adaptive task planning**: Next task uses ALL workflow data (messages, document metadata) to refine objective **Key Simplification:** Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata. --- ## 1. ActionResult Class Changes **File:** `gateway/modules/datamodels/datamodelChat.py` (lines 483-521) ### NO CHANGES NEEDED **Current Structure (KEEP ALL - ALL USED):** - ✅ `success: bool` - Used by validation - ✅ `error: Optional[str]` - Used for error handling - ✅ `documents: List[ActionDocument]` - Contains document metadata (name, data, mimeType) - ✅ `resultLabel: Optional[str]` - Used for document routing **Documents already provide all needed metadata:** - `documentName` - File name - `documentData` - Content - `mimeType` - MIME type (can derive format from this) **No summary field needed** - document metadata is sufficient. --- ## 2. TaskResult Class Changes **File:** `gateway/modules/datamodels/datamodelChat.py` (lines 718-736) ### NO CHANGES NEEDED **Current Structure (KEEP ALL - ALL USED):** - ✅ `taskId: str` - Task identification - ✅ `status: TaskStatus` - Task status tracking - ✅ `success: bool` - Success flag - ✅ `feedback: Optional[str]` - Task feedback - ✅ `error: Optional[str]` - Error message **Document metadata available from workflow:** - Can extract delivered formats from documents in workflow messages - No need to store separately - use existing document metadata --- ## 3. TaskStep Class Changes **File:** `gateway/modules/datamodels/datamodelChat.py` (lines 790-825) ### Modify - Change `expectedFormat: Optional[str]` → `expectedFormats: Optional[List[str]]` - Keep `dataType` and `qualityRequirements` as-is ### Modified Class: ```python class TaskStep(BaseModel): id: str objective: str dependencies: Optional[list[str]] = Field(default_factory=list) successCriteria: Optional[list[str]] = Field(default_factory=list) estimatedComplexity: Optional[str] = None userMessage: Optional[str] = Field( None, description="User-friendly message in user's language" ) # Format details extracted from intent analysis dataType: Optional[str] = Field( None, description="Expected data type (text, numbers, documents, etc.)" ) expectedFormats: Optional[List[str]] = Field( None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms." ) qualityRequirements: Optional[Dict[str, Any]] = Field( None, description="Quality requirements and constraints" ) ``` ### Register Labels Update: ```python "expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"} ``` --- ## 4. ChatWorkflow Class Changes (for Workflow-Level Tracking) **File:** `gateway/modules/datamodels/datamodelChat.py` (find ChatWorkflow class) ### Add (if not exists) ```python expectedFormats: Optional[List[str]] = Field( None, description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis." ) ``` Note: `_workflowIntent` is already stored as a dict (not a model field), so `expectedFormats` can be extracted from there, but having it as an explicit field makes it easier to query. --- ## 5. ActionItem Class Review **File:** `gateway/modules/datamodels/datamodelChat.py` (lines 652-715) ### Current Structure (ALL USED - KEEP): - ✅ `id: str` - Used for action identification - ✅ `execMethod: str` - Used for action execution - ✅ `execAction: str` - Used for action execution - ✅ `execParameters: Dict[str, Any]` - Used for action execution - ✅ `execResultLabel: Optional[str]` - Used for document routing - ✅ `expectedDocumentFormats: Optional[List[Dict[str, str]]]` - Used by action planning - ✅ `userMessage: Optional[str]` - Used for user communication - ✅ `status: TaskStatus` - Used for tracking - ✅ `error: Optional[str]` - Used for error handling - ✅ `retryCount: int` - Used for retry logic - ✅ `retryMax: int` - Used for retry logic - ✅ `processingTime: Optional[float]` - Used for performance tracking - ✅ `timestamp: float` - Used for ordering/auditing - ✅ `result: Optional[str]` - Used to store action result text **NO CHANGES NEEDED** - All attributes are used --- ## 6. Summary of Changes ### Classes to Modify: 1. ✅ **TaskStep** - Change `expectedFormat` (str) → `expectedFormats` (List[str]) 2. ✅ **ChatWorkflow** - Add `expectedFormats` (optional, for explicit tracking) ### Classes to Review (NO CHANGES): - ✅ **ActionResult** - Keep as-is, documents already have metadata - ✅ **TaskResult** - Keep as-is, no summary needed - ✅ **ActionDocument** - Already correct (documentName, documentData, mimeType) - ✅ **ActionItem** - All attributes used - ✅ **Observation** - Already has contentValidation field - ✅ **TaskItem** - Used for database storage, separate from TaskStep --- ## 7. Implementation Impact ### Files That Will Need Updates: 1. **datamodelChat.py** - Class definitions (this proposal) - Change `expectedFormat` → `expectedFormats` in TaskStep - Add `expectedFormats` to ChatWorkflow (optional) 2. **taskPlanner.py** - Populate `expectedFormats` list instead of single `expectedFormat` - **Adaptive planning:** Use ALL workflow data (messages, document metadata) to refine next task objective - Extract delivered formats from workflow documents - Compare what was delivered vs. what was planned 3. **contentValidator.py** - Use `expectedFormats` list for validation - **Action-level validation:** Check action results against task objective (already exists) - **Task-level validation:** Validate THIS task's deliverables against THIS task's expectations - Uses document metadata (name, size, format, mimeType) - no summaries needed 4. **intentAnalyzer.py** - Fix prompt to ask for actual file format extensions - Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx") 5. **promptGenerationTaskplan.py** - Ask for `expectedFormats` in task planning - **Adaptive planning:** Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task - Show what was actually delivered to help refine objective 6. **workflowManager.py** - Pass ALL workflow data to next task planning - Messages (text content) - Document metadata (names, sizes, formats, mimeTypes) - Validation results ### Key Implementation Points: - **No summary fields:** Use existing document metadata (name, size, format, mimeType) - **Adaptive task planning:** Next task receives ALL workflow data (messages + document metadata) to refine objective - **Validation scope:** Task validation checks ONLY that task's actions, not all workflow actions - **After each action:** Validate against task objective → decide if complete or next action needed --- ## 8. Validation Logic Alignment ### Action-Level Validation (Within Task): - **When:** After each action execution within a task - **Checks:** Action results against task objective - **Against:** Action documents (name, size, format, mimeType metadata) - **Purpose:** Decide if task is complete or next action needed - **Triggers:** Continue to next action if incomplete, complete task if done ### Task Planning (Adaptive - Uses ALL Workflow Data): - **Input:** ALL workflow data available: - All messages (text content) - All document metadata (names, sizes, formats/extensions, mimeTypes) - Previous task validation results - **Process:** - Extract delivered formats from all workflow documents - Compare what was ACTUALLY delivered vs. what was PLANNED - Refine next task objective: - Deliver MORE if previous tasks delivered less than expected - Deliver LESS if previous tasks already delivered more - Adapt to actual workflow progress ### Task-Level Validation (Task Completion): - **When:** After ALL actions in a task complete - **Checks:** Task objective, task `expectedFormats`, task `successCriteria` - **Against:** Documents from THIS task only (extract formats from document metadata) - **Purpose:** Verify THIS task delivered what was expected for THIS task scope - **Output:** Validation result (used in workflow data for next task planning) ### Workflow-Level Validation (Final): - **When:** After ALL tasks complete - **Checks:** Original user request, workflow `expectedFormats`, workflow success criteria - **Against:** ALL documents from ALL tasks (extract formats from document metadata) - **Purpose:** Final verification that complete workflow delivered what user requested - **Triggers:** New compensatory task if validation fails (missing deliverables) --- ## 9. Next Steps 1. **Review and approve this proposal** 2. **Implement class changes** in datamodelChat.py 3. **Update intent analyzer prompt** to request actual file format extensions 4. **Update task planning prompt** to request `expectedFormats` list 5. **Update AI generation prompts** to include summary instruction 6. **Implement aggregation logic** for summaries at task/workflow levels 7. **Implement workflow-level validation** method 8. **Update all references** from `expectedFormat` to `expectedFormats` --- ## Questions Answered ✅ **Document metadata:** Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed ✅ **Format extraction:** Extract formats from document metadata (mimeType or file extensions) ✅ **Task validation scope:** Task validation checks ONLY actions in that task, not all workflow actions ✅ **Adaptive planning:** Next task uses ALL workflow data (messages + document metadata) to refine objective ✅ **After each action:** Validate against task objective → decide complete or next action needed --- ## 10. Validation Flow Clarification ### Simplified Flow: 1. **Within Task (Action-by-Action):** - Action executes → delivers documents with metadata - Validate action results against task objective - If incomplete → next action needed - If complete → task done 2. **Task Planning (Adaptive):** - Receives: ALL workflow data (messages, document metadata from all previous tasks) - Extracts: Delivered formats from document metadata (file extensions/mimeTypes) - Compares: What was actually delivered vs. what was planned - Refines: Next task objective (may need more/less based on actual progress) 3. **Task Completion:** - Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations - Result: Used in workflow data for next task planning 4. **Workflow Completion:** - Final validation: All documents (extract formats from metadata) meet original user request - If missing: Create compensatory task --- **Status:** Ready for implementation after approval