gateway/modules/datamodels/PROPOSAL_CLASS_ENHANCEMENTS.md
2025-11-02 23:31:10 +01:00

11 KiB

Pydantic Class Enhancement Proposal

Format Tracking & Validation Alignment

Date: 2025-11-02
Purpose: Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats

Simplified Approach: Use existing document metadata (name, size, format, mimeType) - no summary fields needed


Executive Summary

This proposal addresses:

  1. Validation alignment: What prompts ask for matches what validators check
  2. Workflow-level validation: Check ALL deliverables from ALL tasks against original user request
  3. Format tracking: Track expected formats (list) at workflow and task levels
  4. Adaptive task planning: Next task uses ALL workflow data (messages, document metadata) to refine objective

Key Simplification: Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata.


1. ActionResult Class Changes

File: gateway/modules/datamodels/datamodelChat.py (lines 483-521)

NO CHANGES NEEDED

Current Structure (KEEP ALL - ALL USED):

  • success: bool - Used by validation
  • error: Optional[str] - Used for error handling
  • documents: List[ActionDocument] - Contains document metadata (name, data, mimeType)
  • resultLabel: Optional[str] - Used for document routing

Documents already provide all needed metadata:

  • documentName - File name
  • documentData - Content
  • mimeType - MIME type (can derive format from this)

No summary field needed - document metadata is sufficient.


2. TaskResult Class Changes

File: gateway/modules/datamodels/datamodelChat.py (lines 718-736)

NO CHANGES NEEDED

Current Structure (KEEP ALL - ALL USED):

  • taskId: str - Task identification
  • status: TaskStatus - Task status tracking
  • success: bool - Success flag
  • feedback: Optional[str] - Task feedback
  • error: Optional[str] - Error message

Document metadata available from workflow:

  • Can extract delivered formats from documents in workflow messages
  • No need to store separately - use existing document metadata

3. TaskStep Class Changes

File: gateway/modules/datamodels/datamodelChat.py (lines 790-825)

Modify

  • Change expectedFormat: Optional[str]expectedFormats: Optional[List[str]]
  • Keep dataType and qualityRequirements as-is

Modified Class:

class TaskStep(BaseModel):
    id: str
    objective: str
    dependencies: Optional[list[str]] = Field(default_factory=list)
    successCriteria: Optional[list[str]] = Field(default_factory=list)
    estimatedComplexity: Optional[str] = None
    userMessage: Optional[str] = Field(
        None, description="User-friendly message in user's language"
    )
    # Format details extracted from intent analysis
    dataType: Optional[str] = Field(
        None, description="Expected data type (text, numbers, documents, etc.)"
    )
    expectedFormats: Optional[List[str]] = Field(
        None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms."
    )
    qualityRequirements: Optional[Dict[str, Any]] = Field(
        None, description="Quality requirements and constraints"
    )

Register Labels

Update:

"expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"}

4. ChatWorkflow Class Changes (for Workflow-Level Tracking)

File: gateway/modules/datamodels/datamodelChat.py (find ChatWorkflow class)

Add (if not exists)

expectedFormats: Optional[List[str]] = Field(
    None,
    description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis."
)

Note: _workflowIntent is already stored as a dict (not a model field), so expectedFormats can be extracted from there, but having it as an explicit field makes it easier to query.


5. ActionItem Class Review

File: gateway/modules/datamodels/datamodelChat.py (lines 652-715)

Current Structure (ALL USED - KEEP):

  • id: str - Used for action identification
  • execMethod: str - Used for action execution
  • execAction: str - Used for action execution
  • execParameters: Dict[str, Any] - Used for action execution
  • execResultLabel: Optional[str] - Used for document routing
  • expectedDocumentFormats: Optional[List[Dict[str, str]]] - Used by action planning
  • userMessage: Optional[str] - Used for user communication
  • status: TaskStatus - Used for tracking
  • error: Optional[str] - Used for error handling
  • retryCount: int - Used for retry logic
  • retryMax: int - Used for retry logic
  • processingTime: Optional[float] - Used for performance tracking
  • timestamp: float - Used for ordering/auditing
  • result: Optional[str] - Used to store action result text

NO CHANGES NEEDED - All attributes are used


6. Summary of Changes

Classes to Modify:

  1. TaskStep - Change expectedFormat (str) → expectedFormats (List[str])
  2. ChatWorkflow - Add expectedFormats (optional, for explicit tracking)

Classes to Review (NO CHANGES):

  • ActionResult - Keep as-is, documents already have metadata
  • TaskResult - Keep as-is, no summary needed
  • ActionDocument - Already correct (documentName, documentData, mimeType)
  • ActionItem - All attributes used
  • Observation - Already has contentValidation field
  • TaskItem - Used for database storage, separate from TaskStep

7. Implementation Impact

Files That Will Need Updates:

  1. datamodelChat.py - Class definitions (this proposal)

    • Change expectedFormatexpectedFormats in TaskStep
    • Add expectedFormats to ChatWorkflow (optional)
  2. taskPlanner.py - Populate expectedFormats list instead of single expectedFormat

    • Adaptive planning: Use ALL workflow data (messages, document metadata) to refine next task objective
    • Extract delivered formats from workflow documents
    • Compare what was delivered vs. what was planned
  3. contentValidator.py - Use expectedFormats list for validation

    • Action-level validation: Check action results against task objective (already exists)
    • Task-level validation: Validate THIS task's deliverables against THIS task's expectations
    • Uses document metadata (name, size, format, mimeType) - no summaries needed
  4. intentAnalyzer.py - Fix prompt to ask for actual file format extensions

    • Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx")
  5. promptGenerationTaskplan.py - Ask for expectedFormats in task planning

    • Adaptive planning: Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task
    • Show what was actually delivered to help refine objective
  6. workflowManager.py - Pass ALL workflow data to next task planning

    • Messages (text content)
    • Document metadata (names, sizes, formats, mimeTypes)
    • Validation results

Key Implementation Points:

  • No summary fields: Use existing document metadata (name, size, format, mimeType)
  • Adaptive task planning: Next task receives ALL workflow data (messages + document metadata) to refine objective
  • Validation scope: Task validation checks ONLY that task's actions, not all workflow actions
  • After each action: Validate against task objective → decide if complete or next action needed

8. Validation Logic Alignment

Action-Level Validation (Within Task):

  • When: After each action execution within a task
  • Checks: Action results against task objective
  • Against: Action documents (name, size, format, mimeType metadata)
  • Purpose: Decide if task is complete or next action needed
  • Triggers: Continue to next action if incomplete, complete task if done

Task Planning (Adaptive - Uses ALL Workflow Data):

  • Input: ALL workflow data available:
    • All messages (text content)
    • All document metadata (names, sizes, formats/extensions, mimeTypes)
    • Previous task validation results
  • Process:
    • Extract delivered formats from all workflow documents
    • Compare what was ACTUALLY delivered vs. what was PLANNED
    • Refine next task objective:
      • Deliver MORE if previous tasks delivered less than expected
      • Deliver LESS if previous tasks already delivered more
      • Adapt to actual workflow progress

Task-Level Validation (Task Completion):

  • When: After ALL actions in a task complete
  • Checks: Task objective, task expectedFormats, task successCriteria
  • Against: Documents from THIS task only (extract formats from document metadata)
  • Purpose: Verify THIS task delivered what was expected for THIS task scope
  • Output: Validation result (used in workflow data for next task planning)

Workflow-Level Validation (Final):

  • When: After ALL tasks complete
  • Checks: Original user request, workflow expectedFormats, workflow success criteria
  • Against: ALL documents from ALL tasks (extract formats from document metadata)
  • Purpose: Final verification that complete workflow delivered what user requested
  • Triggers: New compensatory task if validation fails (missing deliverables)

9. Next Steps

  1. Review and approve this proposal
  2. Implement class changes in datamodelChat.py
  3. Update intent analyzer prompt to request actual file format extensions
  4. Update task planning prompt to request expectedFormats list
  5. Update AI generation prompts to include summary instruction
  6. Implement aggregation logic for summaries at task/workflow levels
  7. Implement workflow-level validation method
  8. Update all references from expectedFormat to expectedFormats

Questions Answered

Document metadata: Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed
Format extraction: Extract formats from document metadata (mimeType or file extensions)
Task validation scope: Task validation checks ONLY actions in that task, not all workflow actions
Adaptive planning: Next task uses ALL workflow data (messages + document metadata) to refine objective
After each action: Validate against task objective → decide complete or next action needed


10. Validation Flow Clarification

Simplified Flow:

  1. Within Task (Action-by-Action):

    • Action executes → delivers documents with metadata
    • Validate action results against task objective
    • If incomplete → next action needed
    • If complete → task done
  2. Task Planning (Adaptive):

    • Receives: ALL workflow data (messages, document metadata from all previous tasks)
    • Extracts: Delivered formats from document metadata (file extensions/mimeTypes)
    • Compares: What was actually delivered vs. what was planned
    • Refines: Next task objective (may need more/less based on actual progress)
  3. Task Completion:

    • Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations
    • Result: Used in workflow data for next task planning
  4. Workflow Completion:

    • Final validation: All documents (extract formats from metadata) meet original user request
    • If missing: Create compensatory task

Status: Ready for implementation after approval