gateway/modules/datamodels/PROPOSAL_CLASS_ENHANCEMENTS.md

# Pydantic Class Enhancement Proposal
## Format Tracking & Validation Alignment

**Date:** 2025-11-02
**Purpose:** Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats

**Simplified Approach:** Use existing document metadata (name, size, format, mimeType) - no summary fields needed

---

## Executive Summary

This proposal addresses:
1. **Validation alignment**: What prompts ask for matches what validators check
2. **Workflow-level validation**: Check ALL deliverables from ALL tasks against original user request
3. **Format tracking**: Track expected formats (list) at workflow and task levels
4. **Adaptive task planning**: Next task uses ALL workflow data (messages, document metadata) to refine objective

**Key Simplification:** Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata.

---

## 1. ActionResult Class Changes

**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 483-521)

### NO CHANGES NEEDED

**Current Structure (KEEP ALL - ALL USED):**
- ✅ `success: bool` - Used by validation
- ✅ `error: Optional[str]` - Used for error handling
- ✅ `documents: List[ActionDocument]` - Contains document metadata (name, data, mimeType)
- ✅ `resultLabel: Optional[str]` - Used for document routing

**Documents already provide all needed metadata:**
- `documentName` - File name
- `documentData` - Content
- `mimeType` - MIME type (can derive format from this)

**No summary field needed** - document metadata is sufficient.

---

## 2. TaskResult Class Changes

**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 718-736)

### NO CHANGES NEEDED

**Current Structure (KEEP ALL - ALL USED):**
- ✅ `taskId: str` - Task identification
- ✅ `status: TaskStatus` - Task status tracking
- ✅ `success: bool` - Success flag
- ✅ `feedback: Optional[str]` - Task feedback
- ✅ `error: Optional[str]` - Error message

**Document metadata available from workflow:**
- Can extract delivered formats from documents in workflow messages
- No need to store separately - use existing document metadata

---

## 3. TaskStep Class Changes

**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 790-825)

### Modify
- Change `expectedFormat: Optional[str]` → `expectedFormats: Optional[List[str]]`
- Keep `dataType` and `qualityRequirements` as-is

### Modified Class:
```python
class TaskStep(BaseModel):
    id: str
    objective: str
    dependencies: Optional[list[str]] = Field(default_factory=list)
    successCriteria: Optional[list[str]] = Field(default_factory=list)
    estimatedComplexity: Optional[str] = None
    userMessage: Optional[str] = Field(
        None, description="User-friendly message in user's language"
    )
    # Format details extracted from intent analysis
    dataType: Optional[str] = Field(
        None, description="Expected data type (text, numbers, documents, etc.)"
    )
    expectedFormats: Optional[List[str]] = Field(
        None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms."
    )
    qualityRequirements: Optional[Dict[str, Any]] = Field(
        None, description="Quality requirements and constraints"
    )
```

### Register Labels
Update:
```python
"expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"}
```

---

## 4. ChatWorkflow Class Changes (for Workflow-Level Tracking)

**File:** `gateway/modules/datamodels/datamodelChat.py` (find ChatWorkflow class)

### Add (if not exists)
```python
expectedFormats: Optional[List[str]] = Field(
    None,
    description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis."
)
```

Note: `_workflowIntent` is already stored as a dict (not a model field), so `expectedFormats` can be extracted from there, but having it as an explicit field makes it easier to query.

---

## 5. ActionItem Class Review

**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 652-715)

### Current Structure (ALL USED - KEEP):
- ✅ `id: str` - Used for action identification
- ✅ `execMethod: str` - Used for action execution
- ✅ `execAction: str` - Used for action execution
- ✅ `execParameters: Dict[str, Any]` - Used for action execution
- ✅ `execResultLabel: Optional[str]` - Used for document routing
- ✅ `expectedDocumentFormats: Optional[List[Dict[str, str]]]` - Used by action planning
- ✅ `userMessage: Optional[str]` - Used for user communication
- ✅ `status: TaskStatus` - Used for tracking
- ✅ `error: Optional[str]` - Used for error handling
- ✅ `retryCount: int` - Used for retry logic
- ✅ `retryMax: int` - Used for retry logic
- ✅ `processingTime: Optional[float]` - Used for performance tracking
- ✅ `timestamp: float` - Used for ordering/auditing
- ✅ `result: Optional[str]` - Used to store action result text

**NO CHANGES NEEDED** - All attributes are used

---

## 6. Summary of Changes

### Classes to Modify:
1. ✅ **TaskStep** - Change `expectedFormat` (str) → `expectedFormats` (List[str])
2. ✅ **ChatWorkflow** - Add `expectedFormats` (optional, for explicit tracking)

### Classes to Review (NO CHANGES):
- ✅ **ActionResult** - Keep as-is, documents already have metadata
- ✅ **TaskResult** - Keep as-is, no summary needed
- ✅ **ActionDocument** - Already correct (documentName, documentData, mimeType)
- ✅ **ActionItem** - All attributes used
- ✅ **Observation** - Already has contentValidation field
- ✅ **TaskItem** - Used for database storage, separate from TaskStep

---

## 7. Implementation Impact

### Files That Will Need Updates:

1. **datamodelChat.py** - Class definitions (this proposal)
   - Change `expectedFormat` → `expectedFormats` in TaskStep
   - Add `expectedFormats` to ChatWorkflow (optional)

2. **taskPlanner.py** - Populate `expectedFormats` list instead of single `expectedFormat`
   - **Adaptive planning:** Use ALL workflow data (messages, document metadata) to refine next task objective
   - Extract delivered formats from workflow documents
   - Compare what was delivered vs. what was planned

3. **contentValidator.py** - Use `expectedFormats` list for validation
   - **Action-level validation:** Check action results against task objective (already exists)
   - **Task-level validation:** Validate THIS task's deliverables against THIS task's expectations
   - Uses document metadata (name, size, format, mimeType) - no summaries needed

4. **intentAnalyzer.py** - Fix prompt to ask for actual file format extensions
   - Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx")

5. **promptGenerationTaskplan.py** - Ask for `expectedFormats` in task planning
   - **Adaptive planning:** Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task
   - Show what was actually delivered to help refine objective

6. **workflowManager.py** - Pass ALL workflow data to next task planning
   - Messages (text content)
   - Document metadata (names, sizes, formats, mimeTypes)
   - Validation results

### Key Implementation Points:

- **No summary fields:** Use existing document metadata (name, size, format, mimeType)
- **Adaptive task planning:** Next task receives ALL workflow data (messages + document metadata) to refine objective
- **Validation scope:** Task validation checks ONLY that task's actions, not all workflow actions
- **After each action:** Validate against task objective → decide if complete or next action needed

---

## 8. Validation Logic Alignment

### Action-Level Validation (Within Task):
- **When:** After each action execution within a task
- **Checks:** Action results against task objective
- **Against:** Action documents (name, size, format, mimeType metadata)
- **Purpose:** Decide if task is complete or next action needed
- **Triggers:** Continue to next action if incomplete, complete task if done

### Task Planning (Adaptive - Uses ALL Workflow Data):
- **Input:** ALL workflow data available:
  - All messages (text content)
  - All document metadata (names, sizes, formats/extensions, mimeTypes)
  - Previous task validation results
- **Process:**
  - Extract delivered formats from all workflow documents
  - Compare what was ACTUALLY delivered vs. what was PLANNED
  - Refine next task objective:
    - Deliver MORE if previous tasks delivered less than expected
    - Deliver LESS if previous tasks already delivered more
    - Adapt to actual workflow progress

### Task-Level Validation (Task Completion):
- **When:** After ALL actions in a task complete
- **Checks:** Task objective, task `expectedFormats`, task `successCriteria`
- **Against:** Documents from THIS task only (extract formats from document metadata)
- **Purpose:** Verify THIS task delivered what was expected for THIS task scope
- **Output:** Validation result (used in workflow data for next task planning)

### Workflow-Level Validation (Final):
- **When:** After ALL tasks complete
- **Checks:** Original user request, workflow `expectedFormats`, workflow success criteria
- **Against:** ALL documents from ALL tasks (extract formats from document metadata)
- **Purpose:** Final verification that complete workflow delivered what user requested
- **Triggers:** New compensatory task if validation fails (missing deliverables)

---

## 9. Next Steps

1. **Review and approve this proposal**
2. **Implement class changes** in datamodelChat.py
3. **Update intent analyzer prompt** to request actual file format extensions
4. **Update task planning prompt** to request `expectedFormats` list
5. **Update AI generation prompts** to include summary instruction
6. **Implement aggregation logic** for summaries at task/workflow levels
7. **Implement workflow-level validation** method
8. **Update all references** from `expectedFormat` to `expectedFormats`

---

## Questions Answered

✅ **Document metadata:** Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed
✅ **Format extraction:** Extract formats from document metadata (mimeType or file extensions)
✅ **Task validation scope:** Task validation checks ONLY actions in that task, not all workflow actions
✅ **Adaptive planning:** Next task uses ALL workflow data (messages + document metadata) to refine objective
✅ **After each action:** Validate against task objective → decide complete or next action needed

---

## 10. Validation Flow Clarification

### Simplified Flow:

1. **Within Task (Action-by-Action):**
   - Action executes → delivers documents with metadata
   - Validate action results against task objective
   - If incomplete → next action needed
   - If complete → task done

2. **Task Planning (Adaptive):**
   - Receives: ALL workflow data (messages, document metadata from all previous tasks)
   - Extracts: Delivered formats from document metadata (file extensions/mimeTypes)
   - Compares: What was actually delivered vs. what was planned
   - Refines: Next task objective (may need more/less based on actual progress)

3. **Task Completion:**
   - Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations
   - Result: Used in workflow data for next task planning

4. **Workflow Completion:**
   - Final validation: All documents (extract formats from metadata) meet original user request
   - If missing: Create compensatory task

---

**Status:** Ready for implementation after approval