gateway/modules/datamodels/PROPOSAL_CLASS_ENHANCEMENTS.md
2025-11-02 23:31:10 +01:00

285 lines
11 KiB
Markdown

# Pydantic Class Enhancement Proposal
## Format Tracking & Validation Alignment
**Date:** 2025-11-02
**Purpose:** Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats
**Simplified Approach:** Use existing document metadata (name, size, format, mimeType) - no summary fields needed
---
## Executive Summary
This proposal addresses:
1. **Validation alignment**: What prompts ask for matches what validators check
2. **Workflow-level validation**: Check ALL deliverables from ALL tasks against original user request
3. **Format tracking**: Track expected formats (list) at workflow and task levels
4. **Adaptive task planning**: Next task uses ALL workflow data (messages, document metadata) to refine objective
**Key Simplification:** Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata.
---
## 1. ActionResult Class Changes
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 483-521)
### NO CHANGES NEEDED
**Current Structure (KEEP ALL - ALL USED):**
-`success: bool` - Used by validation
-`error: Optional[str]` - Used for error handling
-`documents: List[ActionDocument]` - Contains document metadata (name, data, mimeType)
-`resultLabel: Optional[str]` - Used for document routing
**Documents already provide all needed metadata:**
- `documentName` - File name
- `documentData` - Content
- `mimeType` - MIME type (can derive format from this)
**No summary field needed** - document metadata is sufficient.
---
## 2. TaskResult Class Changes
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 718-736)
### NO CHANGES NEEDED
**Current Structure (KEEP ALL - ALL USED):**
-`taskId: str` - Task identification
-`status: TaskStatus` - Task status tracking
-`success: bool` - Success flag
-`feedback: Optional[str]` - Task feedback
-`error: Optional[str]` - Error message
**Document metadata available from workflow:**
- Can extract delivered formats from documents in workflow messages
- No need to store separately - use existing document metadata
---
## 3. TaskStep Class Changes
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 790-825)
### Modify
- Change `expectedFormat: Optional[str]``expectedFormats: Optional[List[str]]`
- Keep `dataType` and `qualityRequirements` as-is
### Modified Class:
```python
class TaskStep(BaseModel):
id: str
objective: str
dependencies: Optional[list[str]] = Field(default_factory=list)
successCriteria: Optional[list[str]] = Field(default_factory=list)
estimatedComplexity: Optional[str] = None
userMessage: Optional[str] = Field(
None, description="User-friendly message in user's language"
)
# Format details extracted from intent analysis
dataType: Optional[str] = Field(
None, description="Expected data type (text, numbers, documents, etc.)"
)
expectedFormats: Optional[List[str]] = Field(
None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms."
)
qualityRequirements: Optional[Dict[str, Any]] = Field(
None, description="Quality requirements and constraints"
)
```
### Register Labels
Update:
```python
"expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"}
```
---
## 4. ChatWorkflow Class Changes (for Workflow-Level Tracking)
**File:** `gateway/modules/datamodels/datamodelChat.py` (find ChatWorkflow class)
### Add (if not exists)
```python
expectedFormats: Optional[List[str]] = Field(
None,
description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis."
)
```
Note: `_workflowIntent` is already stored as a dict (not a model field), so `expectedFormats` can be extracted from there, but having it as an explicit field makes it easier to query.
---
## 5. ActionItem Class Review
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 652-715)
### Current Structure (ALL USED - KEEP):
-`id: str` - Used for action identification
-`execMethod: str` - Used for action execution
-`execAction: str` - Used for action execution
-`execParameters: Dict[str, Any]` - Used for action execution
-`execResultLabel: Optional[str]` - Used for document routing
-`expectedDocumentFormats: Optional[List[Dict[str, str]]]` - Used by action planning
-`userMessage: Optional[str]` - Used for user communication
-`status: TaskStatus` - Used for tracking
-`error: Optional[str]` - Used for error handling
-`retryCount: int` - Used for retry logic
-`retryMax: int` - Used for retry logic
-`processingTime: Optional[float]` - Used for performance tracking
-`timestamp: float` - Used for ordering/auditing
-`result: Optional[str]` - Used to store action result text
**NO CHANGES NEEDED** - All attributes are used
---
## 6. Summary of Changes
### Classes to Modify:
1.**TaskStep** - Change `expectedFormat` (str) → `expectedFormats` (List[str])
2.**ChatWorkflow** - Add `expectedFormats` (optional, for explicit tracking)
### Classes to Review (NO CHANGES):
-**ActionResult** - Keep as-is, documents already have metadata
-**TaskResult** - Keep as-is, no summary needed
-**ActionDocument** - Already correct (documentName, documentData, mimeType)
-**ActionItem** - All attributes used
-**Observation** - Already has contentValidation field
-**TaskItem** - Used for database storage, separate from TaskStep
---
## 7. Implementation Impact
### Files That Will Need Updates:
1. **datamodelChat.py** - Class definitions (this proposal)
- Change `expectedFormat``expectedFormats` in TaskStep
- Add `expectedFormats` to ChatWorkflow (optional)
2. **taskPlanner.py** - Populate `expectedFormats` list instead of single `expectedFormat`
- **Adaptive planning:** Use ALL workflow data (messages, document metadata) to refine next task objective
- Extract delivered formats from workflow documents
- Compare what was delivered vs. what was planned
3. **contentValidator.py** - Use `expectedFormats` list for validation
- **Action-level validation:** Check action results against task objective (already exists)
- **Task-level validation:** Validate THIS task's deliverables against THIS task's expectations
- Uses document metadata (name, size, format, mimeType) - no summaries needed
4. **intentAnalyzer.py** - Fix prompt to ask for actual file format extensions
- Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx")
5. **promptGenerationTaskplan.py** - Ask for `expectedFormats` in task planning
- **Adaptive planning:** Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task
- Show what was actually delivered to help refine objective
6. **workflowManager.py** - Pass ALL workflow data to next task planning
- Messages (text content)
- Document metadata (names, sizes, formats, mimeTypes)
- Validation results
### Key Implementation Points:
- **No summary fields:** Use existing document metadata (name, size, format, mimeType)
- **Adaptive task planning:** Next task receives ALL workflow data (messages + document metadata) to refine objective
- **Validation scope:** Task validation checks ONLY that task's actions, not all workflow actions
- **After each action:** Validate against task objective → decide if complete or next action needed
---
## 8. Validation Logic Alignment
### Action-Level Validation (Within Task):
- **When:** After each action execution within a task
- **Checks:** Action results against task objective
- **Against:** Action documents (name, size, format, mimeType metadata)
- **Purpose:** Decide if task is complete or next action needed
- **Triggers:** Continue to next action if incomplete, complete task if done
### Task Planning (Adaptive - Uses ALL Workflow Data):
- **Input:** ALL workflow data available:
- All messages (text content)
- All document metadata (names, sizes, formats/extensions, mimeTypes)
- Previous task validation results
- **Process:**
- Extract delivered formats from all workflow documents
- Compare what was ACTUALLY delivered vs. what was PLANNED
- Refine next task objective:
- Deliver MORE if previous tasks delivered less than expected
- Deliver LESS if previous tasks already delivered more
- Adapt to actual workflow progress
### Task-Level Validation (Task Completion):
- **When:** After ALL actions in a task complete
- **Checks:** Task objective, task `expectedFormats`, task `successCriteria`
- **Against:** Documents from THIS task only (extract formats from document metadata)
- **Purpose:** Verify THIS task delivered what was expected for THIS task scope
- **Output:** Validation result (used in workflow data for next task planning)
### Workflow-Level Validation (Final):
- **When:** After ALL tasks complete
- **Checks:** Original user request, workflow `expectedFormats`, workflow success criteria
- **Against:** ALL documents from ALL tasks (extract formats from document metadata)
- **Purpose:** Final verification that complete workflow delivered what user requested
- **Triggers:** New compensatory task if validation fails (missing deliverables)
---
## 9. Next Steps
1. **Review and approve this proposal**
2. **Implement class changes** in datamodelChat.py
3. **Update intent analyzer prompt** to request actual file format extensions
4. **Update task planning prompt** to request `expectedFormats` list
5. **Update AI generation prompts** to include summary instruction
6. **Implement aggregation logic** for summaries at task/workflow levels
7. **Implement workflow-level validation** method
8. **Update all references** from `expectedFormat` to `expectedFormats`
---
## Questions Answered
**Document metadata:** Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed
**Format extraction:** Extract formats from document metadata (mimeType or file extensions)
**Task validation scope:** Task validation checks ONLY actions in that task, not all workflow actions
**Adaptive planning:** Next task uses ALL workflow data (messages + document metadata) to refine objective
**After each action:** Validate against task objective → decide complete or next action needed
---
## 10. Validation Flow Clarification
### Simplified Flow:
1. **Within Task (Action-by-Action):**
- Action executes → delivers documents with metadata
- Validate action results against task objective
- If incomplete → next action needed
- If complete → task done
2. **Task Planning (Adaptive):**
- Receives: ALL workflow data (messages, document metadata from all previous tasks)
- Extracts: Delivered formats from document metadata (file extensions/mimeTypes)
- Compares: What was actually delivered vs. what was planned
- Refines: Next task objective (may need more/less based on actual progress)
3. **Task Completion:**
- Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations
- Result: Used in workflow data for next task planning
4. **Workflow Completion:**
- Final validation: All documents (extract formats from metadata) meet original user request
- If missing: Create compensatory task
---
**Status:** Ready for implementation after approval