285 lines
11 KiB
Markdown
285 lines
11 KiB
Markdown
# Pydantic Class Enhancement Proposal
|
|
## Format Tracking & Validation Alignment
|
|
|
|
**Date:** 2025-11-02
|
|
**Purpose:** Align validation logic with prompt requirements, enable workflow-level validation, and track expected file formats
|
|
|
|
**Simplified Approach:** Use existing document metadata (name, size, format, mimeType) - no summary fields needed
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
This proposal addresses:
|
|
1. **Validation alignment**: What prompts ask for matches what validators check
|
|
2. **Workflow-level validation**: Check ALL deliverables from ALL tasks against original user request
|
|
3. **Format tracking**: Track expected formats (list) at workflow and task levels
|
|
4. **Adaptive task planning**: Next task uses ALL workflow data (messages, document metadata) to refine objective
|
|
|
|
**Key Simplification:** Actions deliver documents with metadata (as today). No summary fields needed - use existing document metadata.
|
|
|
|
---
|
|
|
|
## 1. ActionResult Class Changes
|
|
|
|
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 483-521)
|
|
|
|
### NO CHANGES NEEDED
|
|
|
|
**Current Structure (KEEP ALL - ALL USED):**
|
|
- ✅ `success: bool` - Used by validation
|
|
- ✅ `error: Optional[str]` - Used for error handling
|
|
- ✅ `documents: List[ActionDocument]` - Contains document metadata (name, data, mimeType)
|
|
- ✅ `resultLabel: Optional[str]` - Used for document routing
|
|
|
|
**Documents already provide all needed metadata:**
|
|
- `documentName` - File name
|
|
- `documentData` - Content
|
|
- `mimeType` - MIME type (can derive format from this)
|
|
|
|
**No summary field needed** - document metadata is sufficient.
|
|
|
|
---
|
|
|
|
## 2. TaskResult Class Changes
|
|
|
|
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 718-736)
|
|
|
|
### NO CHANGES NEEDED
|
|
|
|
**Current Structure (KEEP ALL - ALL USED):**
|
|
- ✅ `taskId: str` - Task identification
|
|
- ✅ `status: TaskStatus` - Task status tracking
|
|
- ✅ `success: bool` - Success flag
|
|
- ✅ `feedback: Optional[str]` - Task feedback
|
|
- ✅ `error: Optional[str]` - Error message
|
|
|
|
**Document metadata available from workflow:**
|
|
- Can extract delivered formats from documents in workflow messages
|
|
- No need to store separately - use existing document metadata
|
|
|
|
---
|
|
|
|
## 3. TaskStep Class Changes
|
|
|
|
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 790-825)
|
|
|
|
### Modify
|
|
- Change `expectedFormat: Optional[str]` → `expectedFormats: Optional[List[str]]`
|
|
- Keep `dataType` and `qualityRequirements` as-is
|
|
|
|
### Modified Class:
|
|
```python
|
|
class TaskStep(BaseModel):
|
|
id: str
|
|
objective: str
|
|
dependencies: Optional[list[str]] = Field(default_factory=list)
|
|
successCriteria: Optional[list[str]] = Field(default_factory=list)
|
|
estimatedComplexity: Optional[str] = None
|
|
userMessage: Optional[str] = Field(
|
|
None, description="User-friendly message in user's language"
|
|
)
|
|
# Format details extracted from intent analysis
|
|
dataType: Optional[str] = Field(
|
|
None, description="Expected data type (text, numbers, documents, etc.)"
|
|
)
|
|
expectedFormats: Optional[List[str]] = Field(
|
|
None, description="Expected output file format extensions (e.g., ['docx', 'pdf', 'xlsx']). Use actual file extensions, not conceptual terms."
|
|
)
|
|
qualityRequirements: Optional[Dict[str, Any]] = Field(
|
|
None, description="Quality requirements and constraints"
|
|
)
|
|
```
|
|
|
|
### Register Labels
|
|
Update:
|
|
```python
|
|
"expectedFormats": {"en": "Expected Formats", "fr": "Formats attendus"}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. ChatWorkflow Class Changes (for Workflow-Level Tracking)
|
|
|
|
**File:** `gateway/modules/datamodels/datamodelChat.py` (find ChatWorkflow class)
|
|
|
|
### Add (if not exists)
|
|
```python
|
|
expectedFormats: Optional[List[str]] = Field(
|
|
None,
|
|
description="List of expected file format extensions from user request (e.g., ['xlsx', 'pdf']). Extracted during intent analysis."
|
|
)
|
|
```
|
|
|
|
Note: `_workflowIntent` is already stored as a dict (not a model field), so `expectedFormats` can be extracted from there, but having it as an explicit field makes it easier to query.
|
|
|
|
---
|
|
|
|
## 5. ActionItem Class Review
|
|
|
|
**File:** `gateway/modules/datamodels/datamodelChat.py` (lines 652-715)
|
|
|
|
### Current Structure (ALL USED - KEEP):
|
|
- ✅ `id: str` - Used for action identification
|
|
- ✅ `execMethod: str` - Used for action execution
|
|
- ✅ `execAction: str` - Used for action execution
|
|
- ✅ `execParameters: Dict[str, Any]` - Used for action execution
|
|
- ✅ `execResultLabel: Optional[str]` - Used for document routing
|
|
- ✅ `expectedDocumentFormats: Optional[List[Dict[str, str]]]` - Used by action planning
|
|
- ✅ `userMessage: Optional[str]` - Used for user communication
|
|
- ✅ `status: TaskStatus` - Used for tracking
|
|
- ✅ `error: Optional[str]` - Used for error handling
|
|
- ✅ `retryCount: int` - Used for retry logic
|
|
- ✅ `retryMax: int` - Used for retry logic
|
|
- ✅ `processingTime: Optional[float]` - Used for performance tracking
|
|
- ✅ `timestamp: float` - Used for ordering/auditing
|
|
- ✅ `result: Optional[str]` - Used to store action result text
|
|
|
|
**NO CHANGES NEEDED** - All attributes are used
|
|
|
|
---
|
|
|
|
## 6. Summary of Changes
|
|
|
|
### Classes to Modify:
|
|
1. ✅ **TaskStep** - Change `expectedFormat` (str) → `expectedFormats` (List[str])
|
|
2. ✅ **ChatWorkflow** - Add `expectedFormats` (optional, for explicit tracking)
|
|
|
|
### Classes to Review (NO CHANGES):
|
|
- ✅ **ActionResult** - Keep as-is, documents already have metadata
|
|
- ✅ **TaskResult** - Keep as-is, no summary needed
|
|
- ✅ **ActionDocument** - Already correct (documentName, documentData, mimeType)
|
|
- ✅ **ActionItem** - All attributes used
|
|
- ✅ **Observation** - Already has contentValidation field
|
|
- ✅ **TaskItem** - Used for database storage, separate from TaskStep
|
|
|
|
---
|
|
|
|
## 7. Implementation Impact
|
|
|
|
### Files That Will Need Updates:
|
|
|
|
1. **datamodelChat.py** - Class definitions (this proposal)
|
|
- Change `expectedFormat` → `expectedFormats` in TaskStep
|
|
- Add `expectedFormats` to ChatWorkflow (optional)
|
|
|
|
2. **taskPlanner.py** - Populate `expectedFormats` list instead of single `expectedFormat`
|
|
- **Adaptive planning:** Use ALL workflow data (messages, document metadata) to refine next task objective
|
|
- Extract delivered formats from workflow documents
|
|
- Compare what was delivered vs. what was planned
|
|
|
|
3. **contentValidator.py** - Use `expectedFormats` list for validation
|
|
- **Action-level validation:** Check action results against task objective (already exists)
|
|
- **Task-level validation:** Validate THIS task's deliverables against THIS task's expectations
|
|
- Uses document metadata (name, size, format, mimeType) - no summaries needed
|
|
|
|
4. **intentAnalyzer.py** - Fix prompt to ask for actual file format extensions
|
|
- Change from conceptual terms ("raw_data", "formatted") to actual extensions ("pdf", "docx", "xlsx")
|
|
|
|
5. **promptGenerationTaskplan.py** - Ask for `expectedFormats` in task planning
|
|
- **Adaptive planning:** Include ALL workflow data (messages, document names/sizes/formats/metadata) when planning next task
|
|
- Show what was actually delivered to help refine objective
|
|
|
|
6. **workflowManager.py** - Pass ALL workflow data to next task planning
|
|
- Messages (text content)
|
|
- Document metadata (names, sizes, formats, mimeTypes)
|
|
- Validation results
|
|
|
|
### Key Implementation Points:
|
|
|
|
- **No summary fields:** Use existing document metadata (name, size, format, mimeType)
|
|
- **Adaptive task planning:** Next task receives ALL workflow data (messages + document metadata) to refine objective
|
|
- **Validation scope:** Task validation checks ONLY that task's actions, not all workflow actions
|
|
- **After each action:** Validate against task objective → decide if complete or next action needed
|
|
|
|
---
|
|
|
|
## 8. Validation Logic Alignment
|
|
|
|
### Action-Level Validation (Within Task):
|
|
- **When:** After each action execution within a task
|
|
- **Checks:** Action results against task objective
|
|
- **Against:** Action documents (name, size, format, mimeType metadata)
|
|
- **Purpose:** Decide if task is complete or next action needed
|
|
- **Triggers:** Continue to next action if incomplete, complete task if done
|
|
|
|
### Task Planning (Adaptive - Uses ALL Workflow Data):
|
|
- **Input:** ALL workflow data available:
|
|
- All messages (text content)
|
|
- All document metadata (names, sizes, formats/extensions, mimeTypes)
|
|
- Previous task validation results
|
|
- **Process:**
|
|
- Extract delivered formats from all workflow documents
|
|
- Compare what was ACTUALLY delivered vs. what was PLANNED
|
|
- Refine next task objective:
|
|
- Deliver MORE if previous tasks delivered less than expected
|
|
- Deliver LESS if previous tasks already delivered more
|
|
- Adapt to actual workflow progress
|
|
|
|
### Task-Level Validation (Task Completion):
|
|
- **When:** After ALL actions in a task complete
|
|
- **Checks:** Task objective, task `expectedFormats`, task `successCriteria`
|
|
- **Against:** Documents from THIS task only (extract formats from document metadata)
|
|
- **Purpose:** Verify THIS task delivered what was expected for THIS task scope
|
|
- **Output:** Validation result (used in workflow data for next task planning)
|
|
|
|
### Workflow-Level Validation (Final):
|
|
- **When:** After ALL tasks complete
|
|
- **Checks:** Original user request, workflow `expectedFormats`, workflow success criteria
|
|
- **Against:** ALL documents from ALL tasks (extract formats from document metadata)
|
|
- **Purpose:** Final verification that complete workflow delivered what user requested
|
|
- **Triggers:** New compensatory task if validation fails (missing deliverables)
|
|
|
|
---
|
|
|
|
## 9. Next Steps
|
|
|
|
1. **Review and approve this proposal**
|
|
2. **Implement class changes** in datamodelChat.py
|
|
3. **Update intent analyzer prompt** to request actual file format extensions
|
|
4. **Update task planning prompt** to request `expectedFormats` list
|
|
5. **Update AI generation prompts** to include summary instruction
|
|
6. **Implement aggregation logic** for summaries at task/workflow levels
|
|
7. **Implement workflow-level validation** method
|
|
8. **Update all references** from `expectedFormat` to `expectedFormats`
|
|
|
|
---
|
|
|
|
## Questions Answered
|
|
|
|
✅ **Document metadata:** Use existing document fields (name, size, format from mimeType/extensions) - no summaries needed
|
|
✅ **Format extraction:** Extract formats from document metadata (mimeType or file extensions)
|
|
✅ **Task validation scope:** Task validation checks ONLY actions in that task, not all workflow actions
|
|
✅ **Adaptive planning:** Next task uses ALL workflow data (messages + document metadata) to refine objective
|
|
✅ **After each action:** Validate against task objective → decide complete or next action needed
|
|
|
|
---
|
|
|
|
## 10. Validation Flow Clarification
|
|
|
|
### Simplified Flow:
|
|
|
|
1. **Within Task (Action-by-Action):**
|
|
- Action executes → delivers documents with metadata
|
|
- Validate action results against task objective
|
|
- If incomplete → next action needed
|
|
- If complete → task done
|
|
|
|
2. **Task Planning (Adaptive):**
|
|
- Receives: ALL workflow data (messages, document metadata from all previous tasks)
|
|
- Extracts: Delivered formats from document metadata (file extensions/mimeTypes)
|
|
- Compares: What was actually delivered vs. what was planned
|
|
- Refines: Next task objective (may need more/less based on actual progress)
|
|
|
|
3. **Task Completion:**
|
|
- Validate: THIS task's documents (extract formats from metadata) against THIS task's expectations
|
|
- Result: Used in workflow data for next task planning
|
|
|
|
4. **Workflow Completion:**
|
|
- Final validation: All documents (extract formats from metadata) meet original user request
|
|
- If missing: Create compensatory task
|
|
|
|
---
|
|
|
|
**Status:** Ready for implementation after approval
|
|
|