346 lines
15 KiB
Markdown
346 lines
15 KiB
Markdown
# Critical Analysis: Loop Plan Refactoring
|
|
|
|
## Executive Summary
|
|
**Overall Assessment**: ✅ **The proposal is fundamentally sound and well-reasoned**, with some areas that need careful consideration. The core idea of separating JSON completeness from task completion is architecturally correct.
|
|
|
|
**Key Corrections from User**:
|
|
1. ✅ **JSON completeness**: Use **parsing** (`json.loads()`), NOT last character check (line 860 is wrong)
|
|
2. ✅ **Intent levels**: Keep separate checks for workflow, task, and action levels
|
|
3. ✅ **Task-level override**: Critical requirement - allow taskplan to override workflow intent
|
|
4. ✅ **Sections without ID**: Omit from summary (don't show "unknown")
|
|
5. ✅ **Summary truncation**: Show first 100 and last 100 items if too long
|
|
6. ✅ **Extraction algorithm**: Only one incomplete section, handle first element edge case, deliver mid-string/number as-is
|
|
7. ✅ **UserIntention prompt**: Adding 5 items is acceptable for AI
|
|
8. ✅ **DoD checking**: Verified in refinement phase (`_refineDecide`, ContentValidator)
|
|
|
|
---
|
|
|
|
## ✅ **CLEVER ASPECTS**
|
|
|
|
### 1. **Separation of Concerns: JSON Completeness vs. Task Completion**
|
|
**Rating**: ⭐⭐⭐⭐⭐ Excellent
|
|
|
|
**Why it's clever:**
|
|
- The current code already hints at this separation (line 916: "CRITICAL: This is ONLY about AI Loop Completion, NOT Action DoD!")
|
|
- **JSON completeness** = "Did the AI finish generating the JSON structure?" (technical)
|
|
- **Task completion** = "Does the content meet user requirements?" (semantic)
|
|
- These are **orthogonal concerns** - mixing them causes confusion
|
|
|
|
**Evidence from code:**
|
|
```python
|
|
# Current _shouldContinueGeneration already separates these!
|
|
# Line 916: "CRITICAL: This is ONLY about AI Loop Completion, NOT Action DoD!"
|
|
# Line 917: "Action DoD is checked AFTER the AI Loop completes in _refineDecide."
|
|
```
|
|
|
|
**Benefit**: Simpler, more predictable loop behavior. JSON completeness is objective and testable.
|
|
|
|
---
|
|
|
|
### 2. **Integrating Intent Analysis into Existing Prompts**
|
|
**Rating**: ⭐⭐⭐⭐ Very Good (with caveats)
|
|
|
|
**Why it's clever:**
|
|
- **Saves 3 AI calls** = significant performance improvement
|
|
- Reduces latency (fewer round trips)
|
|
- Reduces cost (fewer API calls)
|
|
- Information is still gathered, just more efficiently
|
|
|
|
**Potential Issues:**
|
|
- **UserIntention prompt complexity**: Adding 5 new fields (primaryGoal, dataType, expectedFormats, qualityRequirements, successCriteria) might make the prompt too complex
|
|
- **Prompt quality risk**: One prompt doing too much might reduce accuracy
|
|
- **Task-specific vs. workflow-level intent**: Task intent might differ from workflow intent (e.g., "Generate 2000 primes" vs. overall workflow goal)
|
|
|
|
**Recommendation**:
|
|
- ✅ Keep integration for workflow-level (userintention)
|
|
- ⚠️ Consider keeping task/action-level intent analysis separate OR make it optional in prompts
|
|
- Test prompt quality after integration
|
|
|
|
---
|
|
|
|
### 3. **JSON Completeness Check via Parsing**
|
|
**Rating**: ⭐⭐⭐⭐⭐ Excellent (CORRECTED)
|
|
|
|
**User Correction**: ❌ **NOT by last character check!** Last character could be `}` or `]` by chance, JSON still incomplete.
|
|
|
|
**Correct Approach**:
|
|
- **Use JSON parsing** to determine completeness
|
|
- If `json.loads()` succeeds → JSON is complete
|
|
- If `json.loads()` fails → JSON is incomplete/broken
|
|
- **Current implementation is WRONG** (line 860: `raw_normalized.endswith(('}', ']'))`)
|
|
|
|
**Why it's clever:**
|
|
- **Objective**: JSON either parses successfully or it doesn't
|
|
- **Reliable**: Handles nested structures, mid-string cuts, mid-number cuts
|
|
- **Debuggable**: Parsing errors clearly indicate where JSON is broken
|
|
|
|
**Benefit**: Predictable behavior, handles all edge cases correctly.
|
|
|
|
---
|
|
|
|
### 4. **New Continuation Summary Format**
|
|
**Rating**: ⭐⭐⭐⭐ Very Good
|
|
|
|
**Why it's clever:**
|
|
- **Clear progress indication**: "heading X with Y items" is more intuitive than "minSections: 5"
|
|
- **Contextual**: Shows what's actually been delivered, not abstract metrics
|
|
- **Actionable**: AI can see exactly what's missing
|
|
|
|
**Example format:**
|
|
```
|
|
Following data has already been delivered:
|
|
- heading "section_1" level 1: Introduction
|
|
- paragraph with 3 texts
|
|
- table "data_table" with 150 rows
|
|
- code_block "primes" with 2000 code lines
|
|
```
|
|
|
|
**User Clarifications**:
|
|
- ✅ **Missing IDs**: If section has no ID, **omit it** from summary (don't show "unknown")
|
|
- ✅ **Summary too long**: If summary exceeds token limit, truncate: show **first 100 items and last 100 items** (remove middle)
|
|
- ✅ **Format consistency**: Ensure consistent formatting across all section types
|
|
|
|
**Recommendation**:
|
|
- ✅ Implement ID check: Skip sections without ID
|
|
- ✅ Implement truncation: First 100 + last 100 items if > 200 items
|
|
|
|
---
|
|
|
|
### 5. **New Extraction Algorithm**
|
|
**Rating**: ⭐⭐⭐ Good (needs refinement)
|
|
|
|
**Why it's clever:**
|
|
- **Precise**: Finds exact cut-off point (section → element → sub-element)
|
|
- **Actionable**: Returns both cut-off element AND element before (gives AI context)
|
|
|
|
**Algorithm:**
|
|
1. Loop over all sections until finding incomplete section
|
|
2. In incomplete section, loop through elements until finding cut-off element
|
|
3. Return cut-off element AND element before it
|
|
|
|
**User Clarifications**:
|
|
- ✅ **Edge case - first element**: If cut-off is in first element, just show cut-off element (no element before exists)
|
|
- ✅ **Only one incomplete section**: There is always only **one section incomplete** (JSON cut-off point)
|
|
- ✅ **Mid-string/number cuts**: In 99% of cases, JSON is cut off mid-string or mid-number - deliver the cut-off part **as-is** (don't try to "complete" it)
|
|
- ✅ **Performance**: No problem - we only parse one AI response, not all accumulated sections
|
|
|
|
**Recommendation**:
|
|
- ✅ Handle first element edge case explicitly
|
|
- ✅ Extract cut-off element as-is (don't try to complete mid-string/number)
|
|
- ✅ Performance is fine (only parsing one response)
|
|
|
|
---
|
|
|
|
## ⚠️ **AREAS OF CONCERN**
|
|
|
|
### 1. **DoD Removal: Where Does Task Completion Checking Happen?**
|
|
**Rating**: ⚠️ Needs Clarification
|
|
|
|
**User Requirement**: Check deeply in code how completeness checks are handled
|
|
|
|
**Current State:**
|
|
- `_shouldContinueGeneration` already separates JSON completeness from DoD (line 916)
|
|
- `_analyzeTaskCompletion` exists but **verified: NOT called anywhere** in codebase
|
|
- Comment says "Action DoD is checked AFTER the AI Loop completes in _refineDecide"
|
|
- `_refineDecide` (modeDynamic.py line 693) uses:
|
|
- ContentValidator for content quality and requirements
|
|
- ProgressTracker for progress state
|
|
- Content validation and analysis (not DoD metrics)
|
|
|
|
**Findings**:
|
|
- ✅ `_analyzeTaskCompletion` is safe to remove (not called)
|
|
- ✅ DoD checking happens in refinement/validation phase (`_refineDecide`, ContentValidator)
|
|
- ✅ JSON completeness (parsing) is separate from task completion (validation)
|
|
|
|
**Recommendation**:
|
|
- ✅ Remove `_analyzeTaskCompletion` (verified safe)
|
|
- ✅ Ensure validation/refinement phase (`_refineDecide`, ContentValidator) still checks requirements
|
|
- ✅ Keep separation: JSON completeness (loop) vs. Task completion (refinement)
|
|
|
|
---
|
|
|
|
### 2. **UserIntention Prompt Complexity**
|
|
**Rating**: ⚠️ Moderate Risk
|
|
|
|
**Current**: UserIntention prompt does 4 things:
|
|
1. Language detection
|
|
2. Normalization
|
|
3. Intent extraction
|
|
4. Context item extraction
|
|
|
|
**Proposed**: Add 5 more things:
|
|
5. Primary goal
|
|
6. Data type
|
|
7. Expected formats
|
|
8. Quality requirements
|
|
9. Success criteria
|
|
|
|
**User Clarification**: ✅ **Adding 5 items is no problem for AI** - prompt complexity is acceptable
|
|
|
|
**Risk**:
|
|
- ~~Prompt bloat~~: User confirms this is acceptable
|
|
- ~~AI confusion~~: User confirms AI can handle this
|
|
- Token usage: Longer prompt = more tokens per call (acceptable trade-off)
|
|
|
|
**Mitigation Options**:
|
|
- ✅ User confirms: No mitigation needed, AI can handle it
|
|
- ✅ Keep integration as planned
|
|
|
|
**Recommendation**:
|
|
- ✅ Proceed with integration (user confirmed acceptable)
|
|
- ✅ Test after implementation to verify quality maintained
|
|
|
|
---
|
|
|
|
### 3. **Task-Specific vs. Workflow-Level Intent**
|
|
**Rating**: ⚠️ Needs Consideration
|
|
|
|
**Problem**:
|
|
- Workflow intent: "Generate comprehensive report" (high-level)
|
|
- Task intent: "Generate first 2000 prime numbers" (specific)
|
|
- Action intent: "Generate primes 1-1000" (very specific)
|
|
|
|
**Current Plan**:
|
|
- UserIntention → workflow-level intent ✅
|
|
- Taskplan → use workflow intent (no re-analysis) ⚠️
|
|
- Dynamic prompts → action-level intent analysis ✅
|
|
|
|
**Concern**:
|
|
- Task might need different dataType/expectedFormats than workflow
|
|
- Example: Workflow wants PDF, but task needs CSV for intermediate step
|
|
|
|
**User Clarification**: ✅ **YES, very important!** Allow taskplan to override workflow intent if task-specific needs differ
|
|
|
|
**Recommendation**:
|
|
- ✅ Keep workflow-level intent in userintention
|
|
- ✅ **CRITICAL**: Allow taskplan to override workflow intent (e.g., workflow wants PDF, task needs CSV)
|
|
- ✅ Keep action-level intent analysis in dynamic prompts (as planned)
|
|
- ✅ **CRITICAL**: Intent check should be different on workflow, task, and action levels (keep separate)
|
|
|
|
---
|
|
|
|
### 4. **Merge Logic Compatibility**
|
|
**Rating**: ✅ Low Risk
|
|
|
|
**Current Plan**: "Keep existing merge logic - it works well"
|
|
|
|
**Analysis**:
|
|
- ✅ Merge logic (`_mergeSectionsIntelligently`, `_mergeSectionContent`) is independent of extraction algorithm
|
|
- ✅ New extraction algorithm only affects continuation context, not merging
|
|
- ⚠️ Need to ensure extraction algorithm works with merged sections
|
|
|
|
**Recommendation**:
|
|
- ✅ Keep merge logic as-is
|
|
- ✅ Test extraction algorithm with merged sections
|
|
- ✅ Ensure extraction finds cut-off in merged structure correctly
|
|
|
|
---
|
|
|
|
### 5. **JSON Completeness Detection Edge Cases**
|
|
**Rating**: ⚠️ Needs Testing
|
|
|
|
**User Correction**: ❌ **Current check is WRONG!** `raw_normalized.endswith(('}', ']'))` is incorrect.
|
|
|
|
**Correct Approach**:
|
|
- **Use JSON parsing** (`json.loads()`) to determine completeness
|
|
- If parsing succeeds → JSON is complete
|
|
- If parsing fails → JSON is incomplete
|
|
- **Nested structures example**: `{"a": {"b": [1,2,3` should NOT pass parseJson (correctly fails)
|
|
|
|
**Current Code Issues**:
|
|
- Line 860: `raw_normalized.endswith(('}', ']'))` - **WRONG**, remove this check
|
|
- Line 864: `json.loads(extracted)` - **CORRECT**, use this as primary check
|
|
- Line 880: `except json.JSONDecodeError` - **CORRECT**, handles parsing failures
|
|
|
|
**Recommendation**:
|
|
- ✅ Remove last character check (line 860)
|
|
- ✅ Rely only on JSON parsing (`json.loads()`)
|
|
- ✅ Parsing handles all edge cases correctly (nested structures, mid-string cuts, etc.)
|
|
|
|
---
|
|
|
|
## 📊 **RISK ASSESSMENT**
|
|
|
|
| Aspect | Risk Level | Mitigation |
|
|
|--------|------------|------------|
|
|
| DoD removal | 🟢 Low | Verify checking happens in refinement phase |
|
|
| Intent integration | 🟡 Medium | Test prompt quality, have fallback plan |
|
|
| Extraction algorithm | 🟡 Medium | Handle edge cases, add tests |
|
|
| JSON completeness | 🟢 Low | Current implementation is robust |
|
|
| Merge compatibility | 🟢 Low | Merge logic is independent |
|
|
|
|
---
|
|
|
|
## 🎯 **RECOMMENDATIONS**
|
|
|
|
### High Priority
|
|
1. ✅ **Verify DoD checking location**: ✅ Confirmed in refinement phase (`_refineDecide`, ContentValidator)
|
|
2. ✅ **Test UserIntention prompt quality**: User confirmed acceptable, proceed with integration
|
|
3. ✅ **Handle extraction edge cases**: ✅ Clarified - first element (show only cut-off), only one incomplete section, deliver mid-string/number as-is
|
|
4. ✅ **Fix JSON completeness check**: ✅ Remove last character check, use parsing only
|
|
5. ✅ **Implement task-level intent override**: ✅ Critical - allow taskplan to override workflow intent
|
|
|
|
### Medium Priority
|
|
4. ✅ **Prompt splitting**: User confirmed not needed - AI can handle 5 additional fields
|
|
5. ✅ **Task-level intent override**: ✅ Critical requirement - must implement
|
|
6. ✅ **Summary truncation**: ✅ Clarified - show first 100 and last 100 items if too long
|
|
7. ✅ **Omit sections without ID**: ✅ Clarified - skip sections without ID in summary
|
|
|
|
### Low Priority
|
|
7. ✅ **Performance optimization**: Early exit in extraction algorithm if cut-off found early
|
|
8. ✅ **Add comprehensive tests**: Unit tests for all edge cases
|
|
9. ✅ **Documentation**: Update docs to explain new loop behavior
|
|
|
|
---
|
|
|
|
## 💡 **FINAL VERDICT**
|
|
|
|
**Overall**: ✅ **APPROVE WITH MODIFICATIONS**
|
|
|
|
**Strengths**:
|
|
- Clear separation of concerns (JSON completeness vs. task completion)
|
|
- Performance improvement (3 fewer AI calls)
|
|
- Simpler, more maintainable code
|
|
- Better continuation prompts (section counts vs. abstract metrics)
|
|
|
|
**Required Modifications** (UPDATED):
|
|
1. ✅ **Fix JSON completeness check**: Remove last character check, use parsing only
|
|
2. ✅ **Verify DoD checking**: Confirmed in refinement phase (`_refineDecide`, ContentValidator)
|
|
3. ✅ **UserIntention prompt**: User confirmed acceptable, proceed with integration
|
|
4. ✅ **Extraction algorithm**: Handle first element edge case, deliver mid-string/number as-is
|
|
5. ✅ **Task-level intent override**: Critical requirement - must implement
|
|
6. ✅ **Summary handling**: Omit sections without ID, truncate if too long (first 100 + last 100)
|
|
7. ✅ **Intent levels**: Keep separate checks for workflow, task, and action levels
|
|
|
|
**Risk Level**: 🟡 **Medium** - Well-reasoned proposal with manageable risks
|
|
|
|
---
|
|
|
|
## 🔍 **QUESTIONS ANSWERED**
|
|
|
|
1. ✅ **Where is `_analyzeTaskCompletion` called?** → **ANSWERED**: Not called anywhere, safe to remove
|
|
2. ✅ **Does refinement phase check DoD?** → **ANSWERED**: Yes, in `_refineDecide` via ContentValidator
|
|
3. ✅ **What happens if JSON is complete but wrong?** → **ANSWERED**: Validation happens in refinement phase
|
|
4. ✅ **Can tasks override workflow intent?** → **ANSWERED**: Yes, critical requirement - must implement
|
|
5. ✅ **What's the token limit for continuation summary?** → **ANSWERED**: Truncate if > 200 items (first 100 + last 100)
|
|
6. ✅ **How to check JSON completeness?** → **ANSWERED**: Use parsing (`json.loads()`), NOT last character check
|
|
7. ✅ **What if section has no ID?** → **ANSWERED**: Omit it from summary
|
|
8. ✅ **What if cut-off in first element?** → **ANSWERED**: Show only cut-off element (no element before)
|
|
9. ✅ **How many sections incomplete?** → **ANSWERED**: Always only one section incomplete
|
|
10. ✅ **What about mid-string/number cuts?** → **ANSWERED**: Deliver as-is (99% of cases)
|
|
|
|
---
|
|
|
|
## 📝 **CONCLUSION**
|
|
|
|
The proposal is **architecturally sound** and addresses real problems (complex DoD logic, multiple AI calls). The core idea of separating JSON completeness from task completion is **excellent design**.
|
|
|
|
Main concerns (UPDATED):
|
|
- ✅ **Prompt complexity** → User confirmed acceptable, proceed
|
|
- ✅ **Edge cases in extraction** → Clarified and addressed
|
|
- ✅ **Task-level intent** → Critical requirement, must implement override
|
|
- ✅ **JSON completeness check** → Must fix (remove last char check, use parsing)
|
|
- ✅ **Summary handling** → Clarified (omit no-ID sections, truncate if long)
|
|
|
|
**Recommendation**: ✅ **Proceed with implementation** - all concerns addressed and clarified by user.
|
|
|