721 lines
24 KiB
Markdown
721 lines
24 KiB
Markdown
# JSON String Accumulation Concept for Iterative AI Generation
|
|
|
|
## Problem Statement
|
|
|
|
Currently, the AI service processes each iteration's JSON string independently, then merges parsed objects. However, the real-world behavior is:
|
|
|
|
1. AI delivers a **STRING** containing JSON (not a parsed JSON object)
|
|
2. First iteration: AI delivers a JSON string that's cut off somewhere (broken/incomplete)
|
|
3. Subsequent iterations: AI delivers MORE JSON string fragments that need to be **APPENDED** to the previous JSON string
|
|
4. Challenge: How to handle incomplete JSON strings and merge them correctly
|
|
|
|
## Core Principle
|
|
|
|
- **If iteration 1 returns complete, valid JSON** → Use it directly (no accumulation needed)
|
|
- **If iteration 1 returns incomplete/broken JSON** → Enter accumulation mode
|
|
|
|
## State Management
|
|
|
|
State class is defined in `datamodelAi.py`:
|
|
|
|
```python
|
|
class JsonAccumulationState(BaseModel):
|
|
accumulatedJsonString: str # Raw accumulated JSON string
|
|
isAccumulationMode: bool # True if we're accumulating fragments
|
|
lastParsedResult: Optional[Dict[str, Any]] # Last successfully parsed result (for prompt context)
|
|
allSections: List[Dict[str, Any]] # Sections extracted so far (for prompt context)
|
|
```
|
|
|
|
## Flow Logic
|
|
|
|
### Phase 1: First Iteration Check
|
|
|
|
```
|
|
1. Receive JSON string from AI
|
|
2. Try to parse:
|
|
- SUCCESS + Complete → Extract sections → DONE (no accumulation)
|
|
- FAILURE or INCOMPLETE → Enter accumulation mode
|
|
```
|
|
|
|
### Phase 2: Accumulation Mode (if needed)
|
|
|
|
```
|
|
For each iteration:
|
|
1. Receive newFragmentString
|
|
2. Concatenate with overlap handling:
|
|
accumulatedJsonString = mergeJsonStringsWithOverlap(
|
|
accumulatedJsonString,
|
|
newFragmentString
|
|
)
|
|
3. Try to parse accumulatedJsonString:
|
|
- SUCCESS → Go to Phase 3 (completion)
|
|
- FAILURE → Continue accumulation
|
|
4. Extract partial sections (for prompt context):
|
|
- Use repairBrokenJson() to get best partial structure
|
|
- Extract sections from partial structure
|
|
- Update allSections (for next prompt)
|
|
5. Build continuation context for next prompt:
|
|
- Extract delivered_summary: Count of items/rows/lines per section
|
|
- Extract cut_off_element: Incomplete element where JSON was cut off
|
|
- Extract element_before_cutoff: Last complete element before cut-off
|
|
- Store last_raw_json: Raw JSON string for reference
|
|
6. Keep accumulatedJsonString for next iteration
|
|
```
|
|
|
|
### Phase 3: Completion (when parsing succeeds)
|
|
|
|
```
|
|
1. Analyze completeness:
|
|
- Check if all structures are closed
|
|
- Identify missing closing elements
|
|
2. Add closing elements if needed:
|
|
- Close unclosed arrays/objects
|
|
- Ensure proper JSON structure
|
|
3. Repair if corrupted:
|
|
- Fix any remaining corruption
|
|
4. Extract final sections:
|
|
- ExtractSectionsFromDocument()
|
|
5. DONE
|
|
```
|
|
|
|
## Function Design
|
|
|
|
### Main Function: `accumulateAndParseJsonFragments`
|
|
|
|
```python
|
|
@staticmethod
|
|
def accumulateAndParseJsonFragments(
|
|
accumulatedJsonString: str,
|
|
newFragmentString: str,
|
|
allSections: List[Dict[str, Any]],
|
|
iteration: int
|
|
) -> Tuple[str, List[Dict[str, Any]], bool, Optional[Dict[str, Any]]]:
|
|
"""
|
|
Accumulate JSON fragments and parse when complete.
|
|
|
|
GENERIC function that handles:
|
|
1. Concatenating JSON strings with overlap detection
|
|
2. Parsing the accumulated string
|
|
3. Extracting sections (partial if incomplete, final if complete)
|
|
4. Determining completion status
|
|
|
|
Args:
|
|
accumulatedJsonString: Previously accumulated JSON string
|
|
newFragmentString: New fragment string from current iteration
|
|
allSections: Sections extracted so far (for prompt context)
|
|
iteration: Current iteration number
|
|
|
|
Returns:
|
|
Tuple of:
|
|
- accumulatedJsonString: Updated accumulated string
|
|
- sections: Extracted sections (partial if incomplete, final if complete)
|
|
- isComplete: True if JSON is complete and valid
|
|
- parsedResult: Parsed JSON object (if parsing succeeded)
|
|
"""
|
|
|
|
# Step 1: Clean encoding issues from accumulated string (check end of first delivered part)
|
|
cleanedAccumulated = cleanEncodingIssues(accumulatedJsonString)
|
|
|
|
# Step 2: Clean encoding issues from new fragment
|
|
cleanedFragment = cleanEncodingIssues(newFragmentString)
|
|
|
|
# Step 3: Concatenate with overlap handling
|
|
combinedString = mergeJsonStringsWithOverlap(
|
|
cleanedAccumulated,
|
|
cleanedFragment
|
|
)
|
|
|
|
# Step 4: Try to parse
|
|
try:
|
|
extracted = extractJsonString(combinedString)
|
|
parsedResult = json.loads(extracted)
|
|
|
|
# Step 5: Parsing succeeded - check completeness
|
|
isComplete = isJsonComplete(parsedResult)
|
|
|
|
if isComplete:
|
|
# Step 6: Complete JSON - finalize
|
|
finalizedJson = finalizeJson(parsedResult)
|
|
sections = extractSectionsFromDocument(finalizedJson)
|
|
return combinedString, sections, True, finalizedJson
|
|
else:
|
|
# Step 7: Incomplete but parseable - extract partial sections
|
|
sections = extractSectionsFromDocument(parsedResult)
|
|
return combinedString, sections, False, parsedResult
|
|
|
|
except json.JSONDecodeError:
|
|
# Step 8: Still broken - repair and extract partial sections
|
|
repaired = repairBrokenJson(combinedString)
|
|
if repaired:
|
|
sections = extractSectionsFromDocument(repaired)
|
|
return combinedString, sections, False, repaired
|
|
else:
|
|
# Repair failed - continue with data BEFORE merging the problematic piece
|
|
# Return previous accumulated string (before adding new fragment)
|
|
# This ensures we don't lose previously accumulated data
|
|
logger.warning(f"Iteration {iteration}: Repair failed, continuing with previous accumulated data")
|
|
return accumulatedJsonString, [], False, None
|
|
```
|
|
|
|
## Helper Functions Needed
|
|
|
|
### 1. `mergeJsonStringsWithOverlap`
|
|
|
|
```python
|
|
@staticmethod
|
|
def mergeJsonStringsWithOverlap(
|
|
accumulated: str,
|
|
newFragment: str
|
|
) -> str:
|
|
"""
|
|
GENERIC function to merge two JSON strings, handling overlaps intelligently.
|
|
|
|
Works for ANY JSON structure - no specific logic for content types.
|
|
|
|
Overlap scenarios (all handled generically):
|
|
- Exact continuation: newFragment starts exactly where accumulated ends
|
|
- Partial overlap: newFragment overlaps with end of accumulated
|
|
- Full overlap: newFragment is subset of accumulated
|
|
|
|
Strategy:
|
|
1. Find longest common suffix/prefix match (string-based comparison)
|
|
2. Remove duplicate content
|
|
3. Concatenate remaining parts
|
|
|
|
Args:
|
|
accumulated: Previously accumulated JSON string
|
|
newFragment: New fragment string to append
|
|
|
|
Returns:
|
|
Combined JSON string with overlaps removed
|
|
"""
|
|
# Implementation:
|
|
# - Find longest common suffix/prefix match
|
|
# - Remove overlapping part
|
|
# - Concatenate: accumulated + newFragment[overlapEnd:]
|
|
pass
|
|
```
|
|
|
|
### 2. `isJsonComplete`
|
|
|
|
```python
|
|
@staticmethod
|
|
def isJsonComplete(parsedJson: Dict[str, Any]) -> bool:
|
|
"""
|
|
GENERIC function to check if parsed JSON structure is complete.
|
|
|
|
Works for ANY JSON structure - no specific logic for content types.
|
|
|
|
Completeness checks (all generic):
|
|
- All arrays are properly closed
|
|
- All objects are properly closed
|
|
- No incomplete structures
|
|
- Recursive validation of nested structures
|
|
|
|
Args:
|
|
parsedJson: Parsed JSON object
|
|
|
|
Returns:
|
|
True if JSON is complete, False otherwise
|
|
"""
|
|
# Implementation:
|
|
# - Recursively check all structures
|
|
# - Verify no incomplete arrays/objects
|
|
# - Generic validation (no content-type-specific logic)
|
|
pass
|
|
```
|
|
|
|
### 3. `finalizeJson`
|
|
|
|
```python
|
|
@staticmethod
|
|
def finalizeJson(parsedJson: Dict[str, Any]) -> Dict[str, Any]:
|
|
"""
|
|
GENERIC function to finalize complete JSON by adding missing closing elements and repairing corruption.
|
|
|
|
Works for ANY JSON structure - no specific logic for content types.
|
|
|
|
Steps (all generic):
|
|
1. Analyze structure for missing closing elements (recursively)
|
|
2. Add closing brackets/braces where needed
|
|
3. Repair any remaining corruption
|
|
4. Validate final structure
|
|
|
|
Args:
|
|
parsedJson: Parsed JSON object that needs finalization
|
|
|
|
Returns:
|
|
Finalized JSON object
|
|
"""
|
|
# Implementation:
|
|
# - Check for incomplete structures (generic recursive)
|
|
# - Add missing closing elements
|
|
# - Repair corruption using existing repair logic
|
|
# - Return finalized structure
|
|
pass
|
|
```
|
|
|
|
### 4. `cleanEncodingIssues`
|
|
|
|
```python
|
|
@staticmethod
|
|
def cleanEncodingIssues(jsonString: str) -> str:
|
|
"""
|
|
GENERIC function to remove problematic encoding parts from JSON string.
|
|
|
|
Works for ANY JSON structure - removes problematic characters/bytes.
|
|
|
|
Args:
|
|
jsonString: JSON string that may have encoding issues
|
|
|
|
Returns:
|
|
Cleaned JSON string
|
|
"""
|
|
try:
|
|
# Try to decode/encode to detect issues
|
|
jsonString.encode('utf-8').decode('utf-8')
|
|
return jsonString
|
|
except UnicodeError:
|
|
# Remove problematic parts
|
|
cleaned = jsonString.encode('utf-8', errors='ignore').decode('utf-8', errors='ignore')
|
|
logger.warning("Removed encoding issues from JSON string")
|
|
return cleaned
|
|
```
|
|
|
|
### 5. `extractKpiFromResponse`
|
|
|
|
```python
|
|
@staticmethod
|
|
def extractKpiFromResponse(aiResponse: str) -> Optional[int]:
|
|
"""
|
|
Extract KPI percentage from AI response.
|
|
|
|
AI is asked: "Based on the delivered data so far, approximately what percentage (%)
|
|
of the total required content has been delivered? Respond with an integer between 0-100."
|
|
|
|
Args:
|
|
aiResponse: AI response string that may contain percentage
|
|
|
|
Returns:
|
|
Integer percentage (0-100) or None if not found
|
|
"""
|
|
# Implementation:
|
|
# - Look for percentage pattern in response (e.g., "45%", "45 percent", "45")
|
|
# - Extract integer value
|
|
# - Validate range (0-100)
|
|
# - Return integer or None
|
|
pass
|
|
```
|
|
|
|
### 6. `validateKpiProgression`
|
|
|
|
```python
|
|
@staticmethod
|
|
def validateKpiProgression(
|
|
accumulationState: JsonAccumulationState,
|
|
currentKpi: int
|
|
) -> bool:
|
|
"""
|
|
Validate KPI progression from AI response.
|
|
|
|
Validation rules:
|
|
- If % goes DOWN → Error (e.g., no data received, started new) → Return False
|
|
- If % doesn't move (increment < 1%) → Error (no progress) → Return False
|
|
- If % goes UP (increment >= 1%) → Good progress → Return True
|
|
|
|
Args:
|
|
accumulationState: Current accumulation state (contains lastKpi)
|
|
currentKpi: Current KPI percentage from AI (integer 0-100)
|
|
|
|
Returns:
|
|
True if KPI progression is valid, False if error detected
|
|
"""
|
|
# Implementation:
|
|
# - Get lastKpi from accumulationState
|
|
# - Calculate increment = currentKpi - lastKpi
|
|
# - If increment < 0: return False (went down - error)
|
|
# - If increment < 1: return False (no progress - error)
|
|
# - If increment >= 1: return True (progress - good)
|
|
pass
|
|
```
|
|
|
|
## Continuation Context for Next Prompt
|
|
|
|
### What is Delivered for Next Iteration Prompt
|
|
|
|
When accumulating JSON fragments, the system needs to provide context to the AI for the next iteration. This is handled by `buildContinuationContext()` which extracts:
|
|
|
|
1. **deliveredSummary**: Summary of all sections with counts
|
|
- Per section: content type, item/row/line counts
|
|
- Example: `- bullet_list with 20 items`, `- table "section_table" with 8 rows`
|
|
- Truncated if too long (first 10 + last 10 items)
|
|
|
|
2. **cutOffElement**: The incomplete element where JSON was cut off
|
|
- Extracted from `lastRawResponse` (raw JSON string)
|
|
- Shows AI where generation stopped
|
|
- Used as reference point for continuation
|
|
|
|
3. **elementBeforeCutoff**: The last complete element before the cut-off
|
|
- Provides context of what was completed
|
|
- Helps AI understand structure
|
|
|
|
4. **lastRawJson**: Raw JSON string from last iteration
|
|
- Stored for reference
|
|
- Used to detect fragments vs. full JSON structures
|
|
|
|
5. **kpiQuestion**: Question for AI to answer with percentage delivered
|
|
- "Based on the delivered data so far, approximately what percentage (%) of the total required content has been delivered? Respond with an integer between 0-100."
|
|
- AI must respond with integer percentage (0-100)
|
|
|
|
### Logic Flow
|
|
|
|
```
|
|
After each accumulation iteration:
|
|
1. Extract sections from accumulated JSON (even if incomplete)
|
|
2. Build continuation context:
|
|
- Count items/rows/lines per section (for deliveredSummary)
|
|
- Find incomplete section from allSections
|
|
- Extract cut-off point from lastRawResponse
|
|
3. Pass context to prompt builder for next iteration
|
|
4. AI uses context to continue from cut-off point
|
|
```
|
|
|
|
## Integration Point
|
|
|
|
### Modified `_extractSectionsFromResponse` in `mainServiceAi.py`
|
|
|
|
```python
|
|
def _extractSectionsFromResponse(
|
|
result: str,
|
|
iteration: int,
|
|
debugPrefix: str,
|
|
allSections: List[Dict[str, Any]] = None,
|
|
accumulationState: Optional[JsonAccumulationState] = None # NEW: Track accumulation state
|
|
) -> Tuple[List[Dict[str, Any]], bool, Optional[Dict[str, Any]], Optional[JsonAccumulationState]]:
|
|
"""
|
|
Extract sections from AI response, handling both valid and broken JSON.
|
|
|
|
NEW BEHAVIOR:
|
|
- First iteration: Check if complete, if not start accumulation
|
|
- Subsequent iterations: Accumulate strings, parse when complete
|
|
|
|
Returns:
|
|
Tuple of:
|
|
- sections: Extracted sections
|
|
- wasJsonComplete: True if JSON is complete
|
|
- parsedResult: Parsed JSON object
|
|
- updatedAccumulationState: Updated accumulation state (None if not in accumulation mode)
|
|
"""
|
|
|
|
if iteration == 1:
|
|
# First iteration - check if complete
|
|
try:
|
|
extracted = extractJsonString(result)
|
|
parsed = json.loads(extracted)
|
|
|
|
# Check completeness
|
|
if JsonResponseHandler.isJsonComplete(parsed):
|
|
# Complete JSON - no accumulation needed
|
|
sections = extractSectionsFromDocument(parsed)
|
|
return sections, True, parsed, None # No accumulation
|
|
except:
|
|
pass
|
|
|
|
# Incomplete - start accumulation
|
|
logger.info(f"Iteration 1: Incomplete JSON detected, starting accumulation mode")
|
|
accumulationState = JsonAccumulationState(
|
|
accumulatedJsonString=result,
|
|
isAccumulationMode=True,
|
|
lastParsedResult=None,
|
|
allSections=[]
|
|
)
|
|
return [], False, None, accumulationState
|
|
|
|
else:
|
|
# Subsequent iterations - accumulate
|
|
if accumulationState and accumulationState.isAccumulationMode:
|
|
accumulated, sections, isComplete, parsedResult = \
|
|
JsonResponseHandler.accumulateAndParseJsonFragments(
|
|
accumulationState.accumulatedJsonString,
|
|
result,
|
|
allSections,
|
|
iteration
|
|
)
|
|
|
|
# Update accumulation state
|
|
accumulationState.accumulatedJsonString = accumulated
|
|
accumulationState.lastParsedResult = parsedResult
|
|
accumulationState.allSections = allSections + sections if sections else allSections
|
|
accumulationState.isAccumulationMode = not isComplete
|
|
|
|
return sections, isComplete, parsedResult, accumulationState
|
|
else:
|
|
# No accumulation mode - process normally (shouldn't happen)
|
|
logger.warning(f"Iteration {iteration}: No accumulation state but iteration > 1")
|
|
return [], False, None, None
|
|
```
|
|
|
|
### Modified Loop in `mainServiceAi.py`
|
|
|
|
```python
|
|
# In the iteration loop:
|
|
accumulationState = None # Track accumulation state
|
|
|
|
for iteration in range(1, maxIterations + 1):
|
|
# ... AI call ...
|
|
|
|
# Extract sections with accumulation support
|
|
extractedSections, wasJsonComplete, parsedResult, accumulationState = \
|
|
self._extractSectionsFromResponse(
|
|
result,
|
|
iteration,
|
|
debugPrefix,
|
|
allSections,
|
|
accumulationState # Pass accumulation state object
|
|
)
|
|
|
|
# Update allSections for prompt context
|
|
if extractedSections:
|
|
allSections = JsonResponseHandler.mergeSectionsIntelligently(
|
|
allSections,
|
|
extractedSections,
|
|
iteration
|
|
)
|
|
|
|
# Build continuation context for next prompt (if needed)
|
|
if not wasJsonComplete and (allSections or result):
|
|
continuationContext = buildContinuationContext(allSections, result)
|
|
# Add KPI question for AI to answer (percentage delivered)
|
|
continuationContext["kpiQuestion"] = "Based on the delivered data so far, approximately what percentage (%) of the total required content has been delivered? Respond with an integer between 0-100."
|
|
# Use continuationContext in next prompt
|
|
|
|
# Extract KPI from AI response and validate progression
|
|
if accumulationState and accumulationState.isAccumulationMode:
|
|
currentKpi = JsonResponseHandler.extractKpiFromResponse(result) # Extract percentage from AI response
|
|
if currentKpi is not None:
|
|
if not JsonResponseHandler.validateKpiProgression(accumulationState, currentKpi):
|
|
logger.warning(f"Iteration {iteration}: KPI validation failed, stopping accumulation")
|
|
break
|
|
# Store KPI in accumulation state
|
|
accumulationState.lastKpi = currentKpi
|
|
|
|
# Check completion
|
|
if wasJsonComplete:
|
|
break # Done
|
|
```
|
|
|
|
## Key Considerations
|
|
|
|
### 1. Overlap Detection Strategy
|
|
|
|
**Question:** How to detect overlaps between accumulated string and new fragment?
|
|
|
|
**GENERIC Approach:**
|
|
- Compare end of accumulated string with start of new fragment
|
|
- Find longest matching suffix/prefix (string-based comparison)
|
|
- Remove duplicate content
|
|
- Works for ANY JSON structure (no content-type-specific logic)
|
|
|
|
### 2. Partial Section Extraction
|
|
|
|
**Question:** Should we extract sections from incomplete JSON for prompt context?
|
|
|
|
**Answer:** Yes, with generic approach:
|
|
- Extract what's available (even if incomplete) - works for ANY content type
|
|
- Use for continuation prompts (via `buildContinuationContext()`)
|
|
- Build delivered summary with counts per section (generic counting)
|
|
- Extract cut-off point from raw JSON string (generic detection)
|
|
- Keep accumulated string separate (for next append)
|
|
|
|
### 3. State Storage
|
|
|
|
**Question:** Where to store `accumulatedJsonString`?
|
|
|
|
**Answer:** Store in `JsonAccumulationState` object for traceability
|
|
- Use `JsonAccumulationState` class from `datamodelAi.py`
|
|
- Store accumulated string, mode flag, parsed result, and sections
|
|
- Better traceability and debugging
|
|
- Can be logged/persisted if needed
|
|
|
|
### 4. Completion Detection
|
|
|
|
**Question:** When is JSON considered "complete"?
|
|
|
|
**GENERIC Criteria:**
|
|
- Parses successfully without errors
|
|
- All structures are properly closed (recursive check)
|
|
- No incomplete arrays/objects
|
|
- Generic validation (no content-type-specific checks)
|
|
|
|
### 5. Error Handling
|
|
|
|
**Scenarios:**
|
|
- Repair fails → Continue accumulation (don't stop)
|
|
- Parsing fails after accumulation → Try repair, continue if repair succeeds
|
|
- Merge fails → Log error, continue with best available data
|
|
|
|
## Implementation Steps
|
|
|
|
1. **Add state class** in `datamodelAi.py`:
|
|
- `JsonAccumulationState` (camelStyle naming)
|
|
|
|
2. **Create helper functions** in `subJsonResponseHandling.py`:
|
|
- `mergeJsonStringsWithOverlap()` (generic, camelStyle)
|
|
- `isJsonComplete()` (generic, camelStyle)
|
|
- `finalizeJson()` (generic, camelStyle)
|
|
|
|
3. **Create main function** in `subJsonResponseHandling.py`:
|
|
- `accumulateAndParseJsonFragments()` (generic, camelStyle)
|
|
|
|
4. **Modify `_extractSectionsFromResponse`** in `mainServiceAi.py`:
|
|
- Add `accumulationState` parameter (JsonAccumulationState object)
|
|
- Add first iteration check
|
|
- Call accumulation function for subsequent iterations
|
|
- Update accumulation state object
|
|
|
|
5. **Update iteration loop** in `mainServiceAi.py`:
|
|
- Track `accumulationState` object (JsonAccumulationState)
|
|
- Pass to `_extractSectionsFromResponse`
|
|
- Build continuation context using `buildContinuationContext()`
|
|
- Add KPI question to continuation context
|
|
- Extract KPI from AI response and validate progression
|
|
- Handle return values
|
|
|
|
6. **Create test file**:
|
|
- Test string accumulation with overlaps
|
|
- Test completion detection
|
|
- Test partial section extraction
|
|
- Test continuation context building
|
|
|
|
## Testing Strategy
|
|
|
|
### Test Cases
|
|
|
|
1. **Complete JSON on first iteration:**
|
|
- Should NOT enter accumulation mode
|
|
- Should extract sections directly
|
|
|
|
2. **Incomplete JSON on first iteration:**
|
|
- Should enter accumulation mode
|
|
- Should store string for next iteration
|
|
|
|
3. **Fragment with exact continuation:**
|
|
- Should concatenate without duplicates
|
|
- Should parse successfully
|
|
|
|
4. **Fragment with overlap:**
|
|
- Should detect and remove overlap
|
|
- Should concatenate correctly
|
|
|
|
5. **Fragment with full overlap:**
|
|
- Should handle duplicate content
|
|
- Should not add duplicates
|
|
|
|
6. **Multiple iterations:**
|
|
- Should accumulate across all iterations
|
|
- Should extract partial sections for prompts
|
|
- Should complete when JSON is valid
|
|
|
|
|
|
## Open Questions - Answers
|
|
|
|
### 1. How to handle very large accumulated strings? (Memory concerns)
|
|
|
|
**Answer:** No memory problems expected
|
|
- System handles files up to ~1GB
|
|
- String accumulation is acceptable for this size
|
|
- No special memory management needed
|
|
|
|
### 2. Should we limit accumulation attempts? (Prevent infinite loops)
|
|
|
|
**Answer:** Yes, use KPI-based stopping
|
|
- Add generic KPI to iteration prompt showing remaining elements needed
|
|
- KPI calculation: Compare expected vs. delivered counts per section type
|
|
- Stop if KPI doesn't decrease in 3 consecutive iterations
|
|
- KPI is AI-provided (not calculated by system) - AI answers percentage question
|
|
- Simple integer comparison for validation (no fuzzy AI calculation)
|
|
|
|
**KPI Question for Iteration Prompt:**
|
|
|
|
```
|
|
=== PROGRESS INDICATOR ===
|
|
Based on the delivered data so far, approximately what percentage (%) of the total
|
|
required content has been delivered?
|
|
|
|
Respond with an integer between 0-100.
|
|
|
|
⚠️ IMPORTANT:
|
|
- If percentage goes DOWN in next iteration → Generation will stop (error detected)
|
|
- If percentage doesn't increase by at least 1% → Generation will stop (no progress)
|
|
- Only continue if percentage increases by 1% or more
|
|
```
|
|
|
|
**KPI Validation Logic:**
|
|
```python
|
|
def validateKpiProgression(
|
|
accumulationState: JsonAccumulationState,
|
|
currentKpi: int
|
|
) -> bool:
|
|
"""
|
|
Validate KPI progression from AI response.
|
|
|
|
Validation rules:
|
|
- If % goes DOWN → Error (e.g., no data received, started new) → Return False
|
|
- If % doesn't move (increment < 1%) → Error (no progress) → Return False
|
|
- If % goes UP (increment >= 1%) → Good progress → Return True
|
|
|
|
Args:
|
|
accumulationState: Current accumulation state (contains lastKpi)
|
|
currentKpi: Current KPI percentage from AI (integer 0-100)
|
|
|
|
Returns:
|
|
True if KPI progression is valid, False if error detected
|
|
"""
|
|
lastKpi = accumulationState.lastKpi if accumulationState.lastKpi else 0
|
|
increment = currentKpi - lastKpi
|
|
|
|
if increment < 0:
|
|
return False # Went down - error
|
|
if increment < 1:
|
|
return False # No progress - error
|
|
return True # Progress - good
|
|
```
|
|
|
|
### 3. How to handle encoding issues in string concatenation?
|
|
|
|
**Answer:** Remove problematic parts
|
|
- Detect encoding errors during concatenation
|
|
- Remove problematic characters/bytes
|
|
- Continue with cleaned string
|
|
- Acceptable to lose some data rather than fail completely
|
|
|
|
**Implementation:**
|
|
```python
|
|
def cleanEncodingIssues(jsonString: str) -> str:
|
|
"""
|
|
Remove problematic encoding parts from JSON string.
|
|
|
|
Generic approach:
|
|
- Detect encoding errors
|
|
- Remove problematic characters/bytes
|
|
- Return cleaned string
|
|
"""
|
|
try:
|
|
# Try to decode/encode to detect issues
|
|
jsonString.encode('utf-8').decode('utf-8')
|
|
return jsonString
|
|
except UnicodeError:
|
|
# Remove problematic parts
|
|
cleaned = jsonString.encode('utf-8', errors='ignore').decode('utf-8', errors='ignore')
|
|
logger.warning("Removed encoding issues from JSON string")
|
|
return cleaned
|
|
```
|
|
|
|
### 4. Should overlap detection be configurable? (Performance vs. accuracy)
|
|
|
|
**Answer:** No, automated mode only
|
|
- AI calls take 30-180 seconds (plenty of time for overlap detection)
|
|
- No performance concerns
|
|
- Always use automated overlap detection
|
|
- No configuration needed
|
|
|