fixed documents handling
This commit is contained in:
parent
a3dd5f2feb
commit
c135321aee
23 changed files with 2913 additions and 1747 deletions
|
|
@ -1,121 +0,0 @@
|
|||
# Azure AD Consent Links
|
||||
|
||||
## Konfiguration
|
||||
- **Client ID**: `c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`
|
||||
- **Tenant ID**: `common` (Multi-Tenant)
|
||||
- **Redirect URI (Prod)**: `https://gateway-prod.poweron-center.net/api/msft/auth/callback`
|
||||
- **Redirect URI (Int)**: `https://gateway-int.poweron-center.net/api/msft/auth/callback`
|
||||
|
||||
## Berechtigungen (Scopes)
|
||||
- `Mail.ReadWrite` - E-Mails lesen und schreiben
|
||||
- `Mail.Send` - E-Mails senden
|
||||
- `Mail.ReadWrite.Shared` - Zugriff auf geteilte Postfächer
|
||||
- `User.Read` - Benutzerprofil lesen
|
||||
- `Sites.ReadWrite.All` - Alle SharePoint-Standorte lesen und schreiben
|
||||
- `Files.ReadWrite.All` - Alle Dateien lesen und schreiben
|
||||
|
||||
## Admin Consent Link (für Tenant-Administrator)
|
||||
|
||||
**WICHTIG:** Der Admin Consent Endpoint gibt `admin_consent` und `tenant` zurück, nicht `code` und `state`.
|
||||
Der bestehende `/auth/callback` Handler erwartet `code` und `state` für den normalen OAuth-Flow.
|
||||
|
||||
**Option 1: Admin Consent über Azure Portal (für eigenen Tenant)**
|
||||
1. Gehe zu Azure Portal → Azure Active Directory → App registrations
|
||||
2. Wähle die App `c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`
|
||||
3. Gehe zu "API permissions"
|
||||
4. Klicke auf "Grant admin consent for [Tenant Name]"
|
||||
|
||||
**Option 1b: App für andere Tenants verfügbar machen**
|
||||
|
||||
Um die App für andere Tenants sichtbar zu machen, müssen folgende Schritte durchgeführt werden:
|
||||
|
||||
1. **Multi-Tenant Konfiguration prüfen:**
|
||||
- Azure Portal → Azure Active Directory → App registrations
|
||||
- Wähle die App `c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`
|
||||
- Gehe zu "Authentication"
|
||||
- Stelle sicher, dass "Supported account types" auf **"Accounts in any organizational directory and personal Microsoft accounts"** oder **"Accounts in any organizational directory"** gesetzt ist
|
||||
|
||||
2. **App für andere Tenants verfügbar machen:**
|
||||
|
||||
**Methode A: Direkter Admin Consent Link (empfohlen)**
|
||||
- Andere Tenant-Administratoren können den Admin Consent Link verwenden:
|
||||
```
|
||||
https://login.microsoftonline.com/{TENANT_ID}/adminconsent?client_id=c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c&redirect_uri=https://gateway-prod.poweron-center.net/api/msft/adminconsent/callback
|
||||
```
|
||||
- Ersetze `{TENANT_ID}` durch die Tenant-ID des Ziel-Tenants (oder verwende `common` für Multi-Tenant)
|
||||
|
||||
**Methode B: Manuell über Azure Portal (für andere Tenants)**
|
||||
- Tenant-Administrator des anderen Tenants:
|
||||
1. Gehe zu Azure Portal → Azure Active Directory → Enterprise applications
|
||||
2. Klicke auf "+ New application"
|
||||
3. Wähle "Browse Azure AD Gallery" (optional) oder "Create your own application"
|
||||
4. Wenn nicht in Gallery: Wähle "Non-gallery application"
|
||||
5. Gib die Client ID ein: `c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`
|
||||
6. Oder verwende direkt diesen Link:
|
||||
```
|
||||
https://portal.azure.com/#blade/Microsoft_AAD_IAM/ManagedAppMenuBlade/Overview/objectId/{CLIENT_ID}
|
||||
```
|
||||
(Ersetze `{CLIENT_ID}` mit `c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`)
|
||||
7. Gehe zu "Permissions" → "Grant admin consent"
|
||||
|
||||
**Methode C: App in Azure AD Gallery veröffentlichen (optional)**
|
||||
- Für größere Sichtbarkeit kann die App in der Azure AD App Gallery veröffentlicht werden
|
||||
- Azure Portal → App registrations → App → "Branding & properties"
|
||||
- Kontaktiere Microsoft für Gallery-Veröffentlichung
|
||||
|
||||
3. **Wichtig für Multi-Tenant Apps:**
|
||||
- Die Redirect URIs müssen öffentlich erreichbar sein
|
||||
- Die App muss die richtigen Berechtigungen deklarieren
|
||||
- Tenant-Administratoren müssen explizit zustimmen (Admin Consent)
|
||||
|
||||
**Option 2: Admin Consent Link (mit Callback-Handler)**
|
||||
### Production
|
||||
```
|
||||
https://login.microsoftonline.com/common/adminconsent?client_id=c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c&redirect_uri=https://gateway-prod.poweron-center.net/api/msft/adminconsent/callback
|
||||
```
|
||||
|
||||
### Integration
|
||||
```
|
||||
https://login.microsoftonline.com/common/adminconsent?client_id=c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c&redirect_uri=https://gateway-int.poweron-center.net/api/msft/adminconsent/callback
|
||||
```
|
||||
|
||||
**Hinweis:** Der `/adminconsent/callback` Endpoint ist implementiert und verarbeitet die `admin_consent` und `tenant` Parameter. Nach erfolgreichem Admin Consent wird eine Bestätigungsseite angezeigt.
|
||||
|
||||
## User Consent Link (für einzelne Benutzer)
|
||||
|
||||
### Production
|
||||
```
|
||||
https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c&response_type=code&redirect_uri=https://gateway-prod.poweron-center.net/api/msft/auth/callback&response_mode=query&scope=Mail.ReadWrite Mail.Send Mail.ReadWrite.Shared User.Read Sites.ReadWrite.All Files.ReadWrite.All offline_access openid profile&state=login
|
||||
```
|
||||
|
||||
### Integration
|
||||
```
|
||||
https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c&response_type=code&redirect_uri=https://gateway-int.poweron-center.net/api/msft/auth/callback&response_mode=query&scope=Mail.ReadWrite Mail.Send Mail.ReadWrite.Shared User.Read Sites.ReadWrite.All Files.ReadWrite.All offline_access openid profile&state=login
|
||||
```
|
||||
|
||||
## Hinweise
|
||||
|
||||
1. **Admin Consent**: Muss von einem Tenant-Administrator durchgeführt werden, um die App für alle Benutzer im Tenant zu genehmigen
|
||||
2. **User Consent**: Jeder Benutzer kann individuell zustimmen (wenn Admin Consent nicht durchgeführt wurde)
|
||||
3. **Multi-Tenant**: Da `common` als Tenant verwendet wird, funktioniert die App für alle Azure AD Tenants
|
||||
4. **Redirect URI**: Muss exakt in der Azure AD App-Registrierung konfiguriert sein
|
||||
|
||||
## Azure Portal Konfiguration
|
||||
|
||||
Stelle sicher, dass in der Azure AD App-Registrierung (`c7e7112d-61dc-4f3a-8cd3-08cc4cd7504c`) folgendes konfiguriert ist:
|
||||
|
||||
1. **Redirect URIs**:
|
||||
- `https://gateway-prod.poweron-center.net/api/msft/auth/callback`
|
||||
- `https://gateway-int.poweron-center.net/api/msft/auth/callback`
|
||||
|
||||
2. **API Permissions** (Delegated):
|
||||
- ✅ Mail.ReadWrite
|
||||
- ✅ Mail.Send
|
||||
- ✅ Mail.ReadWrite.Shared
|
||||
- ✅ User.Read
|
||||
- ✅ Sites.ReadWrite.All
|
||||
- ✅ Files.ReadWrite.All
|
||||
|
||||
3. **Supported account types**:
|
||||
- "Accounts in any organizational directory and personal Microsoft accounts" (Multi-tenant)
|
||||
|
||||
|
|
@ -44,6 +44,8 @@ class AiAnthropic(BaseConnectorAi):
|
|||
return "anthropic"
|
||||
|
||||
def getModels(self) -> List[AiModel]:
|
||||
return [] # TODO: DEBUG TO TURN ON AFTER TESTING
|
||||
|
||||
"""Get all available Anthropic models."""
|
||||
return [
|
||||
AiModel(
|
||||
|
|
|
|||
|
|
@ -827,6 +827,7 @@ class TaskContext(BaseModel):
|
|||
parametersContext: Optional[str] = Field(None, description="Context for parameter generation")
|
||||
learnings: Optional[list[str]] = Field(default_factory=list, description="Learnings from previous actions")
|
||||
stage1Selection: Optional[dict] = Field(None, description="Stage 1 selection data")
|
||||
nextActionGuidance: Optional[Dict[str, Any]] = Field(None, description="Guidance for the next action from previous refinement")
|
||||
|
||||
def updateFromSelection(self, selection: Any):
|
||||
"""Update context from Stage 1 selection
|
||||
|
|
|
|||
|
|
@ -167,8 +167,7 @@ Respond with ONLY a JSON object in this exact format:
|
|||
promptBuilder: Optional[callable] = None,
|
||||
promptArgs: Optional[Dict[str, Any]] = None,
|
||||
operationId: Optional[str] = None,
|
||||
userPrompt: Optional[str] = None,
|
||||
workflowIntent: Optional[Dict[str, Any]] = None
|
||||
userPrompt: Optional[str] = None
|
||||
) -> str:
|
||||
"""
|
||||
Shared core function for AI calls with repair-based looping system.
|
||||
|
|
@ -212,17 +211,14 @@ Respond with ONLY a JSON object in this exact format:
|
|||
)
|
||||
|
||||
# Build iteration prompt
|
||||
if len(allSections) > 0 and promptBuilder and promptArgs:
|
||||
# CRITICAL: Build continuation prompt if we have sections OR if we have a previous response (even if broken)
|
||||
# This ensures continuation prompts are built even when JSON is so broken that no sections can be extracted
|
||||
if (len(allSections) > 0 or lastRawResponse) and promptBuilder and promptArgs:
|
||||
# This is a continuation - build continuation context with raw JSON and rebuild prompt
|
||||
continuationContext = buildContinuationContext(allSections, lastRawResponse)
|
||||
if not lastRawResponse:
|
||||
logger.warning(f"Iteration {iteration}: No previous response available for continuation!")
|
||||
|
||||
# CRITICAL: Add workflowIntent (actionIntent) to continuationContext for DoD-based progress filtering
|
||||
# This allows buildGenerationPrompt to filter progress stats based on Definition of Done KPIs
|
||||
if workflowIntent:
|
||||
continuationContext['taskIntent'] = workflowIntent # Keep key name 'taskIntent' for compatibility
|
||||
|
||||
# Filter promptArgs to only include parameters that buildGenerationPrompt accepts
|
||||
# buildGenerationPrompt accepts: outputFormat, userPrompt, title, extracted_content, continuationContext
|
||||
filteredPromptArgs = {
|
||||
|
|
@ -277,14 +273,37 @@ Respond with ONLY a JSON object in this exact format:
|
|||
# Don't break the main loop if stat storage fails
|
||||
logger.warning(f"Failed to store workflow stat: {str(statError)}")
|
||||
|
||||
# Check for error response using generic error detection (errorCount > 0 or modelName == "error")
|
||||
if hasattr(response, 'errorCount') and response.errorCount > 0:
|
||||
errorMsg = f"Iteration {iteration}: Error response detected (errorCount={response.errorCount}), stopping loop: {result[:200] if result else 'empty'}"
|
||||
logger.error(errorMsg)
|
||||
break
|
||||
|
||||
if hasattr(response, 'modelName') and response.modelName == "error":
|
||||
errorMsg = f"Iteration {iteration}: Error response detected (modelName=error), stopping loop: {result[:200] if result else 'empty'}"
|
||||
logger.error(errorMsg)
|
||||
break
|
||||
|
||||
if not result or not result.strip():
|
||||
logger.warning(f"Iteration {iteration}: Empty response, stopping")
|
||||
break
|
||||
|
||||
# Check if this is a text response (not document generation)
|
||||
# Text responses don't need JSON parsing - return immediately after first successful response
|
||||
isTextResponse = (promptBuilder is None and promptArgs is None) or debugPrefix == "text"
|
||||
|
||||
if isTextResponse:
|
||||
# For text responses, return the text immediately - no JSON parsing needed
|
||||
logger.info(f"Iteration {iteration}: Text response received, returning immediately")
|
||||
if iterationOperationId:
|
||||
self.services.chat.progressLogFinish(iterationOperationId, True)
|
||||
return result
|
||||
|
||||
# Store raw response for continuation (even if broken)
|
||||
lastRawResponse = result
|
||||
|
||||
# Extract sections from response (handles both valid and broken JSON)
|
||||
# Only for document generation (JSON responses)
|
||||
extractedSections, wasJsonComplete, parsedResult = self._extractSectionsFromResponse(result, iteration, debugPrefix)
|
||||
|
||||
# Extract document metadata from first iteration if available
|
||||
|
|
@ -312,25 +331,12 @@ Respond with ONLY a JSON object in this exact format:
|
|||
allSections = self._mergeSectionsIntelligently(allSections, extractedSections, iteration)
|
||||
|
||||
# Check if we should continue (completion detection)
|
||||
# Extract user prompt from promptArgs if available
|
||||
extractedUserPrompt = userPrompt
|
||||
if not extractedUserPrompt and promptArgs:
|
||||
extractedUserPrompt = promptArgs.get("userPrompt") or promptArgs.get("user_prompt")
|
||||
if not extractedUserPrompt:
|
||||
# Try to extract from original prompt
|
||||
if "User request:" in prompt:
|
||||
try:
|
||||
extractedUserPrompt = prompt.split("User request:")[1].split("\n")[0].strip('"')
|
||||
except:
|
||||
pass
|
||||
|
||||
# Simple logic: JSON completeness determines continuation
|
||||
shouldContinue = self._shouldContinueGeneration(
|
||||
allSections,
|
||||
iteration,
|
||||
wasJsonComplete,
|
||||
result,
|
||||
userPrompt=extractedUserPrompt,
|
||||
workflowIntent=workflowIntent
|
||||
result
|
||||
)
|
||||
|
||||
if shouldContinue:
|
||||
|
|
@ -842,39 +848,22 @@ Respond with ONLY a JSON object in this exact format:
|
|||
Determines completion based on JSON structure (complete JSON = complete, broken/incomplete = incomplete).
|
||||
Returns (sections, wasJsonComplete, parsedResult)
|
||||
"""
|
||||
|
||||
# First, try to parse as valid JSON
|
||||
# CRITICAL: JSON completeness is determined by parsing, NOT by last character check!
|
||||
# Last character could be } or ] by chance, JSON still incomplete
|
||||
try:
|
||||
extracted = extractJsonString(result)
|
||||
|
||||
# CRITICAL: Check if raw response suggests incomplete JSON BEFORE parsing
|
||||
# extractFirstBalancedJson can return partial but valid JSON if raw is incomplete
|
||||
from modules.shared.jsonUtils import stripCodeFences, normalizeJsonText
|
||||
raw_normalized = normalizeJsonText(stripCodeFences(result.strip())).strip()
|
||||
extracted_stripped = extracted.strip()
|
||||
|
||||
# If extracted is shorter than raw, or raw doesn't end properly, it's incomplete
|
||||
is_raw_incomplete = False
|
||||
if len(extracted_stripped) < len(raw_normalized):
|
||||
is_raw_incomplete = True
|
||||
logger.info(f"Iteration {iteration}: Extracted JSON ({len(extracted_stripped)} chars) shorter than raw ({len(raw_normalized)} chars) - raw is incomplete")
|
||||
elif raw_normalized and not raw_normalized.endswith(('}', ']')):
|
||||
is_raw_incomplete = True
|
||||
logger.info(f"Iteration {iteration}: Raw response doesn't end with }} or ] - raw is incomplete")
|
||||
|
||||
# Try to parse the extracted JSON
|
||||
# If parsing succeeds, JSON is complete
|
||||
parsed_result = json.loads(extracted)
|
||||
|
||||
# Extract sections from parsed JSON
|
||||
sections = extractSectionsFromDocument(parsed_result)
|
||||
|
||||
# CRITICAL: If raw response is incomplete, mark as incomplete
|
||||
# JSON structure determines completion, not any flag
|
||||
if is_raw_incomplete:
|
||||
logger.info(f"Iteration {iteration}: JSON parseable but raw response incomplete - marking as incomplete")
|
||||
return sections, False, parsed_result
|
||||
|
||||
# JSON was parseable and has sections or complete structure
|
||||
# Raw response ends properly = complete
|
||||
logger.info(f"Iteration {iteration}: JSON parseable and raw response complete - marking as complete")
|
||||
# JSON parsed successfully = complete
|
||||
logger.info(f"Iteration {iteration}: JSON parsed successfully - marking as complete")
|
||||
return sections, True, parsed_result
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
|
|
@ -906,9 +895,7 @@ Respond with ONLY a JSON object in this exact format:
|
|||
allSections: List[Dict[str, Any]],
|
||||
iteration: int,
|
||||
wasJsonComplete: bool,
|
||||
rawResponse: str = None,
|
||||
userPrompt: Optional[str] = None,
|
||||
workflowIntent: Optional[Dict[str, Any]] = None
|
||||
rawResponse: str = None
|
||||
) -> bool:
|
||||
"""
|
||||
Determine if AI generation loop should continue.
|
||||
|
|
@ -917,23 +904,22 @@ Respond with ONLY a JSON object in this exact format:
|
|||
Action DoD is checked AFTER the AI Loop completes in _refineDecide.
|
||||
|
||||
Simple logic:
|
||||
- If JSON is incomplete/broken → continue (needs more content)
|
||||
- If JSON is complete → stop (all content delivered)
|
||||
- If JSON parsing failed or incomplete → continue (needs more content)
|
||||
- If JSON parses successfully and is complete → stop (all content delivered)
|
||||
- Loop detection prevents infinite loops
|
||||
|
||||
CRITICAL: JSON completeness is determined by parsing, NOT by last character check!
|
||||
Returns True if we should continue, False if AI Loop is done.
|
||||
"""
|
||||
if len(allSections) == 0:
|
||||
return True # No sections yet, continue
|
||||
|
||||
# CRITERION 1: If JSON was incomplete/broken - continue to repair/complete
|
||||
# CRITERION 1: If JSON was incomplete/broken (parsing failed or incomplete) - continue to repair/complete
|
||||
if not wasJsonComplete:
|
||||
logger.info(f"Iteration {iteration}: JSON incomplete/broken - continuing to complete")
|
||||
return True
|
||||
|
||||
# CRITERION 2: JSON is complete - check for loop detection
|
||||
# If JSON is complete, we're done (all content delivered)
|
||||
# But check for infinite loops first
|
||||
# CRITERION 2: JSON is complete (parsed successfully) - check for loop detection
|
||||
if self._isStuckInLoop(allSections, iteration):
|
||||
logger.warning(f"Iteration {iteration}: Detected potential infinite loop - stopping AI loop")
|
||||
return False
|
||||
|
|
@ -942,153 +928,6 @@ Respond with ONLY a JSON object in this exact format:
|
|||
logger.info(f"Iteration {iteration}: JSON complete - AI loop done")
|
||||
return False
|
||||
|
||||
def _analyzeTaskCompletion(
|
||||
self,
|
||||
allSections: List[Dict[str, Any]],
|
||||
userPrompt: Optional[str],
|
||||
iteration: int,
|
||||
workflowIntent: Optional[Dict[str, Any]] = None
|
||||
) -> bool:
|
||||
"""
|
||||
GENERIC task completion analysis using KPIs from Intent Analyzer.
|
||||
|
||||
Uses definitionOfDone KPIs from workflowIntent to check completion.
|
||||
Falls back to simple heuristics if workflowIntent not available.
|
||||
|
||||
Returns True if task appears complete, False otherwise.
|
||||
"""
|
||||
if not allSections:
|
||||
return False
|
||||
|
||||
# Calculate current metrics from JSON structure
|
||||
totalSections = len(allSections)
|
||||
totalContentSize = 0
|
||||
totalRows = 0
|
||||
totalItems = 0
|
||||
totalParagraphs = 0
|
||||
totalHeadings = 0
|
||||
totalCodeLines = 0
|
||||
contentTypes = set()
|
||||
lastSectionComplete = True
|
||||
|
||||
for section in allSections:
|
||||
contentType = section.get("content_type", "")
|
||||
contentTypes.add(contentType)
|
||||
elements = section.get("elements", [])
|
||||
|
||||
if isinstance(elements, list) and elements:
|
||||
lastElem = elements[-1] if elements else {}
|
||||
else:
|
||||
lastElem = elements if isinstance(elements, dict) else {}
|
||||
|
||||
if isinstance(lastElem, dict):
|
||||
if contentType == "code_block":
|
||||
code = lastElem.get("code", "")
|
||||
if code:
|
||||
lines = [l for l in code.split('\n') if l.strip()]
|
||||
totalCodeLines += len(lines)
|
||||
totalContentSize += len(code)
|
||||
if code and not code.rstrip().endswith('\n'):
|
||||
lastSectionComplete = False
|
||||
|
||||
elif contentType == "table":
|
||||
rows = lastElem.get("rows", [])
|
||||
if isinstance(rows, list):
|
||||
totalRows += len(rows)
|
||||
totalContentSize += len(str(rows))
|
||||
if not lastElem.get("headers"):
|
||||
lastSectionComplete = False
|
||||
|
||||
elif contentType in ["bullet_list", "numbered_list"]:
|
||||
items = lastElem.get("items", [])
|
||||
if isinstance(items, list):
|
||||
totalItems += len(items)
|
||||
totalContentSize += len(str(items))
|
||||
|
||||
elif contentType == "heading":
|
||||
totalHeadings += 1
|
||||
text = lastElem.get("text", "")
|
||||
if text:
|
||||
totalContentSize += len(text)
|
||||
|
||||
elif contentType == "paragraph":
|
||||
totalParagraphs += 1
|
||||
text = lastElem.get("text", "")
|
||||
if text:
|
||||
totalContentSize += len(text)
|
||||
if text and not text.rstrip()[-1] in '.!?':
|
||||
lastSectionComplete = False
|
||||
|
||||
# STRATEGY 1: Use KPIs from Intent Analyzer (preferred method)
|
||||
if workflowIntent and isinstance(workflowIntent, dict):
|
||||
definitionOfDone = workflowIntent.get("definitionOfDone", {})
|
||||
if definitionOfDone:
|
||||
# Check all KPI thresholds
|
||||
allKPIsMet = True
|
||||
kpiChecks = []
|
||||
|
||||
minSections = definitionOfDone.get("minSections", 0)
|
||||
if minSections > 0:
|
||||
met = totalSections >= minSections
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"sections: {totalSections}/{minSections}")
|
||||
|
||||
minParagraphs = definitionOfDone.get("minParagraphs", 0)
|
||||
if minParagraphs > 0:
|
||||
met = totalParagraphs >= minParagraphs
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"paragraphs: {totalParagraphs}/{minParagraphs}")
|
||||
|
||||
minHeadings = definitionOfDone.get("minHeadings", 0)
|
||||
if minHeadings > 0:
|
||||
met = totalHeadings >= minHeadings
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"headings: {totalHeadings}/{minHeadings}")
|
||||
|
||||
minTableRows = definitionOfDone.get("minTableRows", 0)
|
||||
if minTableRows > 0:
|
||||
met = totalRows >= minTableRows
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"tableRows: {totalRows}/{minTableRows}")
|
||||
|
||||
minListItems = definitionOfDone.get("minListItems", 0)
|
||||
if minListItems > 0:
|
||||
met = totalItems >= minListItems
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"listItems: {totalItems}/{minListItems}")
|
||||
|
||||
minCodeLines = definitionOfDone.get("minCodeLines", 0)
|
||||
if minCodeLines > 0:
|
||||
met = totalCodeLines >= minCodeLines
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"codeLines: {totalCodeLines}/{minCodeLines}")
|
||||
|
||||
minContentSize = definitionOfDone.get("minContentSize", 0)
|
||||
if minContentSize > 0:
|
||||
met = totalContentSize >= minContentSize
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"contentSize: {totalContentSize}/{minContentSize}")
|
||||
|
||||
# Check required content types
|
||||
requiredContentTypes = definitionOfDone.get("requiredContentTypes", [])
|
||||
if requiredContentTypes:
|
||||
met = all(ct in contentTypes for ct in requiredContentTypes)
|
||||
allKPIsMet = allKPIsMet and met
|
||||
kpiChecks.append(f"contentTypes: {list(contentTypes)}/{requiredContentTypes}")
|
||||
|
||||
# If all KPIs met and last section is complete, task is done
|
||||
if allKPIsMet and lastSectionComplete:
|
||||
logger.info(f"Task completion (KPI-based): All KPIs met - {', '.join(kpiChecks)}")
|
||||
return True
|
||||
|
||||
# STRATEGY 2: Fallback to simple heuristics if no workflowIntent
|
||||
# Only use if substantial content and last section complete
|
||||
if totalContentSize > 20000 and lastSectionComplete and iteration > 2:
|
||||
logger.info(f"Task completion (fallback heuristic): Large content ({totalContentSize} chars) over {iteration} iterations, last section complete")
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _isStuckInLoop(
|
||||
self,
|
||||
allSections: List[Dict[str, Any]],
|
||||
|
|
@ -1436,36 +1275,14 @@ Respond with ONLY a JSON object in this exact format:
|
|||
if promptArgs:
|
||||
userPrompt = promptArgs.get("userPrompt") or promptArgs.get("user_prompt")
|
||||
|
||||
# CRITICAL: Get actionIntent (not taskIntent or workflowIntent) for Definition of Done
|
||||
# Action Intent contains Definition of Done for THIS specific action
|
||||
# Each action needs its own DoD because actions have different completion criteria
|
||||
# Example: Action 1 "Generate 2000 primes" → DoD: 200 rows, Action 2 "Convert to CSV" → DoD: 1 document
|
||||
actionIntent = None
|
||||
if hasattr(self.services, 'workflow') and self.services.workflow:
|
||||
# Priority 1: Use actionIntent (most specific - for THIS action)
|
||||
actionIntent = getattr(self.services.workflow, '_actionIntent', None)
|
||||
if not actionIntent:
|
||||
# Priority 2: Fallback to taskIntent (for THIS task)
|
||||
actionIntent = getattr(self.services.workflow, '_taskIntent', None)
|
||||
if actionIntent:
|
||||
logger.info("Action intent not found, using task intent as fallback")
|
||||
if not actionIntent:
|
||||
# Priority 3: Fallback to workflowIntent (for entire workflow)
|
||||
actionIntent = getattr(self.services.workflow, '_workflowIntent', None)
|
||||
logger.warning("Action and task intent not found, using workflow intent as fallback")
|
||||
|
||||
# Store actionIntent separately (not in promptArgs - buildGenerationPrompt doesn't accept it)
|
||||
# actionIntent is passed to _callAiWithLooping for completion detection, not for prompt building
|
||||
|
||||
generated_json = await self._callAiWithLooping(
|
||||
generation_prompt,
|
||||
options,
|
||||
"document_generation",
|
||||
buildGenerationPrompt,
|
||||
promptArgs, # Does NOT contain taskIntent - buildGenerationPrompt doesn't accept it
|
||||
promptArgs,
|
||||
aiOperationId,
|
||||
userPrompt=userPrompt,
|
||||
workflowIntent=actionIntent # Use actionIntent (contains Definition of Done for THIS action)
|
||||
userPrompt=userPrompt
|
||||
)
|
||||
|
||||
self.services.chat.progressLogUpdate(aiOperationId, 0.7, "Parsing generated JSON")
|
||||
|
|
|
|||
|
|
@ -90,11 +90,15 @@ class ChatService:
|
|||
allDocuments = []
|
||||
for docRef in stringRefs:
|
||||
if docRef.startswith("docItem:"):
|
||||
# docItem:<id>:<filename> - extract ID and find document
|
||||
# docItem:<id>:<filename> or docItem:<id> (filename is optional)
|
||||
# ALWAYS try to match by documentId first (parts[1] is always the documentId when format is correct)
|
||||
parts = docRef.split(':')
|
||||
if len(parts) >= 2:
|
||||
docId = parts[1]
|
||||
# Find the document by ID
|
||||
docId = parts[1] # This should be the documentId (UUID)
|
||||
docFound = False
|
||||
|
||||
# ALWAYS try to match by documentId first (regardless of number of parts)
|
||||
# This handles: docItem:documentId and docItem:documentId:filename
|
||||
for message in workflow.messages:
|
||||
# Validate message belongs to this workflow
|
||||
msgWorkflowId = getattr(message, 'workflowId', None)
|
||||
|
|
@ -104,9 +108,42 @@ class ChatService:
|
|||
if message.documents:
|
||||
for doc in message.documents:
|
||||
if doc.id == docId:
|
||||
docName = getattr(doc, 'fileName', 'unknown')
|
||||
allDocuments.append(doc)
|
||||
docFound = True
|
||||
logger.debug(f"Matched document reference '{docRef}' to document {doc.id} (fileName: {getattr(doc, 'fileName', 'unknown')}) by documentId")
|
||||
break
|
||||
if docFound:
|
||||
break
|
||||
|
||||
# Fallback: If not found by documentId and it looks like a filename (has file extension), try filename matching
|
||||
# This handles cases where AI incorrectly generates docItem:filename.docx
|
||||
if not docFound and '.' in docId and len(parts) == 2:
|
||||
# Format: docItem:filename (AI generated wrong format) - try to match by filename
|
||||
filename = parts[1]
|
||||
logger.warning(f"Document reference '{docRef}' not found by documentId, attempting to match by filename: {filename}")
|
||||
|
||||
for message in workflow.messages:
|
||||
# Validate message belongs to this workflow
|
||||
msgWorkflowId = getattr(message, 'workflowId', None)
|
||||
if not msgWorkflowId or msgWorkflowId != workflowId:
|
||||
continue
|
||||
|
||||
if message.documents:
|
||||
for doc in message.documents:
|
||||
docFileName = getattr(doc, 'fileName', '')
|
||||
# Match filename exactly or by base name (without path)
|
||||
if docFileName == filename or docFileName.endswith(filename):
|
||||
allDocuments.append(doc)
|
||||
docFound = True
|
||||
logger.info(f"Matched document reference '{docRef}' to document {doc.id} by filename {docFileName}")
|
||||
break
|
||||
if docFound:
|
||||
break
|
||||
|
||||
if not docFound:
|
||||
logger.error(f"Could not resolve document reference '{docRef}' - no document found with filename '{filename}'")
|
||||
elif not docFound:
|
||||
logger.error(f"Could not resolve document reference '{docRef}' - no document found with documentId '{docId}'")
|
||||
elif docRef.startswith("docList:"):
|
||||
# docList:<messageId>:<label> or docList:<label> - extract message ID and find document list
|
||||
parts = docRef.split(':')
|
||||
|
|
|
|||
|
|
@ -447,7 +447,19 @@ class RendererXlsx(BaseRenderer):
|
|||
if len(tableSections) > 1:
|
||||
# Create separate sheets for each table
|
||||
for i, section in enumerate(tableSections, 1):
|
||||
sectionTitle = section.get("title", f"Table {i}")
|
||||
# Try to get caption from table element first, then section title, then fallback
|
||||
sectionTitle = None
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
tableElement = elements[0]
|
||||
sectionTitle = tableElement.get("caption")
|
||||
|
||||
if not sectionTitle:
|
||||
sectionTitle = section.get("title")
|
||||
|
||||
if not sectionTitle:
|
||||
sectionTitle = f"Table {i}"
|
||||
|
||||
sheetNames.append(sectionTitle[:31]) # Excel sheet name limit
|
||||
else:
|
||||
# Single table or mixed content - create main sheet
|
||||
|
|
@ -488,7 +500,15 @@ class RendererXlsx(BaseRenderer):
|
|||
if i < len(sheetNames):
|
||||
sheetName = sheetNames[i]
|
||||
sheet = sheets[sheetName]
|
||||
self._populateTableSheet(sheet, section, styles, f"Table {i+1}")
|
||||
# Use the caption from table element as sheet title, or fallback to sheet name
|
||||
sheetTitle = sheetName
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
tableElement = elements[0]
|
||||
caption = tableElement.get("caption")
|
||||
if caption:
|
||||
sheetTitle = caption
|
||||
self._populateTableSheet(sheet, section, styles, sheetTitle)
|
||||
else:
|
||||
# Single table or mixed content - use original logic
|
||||
firstSheetName = sheetNames[0]
|
||||
|
|
|
|||
|
|
@ -38,132 +38,57 @@ async def buildGenerationPrompt(
|
|||
|
||||
# Build prompt based on whether this is a continuation or first call
|
||||
# Check if we have valid continuation context with actual JSON fragment
|
||||
# CRITICAL: Allow continuation even if section_count is 0 (broken JSON that couldn't be parsed)
|
||||
# as long as we have last_raw_json - this handles cases where JSON is too broken to extract sections
|
||||
hasContinuation = (
|
||||
continuationContext
|
||||
and continuationContext.get("section_count", 0) > 0
|
||||
and continuationContext.get("last_raw_json", "")
|
||||
and continuationContext.get("last_raw_json", "").strip() != "{}"
|
||||
)
|
||||
|
||||
if hasContinuation:
|
||||
# CONTINUATION PROMPT - user already received first part, continue from where it stopped
|
||||
lastItemObject = continuationContext.get("last_item_object", "") # Last complete sub-element (row, item, line, etc.)
|
||||
totalItemsCount = continuationContext.get("total_items_count", 0)
|
||||
# CONTINUATION PROMPT - use new summary format from buildContinuationContext
|
||||
delivered_summary = continuationContext.get("delivered_summary", "")
|
||||
element_before_cutoff = continuationContext.get("element_before_cutoff")
|
||||
cut_off_element = continuationContext.get("cut_off_element")
|
||||
|
||||
# CRITICAL: Only use lastItemObject - it contains the last complete sub-element
|
||||
# If extraction failed and lastItemObject is empty, we'll show a message that extraction failed
|
||||
# No need for fragmentSnippet - it's redundant and causes duplication
|
||||
# Build continuation text with delivered summary and cut-off information
|
||||
# CRITICAL: Always include cut-off information if available (per loop_plan.md)
|
||||
continuationText = f"{delivered_summary}\n\n"
|
||||
continuationText += "⚠️ CONTINUATION: Response was cut off. Generate ONLY the remaining content that comes AFTER the reference elements below.\n\n"
|
||||
|
||||
# Build clear continuation guidance with PROGRESS STATISTICS from all accumulated sections
|
||||
# This helps AI understand completion status without seeing entire content
|
||||
# GENERIC approach: Works for all task types (books, reports, code, lists, tables, etc.)
|
||||
continuationGuidance = []
|
||||
# Add cut-off point information (per loop_plan.md: always add if available)
|
||||
# These are shown ONLY as REFERENCE to know where generation stopped
|
||||
if element_before_cutoff:
|
||||
continuationText += "# REFERENCE: Last complete element (already delivered - DO NOT repeat):\n"
|
||||
continuationText += f"{element_before_cutoff}\n\n"
|
||||
|
||||
progressStats = continuationContext.get("progress_stats", {})
|
||||
totalRows = progressStats.get("total_rows", 0)
|
||||
totalItems = progressStats.get("total_items", 0)
|
||||
totalCodeLines = progressStats.get("total_code_lines", 0)
|
||||
totalParagraphs = progressStats.get("total_paragraphs", 0)
|
||||
totalHeadings = progressStats.get("total_headings", 0)
|
||||
sectionCount = progressStats.get("section_count", 0)
|
||||
contentTypeCount = progressStats.get("content_type_count", 0)
|
||||
lastContentType = progressStats.get("last_content_type")
|
||||
if cut_off_element:
|
||||
continuationText += "# REFERENCE: Incomplete element (cut off here - DO NOT repeat):\n"
|
||||
continuationText += f"{cut_off_element}\n\n"
|
||||
|
||||
# CRITICAL: Filter progress stats based on Definition of Done from taskIntent
|
||||
# Only show KPIs that are relevant for this specific action/task
|
||||
taskIntent = continuationContext.get("taskIntent", {})
|
||||
definitionOfDone = taskIntent.get("definitionOfDone", {}) if isinstance(taskIntent, dict) else {}
|
||||
|
||||
# Build comprehensive progress information (filtered by DoD if available)
|
||||
progressParts = []
|
||||
|
||||
# Only show progress metrics that are relevant based on DoD KPIs
|
||||
# If DoD specifies minTableRows, show rows; if minListItems, show items; etc.
|
||||
if definitionOfDone:
|
||||
# Filter based on DoD KPIs - only show metrics that matter for this task
|
||||
if definitionOfDone.get("minTableRows", 0) > 0 and totalRows > 0:
|
||||
progressParts.append(f"{totalRows} row{'s' if totalRows > 1 else ''}")
|
||||
if definitionOfDone.get("minListItems", 0) > 0 and totalItems > 0:
|
||||
progressParts.append(f"{totalItems} item{'s' if totalItems > 1 else ''}")
|
||||
if definitionOfDone.get("minCodeLines", 0) > 0 and totalCodeLines > 0:
|
||||
progressParts.append(f"{totalCodeLines} line{'s' if totalCodeLines > 1 else ''} of code/data")
|
||||
if definitionOfDone.get("minParagraphs", 0) > 0 and totalParagraphs > 0:
|
||||
progressParts.append(f"{totalParagraphs} paragraph{'s' if totalParagraphs > 1 else ''}")
|
||||
if definitionOfDone.get("minHeadings", 0) > 0 and totalHeadings > 0:
|
||||
progressParts.append(f"{totalHeadings} heading{'s' if totalHeadings > 1 else ''}")
|
||||
if definitionOfDone.get("minSections", 0) > 0 and sectionCount > 0:
|
||||
progressParts.append(f"{sectionCount} section{'s' if sectionCount > 1 else ''}")
|
||||
# Only show contentSize if no other metrics are available (it's less informative)
|
||||
# Prefer showing rows/items/lines over characters
|
||||
if not progressParts and definitionOfDone.get("minContentSize", 0) > 0:
|
||||
totalContentSize = progressStats.get("total_content_size", 0)
|
||||
if totalContentSize > 0:
|
||||
progressParts.append(f"{totalContentSize} characters")
|
||||
else:
|
||||
# No DoD available - show all progress metrics (fallback)
|
||||
if sectionCount > 0:
|
||||
progressParts.append(f"{sectionCount} section{'s' if sectionCount > 1 else ''}")
|
||||
if totalHeadings > 0:
|
||||
progressParts.append(f"{totalHeadings} heading{'s' if totalHeadings > 1 else ''}")
|
||||
if totalParagraphs > 0:
|
||||
progressParts.append(f"{totalParagraphs} paragraph{'s' if totalParagraphs > 1 else ''}")
|
||||
if totalRows > 0:
|
||||
progressParts.append(f"{totalRows} row{'s' if totalRows > 1 else ''}")
|
||||
if totalItems > 0:
|
||||
progressParts.append(f"{totalItems} item{'s' if totalItems > 1 else ''}")
|
||||
if totalCodeLines > 0:
|
||||
progressParts.append(f"{totalCodeLines} line{'s' if totalCodeLines > 1 else ''} of code/data")
|
||||
if contentTypeCount > 1:
|
||||
progressParts.append(f"{contentTypeCount} different content types")
|
||||
|
||||
if progressParts:
|
||||
continuationGuidance.append(f"PROGRESS: You have already generated: {', '.join(progressParts)}.")
|
||||
elif totalItemsCount > 0:
|
||||
# Fallback to old totalItemsCount if progress_stats not available
|
||||
continuationGuidance.append(f"PROGRESS: You have already generated {totalItemsCount} items.")
|
||||
|
||||
# Show the last complete item AND cut item for continuation point
|
||||
# CRITICAL: AI needs both to know where to continue
|
||||
cutItemObject = continuationContext.get("cut_item_object")
|
||||
contentTypeForItems = continuationContext.get("content_type_for_items")
|
||||
|
||||
if lastItemObject:
|
||||
if cutItemObject:
|
||||
# Both complete and cut items available - show both
|
||||
continuationGuidance.append(f"Last complete {contentTypeForItems or 'item'} in previous response: {lastItemObject}")
|
||||
continuationGuidance.append(f"Incomplete/cut {contentTypeForItems or 'item'} at the end: {cutItemObject}")
|
||||
continuationGuidance.append(f"Continue from the incomplete item above - complete it first, then add NEW items.")
|
||||
else:
|
||||
# Only complete item available
|
||||
continuationGuidance.append(f"Last complete {contentTypeForItems or 'item'} in previous response: {lastItemObject}")
|
||||
continuationGuidance.append(f"Continue with the NEXT item after this.")
|
||||
|
||||
continuationText = "\n".join(continuationGuidance) if continuationGuidance else "Continue from where it stopped."
|
||||
continuationText += "⚠️ CRITICAL: The elements above are REFERENCE ONLY. They are already delivered.\n"
|
||||
continuationText += "Generate ONLY what comes AFTER these elements. DO NOT regenerate the entire JSON structure.\n"
|
||||
continuationText += "Start directly with the next element/section that should follow.\n\n"
|
||||
|
||||
# PROMPT FOR CONTINUATION
|
||||
generationPrompt = f"""User request: "{userPrompt}"
|
||||
|
||||
NOTE: The user already received part of the response.
|
||||
TASK: Continue generating the remaining content.
|
||||
⚠️ CONTINUATION MODE: Response was incomplete. Generate ONLY the remaining content.
|
||||
|
||||
{continuationText}
|
||||
|
||||
JSON structure template:
|
||||
{jsonTemplate}
|
||||
|
||||
Instructions:
|
||||
- Return ONLY valid JSON (strict). No comments of any kind (no //, /* */, or #). No trailing commas. Strings must use double quotes.
|
||||
- Arrays must contain ONLY JSON values; do not include comments or ellipses.
|
||||
- Use ONLY the element structures shown in the template.
|
||||
- Continue from where it stopped - add NEW items only; do not repeat existing items.
|
||||
- Generate remaining content to complete the user request. Do NOT just give an instruction or comments. Deliver the complete response.
|
||||
- Fill with actual content (no placeholders or instructional text such as "Add more...").
|
||||
- IMPORTANT: Ensure "filename" in each document has meaningful name with appropriate extension matching the content.
|
||||
Rules:
|
||||
- Return ONLY valid JSON (no comments, no trailing commas, double quotes only).
|
||||
- Reference elements shown above are ALREADY DELIVERED - DO NOT repeat them.
|
||||
- Generate ONLY the remaining content that comes AFTER the reference elements.
|
||||
- DO NOT regenerate the entire JSON structure - start directly with what comes next.
|
||||
- Output JSON only; no markdown fences or extra text.
|
||||
|
||||
IMPORTANT: Before responding, analyse the remaining data to fully satisfy user request.
|
||||
|
||||
Continue generating:
|
||||
Continue generating the remaining content now.
|
||||
"""
|
||||
else:
|
||||
|
||||
|
|
@ -177,14 +102,13 @@ JSON structure template:
|
|||
{jsonTemplate}
|
||||
|
||||
Instructions:
|
||||
- Start with {{"metadata": ...}} - return COMPLETE, STRICT JSON.
|
||||
- Return ONLY valid JSON (strict). No comments. No trailing commas. Use double quotes.
|
||||
- Do NOT reuse example section IDs; create your own.
|
||||
- Generate complete content based on the user request. Do NOT just give an instruction or comments. Deliver the complete response.
|
||||
- IMPORTANT: Set a meaningful "filename" in each document with appropriate file extension (e.g., "prime_numbers.txt", "report.docx", "data.json"). The filename should reflect the content and task objective.
|
||||
- Output JSON only; no markdown fences or extra text.
|
||||
|
||||
Generate your complete response starting from {{"metadata": ...}}:
|
||||
Generate your complete response.
|
||||
"""
|
||||
|
||||
# If we have extracted content, prepend it to the prompt
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -178,20 +178,6 @@ class MethodAi(MethodBase):
|
|||
mimeType=doc.mimeType or output_mime_type
|
||||
))
|
||||
|
||||
# Preserve structured content field for validation (if it exists)
|
||||
# Parse content JSON to check if it's structured data
|
||||
try:
|
||||
import json
|
||||
contentData = json.loads(aiResponse.content) if isinstance(aiResponse.content, str) else aiResponse.content
|
||||
if isinstance(contentData, (dict, list)):
|
||||
action_documents.append(ActionDocument(
|
||||
documentName="structured_content.json",
|
||||
documentData=contentData,
|
||||
mimeType="application/json"
|
||||
))
|
||||
except:
|
||||
pass # Content is not JSON, skip structured content
|
||||
|
||||
final_documents = action_documents
|
||||
else:
|
||||
# Text response - create document from content
|
||||
|
|
@ -228,7 +214,7 @@ class MethodAi(MethodBase):
|
|||
|
||||
|
||||
@action
|
||||
async def extractContent(self, parameters: ExtractContentParameters) -> ActionResult:
|
||||
async def extractContent(self, parameters: Dict[str, Any]) -> ActionResult:
|
||||
"""
|
||||
Extract content from documents (separate from AI calls).
|
||||
|
||||
|
|
@ -236,8 +222,8 @@ class MethodAi(MethodBase):
|
|||
The extracted ContentParts can then be used by subsequent AI processing actions.
|
||||
|
||||
Parameters:
|
||||
- documentList: DocumentReferenceList - Document references to extract content from
|
||||
- extractionOptions: Optional[ExtractionOptions] - Extraction options (if not provided, defaults are used)
|
||||
- documentList (list, required): Document reference(s) to extract content from.
|
||||
- extractionOptions (dict, optional): Extraction options (if not provided, defaults are used).
|
||||
|
||||
Returns:
|
||||
- ActionResult with ActionDocument containing ContentExtracted objects
|
||||
|
|
@ -248,17 +234,33 @@ class MethodAi(MethodBase):
|
|||
workflowId = self.services.workflow.id if self.services.workflow else f"no-workflow-{int(time.time())}"
|
||||
operationId = f"ai_extract_{workflowId}_{int(time.time())}"
|
||||
|
||||
# Extract documentList from parameters dict
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
documentListParam = parameters.get("documentList")
|
||||
if not documentListParam:
|
||||
return ActionResult.isFailure(error="documentList is required")
|
||||
|
||||
# Convert to DocumentReferenceList if needed
|
||||
if isinstance(documentListParam, DocumentReferenceList):
|
||||
documentList = documentListParam
|
||||
elif isinstance(documentListParam, str):
|
||||
documentList = DocumentReferenceList.from_string_list([documentListParam])
|
||||
elif isinstance(documentListParam, list):
|
||||
documentList = DocumentReferenceList.from_string_list(documentListParam)
|
||||
else:
|
||||
return ActionResult.isFailure(error=f"Invalid documentList type: {type(documentListParam)}")
|
||||
|
||||
# Start progress tracking
|
||||
self.services.chat.progressLogStart(
|
||||
operationId,
|
||||
"Extracting content from documents",
|
||||
"Content Extraction",
|
||||
f"Documents: {len(parameters.documentList.references) if parameters.documentList else 0}"
|
||||
f"Documents: {len(documentList.references)}"
|
||||
)
|
||||
|
||||
# Get ChatDocuments from documentList
|
||||
self.services.chat.progressLogUpdate(operationId, 0.2, "Loading documents")
|
||||
chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(parameters.documentList)
|
||||
chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(documentList)
|
||||
|
||||
if not chatDocuments:
|
||||
self.services.chat.progressLogFinish(operationId, False)
|
||||
|
|
@ -340,7 +342,8 @@ class MethodAi(MethodBase):
|
|||
return ActionResult.isFailure(error="Research prompt is required")
|
||||
|
||||
# Init progress logger
|
||||
operationId = f"web_research_{self.services.workflow.id}_{int(time.time())}"
|
||||
workflowId = self.services.workflow.id if self.services.workflow else f"no-workflow-{int(time.time())}"
|
||||
operationId = f"web_research_{workflowId}_{int(time.time())}"
|
||||
|
||||
# Start progress tracking
|
||||
self.services.chat.progressLogStart(
|
||||
|
|
@ -500,6 +503,348 @@ class MethodAi(MethodBase):
|
|||
return await self.process(processParams)
|
||||
|
||||
|
||||
@action
|
||||
async def convert(self, parameters: Dict[str, Any]) -> ActionResult:
|
||||
"""
|
||||
GENERAL:
|
||||
- Purpose: Convert documents/data between different formats with specific formatting options (e.g., JSON→CSV with custom columns, delimiters).
|
||||
- Input requirements: documentList (required); inputFormat and outputFormat (required).
|
||||
- Output format: Document in target format with specified formatting options.
|
||||
- CRITICAL: If input is already in standardized JSON format, uses automatic rendering system (no AI call needed).
|
||||
|
||||
Parameters:
|
||||
- documentList (list, required): Document reference(s) to convert.
|
||||
- inputFormat (str, required): Source format (json, csv, xlsx, txt, etc.).
|
||||
- outputFormat (str, required): Target format (csv, json, xlsx, txt, etc.).
|
||||
- columnsPerRow (int, optional): For CSV output, number of columns per row. Default: auto-detect.
|
||||
- delimiter (str, optional): For CSV output, delimiter character. Default: comma (,).
|
||||
- includeHeader (bool, optional): For CSV output, whether to include header row. Default: True.
|
||||
- language (str, optional): Language for output (e.g., 'de', 'en', 'fr'). Default: 'en'.
|
||||
"""
|
||||
documentList = parameters.get("documentList", [])
|
||||
if not documentList:
|
||||
return ActionResult.isFailure(error="documentList is required")
|
||||
|
||||
inputFormat = parameters.get("inputFormat")
|
||||
outputFormat = parameters.get("outputFormat")
|
||||
if not inputFormat or not outputFormat:
|
||||
return ActionResult.isFailure(error="inputFormat and outputFormat are required")
|
||||
|
||||
# Normalize formats (remove leading dot if present)
|
||||
normalizedInputFormat = inputFormat.strip().lstrip('.').lower()
|
||||
normalizedOutputFormat = outputFormat.strip().lstrip('.').lower()
|
||||
|
||||
# Get documents
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
if isinstance(documentList, DocumentReferenceList):
|
||||
docRefList = documentList
|
||||
elif isinstance(documentList, list):
|
||||
docRefList = DocumentReferenceList.from_string_list(documentList)
|
||||
else:
|
||||
docRefList = DocumentReferenceList.from_string_list([documentList])
|
||||
|
||||
chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(docRefList)
|
||||
if not chatDocuments:
|
||||
return ActionResult.isFailure(error="No documents found in documentList")
|
||||
|
||||
# Check if input is standardized JSON format - if so, use direct rendering
|
||||
if normalizedInputFormat == "json" and len(chatDocuments) == 1:
|
||||
try:
|
||||
import json
|
||||
doc = chatDocuments[0]
|
||||
# ChatDocument doesn't have documentData - need to load file content using fileId
|
||||
docBytes = self.services.chat.getFileData(doc.fileId)
|
||||
if not docBytes:
|
||||
raise ValueError(f"No file data found for fileId={doc.fileId}")
|
||||
|
||||
# Decode bytes to string
|
||||
docData = docBytes.decode('utf-8')
|
||||
|
||||
# Try to parse as JSON
|
||||
if isinstance(docData, str):
|
||||
jsonData = json.loads(docData)
|
||||
elif isinstance(docData, dict):
|
||||
jsonData = docData
|
||||
else:
|
||||
jsonData = None
|
||||
|
||||
# Check if it's standardized JSON format (has "documents" or "sections")
|
||||
if jsonData and (isinstance(jsonData, dict) and ("documents" in jsonData or "sections" in jsonData)):
|
||||
# Use direct rendering - no AI call needed!
|
||||
from modules.services.serviceGeneration.mainServiceGeneration import GenerationService
|
||||
generationService = GenerationService(self.services)
|
||||
|
||||
# Ensure format is "documents" array
|
||||
if "documents" not in jsonData:
|
||||
jsonData = {"documents": [{"sections": jsonData.get("sections", []), "metadata": jsonData.get("metadata", {})}]}
|
||||
|
||||
# Get title
|
||||
title = jsonData.get("metadata", {}).get("title", doc.documentName or "Converted Document")
|
||||
|
||||
# Render with options
|
||||
renderOptions = {}
|
||||
if normalizedOutputFormat == "csv":
|
||||
renderOptions["delimiter"] = parameters.get("delimiter", ",")
|
||||
renderOptions["columnsPerRow"] = parameters.get("columnsPerRow")
|
||||
renderOptions["includeHeader"] = parameters.get("includeHeader", True)
|
||||
|
||||
rendered_content, mime_type = await generationService.renderReport(
|
||||
jsonData, normalizedOutputFormat, title, None, None
|
||||
)
|
||||
|
||||
# Apply CSV options if needed (renderer will handle them)
|
||||
if normalizedOutputFormat == "csv" and renderOptions:
|
||||
rendered_content = self._applyCsvOptions(rendered_content, renderOptions)
|
||||
|
||||
from modules.datamodels.datamodelChat import ActionDocument
|
||||
actionDoc = ActionDocument(
|
||||
documentName=f"{doc.documentName.rsplit('.', 1)[0] if '.' in doc.documentName else doc.documentName}.{normalizedOutputFormat}",
|
||||
documentData=rendered_content,
|
||||
mimeType=mime_type
|
||||
)
|
||||
|
||||
return ActionResult.isSuccess(documents=[actionDoc])
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Direct rendering failed, falling back to AI conversion: {str(e)}")
|
||||
# Fall through to AI-based conversion
|
||||
|
||||
# Fallback: Use AI for conversion (for non-JSON inputs or complex conversions)
|
||||
columnsPerRow = parameters.get("columnsPerRow")
|
||||
delimiter = parameters.get("delimiter", ",")
|
||||
includeHeader = parameters.get("includeHeader", True)
|
||||
language = parameters.get("language", "en")
|
||||
|
||||
aiPrompt = f"Convert the provided document(s) from {normalizedInputFormat.upper()} format to {normalizedOutputFormat.upper()} format."
|
||||
|
||||
if normalizedOutputFormat == "csv":
|
||||
aiPrompt += f" Use '{delimiter}' as the delimiter character."
|
||||
if columnsPerRow:
|
||||
aiPrompt += f" Format the output with {columnsPerRow} columns per row."
|
||||
if not includeHeader:
|
||||
aiPrompt += " Do not include a header row."
|
||||
else:
|
||||
aiPrompt += " Include a header row with column names."
|
||||
|
||||
if language and language != "en":
|
||||
aiPrompt += f" Use language: {language}."
|
||||
|
||||
aiPrompt += " Preserve all data and ensure accurate conversion. Maintain data integrity and structure."
|
||||
|
||||
return await self.process({
|
||||
"aiPrompt": aiPrompt,
|
||||
"documentList": documentList,
|
||||
"resultType": normalizedOutputFormat
|
||||
})
|
||||
|
||||
def _applyCsvOptions(self, csvContent: str, options: Dict[str, Any]) -> str:
|
||||
"""Apply CSV formatting options to rendered CSV content."""
|
||||
delimiter = options.get("delimiter", ",")
|
||||
columnsPerRow = options.get("columnsPerRow")
|
||||
includeHeader = options.get("includeHeader", True)
|
||||
|
||||
# Check if any options need to be applied
|
||||
needsProcessing = (delimiter != ",") or (columnsPerRow is not None) or (not includeHeader)
|
||||
|
||||
if not needsProcessing:
|
||||
return csvContent
|
||||
|
||||
import csv
|
||||
import io
|
||||
# Re-read CSV with comma, write with new delimiter
|
||||
reader = csv.reader(io.StringIO(csvContent))
|
||||
output = io.StringIO()
|
||||
writer = csv.writer(output, delimiter=delimiter)
|
||||
|
||||
rows = list(reader)
|
||||
|
||||
# Handle header
|
||||
if not includeHeader and rows:
|
||||
rows = rows[1:] # Skip header
|
||||
|
||||
# Handle columnsPerRow
|
||||
if columnsPerRow:
|
||||
newRows = []
|
||||
for row in rows:
|
||||
# Split row into chunks of columnsPerRow
|
||||
for i in range(0, len(row), columnsPerRow):
|
||||
chunk = row[i:i+columnsPerRow]
|
||||
# Pad to columnsPerRow if needed
|
||||
while len(chunk) < columnsPerRow:
|
||||
chunk.append("")
|
||||
newRows.append(chunk)
|
||||
rows = newRows
|
||||
|
||||
for row in rows:
|
||||
writer.writerow(row)
|
||||
|
||||
return output.getvalue()
|
||||
|
||||
|
||||
@action
|
||||
async def reformat(self, parameters: Dict[str, Any]) -> ActionResult:
|
||||
"""
|
||||
GENERAL:
|
||||
- Purpose: Reformat/transform documents with specific transformation rules (e.g., extract arrays, reshape data, apply custom formatting).
|
||||
- Input requirements: documentList (required); inputFormat and outputFormat (required); transformationRules (optional).
|
||||
- Output format: Document in target format with applied transformation rules.
|
||||
- CRITICAL: If input is already in standardized JSON format, uses automatic rendering system with transformation rules.
|
||||
|
||||
Parameters:
|
||||
- documentList (list, required): Document reference(s) to reformat.
|
||||
- inputFormat (str, required): Source format (json, csv, xlsx, txt, etc.).
|
||||
- outputFormat (str, required): Target format (csv, json, xlsx, txt, etc.).
|
||||
- transformationRules (str, optional): Specific transformation instructions (e.g., "Extract prime numbers array and format as CSV with 10 columns per row").
|
||||
- columnsPerRow (int, optional): For CSV output, number of columns per row. Default: auto-detect.
|
||||
- totalRows (int, optional): For CSV output, total number of rows to create. Default: auto-detect.
|
||||
- delimiter (str, optional): For CSV output, delimiter character. Default: comma (,).
|
||||
- includeHeader (bool, optional): For CSV output, whether to include header row. Default: True.
|
||||
- language (str, optional): Language for output (e.g., 'de', 'en', 'fr'). Default: 'en'.
|
||||
"""
|
||||
documentList = parameters.get("documentList", [])
|
||||
if not documentList:
|
||||
return ActionResult.isFailure(error="documentList is required")
|
||||
|
||||
inputFormat = parameters.get("inputFormat")
|
||||
outputFormat = parameters.get("outputFormat")
|
||||
if not inputFormat or not outputFormat:
|
||||
return ActionResult.isFailure(error="inputFormat and outputFormat are required")
|
||||
|
||||
transformationRules = parameters.get("transformationRules")
|
||||
columnsPerRow = parameters.get("columnsPerRow")
|
||||
totalRows = parameters.get("totalRows")
|
||||
delimiter = parameters.get("delimiter", ",")
|
||||
includeHeader = parameters.get("includeHeader", True)
|
||||
language = parameters.get("language", "en")
|
||||
|
||||
# Normalize formats (remove leading dot if present)
|
||||
normalizedInputFormat = inputFormat.strip().lstrip('.').lower()
|
||||
normalizedOutputFormat = outputFormat.strip().lstrip('.').lower()
|
||||
|
||||
# Get documents
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
if isinstance(documentList, DocumentReferenceList):
|
||||
docRefList = documentList
|
||||
elif isinstance(documentList, list):
|
||||
docRefList = DocumentReferenceList.from_string_list(documentList)
|
||||
else:
|
||||
docRefList = DocumentReferenceList.from_string_list([documentList])
|
||||
|
||||
chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(docRefList)
|
||||
if not chatDocuments:
|
||||
return ActionResult.isFailure(error="No documents found in documentList")
|
||||
|
||||
# Check if input is standardized JSON format - if so, use direct rendering with transformation
|
||||
if normalizedInputFormat == "json" and len(chatDocuments) == 1:
|
||||
try:
|
||||
import json
|
||||
doc = chatDocuments[0]
|
||||
# ChatDocument doesn't have documentData - need to load file content using fileId
|
||||
docBytes = self.services.chat.getFileData(doc.fileId)
|
||||
if not docBytes:
|
||||
raise ValueError(f"No file data found for fileId={doc.fileId}")
|
||||
|
||||
# Decode bytes to string
|
||||
docData = docBytes.decode('utf-8')
|
||||
|
||||
# Try to parse as JSON
|
||||
if isinstance(docData, str):
|
||||
jsonData = json.loads(docData)
|
||||
elif isinstance(docData, dict):
|
||||
jsonData = docData
|
||||
else:
|
||||
jsonData = None
|
||||
|
||||
# Check if it's standardized JSON format (has "documents" or "sections")
|
||||
if jsonData and (isinstance(jsonData, dict) and ("documents" in jsonData or "sections" in jsonData)):
|
||||
# Apply transformation rules if provided
|
||||
if transformationRules:
|
||||
# Use AI to apply transformation rules to JSON
|
||||
aiPrompt = f"Apply the following transformation rules to the JSON document: {transformationRules}"
|
||||
if normalizedOutputFormat == "csv":
|
||||
aiPrompt += f" Output format: CSV with delimiter '{delimiter}'"
|
||||
if columnsPerRow:
|
||||
aiPrompt += f", {columnsPerRow} columns per row"
|
||||
if totalRows:
|
||||
aiPrompt += f", {totalRows} total rows"
|
||||
if not includeHeader:
|
||||
aiPrompt += ", no header row"
|
||||
|
||||
# Use process to apply transformation
|
||||
return await self.process({
|
||||
"aiPrompt": aiPrompt,
|
||||
"documentList": documentList,
|
||||
"resultType": normalizedOutputFormat
|
||||
})
|
||||
else:
|
||||
# No transformation rules - use direct rendering
|
||||
from modules.services.serviceGeneration.mainServiceGeneration import GenerationService
|
||||
generationService = GenerationService(self.services)
|
||||
|
||||
# Ensure format is "documents" array
|
||||
if "documents" not in jsonData:
|
||||
jsonData = {"documents": [{"sections": jsonData.get("sections", []), "metadata": jsonData.get("metadata", {})}]}
|
||||
|
||||
# Get title
|
||||
title = jsonData.get("metadata", {}).get("title", doc.documentName or "Reformatted Document")
|
||||
|
||||
# Render with options
|
||||
renderOptions = {}
|
||||
if normalizedOutputFormat == "csv":
|
||||
renderOptions["delimiter"] = delimiter
|
||||
renderOptions["columnsPerRow"] = columnsPerRow
|
||||
renderOptions["includeHeader"] = includeHeader
|
||||
|
||||
rendered_content, mime_type = await generationService.renderReport(
|
||||
jsonData, normalizedOutputFormat, title, None, None
|
||||
)
|
||||
|
||||
# Apply CSV options if needed
|
||||
if normalizedOutputFormat == "csv" and renderOptions:
|
||||
rendered_content = self._applyCsvOptions(rendered_content, renderOptions)
|
||||
|
||||
from modules.datamodels.datamodelChat import ActionDocument
|
||||
actionDoc = ActionDocument(
|
||||
documentName=f"{doc.documentName.rsplit('.', 1)[0] if '.' in doc.documentName else doc.documentName}.{normalizedOutputFormat}",
|
||||
documentData=rendered_content,
|
||||
mimeType=mime_type
|
||||
)
|
||||
|
||||
return ActionResult.isSuccess(documents=[actionDoc])
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Direct rendering failed, falling back to AI reformatting: {str(e)}")
|
||||
# Fall through to AI-based reformatting
|
||||
|
||||
# Fallback: Use AI for reformatting with transformation rules
|
||||
aiPrompt = f"Reformat the provided document(s) from {normalizedInputFormat.upper()} format to {normalizedOutputFormat.upper()} format."
|
||||
|
||||
if transformationRules:
|
||||
aiPrompt += f" Apply the following transformation rules: {transformationRules}"
|
||||
|
||||
if normalizedOutputFormat == "csv":
|
||||
aiPrompt += f" Use '{delimiter}' as the delimiter character."
|
||||
if columnsPerRow:
|
||||
aiPrompt += f" Format the output with {columnsPerRow} columns per row."
|
||||
if totalRows:
|
||||
aiPrompt += f" Create exactly {totalRows} rows total."
|
||||
if not includeHeader:
|
||||
aiPrompt += " Do not include a header row."
|
||||
else:
|
||||
aiPrompt += " Include a header row with column names."
|
||||
|
||||
if language and language != "en":
|
||||
aiPrompt += f" Use language: {language}."
|
||||
|
||||
aiPrompt += " Preserve all data and ensure accurate transformation. Maintain data integrity."
|
||||
|
||||
return await self.process({
|
||||
"aiPrompt": aiPrompt,
|
||||
"documentList": documentList,
|
||||
"resultType": normalizedOutputFormat
|
||||
})
|
||||
|
||||
|
||||
@action
|
||||
async def convertDocument(self, parameters: Dict[str, Any]) -> ActionResult:
|
||||
"""
|
||||
|
|
|
|||
|
|
@ -1,9 +1,8 @@
|
|||
# adaptive module for Dynamic mode
|
||||
# Provides adaptive learning capabilities
|
||||
|
||||
from .intentAnalyzer import IntentAnalyzer
|
||||
from .contentValidator import ContentValidator
|
||||
from .learningEngine import LearningEngine
|
||||
from .progressTracker import ProgressTracker
|
||||
|
||||
__all__ = ['IntentAnalyzer', 'ContentValidator', 'LearningEngine', 'ProgressTracker']
|
||||
__all__ = ['ContentValidator', 'LearningEngine', 'ProgressTracker']
|
||||
|
|
|
|||
|
|
@ -126,10 +126,111 @@ class ContentValidator:
|
|||
# Fallback: assume 8KB available
|
||||
return 8 * 1024
|
||||
|
||||
def _summarizeJsonStructure(self, jsonData: Any) -> Dict[str, Any]:
|
||||
"""Summarize JSON document structure for validation - extracts main objects, statistics, captions, and IDs."""
|
||||
try:
|
||||
if not isinstance(jsonData, dict):
|
||||
return {"type": "non-dict", "preview": str(jsonData)[:200]}
|
||||
|
||||
summary = {
|
||||
"metadata": {},
|
||||
"sections": [],
|
||||
"statistics": {}
|
||||
}
|
||||
|
||||
# Extract metadata
|
||||
metadata = jsonData.get("metadata", {})
|
||||
if metadata:
|
||||
summary["metadata"] = {
|
||||
"title": metadata.get("title"),
|
||||
"split_strategy": metadata.get("split_strategy"),
|
||||
"extraction_method": metadata.get("extraction_method")
|
||||
}
|
||||
|
||||
# Extract documents array (if present)
|
||||
documents = jsonData.get("documents", [])
|
||||
if documents:
|
||||
summary["statistics"]["documentCount"] = len(documents)
|
||||
# Process first document (most common case)
|
||||
if len(documents) > 0:
|
||||
doc = documents[0]
|
||||
docSections = doc.get("sections", [])
|
||||
summary["statistics"]["sectionCount"] = len(docSections)
|
||||
|
||||
# Summarize sections
|
||||
for section in docSections:
|
||||
sectionSummary = {
|
||||
"id": section.get("id"),
|
||||
"content_type": section.get("content_type"),
|
||||
"title": section.get("title"),
|
||||
"order": section.get("order")
|
||||
}
|
||||
|
||||
# For tables: extract caption and statistics
|
||||
if section.get("content_type") == "table":
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
tableElement = elements[0]
|
||||
sectionSummary["caption"] = tableElement.get("caption")
|
||||
headers = tableElement.get("headers", [])
|
||||
rows = tableElement.get("rows", [])
|
||||
sectionSummary["columnCount"] = len(headers)
|
||||
sectionSummary["rowCount"] = len(rows)
|
||||
sectionSummary["headers"] = headers # Include headers for context
|
||||
|
||||
# For lists: extract item count
|
||||
elif section.get("content_type") == "list":
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
listElement = elements[0]
|
||||
items = listElement.get("items", [])
|
||||
sectionSummary["itemCount"] = len(items)
|
||||
|
||||
# For paragraphs/headings: extract text preview
|
||||
elif section.get("content_type") in ["paragraph", "heading"]:
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
textElement = elements[0]
|
||||
text = textElement.get("text", "")
|
||||
if text:
|
||||
sectionSummary["textPreview"] = text[:100] + ("..." if len(text) > 100 else "")
|
||||
|
||||
summary["sections"].append(sectionSummary)
|
||||
else:
|
||||
# Fallback: check for sections directly in root
|
||||
sections = jsonData.get("sections", [])
|
||||
if sections:
|
||||
summary["statistics"]["sectionCount"] = len(sections)
|
||||
for section in sections:
|
||||
sectionSummary = {
|
||||
"id": section.get("id"),
|
||||
"content_type": section.get("content_type"),
|
||||
"title": section.get("title")
|
||||
}
|
||||
|
||||
if section.get("content_type") == "table":
|
||||
elements = section.get("elements", [])
|
||||
if elements and isinstance(elements, list) and len(elements) > 0:
|
||||
tableElement = elements[0]
|
||||
sectionSummary["caption"] = tableElement.get("caption")
|
||||
headers = tableElement.get("headers", [])
|
||||
rows = tableElement.get("rows", [])
|
||||
sectionSummary["columnCount"] = len(headers)
|
||||
sectionSummary["rowCount"] = len(rows)
|
||||
sectionSummary["headers"] = headers
|
||||
|
||||
summary["sections"].append(sectionSummary)
|
||||
|
||||
return summary
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Error summarizing JSON structure: {str(e)}")
|
||||
return {"error": str(e), "type": "error"}
|
||||
|
||||
def _analyzeDocumentsWithSizeLimit(self, documents: List[Any], maxTotalBytes: int) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Analyze documents for validation - METADATA ONLY (no document content/previews).
|
||||
For planning/validation, we only need metadata to assess format, type, and size compatibility.
|
||||
Analyze documents for validation - includes metadata AND JSON structure summary.
|
||||
JSON summary provides structure information (sections, tables with captions, IDs) without full content.
|
||||
"""
|
||||
if not documents:
|
||||
return []
|
||||
|
|
@ -142,14 +243,25 @@ class ContentValidator:
|
|||
formatExt = self._detectFormat(doc)
|
||||
sizeInfo = self._calculateSize(doc)
|
||||
|
||||
# Only include metadata - NO document content/previews
|
||||
# This keeps prompts small and focused on validation criteria
|
||||
summary = {
|
||||
"name": name,
|
||||
"mimeType": mimeType,
|
||||
"format": formatExt,
|
||||
"size": sizeInfo["readable"]
|
||||
}
|
||||
|
||||
# Extract JSON structure summary if documentData is available
|
||||
data = getattr(doc, 'documentData', None)
|
||||
if data is not None:
|
||||
if isinstance(data, dict):
|
||||
# Summarize JSON structure
|
||||
jsonSummary = self._summarizeJsonStructure(data)
|
||||
summary["jsonStructure"] = jsonSummary
|
||||
elif isinstance(data, list) and len(data) > 0 and isinstance(data[0], dict):
|
||||
# Handle list of documents
|
||||
jsonSummary = self._summarizeJsonStructure(data[0])
|
||||
summary["jsonStructure"] = jsonSummary
|
||||
|
||||
summaries.append(summary)
|
||||
except Exception as e:
|
||||
logger.warning(f"Error analyzing document {getattr(doc, 'documentName', 'Unknown')}: {str(e)}")
|
||||
|
|
@ -296,27 +408,69 @@ class ContentValidator:
|
|||
successCriteria = intent.get('successCriteria', [])
|
||||
criteriaCount = len(successCriteria)
|
||||
|
||||
# Build action name context
|
||||
# Build action name context with human-readable description
|
||||
actionContext = ""
|
||||
if actionName:
|
||||
actionContext = f"\nACTION THAT CREATED DOCUMENTS: {actionName}"
|
||||
# Convert action name to human-readable format
|
||||
actionDescription = actionName.replace("ai.", "").replace(".", " ").title()
|
||||
if "convert" in actionName.lower():
|
||||
actionDescription = "Document format conversion"
|
||||
elif "generate" in actionName.lower() or "create" in actionName.lower():
|
||||
actionDescription = "Document generation"
|
||||
elif "extract" in actionName.lower():
|
||||
actionDescription = "Content extraction"
|
||||
elif "process" in actionName.lower():
|
||||
actionDescription = "Content processing"
|
||||
actionContext = f"\nDOCUMENTS CREATED BY: {actionDescription} ({actionName})"
|
||||
|
||||
# Format success criteria for display
|
||||
criteriaDisplay = json.dumps(successCriteria, ensure_ascii=False) if successCriteria else "[]"
|
||||
|
||||
# Build successCriteriaMet example - show proper array format
|
||||
criteriaMetExample = json.dumps([False] * criteriaCount) if criteriaCount > 0 else "[]"
|
||||
|
||||
promptBase = f"""TASK VALIDATION
|
||||
|
||||
{objectiveLabel}: '{objectiveText}'
|
||||
EXPECTED DATA TYPE: {dataType}
|
||||
EXPECTED FORMATS: {expectedFormats if expectedFormats else ['any']}
|
||||
SUCCESS CRITERIA ({criteriaCount} items): {successCriteria}{actionContext}
|
||||
SUCCESS CRITERIA ({criteriaCount} items): {criteriaDisplay}{actionContext}
|
||||
|
||||
VALIDATION RULES:
|
||||
IMPORTANT: You only have document METADATA (filename, format, size, mimeType) - NOT document content.
|
||||
Validate based on metadata only:
|
||||
1. Check if filenames are APPROXIMATELY meaningful (generic names like "generated.docx" are acceptable if format matches)
|
||||
2. Check if delivered formats are compatible with expected format
|
||||
3. Check if document sizes are reasonable for the task objective
|
||||
4. Assess if filename and size combination suggests correct data type
|
||||
5. Rate overall quality (0.0-1.0) based on metadata indicators, with format matching being the most important
|
||||
6. Identify specific gaps based on what the user requested (infer from filename, size, format - NOT content)
|
||||
You have document METADATA (filename, format, size, mimeType) AND JSON STRUCTURE SUMMARY (sections, tables with captions, IDs, statistics).
|
||||
|
||||
What CAN be validated:
|
||||
- Format compatibility: Check if delivered format matches expected format (e.g., xlsx matches xlsx, docx matches docx)
|
||||
- Filename appropriateness: Check if filename suggests correct content type (e.g., "employee_data.xlsx" suggests employee data)
|
||||
- Document structure: Use JSON structure summary to validate:
|
||||
* Number of sections/tables matches requirements
|
||||
* Table captions are present and meaningful (if task requires specific tables)
|
||||
* Section IDs are present (if needed)
|
||||
* Table row/column counts are reasonable for the task
|
||||
* Section types match expectations (e.g., task asks for tables, check if tables are present)
|
||||
- Document count: Check if number of documents matches expectations
|
||||
- Basic size sanity: Only flag size if EXTREMELY small (<1KB) or suspiciously large for the task type
|
||||
|
||||
What CANNOT be validated:
|
||||
- Content quality, accuracy, or completeness of actual data values
|
||||
- Whether specific data values are correct
|
||||
- Whether formatting details are perfect
|
||||
- Whether content meets very detailed requirements that require reading actual data
|
||||
|
||||
Validation approach:
|
||||
1. Format matching is PRIMARY - if format matches, qualityScore should be at least 0.7
|
||||
2. Structure validation using JSON summary is SECONDARY - check if structure matches requirements:
|
||||
- If task asks for "two sheets" or "two tables", verify section count or table count from JSON summary
|
||||
- If task asks for specific table captions, verify they exist in JSON summary
|
||||
- If task asks for specific structure (e.g., "Employees table" and "Departments table"), verify section titles/captions match
|
||||
3. Filename appropriateness is TERTIARY - meaningful filenames increase score
|
||||
4. Size checks should be VERY conservative - only flag if clearly wrong (e.g., 0 bytes or <1KB for complex documents)
|
||||
5. For successCriteriaMet: Evaluate each criterion using metadata AND JSON structure:
|
||||
- Format-related criteria: Can be evaluated (e.g., "Excel file" → check format)
|
||||
- Structure-related criteria: Can be evaluated using JSON summary (e.g., "two sheets" → check section count, "table with caption X" → check JSON summary for caption)
|
||||
- Content-related criteria: Set to false if cannot be determined from structure (don't guess data values)
|
||||
6. Only suggest improvements if there are CLEAR issues (wrong format, missing structure elements, etc.)
|
||||
7. If format matches, structure matches requirements (from JSON summary), and filename is reasonable, qualityScore should be 0.8-1.0
|
||||
|
||||
OUTPUT FORMAT - JSON ONLY (no prose):
|
||||
{{
|
||||
|
|
@ -325,22 +479,25 @@ OUTPUT FORMAT - JSON ONLY (no prose):
|
|||
"dataTypeMatch": false,
|
||||
"formatMatch": false,
|
||||
"documentCount": {len(documents)},
|
||||
"successCriteriaMet": {"[false]" * criteriaCount},
|
||||
"gapAnalysis": "Describe what is missing or incorrect based on filename, size, format metadata",
|
||||
"improvementSuggestions": ["General action to improve overall result"],
|
||||
"successCriteriaMet": {criteriaMetExample},
|
||||
"gapAnalysis": "Describe what is missing or incorrect based ONLY on metadata (format, filename, count, size). If format matches and filename is reasonable, state that validation is limited by metadata-only access.",
|
||||
"improvementSuggestions": [],
|
||||
"validationDetails": [
|
||||
{{
|
||||
"documentName": "document.ext",
|
||||
"issues": ["Issue inferred from metadata (e.g., filename doesn't match task, size too small for objective)"],
|
||||
"issues": ["Issue inferred from metadata ONLY"],
|
||||
"suggestions": ["Specific fix based on metadata analysis"]
|
||||
}}
|
||||
]
|
||||
}}
|
||||
|
||||
Field explanations:
|
||||
- "improvementSuggestions": CONCRETE, EXECUTABLE actions to fix the issues. DO NOT just repeat the original task - suggest SPECIFIC, actionable steps that address the identified problems. Each suggestion should be a concrete action that can be executed, not a vague instruction to repeat the task.
|
||||
- "validationDetails[].suggestions": Specific fixes for each document's individual issues (document-specific, detailed, actionable)
|
||||
- IMPORTANT: Improvement suggestions must be ACTIONABLE and SPECIFIC. Instead of saying "generate CSV again", suggest concrete steps like "convert existing JSON output to CSV format" or "regenerate with CSV format parameter". Focus on what needs to be done differently, not repeating the original request.
|
||||
- "successCriteriaMet": Array of {criteriaCount} boolean values, one per success criterion. Evaluate each based ONLY on metadata. If a criterion cannot be evaluated from metadata, set to false and explain in gapAnalysis.
|
||||
- "qualityScore": 0.0-1.0 score. If format matches and filename is reasonable, score should be 0.8-1.0. Only reduce score for clear metadata issues.
|
||||
- "overallSuccess": true if format matches AND (qualityScore >= 0.8 OR no clear metadata issues)
|
||||
- "improvementSuggestions": Only include if there are CLEAR metadata issues that can be fixed. If format matches and filename is reasonable, leave empty array [].
|
||||
- "gapAnalysis": Be honest about limitations - if validation is limited by metadata-only access, state this clearly.
|
||||
- IMPORTANT: Do NOT suggest improvements based on assumptions about content quality. Only suggest fixes for clear metadata problems (wrong format, missing documents, etc.).
|
||||
|
||||
DELIVERED DOCUMENTS ({len(documents)} items):
|
||||
"""
|
||||
|
|
@ -354,8 +511,9 @@ DELIVERED DOCUMENTS ({len(documents)} items):
|
|||
documentSummaries = self._analyzeDocumentsWithSizeLimit(documents, availableBytes)
|
||||
|
||||
# Build final prompt with summaries at the end
|
||||
documentsJson = json.dumps(documentSummaries, indent=2)
|
||||
validationPrompt = promptBase + documentsJson
|
||||
# Format document summaries with JSON structure prominently displayed
|
||||
documentsJson = json.dumps(documentSummaries, indent=2, ensure_ascii=False)
|
||||
validationPrompt = promptBase + documentsJson + "\n\nNOTE: The 'jsonStructure' field in each document summary contains the document structure (sections, tables with captions, IDs, statistics). Use this to validate structure requirements like number of tables, table captions, section types, etc."
|
||||
|
||||
# Call AI service for validation
|
||||
response = await self.services.ai.callAiPlanning(
|
||||
|
|
|
|||
|
|
@ -1,179 +0,0 @@
|
|||
# intentAnalyzer.py
|
||||
# Intent analysis for adaptive Dynamic mode - AI-based, language-agnostic
|
||||
|
||||
import json
|
||||
import logging
|
||||
from typing import Dict, Any, List
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class IntentAnalyzer:
|
||||
"""Analyzes user intent using AI - language-agnostic and generic"""
|
||||
|
||||
def __init__(self, services=None):
|
||||
self.services = services
|
||||
|
||||
async def analyzeUserIntent(self, userPrompt: str, context: Any) -> Dict[str, Any]:
|
||||
"""Analyzes user intent from prompt and context using AI (single attempt, no fallbacks)"""
|
||||
aiAnalysis = await self._analyzeIntentWithAI(userPrompt, context)
|
||||
if not aiAnalysis:
|
||||
raise ValueError("AI intent analysis failed: empty or invalid response")
|
||||
return aiAnalysis
|
||||
|
||||
async def _analyzeIntentWithAI(self, userPrompt: str, context: Any) -> Dict[str, Any]:
|
||||
"""Uses AI to analyze user intent - language-agnostic"""
|
||||
try:
|
||||
if not self.services or not hasattr(self.services, 'ai'):
|
||||
return None
|
||||
|
||||
# Create AI analysis prompt
|
||||
# Determine if we're in task context (have taskStep) or workflow context
|
||||
isTaskContext = hasattr(context, 'taskStep') and context.taskStep is not None
|
||||
contextObjective = getattr(context.taskStep, 'objective', '') if isTaskContext else ''
|
||||
|
||||
# Use appropriate label based on context
|
||||
if isTaskContext:
|
||||
# Task context: use OBJECTIVE label and only task objective
|
||||
requestLabel = "OBJECTIVE"
|
||||
contextInfo = f"OBJECTIVE: {self.services.utils.sanitizePromptContent(contextObjective, 'userinput')}"
|
||||
else:
|
||||
# Workflow context: use USER REQUEST label
|
||||
requestLabel = "USER REQUEST"
|
||||
contextInfo = f"CONTEXT: {self.services.utils.sanitizePromptContent(contextObjective, 'userinput') if contextObjective else 'None'}"
|
||||
|
||||
analysisPrompt = f"""
|
||||
You are an intent analyzer. Analyze the user's request to understand what they want delivered.
|
||||
|
||||
{requestLabel}: {self.services.utils.sanitizePromptContent(userPrompt, 'userinput')}
|
||||
|
||||
{contextInfo}
|
||||
|
||||
Analyze the user's intent and determine:
|
||||
1. What type of data/content they want (numbers, text, documents, analysis, code, etc.)
|
||||
2. What file format(s) they expect - provide matching file format extensions list
|
||||
- If multiple formats requested, list all of them (e.g., ["xlsx", "pdf"])
|
||||
- If format is unclear or not specified, use empty list []
|
||||
3. What quality requirements they have (accuracy, completeness)
|
||||
4. What specific success criteria define completion
|
||||
5. What language the user is communicating in (detect from the user request)
|
||||
6. DEFINITION OF DONE: Define measurable KPIs that can be checked against JSON structure metrics
|
||||
|
||||
CRITICAL: Respond with ONLY the JSON object below. Do not include any explanatory text, analysis, or other content before or after the JSON.
|
||||
|
||||
{{
|
||||
"primaryGoal": "The main objective the user wants to achieve",
|
||||
"dataType": "numbers|text|documents|analysis|code|unknown",
|
||||
"expectedFormats": ["pdf", "docx", "xlsx", "txt", "json", "csv", "html", "md"],
|
||||
"qualityRequirements": {{
|
||||
"accuracyThreshold": 0.0-1.0,
|
||||
"completenessThreshold": 0.0-1.0
|
||||
}},
|
||||
"successCriteria": ["specific criterion 1", "specific criterion 2"],
|
||||
"languageUserDetected": "en",
|
||||
"confidenceScore": 0.0-1.0,
|
||||
"definitionOfDone": {{
|
||||
"minSections": 0,
|
||||
"minParagraphs": 0,
|
||||
"minHeadings": 0,
|
||||
"minTableRows": 0,
|
||||
"minListItems": 0,
|
||||
"minCodeLines": 0,
|
||||
"minContentSize": 0,
|
||||
"requiredContentTypes": [],
|
||||
"completionType": "quantitative|qualitative|structural"
|
||||
}}
|
||||
}}
|
||||
|
||||
DEFINITION OF DONE RULES:
|
||||
- Extract quantitative requirements from user prompt (e.g., "4000 prime numbers" -> minTableRows: 4000)
|
||||
- For qualitative tasks (books, reports), set structural requirements (minSections, minParagraphs, minHeadings)
|
||||
- For code tasks, set minCodeLines based on requirements
|
||||
- For lists, set minListItems based on requirements
|
||||
- Set minContentSize as minimum expected content size in characters
|
||||
- Set requiredContentTypes if specific content types are required (e.g., ["table"] for CSV, ["paragraph", "heading"] for books)
|
||||
- Set completionType: "quantitative" for tasks with specific counts, "qualitative" for content quality tasks, "structural" for structured documents
|
||||
- Use 0 for metrics that are not relevant for this task type
|
||||
"""
|
||||
|
||||
# Call AI service for analysis
|
||||
response = await self.services.ai.callAiPlanning(
|
||||
prompt=analysisPrompt,
|
||||
placeholders=None,
|
||||
debugType="intentanalysis"
|
||||
)
|
||||
|
||||
# No retries or correction prompts here; parse-or-fail below
|
||||
|
||||
if not response or not response.strip():
|
||||
logger.warning("AI intent analysis returned empty response")
|
||||
return None
|
||||
|
||||
# Clean and extract JSON from response
|
||||
result = response.strip()
|
||||
logger.debug(f"AI intent analysis response length: {len(result)}")
|
||||
|
||||
# Try to find JSON in the response with multiple strategies
|
||||
import re
|
||||
|
||||
# Strategy 1: Look for JSON in markdown code blocks
|
||||
json_match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', result, re.DOTALL)
|
||||
if json_match:
|
||||
result = json_match.group(1)
|
||||
logger.debug(f"Extracted JSON from markdown code block: {result[:200]}...")
|
||||
else:
|
||||
# Strategy 2: Look for JSON object with proper structure
|
||||
json_match = re.search(r'\{[^{}]*"primaryGoal"[^{}]*\}', result, re.DOTALL)
|
||||
if not json_match:
|
||||
# Strategy 3: Look for any JSON object
|
||||
json_match = re.search(r'\{.*\}', result, re.DOTALL)
|
||||
|
||||
if not json_match:
|
||||
logger.warning(f"AI intent analysis failed - no JSON found in response: {result[:200]}...")
|
||||
logger.debug(f"Full AI response: {result}")
|
||||
return None
|
||||
|
||||
result = json_match.group(0)
|
||||
logger.debug(f"Extracted JSON directly: {result[:200]}...")
|
||||
|
||||
try:
|
||||
aiResult = json.loads(result)
|
||||
logger.info("AI intent analysis JSON parsed successfully")
|
||||
|
||||
# Set language only if currentUserLanguage is empty
|
||||
detected_lang = (aiResult.get('languageUserDetected') or '').strip()
|
||||
if detected_lang and detected_lang.lower() != 'unknown' and self.services.currentUserLanguage == "":
|
||||
self.services.currentUserLanguage = detected_lang
|
||||
logger.info(f"Set currentUserLanguage from intent: {detected_lang}")
|
||||
|
||||
# Also set services.user.language if it's empty
|
||||
if self.services.user and not self.services.user.language:
|
||||
self.services.user.language = detected_lang
|
||||
logger.info(f"Set services.user.language from intent: {detected_lang}")
|
||||
|
||||
return aiResult
|
||||
|
||||
except json.JSONDecodeError as json_error:
|
||||
logger.warning(f"AI intent analysis invalid JSON: {str(json_error)}")
|
||||
logger.debug(f"JSON content: {result}")
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"AI intent analysis failed: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
|
||||
def _isValidJsonResponse(self, response: str) -> bool:
|
||||
"""Checks if response contains valid JSON structure"""
|
||||
try:
|
||||
import re
|
||||
# Look for JSON with expected structure
|
||||
json_match = re.search(r'\{[^{}]*"primaryGoal"[^{}]*\}', response, re.DOTALL)
|
||||
if json_match:
|
||||
json.loads(json_match.group(0))
|
||||
return True
|
||||
return False
|
||||
except:
|
||||
return False
|
||||
|
|
@ -14,19 +14,19 @@ class LearningEngine:
|
|||
self.strategies = {}
|
||||
self.feedbackHistory = []
|
||||
|
||||
def learnFromFeedback(self, feedback: Dict[str, Any], context: Any, intent: Dict[str, Any]):
|
||||
"""Learns from feedback and updates strategies"""
|
||||
def learnFromFeedback(self, feedback: Dict[str, Any], context: Any, taskIntent: Dict[str, Any]):
|
||||
"""Learns from feedback and updates strategies - works on TASK level, not workflow level"""
|
||||
try:
|
||||
# Store feedback
|
||||
self.feedbackHistory.append({
|
||||
"feedback": feedback,
|
||||
"context": self._serializeContext(context),
|
||||
"intent": intent,
|
||||
"taskIntent": taskIntent, # Changed from intent to taskIntent
|
||||
"timestamp": datetime.now(timezone.utc).timestamp()
|
||||
})
|
||||
|
||||
# Update strategies based on feedback
|
||||
self._updateStrategies(feedback, intent)
|
||||
# Update strategies based on feedback (using taskIntent)
|
||||
self._updateStrategies(feedback, taskIntent)
|
||||
|
||||
# Normalize scores for safe logging
|
||||
_qs = feedback.get('qualityScore', 0.0)
|
||||
|
|
@ -47,11 +47,11 @@ class LearningEngine:
|
|||
except Exception as e:
|
||||
logger.error(f"Error learning from feedback: {str(e)}")
|
||||
|
||||
def getImprovedStrategy(self, context: Any, intent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Returns improved strategy based on learning"""
|
||||
def getImprovedStrategy(self, context: Any, taskIntent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Returns improved strategy based on learning - works on TASK level"""
|
||||
try:
|
||||
# Get strategy key based on intent
|
||||
strategyKey = self._getStrategyKey(intent)
|
||||
# Get strategy key based on taskIntent
|
||||
strategyKey = self._getStrategyKey(taskIntent)
|
||||
|
||||
# Get existing strategy or create default
|
||||
if strategyKey in self.strategies:
|
||||
|
|
@ -60,18 +60,18 @@ class LearningEngine:
|
|||
return strategy
|
||||
else:
|
||||
# Create default strategy
|
||||
defaultStrategy = self._createDefaultStrategy(intent)
|
||||
defaultStrategy = self._createDefaultStrategy(taskIntent)
|
||||
self.strategies[strategyKey] = defaultStrategy
|
||||
logger.info(f"Created default strategy for {strategyKey}")
|
||||
return defaultStrategy
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting improved strategy: {str(e)}")
|
||||
return self._createDefaultStrategy(intent)
|
||||
return self._createDefaultStrategy(taskIntent)
|
||||
|
||||
def _updateStrategies(self, feedback: Dict[str, Any], intent: Dict[str, Any]):
|
||||
"""Updates strategies based on feedback"""
|
||||
strategyKey = self._getStrategyKey(intent)
|
||||
def _updateStrategies(self, feedback: Dict[str, Any], taskIntent: Dict[str, Any]):
|
||||
"""Updates strategies based on feedback - works on TASK level"""
|
||||
strategyKey = self._getStrategyKey(taskIntent)
|
||||
actionAttempted = feedback.get('actionAttempted', 'unknown')
|
||||
# Coerce possibly None or non-numeric to floats
|
||||
qs_raw = feedback.get('qualityScore', 0.0)
|
||||
|
|
@ -87,7 +87,7 @@ class LearningEngine:
|
|||
|
||||
# Get or create strategy
|
||||
if strategyKey not in self.strategies:
|
||||
self.strategies[strategyKey] = self._createDefaultStrategy(intent)
|
||||
self.strategies[strategyKey] = self._createDefaultStrategy(taskIntent)
|
||||
|
||||
strategy = self.strategies[strategyKey]
|
||||
|
||||
|
|
@ -113,17 +113,17 @@ class LearningEngine:
|
|||
# Update last modified
|
||||
strategy['lastModified'] = datetime.now(timezone.utc).timestamp()
|
||||
|
||||
def _getStrategyKey(self, intent: Dict[str, Any]) -> str:
|
||||
"""Gets strategy key based on intent"""
|
||||
dataType = intent.get('dataType', 'unknown')
|
||||
expectedFormats = intent.get('expectedFormats', [])
|
||||
def _getStrategyKey(self, taskIntent: Dict[str, Any]) -> str:
|
||||
"""Gets strategy key based on taskIntent"""
|
||||
dataType = taskIntent.get('dataType', 'unknown')
|
||||
expectedFormats = taskIntent.get('expectedFormats', [])
|
||||
formatKey = '_'.join(expectedFormats) if expectedFormats else 'unknown'
|
||||
return f"{dataType}_{formatKey}"
|
||||
|
||||
def _createDefaultStrategy(self, intent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Creates a default strategy for the intent"""
|
||||
dataType = intent.get('dataType', 'unknown')
|
||||
expectedFormats = intent.get('expectedFormats', [])
|
||||
def _createDefaultStrategy(self, taskIntent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Creates a default strategy for the taskIntent"""
|
||||
dataType = taskIntent.get('dataType', 'unknown')
|
||||
expectedFormats = taskIntent.get('expectedFormats', [])
|
||||
formatStr = ', '.join(expectedFormats) if expectedFormats else 'any'
|
||||
formatKey = '_'.join(expectedFormats) if expectedFormats else 'unknown'
|
||||
|
||||
|
|
@ -170,10 +170,17 @@ class LearningEngine:
|
|||
}
|
||||
|
||||
def _serializeContext(self, context: Any) -> Dict[str, Any]:
|
||||
"""Serializes context for storage"""
|
||||
"""Serializes context for storage - task-level context"""
|
||||
try:
|
||||
taskObjective = ""
|
||||
if hasattr(context, 'taskStep') and context.taskStep:
|
||||
if hasattr(context.taskStep, 'objective'):
|
||||
taskObjective = context.taskStep.objective
|
||||
elif isinstance(context.taskStep, dict):
|
||||
taskObjective = context.taskStep.get('objective', '')
|
||||
|
||||
return {
|
||||
"taskObjective": getattr(context, 'taskStep', {}).get('objective', '') if hasattr(context, 'taskStep') else '',
|
||||
"taskObjective": taskObjective,
|
||||
"workflowId": getattr(context, 'workflowId', ''),
|
||||
"availableDocuments": getattr(context, 'availableDocuments', [])
|
||||
}
|
||||
|
|
|
|||
|
|
@ -17,56 +17,59 @@ class ProgressTracker:
|
|||
self.learningInsights = []
|
||||
self.currentPhase = "plan"
|
||||
|
||||
def updateOperation(self, result: Any, validation: Dict[str, Any], intent: Dict[str, Any]):
|
||||
"""Updates progress tracking based on action result"""
|
||||
def updateOperation(self, result: Any, validation: Dict[str, Any], taskIntent: Dict[str, Any]):
|
||||
"""Updates progress tracking based on action result - tracks per TASK, not workflow"""
|
||||
try:
|
||||
schemaCompliant = validation.get('schemaCompliant', True)
|
||||
overallSuccess = validation.get('overallSuccess', None)
|
||||
qualityScore = validation.get('qualityScore', None)
|
||||
improvementSuggestions = validation.get('improvementSuggestions', [])
|
||||
|
||||
# Get task objective from taskIntent (task-level, not workflow-level)
|
||||
taskObjective = taskIntent.get('taskObjective', taskIntent.get('primaryGoal', 'Unknown'))
|
||||
|
||||
# If validation is not schema compliant, treat as indeterminate (do not count as failure)
|
||||
if not schemaCompliant or overallSuccess is None or qualityScore is None:
|
||||
self.partialAchievements.append({
|
||||
"objective": intent.get('primaryGoal', 'Unknown'),
|
||||
"objective": taskObjective,
|
||||
"partialAchievement": "Validation indeterminate (schema non-compliant or missing fields)",
|
||||
"missingFields": validation.get('missingFields', []),
|
||||
"timestamp": datetime.now(timezone.utc).timestamp()
|
||||
})
|
||||
self.currentPhase = "partial"
|
||||
logger.info(f"Indeterminate validation (no penalty): {intent.get('primaryGoal', 'Unknown')}")
|
||||
logger.info(f"Indeterminate validation (no penalty): {taskObjective}")
|
||||
elif overallSuccess and qualityScore > 0.7:
|
||||
# Successful completion
|
||||
self.completedObjectives.append({
|
||||
"objective": intent.get('primaryGoal', 'Unknown'),
|
||||
"objective": taskObjective,
|
||||
"achievement": f"Quality score: {qualityScore:.2f}",
|
||||
"qualityScore": qualityScore,
|
||||
"timestamp": datetime.now(timezone.utc).timestamp()
|
||||
})
|
||||
self.currentPhase = "completed"
|
||||
logger.info(f"Objective completed: {intent.get('primaryGoal', 'Unknown')}")
|
||||
logger.info(f"Task objective completed: {taskObjective}")
|
||||
|
||||
elif qualityScore > 0.3:
|
||||
# Partial achievement
|
||||
self.partialAchievements.append({
|
||||
"objective": intent.get('primaryGoal', 'Unknown'),
|
||||
"objective": taskObjective,
|
||||
"partialAchievement": f"Quality score: {qualityScore:.2f}",
|
||||
"missingParts": improvementSuggestions,
|
||||
"timestamp": datetime.now(timezone.utc).timestamp()
|
||||
})
|
||||
self.currentPhase = "partial"
|
||||
logger.info(f"Partial achievement: {intent.get('primaryGoal', 'Unknown')}")
|
||||
logger.info(f"Partial achievement: {taskObjective}")
|
||||
|
||||
else:
|
||||
# Failed attempt
|
||||
self.failedAttempts.append({
|
||||
"objective": intent.get('primaryGoal', 'Unknown'),
|
||||
"objective": taskObjective,
|
||||
"failureReason": f"Quality score: {qualityScore:.2f}",
|
||||
"learningOpportunity": improvementSuggestions,
|
||||
"timestamp": datetime.now(timezone.utc).timestamp()
|
||||
})
|
||||
self.currentPhase = "failed"
|
||||
logger.info(f"Failed attempt: {intent.get('primaryGoal', 'Unknown')}")
|
||||
logger.info(f"Failed attempt: {taskObjective}")
|
||||
|
||||
# Extract learning insights
|
||||
if improvementSuggestions:
|
||||
|
|
|
|||
|
|
@ -9,7 +9,6 @@ from modules.datamodels.datamodelAi import AiCallOptions, OperationTypeEnum, Pro
|
|||
from modules.workflows.processing.shared.promptGenerationTaskplan import (
|
||||
generateTaskPlanningPrompt
|
||||
)
|
||||
from modules.workflows.processing.adaptive import IntentAnalyzer
|
||||
from modules.workflows.processing.shared.stateTools import checkWorkflowStopped
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
|
@ -50,14 +49,14 @@ class TaskPlanner:
|
|||
cleanedObjective = actualUserPrompt
|
||||
workflowIntent = None
|
||||
else:
|
||||
# This intent will be reused for workflow-level validation in executeTask
|
||||
from modules.workflows.processing.adaptive import IntentAnalyzer
|
||||
intentAnalyzer = IntentAnalyzer(self.services)
|
||||
workflowIntent = await intentAnalyzer.analyzeUserIntent(actualUserPrompt, None)
|
||||
# Store workflow intent for reuse in executeTask (avoid redundant analysis)
|
||||
if not hasattr(workflow, '_workflowIntent'):
|
||||
workflow._workflowIntent = workflowIntent
|
||||
cleanedObjective = workflowIntent.get('primaryGoal', actualUserPrompt) if isinstance(workflowIntent, dict) else actualUserPrompt
|
||||
# Use workflowIntent from workflow object (set in workflowManager from userintention analysis)
|
||||
workflowIntent = getattr(workflow, '_workflowIntent', None)
|
||||
if workflowIntent and isinstance(workflowIntent, dict):
|
||||
cleanedObjective = workflowIntent.get('primaryGoal', actualUserPrompt)
|
||||
else:
|
||||
# Fallback: use user prompt directly if workflowIntent not available
|
||||
cleanedObjective = actualUserPrompt
|
||||
logger.warning("WorkflowIntent not found in workflow object, using user prompt directly")
|
||||
|
||||
# Create proper context object for task planning using cleaned intent
|
||||
# For task planning, we need to create a minimal TaskStep since TaskContext requires it
|
||||
|
|
|
|||
|
|
@ -22,7 +22,7 @@ from modules.workflows.processing.shared.promptGenerationActionsDynamic import (
|
|||
generateDynamicRefinementPrompt
|
||||
)
|
||||
from modules.workflows.processing.shared.placeholderFactory import extractReviewContent
|
||||
from modules.workflows.processing.adaptive import IntentAnalyzer, ContentValidator, LearningEngine, ProgressTracker
|
||||
from modules.workflows.processing.adaptive import ContentValidator, LearningEngine, ProgressTracker
|
||||
from modules.workflows.processing.adaptive.adaptiveLearningEngine import AdaptiveLearningEngine
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
|
@ -33,7 +33,6 @@ class DynamicMode(BaseMode):
|
|||
def __init__(self, services):
|
||||
super().__init__(services)
|
||||
# Initialize adaptive components
|
||||
self.intentAnalyzer = IntentAnalyzer(services)
|
||||
self.learningEngine = LearningEngine()
|
||||
self.adaptiveLearningEngine = AdaptiveLearningEngine() # New enhanced learning engine
|
||||
self.contentValidator = ContentValidator(services, self.adaptiveLearningEngine)
|
||||
|
|
@ -56,23 +55,42 @@ class DynamicMode(BaseMode):
|
|||
logger.info(f"=== STARTING TASK {taskIndex}: {taskStep.objective} ===")
|
||||
|
||||
# Use workflow-level intent from planning phase (stored in workflow object)
|
||||
# This avoids redundant intent analysis - intent was already analyzed during task planning
|
||||
# This avoids redundant intent analysis - intent was already analyzed during userintention phase
|
||||
if hasattr(workflow, '_workflowIntent') and workflow._workflowIntent:
|
||||
self.workflowIntent = workflow._workflowIntent
|
||||
logger.info(f"Using workflow intent from planning phase")
|
||||
logger.info(f"Using workflow intent from userintention phase")
|
||||
else:
|
||||
# Fallback: analyze if not available (shouldn't happen in normal flow)
|
||||
original_prompt = self.services.currentUserPrompt if self.services and hasattr(self.services, 'currentUserPrompt') else taskStep.objective
|
||||
self.workflowIntent = await self.intentAnalyzer.analyzeUserIntent(original_prompt, context)
|
||||
logger.warning(f"Workflow intent not found in workflow object, analyzed fresh")
|
||||
# Fallback: use empty dict if not available (shouldn't happen in normal flow)
|
||||
self.workflowIntent = {}
|
||||
logger.warning(f"Workflow intent not found in workflow object, using empty dict")
|
||||
|
||||
# Task-level intent: Use task-specific fields from TaskStep if available, otherwise inherit from workflow
|
||||
# Task can override workflow intent (e.g., workflow wants PDF, task needs CSV)
|
||||
# IMPORTANT: taskIntent is used for task-level tracking, not workflow-level
|
||||
self.taskIntent = {}
|
||||
|
||||
# Add task objective - this is what we track progress against
|
||||
self.taskIntent['taskObjective'] = taskStep.objective
|
||||
|
||||
if taskStep.dataType:
|
||||
self.taskIntent['dataType'] = taskStep.dataType
|
||||
elif self.workflowIntent.get('dataType'):
|
||||
self.taskIntent['dataType'] = self.workflowIntent['dataType']
|
||||
|
||||
if taskStep.expectedFormats:
|
||||
self.taskIntent['expectedFormats'] = taskStep.expectedFormats
|
||||
elif self.workflowIntent.get('expectedFormats'):
|
||||
self.taskIntent['expectedFormats'] = self.workflowIntent['expectedFormats']
|
||||
|
||||
if hasattr(taskStep, 'qualityRequirements') and taskStep.qualityRequirements:
|
||||
self.taskIntent['qualityRequirements'] = taskStep.qualityRequirements
|
||||
elif self.workflowIntent.get('qualityRequirements'):
|
||||
self.taskIntent['qualityRequirements'] = self.workflowIntent['qualityRequirements']
|
||||
|
||||
# CRITICAL: Task-level intent analysis - each task needs its own Definition of Done
|
||||
# Workflow intent is for overall planning, but each task has specific completion criteria
|
||||
# This Definition of Done is needed for AI looping completion detection
|
||||
self.taskIntent = await self.intentAnalyzer.analyzeUserIntent(taskStep.objective, context)
|
||||
# Store taskIntent in workflow object so it's accessible from services
|
||||
workflow._taskIntent = self.taskIntent
|
||||
logger.info(f"Task intent: {self.taskIntent}")
|
||||
logger.info(f"Task intent (task-level): {self.taskIntent}")
|
||||
logger.info(f"Task objective: {taskStep.objective}")
|
||||
logger.info(f"Task format info: dataType={taskStep.dataType}, expectedFormats={taskStep.expectedFormats}")
|
||||
|
||||
# NEW: Reset progress tracking for new task
|
||||
|
|
@ -137,12 +155,12 @@ class DynamicMode(BaseMode):
|
|||
step
|
||||
)
|
||||
|
||||
# NEW: Learn from feedback
|
||||
feedback = self._collectFeedback(result, validationResult, self.workflowIntent)
|
||||
self.learningEngine.learnFromFeedback(feedback, context, self.workflowIntent)
|
||||
# NEW: Learn from feedback - use taskIntent (task-level), not workflowIntent
|
||||
feedback = self._collectFeedback(result, validationResult, self.taskIntent)
|
||||
self.learningEngine.learnFromFeedback(feedback, context, self.taskIntent)
|
||||
|
||||
# NEW: Update progress
|
||||
self.progressTracker.updateOperation(result, validationResult, self.workflowIntent)
|
||||
# NEW: Update progress - use taskIntent (task-level), not workflowIntent
|
||||
self.progressTracker.updateOperation(result, validationResult, self.taskIntent)
|
||||
|
||||
decision = await self._refineDecide(context, observation)
|
||||
|
||||
|
|
@ -154,12 +172,12 @@ class DynamicMode(BaseMode):
|
|||
|
||||
# Store next action guidance from decision for use in next iteration
|
||||
if decision and decision.status == "continue" and decision.nextAction:
|
||||
# Use setattr for Pydantic models (TaskContext is a BaseModel)
|
||||
setattr(context, 'nextActionGuidance', {
|
||||
# Set nextActionGuidance directly (now defined in TaskContext model)
|
||||
context.nextActionGuidance = {
|
||||
"action": decision.nextAction,
|
||||
"parameters": decision.nextActionParameters or {},
|
||||
"objective": decision.nextActionObjective or decision.reason or ""
|
||||
})
|
||||
}
|
||||
logger.info(f"Stored next action guidance: {decision.nextAction} with parameters {decision.nextActionParameters}")
|
||||
|
||||
# Update context with learnings from this step
|
||||
|
|
@ -218,10 +236,9 @@ class DynamicMode(BaseMode):
|
|||
async def _planSelect(self, context: TaskContext) -> Dict[str, Any]:
|
||||
"""Plan: select exactly one action. Returns {"action": {method, name}}"""
|
||||
# Check if we have concrete next action guidance from previous refinement decision
|
||||
# Check for nextActionGuidance (stored as dynamic attribute via setattr)
|
||||
nextActionGuidance = getattr(context, 'nextActionGuidance', None)
|
||||
if nextActionGuidance:
|
||||
guidance = nextActionGuidance
|
||||
# Check for nextActionGuidance (now defined in TaskContext model)
|
||||
if context.nextActionGuidance:
|
||||
guidance = context.nextActionGuidance
|
||||
actionName = guidance.get("action")
|
||||
parameters = guidance.get("parameters", {})
|
||||
objective = guidance.get("objective", "")
|
||||
|
|
@ -235,7 +252,7 @@ class DynamicMode(BaseMode):
|
|||
"parameters": parameters
|
||||
}
|
||||
# Clear guidance after use (one-time use)
|
||||
setattr(context, 'nextActionGuidance', None)
|
||||
context.nextActionGuidance = None
|
||||
return selection
|
||||
|
||||
# Normal planning: use AI to select action
|
||||
|
|
@ -262,9 +279,23 @@ class DynamicMode(BaseMode):
|
|||
)
|
||||
|
||||
# Parse response using structured parsing with ActionDefinition model
|
||||
from modules.shared.jsonUtils import parseJsonWithModel
|
||||
from modules.shared.jsonUtils import parseJsonWithModel, tryParseJson
|
||||
from modules.datamodels.datamodelWorkflow import ActionDefinition
|
||||
|
||||
# CRITICAL: Extract requiredInputDocuments from raw JSON BEFORE parsing as ActionDefinition
|
||||
# ActionDefinition model doesn't have requiredInputDocuments field, so it gets lost during parsing
|
||||
# tryParseJson already handles markdown code blocks via extractJsonString internally
|
||||
rawJson, parseError, _ = tryParseJson(response)
|
||||
requiredInputDocuments = None
|
||||
requiredConnection = None
|
||||
if parseError:
|
||||
logger.warning(f"Error parsing raw JSON for requiredInputDocuments extraction: {parseError}")
|
||||
if isinstance(rawJson, dict):
|
||||
requiredInputDocuments = rawJson.get('requiredInputDocuments')
|
||||
requiredConnection = rawJson.get('requiredConnection')
|
||||
if requiredInputDocuments:
|
||||
logger.info(f"Extracted requiredInputDocuments from raw JSON: {requiredInputDocuments}")
|
||||
|
||||
try:
|
||||
# Parse response string as ActionDefinition
|
||||
actionDef = parseJsonWithModel(response, ActionDefinition)
|
||||
|
|
@ -278,28 +309,35 @@ class DynamicMode(BaseMode):
|
|||
raise ValueError("Selection missing 'action' as string")
|
||||
|
||||
# Validate document references - prevent AI from inventing Message IDs
|
||||
# Convert string references to typed DocumentReferenceList
|
||||
if 'requiredInputDocuments' in selection:
|
||||
stringRefs = selection['requiredInputDocuments']
|
||||
if isinstance(stringRefs, list):
|
||||
# Validate string references first
|
||||
self._validateDocumentReferences(stringRefs, context)
|
||||
# Convert to typed DocumentReferenceList
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
selection['documentList'] = DocumentReferenceList.from_string_list(stringRefs)
|
||||
# Remove old field
|
||||
del selection['requiredInputDocuments']
|
||||
elif stringRefs:
|
||||
# Single string reference
|
||||
self._validateDocumentReferences([stringRefs], context)
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
selection['documentList'] = DocumentReferenceList.from_string_list([stringRefs])
|
||||
del selection['requiredInputDocuments']
|
||||
# Convert string references to typed DocumentReferenceList (from raw JSON, not from parsed model)
|
||||
if requiredInputDocuments:
|
||||
stringRefs = requiredInputDocuments
|
||||
try:
|
||||
if isinstance(stringRefs, list):
|
||||
# Validate string references first
|
||||
self._validateDocumentReferences(stringRefs, context)
|
||||
# Convert to typed DocumentReferenceList
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
docList = DocumentReferenceList.from_string_list(stringRefs)
|
||||
selection['documentList'] = docList
|
||||
logger.info(f"Converted requiredInputDocuments to documentList: {len(docList.references)} references")
|
||||
elif stringRefs:
|
||||
# Single string reference
|
||||
self._validateDocumentReferences([stringRefs], context)
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
docList = DocumentReferenceList.from_string_list([stringRefs])
|
||||
selection['documentList'] = docList
|
||||
logger.info(f"Converted requiredInputDocuments to documentList: {len(docList.references)} references")
|
||||
except Exception as e:
|
||||
logger.error(f"Error converting requiredInputDocuments to documentList: {e}")
|
||||
raise # Re-raise to fail fast if document conversion fails
|
||||
else:
|
||||
# No documents required - this is normal for actions that don't need input documents
|
||||
logger.debug(f"No requiredInputDocuments found in raw JSON response (normal for actions without document requirements)")
|
||||
|
||||
# Convert connection reference if present
|
||||
if 'requiredConnection' in selection:
|
||||
selection['connectionReference'] = selection.get('requiredConnection')
|
||||
del selection['requiredConnection']
|
||||
# Convert connection reference if present (from raw JSON, not from parsed model)
|
||||
if requiredConnection:
|
||||
selection['connectionReference'] = requiredConnection
|
||||
|
||||
# Enforce spec: Stage 1 must NOT include 'parameters'
|
||||
if 'parameters' in selection:
|
||||
|
|
@ -336,10 +374,35 @@ class DynamicMode(BaseMode):
|
|||
|
||||
# Check if all provided references are valid and prefer non-empty
|
||||
for ref in document_refs:
|
||||
if ref not in preferred_refs:
|
||||
logger.error(f"Invalid or empty document reference: {ref}")
|
||||
logger.error(f"Available references: {valid_refs}")
|
||||
raise ValueError(f"Document reference '{ref}' not found or refers to empty document. Use only non-empty references from AVAILABLE_DOCUMENTS_INDEX.")
|
||||
if ref in preferred_refs:
|
||||
# Exact match - valid
|
||||
continue
|
||||
|
||||
# For docItem references, check if documentId matches (filename is optional)
|
||||
if ref.startswith('docItem:'):
|
||||
# Extract documentId from provided reference
|
||||
provided_parts = ref[8:].split(':', 1) # Remove "docItem:" prefix
|
||||
provided_doc_id = provided_parts[0] if provided_parts else None
|
||||
|
||||
if provided_doc_id:
|
||||
# Check if any available reference has the same documentId
|
||||
found_match = False
|
||||
for valid_ref in valid_refs:
|
||||
if valid_ref.startswith('docItem:'):
|
||||
valid_parts = valid_ref[8:].split(':', 1)
|
||||
valid_doc_id = valid_parts[0] if valid_parts else None
|
||||
if valid_doc_id == provided_doc_id:
|
||||
found_match = True
|
||||
break
|
||||
|
||||
if found_match:
|
||||
# DocumentId matches - valid (filename is optional)
|
||||
continue
|
||||
|
||||
# No match found
|
||||
logger.error(f"Invalid or empty document reference: {ref}")
|
||||
logger.error(f"Available references: {valid_refs}")
|
||||
raise ValueError(f"Document reference '{ref}' not found or refers to empty document. Use only non-empty references from AVAILABLE_DOCUMENTS_INDEX.")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error validating document references: {str(e)}")
|
||||
|
|
@ -351,26 +414,35 @@ class DynamicMode(BaseMode):
|
|||
compoundActionName = selection.get('action', '')
|
||||
actionObjective = selection.get('actionObjective', '')
|
||||
|
||||
# CRITICAL: Create Action-level Intent with Definition of Done for THIS specific action
|
||||
# Each action needs its own DoD because:
|
||||
# - Action 1: "Generate first 2000 prime numbers" → DoD: 200 table rows
|
||||
# - Action 2: "Generate remaining 2000 prime numbers" → DoD: 200 table rows
|
||||
# - Action 3: "Convert to CSV" → DoD: 1 document, CSV format
|
||||
# Without action-specific DoD, AI loops never know when THIS action is complete
|
||||
actionIntent = None
|
||||
# Action-level intent: Extract from dynamic plan selection prompt response
|
||||
# Action intent analysis is now integrated into generateDynamicPlanSelectionPrompt
|
||||
# Extract intent fields from selection response
|
||||
actionIntent = {}
|
||||
if actionObjective:
|
||||
try:
|
||||
actionIntent = await self.intentAnalyzer.analyzeUserIntent(actionObjective, context)
|
||||
# Store actionIntent in workflow object so it's accessible from services
|
||||
workflow._actionIntent = actionIntent
|
||||
logger.info(f"Action intent created: {actionIntent.get('definitionOfDone', {}) if actionIntent else 'None'}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to create action intent: {e}, falling back to task intent")
|
||||
# Fallback to task intent if action intent creation fails
|
||||
actionIntent = getattr(workflow, '_taskIntent', None)
|
||||
# Extract intent fields from selection response (if provided by AI)
|
||||
if 'dataType' in selection:
|
||||
actionIntent['dataType'] = selection.get('dataType')
|
||||
if 'expectedFormats' in selection:
|
||||
actionIntent['expectedFormats'] = selection.get('expectedFormats')
|
||||
if 'qualityRequirements' in selection:
|
||||
actionIntent['qualityRequirements'] = selection.get('qualityRequirements')
|
||||
if 'successCriteria' in selection:
|
||||
actionIntent['successCriteria'] = selection.get('successCriteria')
|
||||
|
||||
# If no intent fields in selection, inherit from task intent
|
||||
if not actionIntent:
|
||||
taskIntent = getattr(workflow, '_taskIntent', None)
|
||||
if taskIntent:
|
||||
actionIntent = taskIntent.copy()
|
||||
logger.info(f"Using task intent as action intent (no intent fields in selection)")
|
||||
else:
|
||||
logger.info(f"Action intent extracted from selection: {actionIntent}")
|
||||
|
||||
# Store actionIntent in workflow object so it's accessible from services
|
||||
workflow._actionIntent = actionIntent
|
||||
else:
|
||||
# No actionObjective - fallback to task intent
|
||||
actionIntent = getattr(workflow, '_taskIntent', None)
|
||||
actionIntent = getattr(workflow, '_taskIntent', None) or {}
|
||||
logger.warning("No actionObjective provided, using task intent as fallback")
|
||||
|
||||
# Parse compound action name (e.g., "ai.webResearch" -> method="ai", action="webResearch")
|
||||
|
|
@ -447,27 +519,60 @@ class DynamicMode(BaseMode):
|
|||
# Merge Stage 1 resource selections into Stage 2 parameters (only if action expects them)
|
||||
try:
|
||||
# Use typed documentList from selection (required)
|
||||
# Check both top-level selection and selection['parameters'] (for guided actions)
|
||||
from modules.datamodels.datamodelDocref import DocumentReferenceList
|
||||
docList = selection.get('documentList')
|
||||
|
||||
# If not found at top level, check in selection['parameters'] (guided action case)
|
||||
if not docList and isinstance(selection, dict) and 'parameters' in selection:
|
||||
docListParam = selection['parameters'].get('documentList')
|
||||
if docListParam:
|
||||
# Convert string list back to DocumentReferenceList if needed
|
||||
if isinstance(docListParam, list) and all(isinstance(x, str) for x in docListParam):
|
||||
docList = DocumentReferenceList.from_string_list(docListParam)
|
||||
elif isinstance(docListParam, DocumentReferenceList):
|
||||
docList = docListParam
|
||||
|
||||
if docList and isinstance(docList, DocumentReferenceList):
|
||||
# Only attach if target action defines 'documentList'
|
||||
# Check if action actually has documentList parameter by checking action definition
|
||||
methodName, actionName = compoundActionName.split('.', 1)
|
||||
from modules.workflows.processing.shared.methodDiscovery import getActionParameterList, methods as _methods
|
||||
expectedParams = getActionParameterList(methodName, actionName, _methods)
|
||||
if 'documentList' in expectedParams:
|
||||
# Pass DocumentReferenceList directly
|
||||
parameters['documentList'] = docList
|
||||
from modules.workflows.processing.shared.methodDiscovery import methods as _methods
|
||||
if methodName in _methods:
|
||||
methodInstance = _methods[methodName]['instance']
|
||||
if actionName in methodInstance.actions:
|
||||
action_info = methodInstance.actions[actionName]
|
||||
docstring = action_info.get('description', '')
|
||||
# Extract parameter names from docstring to check if documentList exists
|
||||
paramDescriptions, _ = methodInstance._extractParameterDetails(docstring)
|
||||
if 'documentList' in paramDescriptions:
|
||||
# Convert DocumentReferenceList to string list for database serialization
|
||||
# Action methods will convert it back to DocumentReferenceList when needed
|
||||
parameters['documentList'] = docList.to_string_list()
|
||||
logger.info(f"Added documentList to parameters: {len(docList.references)} references")
|
||||
elif 'documentList' not in parameters and isinstance(selection, dict) and 'parameters' in selection:
|
||||
# Fallback: if documentList is already in selection['parameters'] as a list, preserve it
|
||||
# This handles guided actions where documentList is already in the right format
|
||||
docListParam = selection['parameters'].get('documentList')
|
||||
if docListParam and isinstance(docListParam, list):
|
||||
parameters['documentList'] = docListParam
|
||||
logger.info(f"Preserved documentList from selection parameters: {len(docListParam)} references")
|
||||
|
||||
# Use connectionReference from selection (required)
|
||||
connectionRef = selection.get('connectionReference')
|
||||
if connectionRef:
|
||||
# Only attach if target action defines 'connectionReference'
|
||||
# Check if action actually has connectionReference parameter
|
||||
methodName, actionName = compoundActionName.split('.', 1)
|
||||
from modules.workflows.processing.shared.methodDiscovery import getActionParameterList, methods as _methods
|
||||
expectedParams = getActionParameterList(methodName, actionName, _methods)
|
||||
if 'connectionReference' in expectedParams:
|
||||
parameters['connectionReference'] = connectionRef
|
||||
from modules.workflows.processing.shared.methodDiscovery import methods as _methods
|
||||
if methodName in _methods:
|
||||
methodInstance = _methods[methodName]['instance']
|
||||
if actionName in methodInstance.actions:
|
||||
action_info = methodInstance.actions[actionName]
|
||||
docstring = action_info.get('description', '')
|
||||
# Extract parameter names from docstring to check if connectionReference exists
|
||||
paramDescriptions, _ = methodInstance._extractParameterDetails(docstring)
|
||||
if 'connectionReference' in paramDescriptions:
|
||||
parameters['connectionReference'] = connectionRef
|
||||
logger.info(f"Added connectionReference to parameters: {connectionRef}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Error merging Stage 1 resources into Stage 2 parameters: {e}")
|
||||
pass
|
||||
|
|
@ -650,7 +755,7 @@ class DynamicMode(BaseMode):
|
|||
|
||||
return True # Default to match for unknown types
|
||||
|
||||
def _collectFeedback(self, result: Any, validation: Dict[str, Any], intent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
def _collectFeedback(self, result: Any, validation: Dict[str, Any], taskIntent: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Collects comprehensive feedback from action execution"""
|
||||
try:
|
||||
# Extract content summary
|
||||
|
|
|
|||
|
|
@ -53,44 +53,48 @@ def generateDynamicPlanSelectionPrompt(services, context: Any, learningEngine=No
|
|||
|
||||
template = """Select exactly one next action to advance the task incrementally.
|
||||
|
||||
OVERALL TASK CONTEXT:
|
||||
{{KEY:OVERALL_TASK_CONTEXT}}
|
||||
=== TASK ===
|
||||
CONTEXT: {{KEY:OVERALL_TASK_CONTEXT}}
|
||||
OBJECTIVE: {{KEY:TASK_OBJECTIVE}}
|
||||
|
||||
OBJECTIVE:
|
||||
{{KEY:TASK_OBJECTIVE}}
|
||||
=== AVAILABLE RESOURCES ===
|
||||
DOCUMENTS: {{KEY:AVAILABLE_DOCUMENTS_SUMMARY}}
|
||||
{{KEY:AVAILABLE_DOCUMENTS_INDEX}}
|
||||
CONNECTIONS: {{KEY:AVAILABLE_CONNECTIONS_INDEX}}
|
||||
|
||||
AVAILABLE_DOCUMENTS_SUMMARY:
|
||||
{{KEY:AVAILABLE_DOCUMENTS_SUMMARY}}
|
||||
|
||||
AVAILABLE_METHODS:
|
||||
=== AVAILABLE ACTIONS ===
|
||||
{{KEY:AVAILABLE_METHODS}}
|
||||
|
||||
WORKFLOW_HISTORY (reverse-chronological, enriched):
|
||||
{{KEY:WORKFLOW_HISTORY}}
|
||||
=== CONTEXT ===
|
||||
HISTORY: {{KEY:WORKFLOW_HISTORY}}
|
||||
GUIDANCE: {{KEY:ADAPTIVE_GUIDANCE}}
|
||||
FAILURES: {{KEY:FAILURE_ANALYSIS}}
|
||||
ESCALATION: {{KEY:ESCALATION_LEVEL}}
|
||||
|
||||
AVAILABLE_DOCUMENTS_INDEX:
|
||||
{{KEY:AVAILABLE_DOCUMENTS_INDEX}}
|
||||
=== SELECTION RULE ===
|
||||
1. Read OBJECTIVE and identify what it requires
|
||||
2. Check AVAILABLE_METHODS to find action whose PURPOSE matches that requirement
|
||||
3. Select action that can DO what objective needs - do not select actions that do something different
|
||||
|
||||
AVAILABLE_CONNECTIONS_INDEX:
|
||||
{{KEY:AVAILABLE_CONNECTIONS_INDEX}}
|
||||
=== OUTPUT FORMAT ===
|
||||
Return ONLY JSON (no markdown, no explanations). The chosen action MUST:
|
||||
- Match the objective's requirement (verify action's purpose in AVAILABLE_METHODS)
|
||||
- Be the next logical incremental step (not complete entire objective in one step)
|
||||
- Target exactly one output format if producing files
|
||||
- Use ONLY exact references from AVAILABLE_DOCUMENTS_INDEX (docList:... or docItem:...)
|
||||
- Learn from previous validation feedback and avoid repeated mistakes
|
||||
- Include intent analysis fields (dataType, expectedFormats, qualityRequirements, successCriteria)
|
||||
|
||||
LEARNING-BASED GUIDANCE:
|
||||
{{KEY:ADAPTIVE_GUIDANCE}}
|
||||
|
||||
FAILURE ANALYSIS:
|
||||
{{KEY:FAILURE_ANALYSIS}}
|
||||
|
||||
ESCALATION LEVEL: {{KEY:ESCALATION_LEVEL}}
|
||||
|
||||
REPLY: Return ONLY a JSON object with the following structure (no comments, no extra text). The chosen action MUST:
|
||||
- be the next logical incremental step toward fulfilling the objective
|
||||
- not attempt to complete the entire objective in one step
|
||||
- if producing files, target exactly one output format for this step
|
||||
- reference ONLY existing document IDs/labels from AVAILABLE_DOCUMENTS_INDEX
|
||||
- learn from previous validation feedback and avoid repeated mistakes
|
||||
{{
|
||||
"action": "method.action_name",
|
||||
"actionObjective": "...",
|
||||
"dataType": "numbers|text|documents|analysis|code|unknown",
|
||||
"expectedFormats": ["pdf", "docx", "xlsx", "txt", "json", "csv", "html", "md"],
|
||||
"qualityRequirements": {{
|
||||
"accuracyThreshold": 0.0-1.0,
|
||||
"completenessThreshold": 0.0-1.0
|
||||
}},
|
||||
"successCriteria": ["specific criterion 1", "specific criterion 2"],
|
||||
"userMessage": "User-friendly message in language '{{KEY:USER_LANGUAGE}}' explaining what this action will do (1 sentence, first person, friendly tone)",
|
||||
"learnings": ["..."],
|
||||
"requiredInputDocuments": ["docList:..."],
|
||||
|
|
@ -98,23 +102,23 @@ REPLY: Return ONLY a JSON object with the following structure (no comments, no e
|
|||
"parametersContext": "concise text that Stage 2 will use to set business parameters"
|
||||
}}
|
||||
|
||||
EXAMPLE how to assign references from AVAILABLE_DOCUMENTS_INDEX and AVAILABLE_CONNECTIONS_INDEX:
|
||||
"requiredInputDocuments": ["docList:msg_47a7a578-e8f2-4ba8-ac66-0dbff40605e0:round8_task1_action1_results","docItem:5d8b7aee-b546-4487-b6a8-835c86f7b186:AI_Generated_Document_20251006-104256.docx"],
|
||||
"requiredConnection": "connection:msft:p.motsch@valueon.ch",
|
||||
=== INTENT ANALYSIS ===
|
||||
Analyze actionObjective to determine:
|
||||
- dataType: numbers|text|documents|analysis|code|unknown
|
||||
- expectedFormats: array of format strings
|
||||
- qualityRequirements: {accuracyThreshold: 0.0-1.0, completenessThreshold: 0.0-1.0}
|
||||
- successCriteria: array of specific completion criteria
|
||||
|
||||
RULES:
|
||||
=== RULES ===
|
||||
1. Use EXACT action names from AVAILABLE_METHODS
|
||||
2. Do NOT output a "parameters" object
|
||||
3. parametersContext must be short and sufficient for Stage 2
|
||||
2. Do NOT output "parameters" object
|
||||
3. parametersContext: short, sufficient for Stage 2
|
||||
4. Return ONLY JSON - no markdown, no explanations
|
||||
5. For requiredInputDocuments, use ONLY exact references from AVAILABLE_DOCUMENTS_INDEX (docList:... or docItem:...)
|
||||
- DO NOT invent or modify Message IDs
|
||||
- DO NOT create new references
|
||||
- Copy references EXACTLY as shown in AVAILABLE_DOCUMENTS_INDEX
|
||||
6. For requiredConnection, use ONLY an exact label from AVAILABLE_CONNECTIONS_INDEX
|
||||
7. Plan incrementally: if the overall intent needs multiple output formats (e.g., CSV and HTML), choose one format in this step and leave the other(s) for subsequent steps
|
||||
8. CRITICAL: Learn from previous validation feedback - avoid repeating the same mistakes
|
||||
9. If previous attempts failed, consider alternative approaches or more specific parameters
|
||||
5. requiredInputDocuments: ONLY exact references from AVAILABLE_DOCUMENTS_INDEX (do not invent/modify)
|
||||
6. requiredConnection: ONLY exact label from AVAILABLE_CONNECTIONS_INDEX
|
||||
7. Plan incrementally: one output format per step
|
||||
8. Learn from validation feedback - avoid repeating mistakes
|
||||
9. If previous attempts failed, try alternative approaches
|
||||
"""
|
||||
|
||||
return PromptBundle(prompt=template, placeholders=placeholders)
|
||||
|
|
@ -261,6 +265,10 @@ LEARNINGS (from prior attempts, if any):
|
|||
REQUIRED PARAMETERS FOR THIS ACTION (use these exact parameter names):
|
||||
{{KEY:ACTION_PARAMETERS}}
|
||||
|
||||
COMPLETION CRITERIA:
|
||||
- Describe what "complete" means for this action in natural language
|
||||
- Consider: What should be delivered? What quality level is expected? What format should the output be in?
|
||||
|
||||
INSTRUCTIONS:
|
||||
- Use ONLY the parameter names listed in section REQUIRED PARAMETERS FOR THIS ACTION
|
||||
- Fill in appropriate values based on the OVERALL TASK CONTEXT and THIS ACTION'S SPECIFIC OBJECTIVE
|
||||
|
|
@ -281,47 +289,65 @@ RULES:
|
|||
return PromptBundle(prompt=template, placeholders=placeholders)
|
||||
|
||||
def generateDynamicRefinementPrompt(services, context: Any, reviewContent: str) -> PromptBundle:
|
||||
"""Define placeholders first, then the template; return PromptBundle."""
|
||||
"""Define placeholders first, then the template; return PromptBundle.
|
||||
|
||||
Review is per TASK, not per user prompt. Each task is handled independently.
|
||||
"""
|
||||
# Get task objective - this is what we're reviewing against
|
||||
taskObjective = ""
|
||||
if hasattr(context, 'taskStep') and context.taskStep and getattr(context.taskStep, 'objective', None):
|
||||
taskObjective = context.taskStep.objective
|
||||
else:
|
||||
# Fallback to user prompt if task objective not available
|
||||
taskObjective = extractUserPrompt(context)
|
||||
|
||||
placeholders: List[PromptPlaceholder] = [
|
||||
PromptPlaceholder(label="USER_PROMPT", content=extractUserPrompt(context), summaryAllowed=False),
|
||||
PromptPlaceholder(label="TASK_OBJECTIVE", content=taskObjective, summaryAllowed=False),
|
||||
PromptPlaceholder(label="USER_LANGUAGE", content=extractUserLanguage(services), summaryAllowed=False),
|
||||
PromptPlaceholder(label="REVIEW_CONTENT", content=reviewContent, summaryAllowed=True),
|
||||
PromptPlaceholder(label="AVAILABLE_METHODS", content=extractAvailableMethods(services), summaryAllowed=False),
|
||||
PromptPlaceholder(label="AVAILABLE_DOCUMENTS_INDEX", content=extractAvailableDocumentsIndex(services, context), summaryAllowed=True),
|
||||
]
|
||||
|
||||
template = """TASK DECISION
|
||||
|
||||
OBJECTIVE: '{{KEY:USER_PROMPT}}'
|
||||
=== TASK OBJECTIVE ===
|
||||
{{KEY:TASK_OBJECTIVE}}
|
||||
|
||||
DECISION RULES:
|
||||
1. "continue" = objective NOT fulfilled - MUST specify concrete next action
|
||||
=== DECISION RULES ===
|
||||
1. "continue" = objective NOT fulfilled → MUST specify next action
|
||||
2. "success" = objective fulfilled
|
||||
3. Return ONLY JSON - no other text
|
||||
|
||||
OUTPUT FORMAT (only JSON object to deliver):
|
||||
=== AVAILABLE RESOURCES ===
|
||||
ACTIONS: {{KEY:AVAILABLE_METHODS}}
|
||||
DOCUMENTS: {{KEY:AVAILABLE_DOCUMENTS_INDEX}}
|
||||
|
||||
=== OBSERVATION ===
|
||||
{{KEY:REVIEW_CONTENT}}
|
||||
|
||||
=== OUTPUT FORMAT ===
|
||||
{{
|
||||
"status": "continue",
|
||||
"reason": "Brief reason for decision",
|
||||
"reason": "Brief reason",
|
||||
"nextAction": "ai.convert",
|
||||
"nextActionParameters": {{
|
||||
"fromFormat": "json",
|
||||
"toFormat": "csv",
|
||||
"targetDocument": "document.json"
|
||||
"documentList": ["docItem:..."],
|
||||
"inputFormat": "json",
|
||||
"outputFormat": "csv",
|
||||
"columnsPerRow": 10
|
||||
}},
|
||||
"nextActionObjective": "Convert the generated JSON document to CSV format with 10 columns per row"
|
||||
"nextActionObjective": "Convert JSON to CSV with 10 columns per row"
|
||||
}}
|
||||
|
||||
IMPORTANT RULES FOR NEXT ACTION:
|
||||
- If status is "continue", you MUST provide "nextAction" and "nextActionParameters"
|
||||
- "nextAction" must be a SPECIFIC, EXECUTABLE action (e.g., "ai.convert", "ai.process", "ai.reformat", "ai.generate")
|
||||
- "nextActionParameters" must contain concrete parameters for that action
|
||||
- "nextActionObjective" must describe what this specific action will achieve
|
||||
- DO NOT suggest repeating the same action that already failed - suggest a DIFFERENT approach
|
||||
- Use improvement suggestions from content validation to determine the next action
|
||||
- If format conversion is needed, use "ai.convert" action
|
||||
- If regeneration is needed with different parameters, use "ai.process" with specific format parameters
|
||||
- If reformatting is needed, use "ai.reformat" action
|
||||
|
||||
OBSERVATION: {{KEY:REVIEW_CONTENT}}
|
||||
=== RULES ===
|
||||
- If "continue": MUST provide nextAction and nextActionParameters
|
||||
- nextAction: SPECIFIC action from AVAILABLE_METHODS (do not invent)
|
||||
- nextActionParameters: concrete parameters (check AVAILABLE_METHODS for valid names)
|
||||
- documentList: ONLY exact references from AVAILABLE_DOCUMENTS_INDEX (do not invent)
|
||||
- nextActionObjective: describe what this action will achieve
|
||||
- Do NOT repeat failed actions - suggest DIFFERENT approach
|
||||
- Use improvement suggestions from content validation
|
||||
|
||||
"""
|
||||
|
||||
|
|
|
|||
|
|
@ -20,11 +20,29 @@ def generateTaskPlanningPrompt(services, context: Any) -> PromptBundle:
|
|||
# Extract user language from services
|
||||
userLanguage = getattr(services, 'currentUserLanguage', None) or 'en'
|
||||
|
||||
# Extract workflowIntent from workflow object if available
|
||||
workflowIntent = {}
|
||||
if hasattr(services, 'workflow') and services.workflow:
|
||||
workflowIntent = getattr(services.workflow, '_workflowIntent', {}) or {}
|
||||
|
||||
# Format workflow intent fields for prompt context
|
||||
workflowIntentText = ""
|
||||
if workflowIntent:
|
||||
workflowIntentText = f"""Workflow-level intent (can be overridden by task-specific needs):
|
||||
- Data Type: {workflowIntent.get('dataType', 'unknown')}
|
||||
- Expected Formats: {workflowIntent.get('expectedFormats', [])}
|
||||
- Quality Requirements: {workflowIntent.get('qualityRequirements', {})}
|
||||
- Primary Goal: {workflowIntent.get('primaryGoal', '')}
|
||||
|
||||
Note: Tasks can override these if task-specific needs differ (e.g., workflow wants PDF, but task needs CSV for intermediate step).
|
||||
"""
|
||||
|
||||
placeholders: List[PromptPlaceholder] = [
|
||||
PromptPlaceholder(label="USER_PROMPT", content=extractUserPrompt(context), summaryAllowed=False),
|
||||
PromptPlaceholder(label="AVAILABLE_DOCUMENTS_SUMMARY", content=extractAvailableDocumentsSummary(services, context), summaryAllowed=True),
|
||||
PromptPlaceholder(label="WORKFLOW_HISTORY", content=extractWorkflowHistory(services), summaryAllowed=True),
|
||||
PromptPlaceholder(label="USER_LANGUAGE", content=userLanguage, summaryAllowed=False),
|
||||
PromptPlaceholder(label="WORKFLOW_INTENT", content=workflowIntentText, summaryAllowed=False),
|
||||
]
|
||||
|
||||
template = """# Task Planning
|
||||
|
|
@ -38,6 +56,9 @@ Break down user requests into logical, executable task steps.
|
|||
### User Request
|
||||
{{KEY:USER_PROMPT}}
|
||||
|
||||
### Workflow Intent
|
||||
{{KEY:WORKFLOW_INTENT}}
|
||||
|
||||
### Available Documents
|
||||
{{KEY:AVAILABLE_DOCUMENTS_SUMMARY}}
|
||||
|
||||
|
|
@ -83,12 +104,22 @@ Break down user requests into logical, executable task steps.
|
|||
"successCriteria": ["measurable criteria 1", "measurable criteria 2"],
|
||||
"estimatedComplexity": "low|medium|high",
|
||||
"userMessage": "What this task will accomplish in language '{{KEY:USER_LANGUAGE}}'",
|
||||
"expectedFormats": ["pdf", "docx", "xlsx", "txt", "json", "csv", "html", "md",...]
|
||||
"dataType": "numbers|text|documents|analysis|code|unknown",
|
||||
"expectedFormats": ["pdf", "docx", "xlsx", "txt", "json", "csv", "html", "md"],
|
||||
"qualityRequirements": {{
|
||||
"accuracyThreshold": 0.0-1.0,
|
||||
"completenessThreshold": 0.0-1.0
|
||||
}}
|
||||
}}
|
||||
],
|
||||
}}
|
||||
```
|
||||
|
||||
**Task Intent Fields**:
|
||||
- **dataType**: Inherit from workflow intent if not task-specific, or override if task needs different type
|
||||
- **expectedFormats**: Inherit from workflow intent if not task-specific, or override if task needs different format (e.g., workflow wants PDF, task needs CSV)
|
||||
- **qualityRequirements**: Inherit from workflow intent if not task-specific, or override if task has different quality needs
|
||||
|
||||
## 🎯 Task Structure Guidelines
|
||||
|
||||
### Task ID Format
|
||||
|
|
|
|||
|
|
@ -431,9 +431,9 @@ class WorkflowProcessor:
|
|||
from modules.datamodels.datamodelAi import AiCallOptions
|
||||
|
||||
options = AiCallOptions(
|
||||
operationType=OperationTypeEnum.TEXT,
|
||||
operationType=OperationTypeEnum.DATA_ANALYSE,
|
||||
priority=PriorityEnum.BALANCED,
|
||||
processingMode=ProcessingModeEnum.STANDARD,
|
||||
processingMode=ProcessingModeEnum.BASIC,
|
||||
maxCost=0.10, # Low cost for simple requests
|
||||
maxProcessingTime=15 # Fast path should complete in 15s
|
||||
)
|
||||
|
|
@ -469,8 +469,11 @@ class WorkflowProcessor:
|
|||
return result
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in fastPathExecute: {str(e)}")
|
||||
return ActionResult.isFailure(f"Fast path execution failed: {str(e)}")
|
||||
import traceback
|
||||
errorDetails = f"{type(e).__name__}: {str(e)}"
|
||||
logger.error(f"Error in fastPathExecute: {errorDetails}")
|
||||
logger.debug(f"Fast path error traceback:\n{traceback.format_exc()}")
|
||||
return ActionResult.isFailure(f"Fast path execution failed: {errorDetails}")
|
||||
|
||||
# Workflow-Level Functions
|
||||
|
||||
|
|
@ -606,16 +609,17 @@ class WorkflowProcessor:
|
|||
# Get file info
|
||||
fileInfo = self.services.chat.getFileInfo(fileItem.id)
|
||||
|
||||
# Create ChatDocument
|
||||
chatDoc = ChatDocument(
|
||||
fileId=fileItem.id,
|
||||
fileName=fileInfo.get("fileName", actionDoc.documentName) if fileInfo else actionDoc.documentName,
|
||||
fileSize=fileInfo.get("size", len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))) if fileInfo else (len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))),
|
||||
mimeType=fileInfo.get("mimeType", actionDoc.mimeType) if fileInfo else actionDoc.mimeType,
|
||||
roundNumber=workflow.currentRound,
|
||||
taskNumber=workflow.getTaskIndex(),
|
||||
actionNumber=workflow.getActionIndex()
|
||||
)
|
||||
# Create ChatDocument as dict (messageId will be assigned by createMessage)
|
||||
# Don't create ChatDocument object directly - it requires messageId which doesn't exist yet
|
||||
chatDoc = {
|
||||
"fileId": fileItem.id,
|
||||
"fileName": fileInfo.get("fileName", actionDoc.documentName) if fileInfo else actionDoc.documentName,
|
||||
"fileSize": fileInfo.get("size", len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))) if fileInfo else (len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))),
|
||||
"mimeType": fileInfo.get("mimeType", actionDoc.mimeType) if fileInfo else actionDoc.mimeType,
|
||||
"roundNumber": workflow.currentRound,
|
||||
"taskNumber": workflow.getTaskIndex(),
|
||||
"actionNumber": workflow.getActionIndex()
|
||||
}
|
||||
chatDocuments.append(chatDoc)
|
||||
|
||||
# Create documentsLabel for docList: references
|
||||
|
|
|
|||
|
|
@ -251,16 +251,17 @@ class WorkflowManager:
|
|||
# Get file info
|
||||
fileInfo = self.services.chat.getFileInfo(fileItem.id)
|
||||
|
||||
# Create ChatDocument
|
||||
chatDoc = ChatDocument(
|
||||
fileId=fileItem.id,
|
||||
fileName=fileInfo.get("fileName", actionDoc.documentName) if fileInfo else actionDoc.documentName,
|
||||
fileSize=fileInfo.get("size", len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))) if fileInfo else (len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))),
|
||||
mimeType=fileInfo.get("mimeType", actionDoc.mimeType) if fileInfo else actionDoc.mimeType,
|
||||
roundNumber=workflow.currentRound,
|
||||
taskNumber=0, # Fast path doesn't have tasks
|
||||
actionNumber=0
|
||||
)
|
||||
# Create ChatDocument as dict (messageId will be assigned by createMessage)
|
||||
# Don't create ChatDocument object directly - it requires messageId which doesn't exist yet
|
||||
chatDoc = {
|
||||
"fileId": fileItem.id,
|
||||
"fileName": fileInfo.get("fileName", actionDoc.documentName) if fileInfo else actionDoc.documentName,
|
||||
"fileSize": fileInfo.get("size", len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))) if fileInfo else (len(actionDoc.documentData) if isinstance(actionDoc.documentData, bytes) else len(actionDoc.documentData.encode('utf-8'))),
|
||||
"mimeType": fileInfo.get("mimeType", actionDoc.mimeType) if fileInfo else actionDoc.mimeType,
|
||||
"roundNumber": workflow.currentRound,
|
||||
"taskNumber": 0, # Fast path doesn't have tasks
|
||||
"actionNumber": 0
|
||||
}
|
||||
chatDocuments.append(chatDoc)
|
||||
|
||||
# Create ChatMessage with fast path response (in user's language)
|
||||
|
|
@ -355,7 +356,12 @@ class WorkflowManager:
|
|||
"1) detectedLanguage: detect ISO 639-1 language code (e.g., de, en).\n"
|
||||
"2) normalizedRequest: full, explicit restatement of the user's request in the detected language; do NOT summarize; preserve ALL constraints and details.\n"
|
||||
"3) intent: concise single-paragraph core request in the detected language for high-level routing.\n"
|
||||
"4) contextItems: supportive data blocks to attach as separate documents if significantly larger than the intent (large literal content, long lists/tables, code/JSON blocks, transcripts, CSV fragments, detailed specs). Keep URLs in the intent unless they embed large pasted content.\n\n"
|
||||
"4) contextItems: supportive data blocks to attach as separate documents if significantly larger than the intent (large literal content, long lists/tables, code/JSON blocks, transcripts, CSV fragments, detailed specs). Keep URLs in the intent unless they embed large pasted content.\n"
|
||||
"5) primaryGoal: The main objective the user wants to achieve.\n"
|
||||
"6) dataType: What type of data/content they want (numbers|text|documents|analysis|code|unknown).\n"
|
||||
"7) expectedFormats: What file format(s) they expect - provide matching file format extensions list (e.g., [\"xlsx\", \"pdf\"]). If format is unclear or not specified, use empty list [].\n"
|
||||
"8) qualityRequirements: Quality requirements they have (accuracy, completeness) as {accuracyThreshold: 0.0-1.0, completenessThreshold: 0.0-1.0}.\n"
|
||||
"9) successCriteria: Specific success criteria that define completion (array of strings).\n\n"
|
||||
"Rules:\n"
|
||||
"- If total content (intent + data) is < 10% of model max tokens, do not extract; return empty contextItems and keep intent compact and self-contained.\n"
|
||||
"- If content exceeds that threshold, move bulky parts into contextItems; keep intent short and clear.\n"
|
||||
|
|
@ -372,7 +378,15 @@ class WorkflowManager:
|
|||
" \"mimeType\": \"text/plain\",\n"
|
||||
" \"content\": \"Full extracted content block here\"\n"
|
||||
" }\n"
|
||||
" ]\n"
|
||||
" ],\n"
|
||||
" \"primaryGoal\": \"The main objective the user wants to achieve\",\n"
|
||||
" \"dataType\": \"numbers|text|documents|analysis|code|unknown\",\n"
|
||||
" \"expectedFormats\": [\"pdf\", \"docx\", \"xlsx\", \"txt\", \"json\", \"csv\", \"html\", \"md\"],\n"
|
||||
" \"qualityRequirements\": {\n"
|
||||
" \"accuracyThreshold\": 0.0-1.0,\n"
|
||||
" \"completenessThreshold\": 0.0-1.0\n"
|
||||
" },\n"
|
||||
" \"successCriteria\": [\"specific criterion 1\", \"specific criterion 2\"]\n"
|
||||
"}\n\n"
|
||||
f"User message:\n{self.services.utils.sanitizePromptContent(userInput.prompt, 'userinput')}"
|
||||
)
|
||||
|
|
@ -388,6 +402,7 @@ class WorkflowManager:
|
|||
normalizedRequest = None
|
||||
intentText = userInput.prompt
|
||||
contextItems = []
|
||||
workflowIntent = None
|
||||
|
||||
# Parse analyzer response (JSON expected)
|
||||
try:
|
||||
|
|
@ -400,8 +415,23 @@ class WorkflowManager:
|
|||
if parsed.get('intent'):
|
||||
intentText = parsed.get('intent')
|
||||
contextItems = parsed.get('contextItems') or []
|
||||
|
||||
# Extract intent analysis fields and store as workflowIntent
|
||||
workflowIntent = {
|
||||
'primaryGoal': parsed.get('primaryGoal'),
|
||||
'dataType': parsed.get('dataType', 'unknown'),
|
||||
'expectedFormats': parsed.get('expectedFormats', []),
|
||||
'qualityRequirements': parsed.get('qualityRequirements', {}),
|
||||
'successCriteria': parsed.get('successCriteria', []),
|
||||
'languageUserDetected': detectedLanguage
|
||||
}
|
||||
|
||||
# Store workflowIntent in workflow object for reuse
|
||||
if hasattr(self.services, 'workflow') and self.services.workflow:
|
||||
self.services.workflow._workflowIntent = workflowIntent
|
||||
except Exception:
|
||||
contextItems = []
|
||||
workflowIntent = None
|
||||
|
||||
# Update services state
|
||||
if detectedLanguage and isinstance(detectedLanguage, str):
|
||||
|
|
|
|||
|
|
@ -1,219 +0,0 @@
|
|||
# OpenAI Timeout Analysis: Why AiService Calls Take Much Longer
|
||||
|
||||
## Test Results Summary
|
||||
|
||||
From `test05_openai_timeout.py`:
|
||||
- **Direct Connector**: 7.20s, **1783 characters** (partial response/explanation)
|
||||
- **AiService**: 309.78s, **9034 characters** (complete structured result)
|
||||
|
||||
**Key Finding:** The direct connector returns a simple text response (likely explaining it can't generate all 5000 primes), while AiService delivers the **complete structured JSON result** with all the data properly formatted.
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Direct Connector Call Flow (Fast: ~7s, Partial Result)
|
||||
```
|
||||
User Prompt → OpenAI API → Simple Text Response
|
||||
```
|
||||
**Steps:**
|
||||
1. Create `AiModelCall` with prompt
|
||||
2. Call `connector.callAiBasic(modelCall)`
|
||||
3. HTTP POST to OpenAI API
|
||||
4. Receive response (text explanation or partial data)
|
||||
5. Return content as-is
|
||||
|
||||
**Result:** Simple text response (1783 chars) - likely explains limitations or provides partial data
|
||||
**Total overhead:** Minimal - just HTTP call overhead
|
||||
|
||||
---
|
||||
|
||||
### AiService Call Flow (Slow: ~310s)
|
||||
|
||||
#### Phase 1: Initialization & Prompt Building (~1-2s)
|
||||
```
|
||||
callAiContent()
|
||||
→ _ensureAiObjectsInitialized()
|
||||
→ progressLogStart()
|
||||
→ buildGenerationPrompt() ← EXPENSIVE!
|
||||
```
|
||||
|
||||
**`buildGenerationPrompt()` overhead:**
|
||||
- Loads `jsonTemplateDocument` (large template)
|
||||
- Processes continuation context if needed
|
||||
- Builds complex prompt with instructions, examples, JSON schema
|
||||
- String replacements and formatting
|
||||
- **Result:** Much larger prompt sent to AI (2706 bytes vs ~200 bytes)
|
||||
|
||||
#### Phase 2: AI Looping with Continuation (~300s)
|
||||
```
|
||||
_callAiWithLooping()
|
||||
→ Iteration 1:
|
||||
- Build prompt (if continuation)
|
||||
- Call AI (actual API call: ~70s for complex request)
|
||||
- Write debug file
|
||||
- Store workflow stat
|
||||
- Parse JSON response
|
||||
- Extract sections
|
||||
- Check completion flags
|
||||
→ Iteration 2+ (if needed):
|
||||
- Build continuation prompt
|
||||
- Call AI again
|
||||
- Parse and merge results
|
||||
- ... (up to 50 iterations!)
|
||||
```
|
||||
|
||||
**Key overhead sources:**
|
||||
|
||||
1. **Multiple Iterations** (up to 50 possible!)
|
||||
- Each iteration makes a full AI API call
|
||||
- Continuation logic rebuilds prompts with context
|
||||
- JSON parsing and repair on each iteration
|
||||
- Section extraction and merging
|
||||
|
||||
2. **Prompt Building Overhead**
|
||||
- First iteration: Full prompt with JSON template (~2700 bytes)
|
||||
- Continuation iterations: Rebuild prompt with last 1500 chars of previous response
|
||||
- Template processing and string manipulation
|
||||
|
||||
3. **JSON Processing**
|
||||
- Parse JSON response
|
||||
- Extract sections using `extractSectionsFromDocument()`
|
||||
- Repair broken JSON if needed (`repairBrokenJson()`)
|
||||
- Merge sections across iterations
|
||||
- Build final result structure
|
||||
|
||||
4. **Debug & Logging**
|
||||
- Write debug files for each iteration:
|
||||
- `document_generation_prompt.txt`
|
||||
- `document_generation_response.txt`
|
||||
- `document_generation_broken_json_iteration_X.txt` (if broken)
|
||||
- `document_generation_final_result.txt`
|
||||
- Progress logging updates
|
||||
- Workflow stat storage
|
||||
|
||||
5. **Completion Detection Logic**
|
||||
- Check for `complete_response` flag
|
||||
- Validate JSON completeness
|
||||
- Determine if continuation needed
|
||||
- Complex logic to decide when to stop
|
||||
|
||||
#### Phase 3: Post-Processing (~5-10s)
|
||||
```
|
||||
→ Parse final JSON
|
||||
→ Extract metadata (title, filename)
|
||||
→ Render to output format (if specified)
|
||||
→ Build AiResponse object
|
||||
→ progressLogFinish()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why the Difference? (Not "43x slower" - Different Results!)
|
||||
|
||||
### The Real Comparison
|
||||
|
||||
**Direct Connector:**
|
||||
- Returns: Simple text response (1783 chars)
|
||||
- Content: Likely explains limitations or provides partial/unstructured data
|
||||
- Time: 7.20s
|
||||
- **Use case:** Quick, simple responses
|
||||
|
||||
**AiService:**
|
||||
- Returns: Complete structured JSON result (9034 chars)
|
||||
- Content: Full structured document with proper JSON format, sections, metadata
|
||||
- Time: 309.78s
|
||||
- **Use case:** Production-ready structured output
|
||||
|
||||
### Why AiService Takes Longer (But Delivers More)
|
||||
|
||||
1. **Structured Output Generation**
|
||||
- **Direct:** AI returns whatever it wants (text explanation)
|
||||
- **AiService:** AI must generate structured JSON following a template
|
||||
- **Impact:** Structured generation takes longer but produces usable results
|
||||
|
||||
2. **Complete Result Delivery**
|
||||
- **Direct:** Single response, may be incomplete or truncated
|
||||
- **AiService:** Multiple iterations ensure complete result
|
||||
- **Impact:** Iterations are **necessary** to deliver the full 9034-character structured result
|
||||
|
||||
3. **Quality Assurance**
|
||||
- **Direct:** Raw response, may have errors
|
||||
- **AiService:** Validates JSON, repairs if broken, merges sections
|
||||
- **Impact:** Ensures production-ready output
|
||||
|
||||
### The Iterations Were Necessary!
|
||||
|
||||
The test showed:
|
||||
- **Response Length:** 9034 characters (complete structured result)
|
||||
- **Iterations:** Multiple iterations were needed to generate the full structured JSON
|
||||
- **Result:** Full, usable, structured document
|
||||
|
||||
**Conclusion:** The iterations were NOT unnecessary - they were required to deliver the complete structured result that the direct connector cannot provide.
|
||||
|
||||
---
|
||||
|
||||
## Breakdown of 309.78s for Complex Request
|
||||
|
||||
Based on code analysis, estimated breakdown:
|
||||
|
||||
1. **Initialization & Setup:** ~2s
|
||||
- Service initialization
|
||||
- Progress logging setup
|
||||
- Prompt building (first iteration)
|
||||
|
||||
2. **AI API Calls:** ~280-290s
|
||||
- Multiple iterations (likely 3-4 iterations)
|
||||
- Each iteration: ~70-80s API call
|
||||
- Continuation prompts add overhead
|
||||
|
||||
3. **Processing Per Iteration:** ~15-20s total
|
||||
- JSON parsing: ~1s × iterations
|
||||
- Section extraction: ~1s × iterations
|
||||
- Debug file writing: ~0.5s × iterations
|
||||
- Progress updates: ~0.1s × updates
|
||||
- Workflow stats: ~0.5s × iterations
|
||||
|
||||
4. **Final Processing:** ~5-10s
|
||||
- Final JSON parsing
|
||||
- Metadata extraction
|
||||
- Response building
|
||||
- Progress finish
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### 1. **Timeout Configuration** ✅ DONE
|
||||
- Increased timeout from 120s to 600s (10 minutes)
|
||||
- Provides sufficient headroom for complex requests
|
||||
|
||||
### 2. **Understanding the Trade-off**
|
||||
- **Direct Connector:** Fast but simple/unstructured results
|
||||
- **AiService:** Slower but delivers complete structured results
|
||||
- **The iterations are necessary** to deliver the full structured output
|
||||
- **The overhead is justified** by the quality and completeness of the result
|
||||
|
||||
### 3. **Monitoring**
|
||||
- Add timing metrics for each phase:
|
||||
- Prompt building time
|
||||
- API call time per iteration
|
||||
- Processing time per iteration
|
||||
- Total iterations
|
||||
- This will help identify bottlenecks
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The difference is **not "slower"** - it's **different results**:
|
||||
|
||||
1. **Direct Connector:** Fast (7s) but delivers simple text (1783 chars) - partial/unstructured
|
||||
2. **AiService:** Slower (310s) but delivers complete structured JSON (9034 chars) - full, usable result
|
||||
|
||||
The iterations were **necessary** to deliver the complete structured result. The overhead is **justified** because:
|
||||
- ✅ Delivers **5x more content** (9034 vs 1783 chars)
|
||||
- ✅ Provides **structured, usable output** (JSON with sections, metadata)
|
||||
- ✅ Ensures **completeness** through iterative generation
|
||||
- ✅ Handles **complex requests** that direct connector cannot
|
||||
|
||||
**The 600-second timeout provides sufficient headroom** for even the most complex requests while ensuring complete, structured results.
|
||||
|
||||
466
tests/functional/test06_workflow_prompt_variations.py
Normal file
466
tests/functional/test06_workflow_prompt_variations.py
Normal file
|
|
@ -0,0 +1,466 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Workflow Test with Prompt Variations - Tests different workflow scenarios:
|
||||
1. Simple prompt for short answer (no documents)
|
||||
2. Merge 2 documents and output as Word document
|
||||
3. Structured data output as Excel file
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import time
|
||||
from typing import Dict, Any, List, Optional
|
||||
|
||||
# Add the gateway to path (go up 2 levels from tests/functional/)
|
||||
_gateway_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
|
||||
if _gateway_path not in sys.path:
|
||||
sys.path.insert(0, _gateway_path)
|
||||
|
||||
# Import the service initialization
|
||||
from modules.services import getInterface as getServices
|
||||
from modules.datamodels.datamodelChat import UserInputRequest, WorkflowModeEnum
|
||||
from modules.datamodels.datamodelUam import User
|
||||
from modules.features.chatPlayground.mainChatPlayground import chatStart
|
||||
import modules.interfaces.interfaceDbChatObjects as interfaceDbChatObjects
|
||||
|
||||
|
||||
class WorkflowPromptVariationsTester:
|
||||
def __init__(self):
|
||||
# Use root user for testing (has full access to everything)
|
||||
from modules.interfaces.interfaceDbAppObjects import getRootInterface
|
||||
rootInterface = getRootInterface()
|
||||
self.testUser = rootInterface.currentUser
|
||||
|
||||
# Initialize services using the existing system
|
||||
self.services = getServices(self.testUser, None) # Test user, no workflow
|
||||
self.testResults = {}
|
||||
|
||||
async def initialize(self):
|
||||
"""Initialize the test environment."""
|
||||
# Set logging level to INFO to see workflow progress
|
||||
import logging
|
||||
logging.getLogger().setLevel(logging.INFO)
|
||||
|
||||
print(f"Initialized test with user: {self.testUser.id}")
|
||||
print(f"Mandate ID: {self.testUser.mandateId}")
|
||||
|
||||
def _createFile(self, fileName: str, mimeType: str, content: str) -> str:
|
||||
"""Helper method to create a file and return its ID."""
|
||||
fileItem = self.services.interfaceDbComponent.createFile(
|
||||
name=fileName,
|
||||
mimeType=mimeType,
|
||||
content=content.encode('utf-8')
|
||||
)
|
||||
self.services.interfaceDbComponent.createFileData(fileItem.id, content.encode('utf-8'))
|
||||
return fileItem.id
|
||||
|
||||
async def _startWorkflow(self, prompt: str, fileIds: List[str] = None) -> Any:
|
||||
"""Start a chat workflow with prompt and optional documents."""
|
||||
if fileIds is None:
|
||||
fileIds = []
|
||||
|
||||
print(f"\nPrompt: {prompt}")
|
||||
print(f"Number of files: {len(fileIds)}")
|
||||
if fileIds:
|
||||
print(f"File IDs: {fileIds}")
|
||||
|
||||
# Create UserInputRequest
|
||||
userInput = UserInputRequest(
|
||||
prompt=prompt,
|
||||
listFileId=fileIds,
|
||||
userLanguage="en"
|
||||
)
|
||||
|
||||
# Start workflow (this is async and returns immediately)
|
||||
workflow = await chatStart(
|
||||
currentUser=self.testUser,
|
||||
userInput=userInput,
|
||||
workflowMode=WorkflowModeEnum.WORKFLOW_DYNAMIC,
|
||||
workflowId=None
|
||||
)
|
||||
|
||||
print(f"✅ Workflow started with ID: {workflow.id}")
|
||||
print(f" Status: {workflow.status}")
|
||||
print(f" Mode: {workflow.workflowMode}")
|
||||
|
||||
return workflow
|
||||
|
||||
async def _waitForWorkflowCompletion(self, workflow: Any, maxWaitTime: Optional[int] = None) -> bool:
|
||||
"""Wait for workflow to complete, checking status periodically.
|
||||
|
||||
Args:
|
||||
workflow: The workflow object to wait for
|
||||
maxWaitTime: Maximum wait time in seconds. If None, wait indefinitely.
|
||||
"""
|
||||
if maxWaitTime:
|
||||
print(f"Maximum wait time: {maxWaitTime} seconds")
|
||||
else:
|
||||
print("Waiting indefinitely (no timeout)")
|
||||
|
||||
startTime = time.time()
|
||||
checkInterval = 2 # Check every 2 seconds
|
||||
lastStatus = None
|
||||
|
||||
while True:
|
||||
# Check timeout if maxWaitTime is set
|
||||
if maxWaitTime is not None:
|
||||
elapsed = time.time() - startTime
|
||||
if elapsed >= maxWaitTime:
|
||||
print(f"\n⚠️ Workflow did not complete within {maxWaitTime} seconds")
|
||||
print(f" Final status: {workflow.status}")
|
||||
return False
|
||||
|
||||
# Get current workflow status
|
||||
interfaceDbChat = interfaceDbChatObjects.getInterface(self.testUser)
|
||||
currentWorkflow = interfaceDbChat.getWorkflow(workflow.id)
|
||||
|
||||
if not currentWorkflow:
|
||||
print("❌ Workflow not found in database")
|
||||
return False
|
||||
|
||||
currentStatus = currentWorkflow.status
|
||||
elapsed = int(time.time() - startTime)
|
||||
|
||||
# Print status if it changed
|
||||
if currentStatus != lastStatus:
|
||||
print(f"Workflow status: {currentStatus} (elapsed: {elapsed}s)")
|
||||
lastStatus = currentStatus
|
||||
|
||||
# Check if workflow is complete
|
||||
if currentStatus in ["completed", "stopped", "failed"]:
|
||||
print(f"\n✅ Workflow finished with status: {currentStatus} (elapsed: {elapsed}s)")
|
||||
return currentStatus == "completed"
|
||||
|
||||
# Wait before next check
|
||||
await asyncio.sleep(checkInterval)
|
||||
|
||||
def _analyzeWorkflowResults(self, workflow: Any) -> Dict[str, Any]:
|
||||
"""Analyze workflow results and extract information."""
|
||||
interfaceDbChat = interfaceDbChatObjects.getInterface(self.testUser)
|
||||
workflow = interfaceDbChat.getWorkflow(workflow.id)
|
||||
|
||||
if not workflow:
|
||||
return {"error": "Workflow not found"}
|
||||
|
||||
# Get unified chat data
|
||||
chatData = interfaceDbChat.getUnifiedChatData(workflow.id, None)
|
||||
|
||||
# Extract messages and documents from items
|
||||
items = chatData.get("items", [])
|
||||
messages = []
|
||||
allDocuments = []
|
||||
|
||||
for item in items:
|
||||
if item.get("type") == "message":
|
||||
message = item.get("item")
|
||||
if message:
|
||||
# Convert ChatMessage to dict if needed
|
||||
if hasattr(message, 'dict'):
|
||||
msgDict = message.dict()
|
||||
elif hasattr(message, '__dict__'):
|
||||
msgDict = message.__dict__
|
||||
else:
|
||||
msgDict = message if isinstance(message, dict) else {}
|
||||
|
||||
messages.append(msgDict)
|
||||
|
||||
# Extract documents from message
|
||||
msgDocuments = msgDict.get("documents", [])
|
||||
if msgDocuments:
|
||||
for doc in msgDocuments:
|
||||
# Convert ChatDocument to dict if needed
|
||||
if hasattr(doc, 'dict'):
|
||||
docDict = doc.dict()
|
||||
elif hasattr(doc, '__dict__'):
|
||||
docDict = doc.__dict__
|
||||
else:
|
||||
docDict = doc if isinstance(doc, dict) else {}
|
||||
|
||||
# Only add if not already in list (avoid duplicates)
|
||||
docId = docDict.get("id") or docDict.get("fileId")
|
||||
if docId and not any(d.get("id") == docId or d.get("fileId") == docId for d in allDocuments):
|
||||
allDocuments.append(docDict)
|
||||
|
||||
userMessages = [m for m in messages if m.get("role") == "user"]
|
||||
assistantMessages = [m for m in messages if m.get("role") == "assistant"]
|
||||
|
||||
results = {
|
||||
"workflowId": workflow.id,
|
||||
"status": workflow.status,
|
||||
"workflowMode": str(workflow.workflowMode) if hasattr(workflow, 'workflowMode') else None,
|
||||
"currentRound": workflow.currentRound,
|
||||
"totalTasks": workflow.totalTasks,
|
||||
"totalActions": workflow.totalActions,
|
||||
"messageCount": len(messages),
|
||||
"userMessageCount": len(userMessages),
|
||||
"assistantMessageCount": len(assistantMessages),
|
||||
"documentCount": len(allDocuments),
|
||||
"documents": allDocuments
|
||||
}
|
||||
|
||||
print(f" Workflow ID: {results['workflowId']}")
|
||||
print(f" Status: {results['status']}")
|
||||
print(f" Messages: {results['messageCount']} (User: {results['userMessageCount']}, Assistant: {results['assistantMessageCount']})")
|
||||
print(f" Documents: {results['documentCount']}")
|
||||
|
||||
# Print document names
|
||||
if allDocuments:
|
||||
print(f" Generated documents:")
|
||||
for doc in allDocuments:
|
||||
fileName = doc.get("fileName") or doc.get("documentName") or "unknown"
|
||||
fileSize = doc.get("fileSize") or doc.get("size") or 0
|
||||
print(f" - {fileName} ({fileSize} bytes)")
|
||||
|
||||
return results
|
||||
|
||||
async def testSimplePrompt(self) -> Dict[str, Any]:
|
||||
"""Test 1: Simple prompt for a short answer (no documents)."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 1: SIMPLE PROMPT FOR SHORT ANSWER")
|
||||
print("="*80)
|
||||
|
||||
try:
|
||||
prompt = "What is the capital of France? Answer in one sentence."
|
||||
|
||||
workflow = await self._startWorkflow(prompt, [])
|
||||
completed = await self._waitForWorkflowCompletion(workflow, maxWaitTime=120)
|
||||
results = self._analyzeWorkflowResults(workflow)
|
||||
|
||||
return {
|
||||
"testName": "Simple Prompt",
|
||||
"completed": completed,
|
||||
"results": results
|
||||
}
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f"❌ Test failed: {type(e).__name__}: {str(e)}")
|
||||
return {
|
||||
"testName": "Simple Prompt",
|
||||
"completed": False,
|
||||
"error": str(e),
|
||||
"traceback": traceback.format_exc()
|
||||
}
|
||||
|
||||
async def testMergeDocumentsToWord(self) -> Dict[str, Any]:
|
||||
"""Test 2: Merge 2 documents and output as Word document."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 2: MERGE 2 DOCUMENTS AND OUTPUT AS WORD")
|
||||
print("="*80)
|
||||
|
||||
try:
|
||||
# Create first document
|
||||
doc1Content = """Project Overview
|
||||
|
||||
This document outlines the key objectives for our new software project.
|
||||
The project aims to develop a modern web application with the following features:
|
||||
- User authentication and authorization
|
||||
- Real-time data synchronization
|
||||
- Responsive design for mobile and desktop
|
||||
- Integration with third-party APIs
|
||||
|
||||
Timeline: 6 months
|
||||
Budget: $500,000
|
||||
"""
|
||||
|
||||
# Create second document
|
||||
doc2Content = """Technical Specifications
|
||||
|
||||
Architecture:
|
||||
- Frontend: React with TypeScript
|
||||
- Backend: Python with FastAPI
|
||||
- Database: PostgreSQL
|
||||
- Deployment: Docker containers on AWS
|
||||
|
||||
Key Requirements:
|
||||
- Support for 10,000 concurrent users
|
||||
- 99.9% uptime SLA
|
||||
- End-to-end encryption for sensitive data
|
||||
- Comprehensive logging and monitoring
|
||||
|
||||
Team Size: 8 developers, 2 designers, 1 project manager
|
||||
"""
|
||||
|
||||
print("\nCreating documents to merge...")
|
||||
doc1Id = self._createFile("project_overview.txt", "text/plain", doc1Content)
|
||||
print(f"✅ Created document 1 with ID: {doc1Id}")
|
||||
|
||||
doc2Id = self._createFile("technical_specs.txt", "text/plain", doc2Content)
|
||||
print(f"✅ Created document 2 with ID: {doc2Id}")
|
||||
|
||||
prompt = "Merge these two documents into a single comprehensive Word document. Include both the project overview and technical specifications in a well-formatted document with proper headings and sections."
|
||||
|
||||
workflow = await self._startWorkflow(prompt, [doc1Id, doc2Id])
|
||||
completed = await self._waitForWorkflowCompletion(workflow, maxWaitTime=300)
|
||||
results = self._analyzeWorkflowResults(workflow)
|
||||
|
||||
# Check if Word document was created
|
||||
wordDocFound = False
|
||||
if results.get("documents"):
|
||||
for doc in results["documents"]:
|
||||
fileName = doc.get("fileName", "").lower()
|
||||
if fileName.endswith(".docx") or fileName.endswith(".doc"):
|
||||
wordDocFound = True
|
||||
print(f" ✅ Word document found: {doc.get('fileName')}")
|
||||
|
||||
if not wordDocFound:
|
||||
print(" ⚠️ Warning: No Word document (.docx or .doc) found in results")
|
||||
|
||||
return {
|
||||
"testName": "Merge Documents to Word",
|
||||
"completed": completed,
|
||||
"wordDocumentFound": wordDocFound,
|
||||
"results": results
|
||||
}
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f"❌ Test failed: {type(e).__name__}: {str(e)}")
|
||||
return {
|
||||
"testName": "Merge Documents to Word",
|
||||
"completed": False,
|
||||
"error": str(e),
|
||||
"traceback": traceback.format_exc()
|
||||
}
|
||||
|
||||
async def testStructuredDataToExcel(self) -> Dict[str, Any]:
|
||||
"""Test 3: Structured data output as Excel file."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 3: STRUCTURED DATA OUTPUT AS EXCEL")
|
||||
print("="*80)
|
||||
|
||||
try:
|
||||
# Create structured data as JSON
|
||||
structuredData = {
|
||||
"employees": [
|
||||
{"id": 1, "name": "John Doe", "department": "Engineering", "salary": 95000, "startDate": "2020-01-15"},
|
||||
{"id": 2, "name": "Jane Smith", "department": "Marketing", "salary": 85000, "startDate": "2019-03-20"},
|
||||
{"id": 3, "name": "Bob Johnson", "department": "Engineering", "salary": 100000, "startDate": "2018-06-10"},
|
||||
{"id": 4, "name": "Alice Williams", "department": "HR", "salary": 75000, "startDate": "2021-09-05"},
|
||||
{"id": 5, "name": "Charlie Brown", "department": "Sales", "salary": 80000, "startDate": "2020-11-12"},
|
||||
{"id": 6, "name": "Diana Prince", "department": "Engineering", "salary": 110000, "startDate": "2017-04-22"},
|
||||
{"id": 7, "name": "Edward Norton", "department": "Marketing", "salary": 90000, "startDate": "2019-08-30"},
|
||||
{"id": 8, "name": "Fiona Green", "department": "HR", "salary": 78000, "startDate": "2022-01-18"}
|
||||
],
|
||||
"departments": [
|
||||
{"name": "Engineering", "budget": 500000, "headCount": 3},
|
||||
{"name": "Marketing", "budget": 300000, "headCount": 2},
|
||||
{"name": "HR", "budget": 200000, "headCount": 2},
|
||||
{"name": "Sales", "budget": 250000, "headCount": 1}
|
||||
]
|
||||
}
|
||||
|
||||
jsonContent = json.dumps(structuredData, indent=2)
|
||||
|
||||
print("\nCreating structured data file...")
|
||||
dataFileId = self._createFile("employee_data.json", "application/json", jsonContent)
|
||||
print(f"✅ Created data file with ID: {dataFileId}")
|
||||
|
||||
prompt = "Create an Excel file from this structured data. Include two sheets: one for employees with all their details, and one for departments with summary information. Format the data nicely with proper column headers and make it easy to read."
|
||||
|
||||
workflow = await self._startWorkflow(prompt, [dataFileId])
|
||||
completed = await self._waitForWorkflowCompletion(workflow, maxWaitTime=300)
|
||||
results = self._analyzeWorkflowResults(workflow)
|
||||
|
||||
# Check if Excel document was created
|
||||
excelDocFound = False
|
||||
if results.get("documents"):
|
||||
for doc in results["documents"]:
|
||||
fileName = doc.get("fileName", "").lower()
|
||||
if fileName.endswith(".xlsx") or fileName.endswith(".xls"):
|
||||
excelDocFound = True
|
||||
print(f" ✅ Excel document found: {doc.get('fileName')}")
|
||||
|
||||
if not excelDocFound:
|
||||
print(" ⚠️ Warning: No Excel document (.xlsx or .xls) found in results")
|
||||
|
||||
return {
|
||||
"testName": "Structured Data to Excel",
|
||||
"completed": completed,
|
||||
"excelDocumentFound": excelDocFound,
|
||||
"results": results
|
||||
}
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f"❌ Test failed: {type(e).__name__}: {str(e)}")
|
||||
return {
|
||||
"testName": "Structured Data to Excel",
|
||||
"completed": False,
|
||||
"error": str(e),
|
||||
"traceback": traceback.format_exc()
|
||||
}
|
||||
|
||||
async def runAllTests(self):
|
||||
"""Run all three test cases."""
|
||||
print("\n" + "="*80)
|
||||
print("WORKFLOW PROMPT VARIATIONS TEST SUITE")
|
||||
print("="*80)
|
||||
|
||||
try:
|
||||
# Initialize
|
||||
await self.initialize()
|
||||
|
||||
# Run all tests
|
||||
test1Results = await self.testSimplePrompt()
|
||||
test2Results = await self.testMergeDocumentsToWord()
|
||||
test3Results = await self.testStructuredDataToExcel()
|
||||
|
||||
self.testResults = {
|
||||
"test1": test1Results,
|
||||
"test2": test2Results,
|
||||
"test3": test3Results,
|
||||
"summary": {
|
||||
"totalTests": 3,
|
||||
"passedTests": sum([
|
||||
1 if test1Results.get("completed") else 0,
|
||||
1 if test2Results.get("completed") else 0,
|
||||
1 if test3Results.get("completed") else 0
|
||||
]),
|
||||
"failedTests": sum([
|
||||
1 if not test1Results.get("completed") else 0,
|
||||
1 if not test2Results.get("completed") else 0,
|
||||
1 if not test3Results.get("completed") else 0
|
||||
])
|
||||
}
|
||||
}
|
||||
|
||||
print("\n" + "="*80)
|
||||
print("TEST SUITE SUMMARY")
|
||||
print("="*80)
|
||||
print(f"Test 1 - Simple Prompt: {'✅ PASSED' if test1Results.get('completed') else '❌ FAILED'}")
|
||||
print(f"Test 2 - Merge to Word: {'✅ PASSED' if test2Results.get('completed') else '❌ FAILED'}")
|
||||
if test2Results.get('wordDocumentFound'):
|
||||
print(f" Word document created: ✅")
|
||||
print(f"Test 3 - Data to Excel: {'✅ PASSED' if test3Results.get('completed') else '❌ FAILED'}")
|
||||
if test3Results.get('excelDocumentFound'):
|
||||
print(f" Excel document created: ✅")
|
||||
print(f"\nTotal: {self.testResults['summary']['passedTests']}/{self.testResults['summary']['totalTests']} tests passed")
|
||||
|
||||
return self.testResults
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
print(f"\n❌ Test suite failed with error: {type(e).__name__}: {str(e)}")
|
||||
print(f"Traceback:\n{traceback.format_exc()}")
|
||||
self.testResults = {
|
||||
"error": str(e),
|
||||
"traceback": traceback.format_exc()
|
||||
}
|
||||
return self.testResults
|
||||
|
||||
|
||||
async def main():
|
||||
"""Run workflow prompt variations test suite."""
|
||||
tester = WorkflowPromptVariationsTester()
|
||||
results = await tester.runAllTests()
|
||||
|
||||
# Print final results as JSON for easy parsing
|
||||
print("\n" + "="*80)
|
||||
print("FINAL RESULTS (JSON)")
|
||||
print("="*80)
|
||||
print(json.dumps(results, indent=2, default=str))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
||||
517
tests/functional/test07_json_extraction.py
Normal file
517
tests/functional/test07_json_extraction.py
Normal file
|
|
@ -0,0 +1,517 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test JSON Extraction from Incomplete/Broken JSON
|
||||
Tests the extraction of lastItemObject and cutItemObject from incomplete JSON responses
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
import shutil
|
||||
from typing import Dict, Any, List
|
||||
|
||||
# Add the gateway to path
|
||||
_gateway_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
|
||||
if _gateway_path not in sys.path:
|
||||
sys.path.insert(0, _gateway_path)
|
||||
|
||||
from modules.shared.jsonUtils import buildContinuationContext, extractSectionsFromDocument
|
||||
from modules.shared.debugLogger import _getBaseDebugDir
|
||||
|
||||
|
||||
class JsonExtractionTester:
|
||||
def __init__(self):
|
||||
self.testResults = {}
|
||||
|
||||
def cleanupDebugFiles(self):
|
||||
"""Delete debug folder and current log file before test run."""
|
||||
try:
|
||||
# Get debug directory path
|
||||
debug_dir = _getBaseDebugDir()
|
||||
|
||||
# Delete debug folder if it exists
|
||||
if os.path.exists(debug_dir):
|
||||
print(f"Cleaning up debug folder: {debug_dir}")
|
||||
shutil.rmtree(debug_dir)
|
||||
print(f" [OK] Debug folder deleted")
|
||||
|
||||
# Also check for log file in the log directory
|
||||
from modules.shared.debugLogger import _resolveLogDir
|
||||
log_dir = _resolveLogDir()
|
||||
log_file = os.path.join(log_dir, "debug_workflow.log")
|
||||
if os.path.exists(log_file):
|
||||
print(f"Cleaning up log file: {log_file}")
|
||||
os.remove(log_file)
|
||||
print(f" [OK] Log file deleted")
|
||||
|
||||
except Exception as e:
|
||||
print(f" [WARN] Error during cleanup: {e}")
|
||||
|
||||
def createIncompleteTableJson(self) -> tuple[str, str]:
|
||||
"""Create incomplete JSON with table that ends mid-row."""
|
||||
complete_json = """{
|
||||
"metadata": {
|
||||
"split_strategy": "single_document",
|
||||
"source_documents": [],
|
||||
"extraction_method": "ai_generation"
|
||||
},
|
||||
"documents": [
|
||||
{
|
||||
"id": "doc_1",
|
||||
"title": "First 4000 Prime Numbers",
|
||||
"filename": "prime_numbers_4000.csv",
|
||||
"sections": [
|
||||
{
|
||||
"id": "section_primes_csv",
|
||||
"content_type": "table",
|
||||
"elements": [
|
||||
{
|
||||
"headers": [],
|
||||
"rows": [
|
||||
["2", "3", "5", "7", "11", "13", "17", "19", "23", "29"],
|
||||
["31", "37", "41", "43", "47", "53", "59", "61", "67", "71"],
|
||||
["73", "79", "83", "89", "97", "101", "103", "107", "109", "113"],
|
||||
["16871", "16879", "16883", "16889", "16901", "16903", "16921", "16927", "16931", "16937"]
|
||||
],
|
||||
"caption": ""
|
||||
}
|
||||
],
|
||||
"order": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}"""
|
||||
|
||||
# Incomplete JSON - cuts off mid-row (CRITICAL: must not end with } or ])
|
||||
# Remove all closing brackets and add incomplete row
|
||||
incomplete_json = complete_json.rstrip().rstrip('}').rstrip(']').rstrip('}').rstrip(']').rstrip('}') + ',\n ["16943", "16963", "16979", "16981", "16987", "16'
|
||||
|
||||
return complete_json, incomplete_json
|
||||
|
||||
def createIncompleteCodeBlockJson(self) -> tuple[str, str]:
|
||||
"""Create incomplete JSON with code_block that ends mid-line."""
|
||||
complete_json = """{
|
||||
"metadata": {
|
||||
"split_strategy": "single_document",
|
||||
"source_documents": [],
|
||||
"extraction_method": "ai_generation"
|
||||
},
|
||||
"documents": [
|
||||
{
|
||||
"id": "doc_1",
|
||||
"title": "Prime Numbers CSV",
|
||||
"filename": "prime_numbers.csv",
|
||||
"sections": [
|
||||
{
|
||||
"id": "section_primes_csv",
|
||||
"content_type": "code_block",
|
||||
"elements": [
|
||||
{
|
||||
"code": "2,3,5,7,11,13,17,19,23,29\\n31,37,41,43,47,53,59,61,67,71\\n73,79,83,89,97,101,103,107,109,113\\n127,131,137,139,149,151,157,163,167,173\\n23773,23789,23801,23813,23819,23827,23831,23833,23857,23869",
|
||||
"language": "csv"
|
||||
}
|
||||
],
|
||||
"order": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}"""
|
||||
|
||||
# Incomplete JSON - cuts off mid-line (CRITICAL: must not end with } or ])
|
||||
# Remove all closing brackets and add incomplete line
|
||||
incomplete_json = complete_json.rstrip().rstrip('}').rstrip(']').rstrip('}').rstrip(']').rstrip('}') + '\\n23873'
|
||||
|
||||
return complete_json, incomplete_json
|
||||
|
||||
def createIncompleteListJson(self) -> tuple[str, str]:
|
||||
"""Create incomplete JSON with list that ends mid-item."""
|
||||
complete_json = """{
|
||||
"metadata": {
|
||||
"split_strategy": "single_document",
|
||||
"source_documents": [],
|
||||
"extraction_method": "ai_generation"
|
||||
},
|
||||
"documents": [
|
||||
{
|
||||
"id": "doc_1",
|
||||
"title": "Prime Numbers List",
|
||||
"filename": "prime_numbers.txt",
|
||||
"sections": [
|
||||
{
|
||||
"id": "section_primes_list",
|
||||
"content_type": "bullet_list",
|
||||
"elements": [
|
||||
{
|
||||
"items": ["2", "3", "5", "7", "11", "13", "17", "19", "23", "29"]
|
||||
}
|
||||
],
|
||||
"order": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}"""
|
||||
|
||||
# Incomplete JSON - cuts off mid-item (CRITICAL: must not end with } or ])
|
||||
# Remove all closing brackets and add incomplete item
|
||||
incomplete_json = complete_json.rstrip().rstrip('}').rstrip(']').rstrip('}').rstrip(']').rstrip('}') + ',\n "31"'
|
||||
|
||||
return complete_json, incomplete_json
|
||||
|
||||
def testTableExtraction(self):
|
||||
"""Test extraction from incomplete table JSON."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 1: Table Extraction (incomplete row)")
|
||||
print("="*80)
|
||||
|
||||
complete_json, incomplete_json = self.createIncompleteTableJson()
|
||||
|
||||
# Parse complete JSON to get allSections
|
||||
complete_obj = json.loads(complete_json)
|
||||
allSections = extractSectionsFromDocument(complete_obj)
|
||||
|
||||
print(f"Complete JSON sections: {len(allSections)}")
|
||||
print(f"Last section content_type: {allSections[0].get('content_type') if allSections else 'None'}")
|
||||
|
||||
# Debug: Check what extractFirstBalancedJson returns
|
||||
from modules.shared.jsonUtils import extractFirstBalancedJson, stripCodeFences
|
||||
raw_json = stripCodeFences(incomplete_json.strip())
|
||||
balanced_json = extractFirstBalancedJson(raw_json)
|
||||
balanced_length = len(balanced_json)
|
||||
cut_part = raw_json[balanced_length:].strip()
|
||||
print(f"\nDebug Info:")
|
||||
print(f" raw_json length: {len(raw_json)}")
|
||||
print(f" balanced_json length: {balanced_length}")
|
||||
print(f" cut_part length: {len(cut_part)}")
|
||||
print(f" cut_part content: {repr(cut_part[:200]) if cut_part else '(empty)'}")
|
||||
|
||||
# Build continuation context
|
||||
continuationContext = buildContinuationContext(allSections, incomplete_json)
|
||||
|
||||
print(f"\nExtraction Results:")
|
||||
print(f" content_type_for_items: {continuationContext.get('content_type_for_items')}")
|
||||
print(f" last_item_object: {continuationContext.get('last_item_object')}")
|
||||
print(f" cut_item_object: {continuationContext.get('cut_item_object')}")
|
||||
print(f" total_items_count: {continuationContext.get('total_items_count')}")
|
||||
|
||||
# Validate results
|
||||
lastItem = continuationContext.get('last_item_object')
|
||||
cutItem = continuationContext.get('cut_item_object')
|
||||
contentType = continuationContext.get('content_type_for_items')
|
||||
|
||||
success = True
|
||||
if contentType != "table":
|
||||
print(f" [FAIL] Expected content_type 'table', got '{contentType}'")
|
||||
success = False
|
||||
if not lastItem:
|
||||
print(f" [FAIL] last_item_object is empty")
|
||||
success = False
|
||||
if not cutItem:
|
||||
print(f" [FAIL] cut_item_object is empty")
|
||||
success = False
|
||||
|
||||
if success:
|
||||
print(f" [PASS] All extractions successful")
|
||||
|
||||
self.testResults['table'] = success
|
||||
return success
|
||||
|
||||
def testCodeBlockExtraction(self):
|
||||
"""Test extraction from incomplete code_block JSON."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 2: Code Block Extraction (incomplete line)")
|
||||
print("="*80)
|
||||
|
||||
complete_json, incomplete_json = self.createIncompleteCodeBlockJson()
|
||||
|
||||
# Parse complete JSON to get allSections
|
||||
complete_obj = json.loads(complete_json)
|
||||
allSections = extractSectionsFromDocument(complete_obj)
|
||||
|
||||
print(f"Complete JSON sections: {len(allSections)}")
|
||||
print(f"Last section content_type: {allSections[0].get('content_type') if allSections else 'None'}")
|
||||
|
||||
# Debug: Check what extractFirstBalancedJson returns
|
||||
from modules.shared.jsonUtils import extractFirstBalancedJson, stripCodeFences
|
||||
raw_json = stripCodeFences(incomplete_json.strip())
|
||||
balanced_json = extractFirstBalancedJson(raw_json)
|
||||
balanced_length = len(balanced_json)
|
||||
cut_part = raw_json[balanced_length:].strip()
|
||||
print(f"\nDebug Info:")
|
||||
print(f" raw_json length: {len(raw_json)}")
|
||||
print(f" balanced_json length: {balanced_length}")
|
||||
print(f" cut_part length: {len(cut_part)}")
|
||||
print(f" cut_part content: {repr(cut_part[:200]) if cut_part else '(empty)'}")
|
||||
|
||||
# Build continuation context
|
||||
continuationContext = buildContinuationContext(allSections, incomplete_json)
|
||||
|
||||
print(f"\nExtraction Results:")
|
||||
print(f" content_type_for_items: {continuationContext.get('content_type_for_items')}")
|
||||
print(f" last_item_object: {continuationContext.get('last_item_object')}")
|
||||
print(f" cut_item_object: {continuationContext.get('cut_item_object')}")
|
||||
print(f" total_items_count: {continuationContext.get('total_items_count')}")
|
||||
|
||||
# Validate results
|
||||
lastItem = continuationContext.get('last_item_object')
|
||||
cutItem = continuationContext.get('cut_item_object')
|
||||
contentType = continuationContext.get('content_type_for_items')
|
||||
|
||||
success = True
|
||||
if contentType != "code_block":
|
||||
print(f" [FAIL] Expected content_type 'code_block', got '{contentType}'")
|
||||
success = False
|
||||
if not lastItem:
|
||||
print(f" [FAIL] last_item_object is empty")
|
||||
success = False
|
||||
if not cutItem:
|
||||
print(f" [FAIL] cut_item_object is empty")
|
||||
success = False
|
||||
|
||||
if success:
|
||||
print(f" [PASS] All extractions successful")
|
||||
|
||||
self.testResults['code_block'] = success
|
||||
return success
|
||||
|
||||
def testListExtraction(self):
|
||||
"""Test extraction from incomplete list JSON."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 3: List Extraction (incomplete item)")
|
||||
print("="*80)
|
||||
|
||||
complete_json, incomplete_json = self.createIncompleteListJson()
|
||||
|
||||
# Parse complete JSON to get allSections
|
||||
complete_obj = json.loads(complete_json)
|
||||
allSections = extractSectionsFromDocument(complete_obj)
|
||||
|
||||
print(f"Complete JSON sections: {len(allSections)}")
|
||||
print(f"Last section content_type: {allSections[0].get('content_type') if allSections else 'None'}")
|
||||
|
||||
# Debug: Check what extractFirstBalancedJson returns
|
||||
from modules.shared.jsonUtils import extractFirstBalancedJson, stripCodeFences
|
||||
raw_json = stripCodeFences(incomplete_json.strip())
|
||||
balanced_json = extractFirstBalancedJson(raw_json)
|
||||
balanced_length = len(balanced_json)
|
||||
cut_part = raw_json[balanced_length:].strip()
|
||||
print(f"\nDebug Info:")
|
||||
print(f" raw_json length: {len(raw_json)}")
|
||||
print(f" balanced_json length: {balanced_length}")
|
||||
print(f" cut_part length: {len(cut_part)}")
|
||||
print(f" cut_part content: {repr(cut_part[:200]) if cut_part else '(empty)'}")
|
||||
|
||||
# Build continuation context
|
||||
continuationContext = buildContinuationContext(allSections, incomplete_json)
|
||||
|
||||
print(f"\nExtraction Results:")
|
||||
print(f" content_type_for_items: {continuationContext.get('content_type_for_items')}")
|
||||
print(f" last_item_object: {continuationContext.get('last_item_object')}")
|
||||
print(f" cut_item_object: {continuationContext.get('cut_item_object')}")
|
||||
print(f" total_items_count: {continuationContext.get('total_items_count')}")
|
||||
|
||||
# Validate results
|
||||
lastItem = continuationContext.get('last_item_object')
|
||||
cutItem = continuationContext.get('cut_item_object')
|
||||
contentType = continuationContext.get('content_type_for_items')
|
||||
|
||||
success = True
|
||||
if contentType not in ["bullet_list", "numbered_list"]:
|
||||
print(f" [FAIL] Expected content_type 'bullet_list' or 'numbered_list', got '{contentType}'")
|
||||
success = False
|
||||
if not lastItem:
|
||||
print(f" [FAIL] last_item_object is empty")
|
||||
success = False
|
||||
if not cutItem:
|
||||
print(f" [FAIL] cut_item_object is empty")
|
||||
success = False
|
||||
|
||||
if success:
|
||||
print(f" [PASS] All extractions successful")
|
||||
|
||||
self.testResults['list'] = success
|
||||
return success
|
||||
|
||||
def createRealWorldTableJson(self) -> tuple[str, str]:
|
||||
"""Create real-world incomplete JSON based on actual prompt pattern - table with many rows."""
|
||||
# Last complete row (exactly as in real scenario)
|
||||
last_complete_row = ["16871", "16879", "16883", "16889", "16901", "16903", "16921", "16927", "16931", "16937"]
|
||||
|
||||
complete_json = f"""{{
|
||||
"metadata": {{
|
||||
"split_strategy": "single_document",
|
||||
"source_documents": [],
|
||||
"extraction_method": "ai_generation"
|
||||
}},
|
||||
"documents": [
|
||||
{{
|
||||
"id": "doc_1",
|
||||
"title": "First 4000 Prime Numbers",
|
||||
"filename": "prime_numbers_4000.csv",
|
||||
"sections": [
|
||||
{{
|
||||
"id": "section_primes_csv",
|
||||
"content_type": "table",
|
||||
"elements": [
|
||||
{{
|
||||
"headers": [],
|
||||
"rows": [
|
||||
["2", "3", "5", "7", "11", "13", "17", "19", "23", "29"],
|
||||
["31", "37", "41", "43", "47", "53", "59", "61", "67", "71"],
|
||||
{json.dumps(last_complete_row)}
|
||||
],
|
||||
"caption": ""
|
||||
}}
|
||||
],
|
||||
"order": 0
|
||||
}}
|
||||
]
|
||||
}}
|
||||
]
|
||||
}}"""
|
||||
|
||||
# Incomplete JSON - cuts off mid-row (exactly like real scenario)
|
||||
# CRITICAL: Must not end with } or ] to be detected as incomplete
|
||||
# Find the position where rows array ends and add incomplete row before closing
|
||||
rows_end_pos = complete_json.rfind(']')
|
||||
if rows_end_pos != -1:
|
||||
# Insert incomplete row before the closing bracket, remove all closing brackets after
|
||||
incomplete_json = complete_json[:rows_end_pos] + ',\n ["16943", "16963", "16979", "16981", "16987", "16'
|
||||
else:
|
||||
# Fallback: remove all closing brackets and append
|
||||
incomplete_json = complete_json.rstrip().rstrip('}').rstrip(']').rstrip('}').rstrip(']').rstrip('}') + ',\n ["16943", "16963", "16979", "16981", "16987", "16'
|
||||
|
||||
return complete_json, incomplete_json
|
||||
|
||||
def testRealWorldTableExtraction(self):
|
||||
"""Test extraction from real-world incomplete table JSON (like from actual prompt)."""
|
||||
print("\n" + "="*80)
|
||||
print("TEST 4: Real-World Table Extraction (400 rows scenario, incomplete row)")
|
||||
print("="*80)
|
||||
|
||||
complete_json, incomplete_json = self.createRealWorldTableJson()
|
||||
|
||||
# Parse complete JSON to get allSections
|
||||
complete_obj = json.loads(complete_json)
|
||||
allSections = extractSectionsFromDocument(complete_obj)
|
||||
|
||||
print(f"Complete JSON sections: {len(allSections)}")
|
||||
if allSections:
|
||||
print(f"Last section content_type: {allSections[0].get('content_type')}")
|
||||
elements = allSections[0].get('elements', [])
|
||||
if elements and isinstance(elements[0], dict) and 'rows' in elements[0]:
|
||||
rows = elements[0].get('rows', [])
|
||||
print(f"Total rows in complete JSON: {len(rows)}")
|
||||
if rows:
|
||||
print(f"Last complete row: {rows[-1]}")
|
||||
|
||||
# Test _extractSectionsRegex with incomplete JSON
|
||||
from modules.shared.jsonUtils import _extractSectionsRegex, repairBrokenJson
|
||||
print(f"\nTesting _extractSectionsRegex with incomplete JSON...")
|
||||
extracted_sections = _extractSectionsRegex(incomplete_json)
|
||||
print(f"Extracted sections: {len(extracted_sections)}")
|
||||
if extracted_sections:
|
||||
print(f"Extracted section content_type: {extracted_sections[0].get('content_type')}")
|
||||
|
||||
# Test repairBrokenJson
|
||||
print(f"\nTesting repairBrokenJson...")
|
||||
repaired_json = repairBrokenJson(incomplete_json)
|
||||
if repaired_json:
|
||||
print(f"Repaired JSON successful")
|
||||
repaired_sections = extractSectionsFromDocument(repaired_json)
|
||||
print(f"Repaired sections: {len(repaired_sections)}")
|
||||
else:
|
||||
print(f"Repair failed")
|
||||
|
||||
# Debug: Check what extractFirstBalancedJson returns
|
||||
from modules.shared.jsonUtils import extractFirstBalancedJson, stripCodeFences
|
||||
raw_json = stripCodeFences(incomplete_json.strip())
|
||||
balanced_json = extractFirstBalancedJson(raw_json)
|
||||
balanced_length = len(balanced_json)
|
||||
cut_part = raw_json[balanced_length:].strip()
|
||||
print(f"\nDebug Info:")
|
||||
print(f" raw_json length: {len(raw_json)}")
|
||||
print(f" balanced_json length: {balanced_length}")
|
||||
print(f" cut_part length: {len(cut_part)}")
|
||||
print(f" cut_part content: {repr(cut_part[:200]) if cut_part else '(empty)'}")
|
||||
|
||||
# Build continuation context
|
||||
continuationContext = buildContinuationContext(allSections, incomplete_json)
|
||||
|
||||
print(f"\nExtraction Results:")
|
||||
print(f" content_type_for_items: {continuationContext.get('content_type_for_items')}")
|
||||
print(f" last_item_object: {continuationContext.get('last_item_object')}")
|
||||
print(f" cut_item_object: {continuationContext.get('cut_item_object')}")
|
||||
print(f" total_items_count: {continuationContext.get('total_items_count')}")
|
||||
|
||||
# Validate results
|
||||
lastItem = continuationContext.get('last_item_object')
|
||||
cutItem = continuationContext.get('cut_item_object')
|
||||
contentType = continuationContext.get('content_type_for_items')
|
||||
|
||||
success = True
|
||||
if contentType != "table":
|
||||
print(f" [FAIL] Expected content_type 'table', got '{contentType}'")
|
||||
success = False
|
||||
if not lastItem:
|
||||
print(f" [FAIL] last_item_object is empty")
|
||||
success = False
|
||||
if not cutItem:
|
||||
print(f" [FAIL] cut_item_object is empty")
|
||||
success = False
|
||||
|
||||
if success:
|
||||
print(f" [PASS] All extractions successful")
|
||||
print(f" Last complete row: {lastItem}")
|
||||
print(f" Cut row: {cutItem}")
|
||||
|
||||
self.testResults['real_world_table'] = success
|
||||
return success
|
||||
|
||||
def runAllTests(self):
|
||||
"""Run all extraction tests."""
|
||||
print("\n" + "="*80)
|
||||
print("JSON EXTRACTION TESTS")
|
||||
print("Testing extraction of lastItemObject and cutItemObject from incomplete JSON")
|
||||
print("="*80)
|
||||
|
||||
# Clean up debug folder and log file before starting tests
|
||||
print("\nCleaning up debug files...")
|
||||
self.cleanupDebugFiles()
|
||||
print("")
|
||||
|
||||
results = []
|
||||
results.append(self.testTableExtraction())
|
||||
results.append(self.testCodeBlockExtraction())
|
||||
results.append(self.testListExtraction())
|
||||
results.append(self.testRealWorldTableExtraction())
|
||||
|
||||
# Summary
|
||||
print("\n" + "="*80)
|
||||
print("TEST SUMMARY")
|
||||
print("="*80)
|
||||
print(f"Table extraction: {'[PASS]' if self.testResults.get('table') else '[FAIL]'}")
|
||||
print(f"Code block extraction: {'[PASS]' if self.testResults.get('code_block') else '[FAIL]'}")
|
||||
print(f"List extraction: {'[PASS]' if self.testResults.get('list') else '[FAIL]'}")
|
||||
print(f"Real-world table extraction: {'[PASS]' if self.testResults.get('real_world_table') else '[FAIL]'}")
|
||||
|
||||
allPassed = all(results)
|
||||
print(f"\nOverall: {'[PASS] ALL TESTS PASSED' if allPassed else '[FAIL] SOME TESTS FAILED'}")
|
||||
|
||||
return allPassed
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main test execution."""
|
||||
tester = JsonExtractionTester()
|
||||
success = tester.runAllTests()
|
||||
return 0 if success else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit_code = asyncio.run(main())
|
||||
sys.exit(exit_code)
|
||||
|
||||
Loading…
Reference in a new issue