From 5e27184cb1847be12cefe0bb2ed371ea48bd834d Mon Sep 17 00:00:00 2001 From: ValueOn AG Date: Fri, 2 Jan 2026 23:03:34 +0100 Subject: [PATCH] enhanced the ai flow for languages (prompt in de, user language en, to deliver documents in fr) and document delivery in different formats --- .../implementation_content_handling_done.md | 1766 +++++++++++++++++ .../implementation_taskintentions_done.md | 1591 +++++++++++++++ 2 files changed, 3357 insertions(+) create mode 100644 implementation/implementation_content_handling_done.md create mode 100644 implementation/implementation_taskintentions_done.md diff --git a/implementation/implementation_content_handling_done.md b/implementation/implementation_content_handling_done.md new file mode 100644 index 0000000..6620f11 --- /dev/null +++ b/implementation/implementation_content_handling_done.md @@ -0,0 +1,1766 @@ +# Implementation Plan: Content Handling Architecture Migration + +## Overview + +This document provides a detailed implementation plan for migrating to the target architecture for content extraction and document generation. The plan focuses on: + +- **Documents and Content Handling**: Intelligent merging of `documentList` and `contentParts` with deduplication +- **Output Document Formats**: Per-document format determination (not global) - AI determines formats from user prompt, multiple documents can have different formats +- **Languages Handling**: Per-document language determination (not global) - uses validated `currentUserLanguage` infrastructure +- **Clear Handover States**: Defined validation at each phase boundary using existing infrastructure +- **Structure Filling**: Two prompt types (with content vs. without content) + +## Verified Infrastructure (Ready to Use) + +The following infrastructure already exists and can be reused: + +- ✅ **Language Validation**: `currentUserLanguage` is validated at `workflowManager.py:695-727` - always valid 2-character ISO code (validates AI response, falls back to user language, then "en"). Safe to use via `self.services.currentUserLanguage` or `_getUserLanguage()` method. + +- ✅ **Format Validation**: Renderer registry exists at `mainServiceGeneration.py:529` (`_getFormatRenderer()` uses `getRenderer()`). Can be imported: `from modules.services.serviceGeneration.renderers.registry import getRenderer`. Returns None if format invalid, falls back to text renderer. + +- ✅ **Language Extraction**: `_getDocumentLanguage()` works correctly at `subStructureFilling.py:349` - extracts per-document language from structure. Used properly during section generation. + +## Context + +This implementation plan is based on the analysis documented in: +- `gateway/modules/services/serviceAi/CONTENT_EXTRACTION_ANALYSIS.md` (Section 9.3: Target State) + +The target architecture addresses architectural issues identified in the current implementation: +1. **Single extraction path** in AI service (no duplication in `ai.process`) +2. **Intelligent merging** of `contentParts` and `documentList` with deduplication +3. **Clear separation** of concerns: action layer delegates to service layer +4. **Consistent behavior** across all code paths +5. **Per-document format/language** determination (not global) + +--- + +## 1. Overview: Major Phases and Handover States + +### Phase Flow Diagram + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ PHASE 1: Document Intent Clarification │ +│ ────────────────────────────────────────────────────────────────── │ +│ INPUT: │ +│ - userPrompt: str (fenced) │ +│ - documentList: DocumentReferenceList (optional) │ +│ - contentParts: List[ContentPart] (optional) │ +│ - actionParameters: Dict (outputFormat, language, etc.) │ +│ │ +│ THROUGHPUT: │ +│ 1. Resolve documents from documentList │ +│ 2. Identify pre-extracted JSON documents │ +│ - Check if JSON contains ContentExtracted structure │ +│ - Map pre-extracted JSONs to original documents │ +│ 3. Filter out original documents covered by pre-extracted │ +│ 4. AI analyzes document purposes │ +│ 5. Map intents back to JSON doc IDs (if applicable) │ +│ │ +│ OUTPUT: │ +│ - documentIntents: List[DocumentIntent] │ +│ * documentId: str │ +│ * intents: List[str] (["extract", "render", "reference"]) │ +│ * extractionPrompt: str (optional) │ +│ * reasoning: str │ +│ Note: outputFormat and language are NOT determined here - │ +│ they're determined in Phase 3 (Structure Generation) │ +│ │ +│ HANDOVER STATE: │ +│ - documentIntents: Complete intent analysis │ +│ - documents: Resolved ChatDocuments │ +│ - preExtractedMapping: Map[originalDocId, jsonDocId] │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ PHASE 2: Content Extraction and Preparation │ +│ ────────────────────────────────────────────────────────────────── │ +│ INPUT: │ +│ - documents: List[ChatDocument] │ +│ - documentIntents: List[DocumentIntent] │ +│ - contentParts: List[ContentPart] (optional, pre-extracted) │ +│ - preExtractedMapping: Map[originalDocId, jsonDocId] │ +│ │ +│ THROUGHPUT: │ +│ 1. Process pre-extracted JSON documents → ContentParts │ +│ - Extract ContentParts from JSON (not treat as regular JSON) │ +│ - Apply intents (extract, render, reference) │ +│ - Mark with isPreExtracted=True │ +│ 2. RAW extraction (NO AI) for regular documents │ +│ - Extract content using extraction service │ +│ - Create ContentParts with metadata │ +│ 3. Merge all ContentParts │ +│ - Pre-extracted parts (from JSON documents) │ +│ - Extracted parts (from regular documents) │ +│ - Provided parts (from contentParts parameter) │ +│ 4. Apply intents to ContentParts (extract, render, reference) │ +│ 5. Mark images for Vision AI extraction (deferred) │ +│ │ +│ OUTPUT: │ +│ - finalContentParts: List[ContentPart] │ +│ * id: str │ +│ * typeGroup: str │ +│ * mimeType: str │ +│ * data: Union[str, bytes] │ +│ * metadata: Dict │ +│ - documentId: str │ +│ - contentFormat: str ("extracted", "object", "reference") │ +│ - intent: str │ +│ - needsVisionExtraction: bool (for images) │ +│ - extractionPrompt: str (for Vision AI) │ +│ - originalFileName: str │ +│ - isPreExtracted: bool │ +│ Note: outputFormat and language are NOT propagated here - │ +│ they're determined in Phase 3 (Structure Generation) │ +│ │ +│ HANDOVER STATE: │ +│ - finalContentParts: Complete merged list │ +│ - All pre-extracted JSON documents processed → ContentParts │ +│ - All regular documents extracted → ContentParts │ +│ - All provided contentParts merged │ +│ - All documents processed (extracted or pre-extracted) │ +│ - Vision AI extraction deferred to Phase 4 │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ PHASE 3: Structure Generation │ +│ ────────────────────────────────────────────────────────────────── │ +│ INPUT: │ +│ - userPrompt: str │ +│ - finalContentParts: List[ContentPart] │ +│ - outputFormat: Optional[str] (optional fallback, defaults to "txt") │ +│ - currentUserLanguage: str (always valid, validated during user intention analysis) │ +│ * From: self.services.currentUserLanguage (always valid, validated during user intention analysis) │ +│ │ +│ THROUGHPUT: │ +│ 1. Group ContentParts by documentId (for context) │ +│ 2. AI generates structure with documents and chapters │ +│ 3. AI determines per-document outputFormat in structure JSON │ +│ from user prompt → else optional outputFormat fallback (or "txt") │ +│ 4. AI determines per-document language in structure JSON │ +│ from user prompt → else validated currentUserLanguage (always valid) │ +│ 5. Assign ContentParts to chapters │ +│ │ +│ OUTPUT: │ +│ - chapterStructure: Dict │ +│ * documents: List[Dict] │ +│ - id: str │ +│ - title: str │ +│ - outputFormat: str (per-document) ← NEW │ +│ - language: str (per-document) ← NEW │ +│ - chapters: List[Dict] │ +│ * id: str │ +│ * level: int │ +│ * title: str │ +│ * generationHint: str │ +│ * contentParts: List[str] (ContentPart IDs) │ +│ │ +│ HANDOVER STATE: │ +│ - chapterStructure: Complete structure with ContentPart │ +│ assignments │ +│ - Per-document format/language determined │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ PHASE 4: Structure Filling │ +│ ────────────────────────────────────────────────────────────────── │ +│ INPUT: │ +│ - chapterStructure: Dict (with per-document language from Phase 3)│ +│ - finalContentParts: List[ContentPart] │ +│ - userPrompt: str │ +│ │ +│ THROUGHPUT: │ +│ For each document (with per-document language): │ +│ For each chapter: │ +│ 1. Generate sections structure (parallel) │ +│ 2. For each section: │ +│ a. Extract per-document language from structure │ +│ b. Check if ContentParts need Vision AI extraction │ +│ c. If yes: Call Vision AI (Phase 2 deferred extraction) │ +│ d. Determine prompt type: │ +│ - WITH CONTENT: If contentParts assigned │ +│ → Use aggregation prompt (isAggregation=True) │ +│ → ContentParts passed as parameters │ +│ → Use per-document language for generation │ +│ - WITHOUT CONTENT: If no contentParts │ +│ → Use generation prompt (isAggregation=False) │ +│ → Only generationHint in prompt │ +│ → Use per-document language for generation │ +│ e. Generate section content with AI │ +│ │ +│ OUTPUT: │ +│ - filledStructure: Dict │ +│ * documents: List[Dict] │ +│ - language: str (preserved from input structure, per-document)│ +│ - chapters: List[Dict] │ +│ * sections: List[Dict] │ +│ - id: str │ +│ - content_type: str │ +│ - elements: List[Dict] │ +│ * type: str │ +│ * content: str (or base64 for images) │ +│ │ +│ HANDOVER STATE: │ +│ - filledStructure: Complete content, ready for rendering │ +│ - Per-document language preserved from structure │ +│ - All Vision AI extractions completed │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ PHASE 5: Document Rendering │ +│ ────────────────────────────────────────────────────────────────── │ +│ INPUT: │ +│ - filledStructure: Dict │ +│ - per-document outputFormat (from Phase 3, determined from prompt) │ +│ - per-document language (from Phase 3, validated currentUserLanguage) │ +│ │ +│ THROUGHPUT: │ +│ 1. Group sections by document (from structure) │ +│ 2. For each document: │ +│ a. Use per-document outputFormat │ +│ b. Use per-document language │ +│ c. Render document in specified format │ +│ │ +│ OUTPUT: │ +│ - renderedDocuments: List[DocumentData] │ +│ * documentName: str │ +│ * documentData: bytes │ +│ * mimeType: str │ +│ │ +│ HANDOVER STATE: │ +│ - renderedDocuments: Final output ready for user │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 2. Detailed Implementation Steps + +### Step 1: Update DocumentIntent Model + +**File**: `gateway/modules/datamodels/datamodelExtraction.py` + +**Changes**: +```python +class DocumentIntent(BaseModel): + documentId: str + intents: List[str] # ["extract", "render", "reference"] + extractionPrompt: Optional[str] = None + # Note: outputFormat and language are NOT here - determined during + # structure generation (Phase 3) in the chapter structure JSON + reasoning: str +``` + +**Rationale**: +- Intent clarification focuses on document purpose (extract, render, reference) +- Output format and language are determined later during structure generation (Phase 3) +- Structure generation has full context (user prompt, ContentParts, chapters) to determine format/language + +--- + +### Step 2: Update Intent Analysis Prompt + +**File**: `gateway/modules/services/serviceAi/subDocumentIntents.py` + +**Changes**: + +1. **Add fencing around userPrompt** (Security Fix): +```python +def _buildIntentAnalysisPrompt( + self, + userPrompt: str, + documents: List[ChatDocument], + actionParameters: Dict[str, Any] +) -> str: + # FENCE user input to prevent prompt injection + fencedUserPrompt = f"""```user_request +{userPrompt} +```""" + + prompt = f"""USER REQUEST: +{fencedUserPrompt} + +DOCUMENTS TO ANALYZE: +{docListText} + +TASK: For each document, determine: +1. Intents (can be multiple): "extract", "render", "reference" +Note: Output format and language are NOT determined here - they will be + determined during structure generation (Phase 3) in the chapter structure JSON + +OUTPUT FORMAT: {outputFormat} (global fallback - for reference only) + +RETURN JSON: +{{ + "intents": [ + {{ + "documentId": "doc_1", + "intents": ["extract"], + "extractionPrompt": "Extract all text content", + // Note: outputFormat and language are NOT here - determined during + // structure generation in the chapter structure JSON + "reasoning": "..." + }} + ] +}} +""" +``` + +2. **Remove global outputFormat from prompt** (keep as fallback): + - Output format should be determined per document based on intent + - Global format remains as fallback if not specified per document + +--- + +### Step 3: Update ContentPart Metadata Propagation + +**File**: `gateway/modules/services/serviceAi/subContentExtraction.py` + +**Changes**: +```python +async def extractAndPrepareContent( + self, + documents: List[ChatDocument], + documentIntents: List[DocumentIntent], + parentOperationId: str, + getIntentForDocument: callable +) -> List[ContentPart]: + # ... existing extraction logic ... + + # Note: outputFormat and language are NOT propagated here - they're determined + # during structure generation (Phase 3) in the chapter structure JSON + # ContentParts are created with intent information only +``` + +**Rationale**: +- ContentParts carry intent and extraction information only +- Output format and language are determined during structure generation (Phase 3) +- Structure generation has full context to make format/language decisions + +--- + +### Step 4: Update Structure Generation + +**File**: `gateway/modules/services/serviceAi/subStructureGeneration.py` + +#### Global Format Source Chain + +**Note**: `outputFormat` parameter is **optional**. If omitted, formats are determined from user prompt by AI. + +**If outputFormat provided**: +1. Action parameters: `action_parameters.get("outputFormat")` or `action_parameters.get("resultType")` +2. Passed to `callAiContent(outputFormat=...)` → `generateStructure(outputFormat=...)` as parameter +3. Used as fallback in State 3 validation if AI doesn't return format per document +4. Final fallback: "txt" if global format is also missing/invalid + +**If outputFormat omitted**: +1. AI determines formats per document from user prompt +2. Validation fallback: "txt" (if AI doesn't return format per document) + +**Rationale**: With per-document format determination, AI can determine different formats for different documents based on user prompt. The `outputFormat` parameter is primarily a fallback for validation, not a requirement. + +#### Language Source Chain + +**Note**: `currentUserLanguage` is always valid (validated during user intention analysis). + +1. AI determines per-document language in structure JSON response +2. If AI doesn't return language: Use validated `currentUserLanguage` (always valid, validated during user intention analysis) +3. `currentUserLanguage` validation ensures: + - AI response `detectedLanguage` is validated (2-character ISO code) + - If AI didn't return language or invalid → uses user language (`self.services.user.language`) + - If user language not set → uses "en" + - Always safe to use directly without fallback logic + +**Changes**: + +1. **Make outputFormat optional in generateStructure method signature**: +```python +async def generateStructure( + self, + userPrompt: str, + contentParts: List[ContentPart], + outputFormat: Optional[str] = None, # ← Optional: if omitted, formats determined from prompt by AI + parentOperationId: str +) -> Dict[str, Any]: + """ + Generate document structure with per-document format determination. + + Multiple documents can be produced with different formats (e.g., one PDF, one HTML). + AI determines formats per-document from user prompt. The outputFormat parameter is + only a validation fallback - used if AI doesn't return format per document. + + Args: + outputFormat: Optional global format fallback. If omitted, formats are determined + from user prompt by AI. Used as validation fallback if AI doesn't + return format per document. Defaults to "txt" if not provided. + """ + # If outputFormat not provided, use "txt" as fallback for validation + # AI will determine formats per document from user prompt + if not outputFormat: + outputFormat = "txt" + logger.debug("outputFormat not provided - using 'txt' as validation fallback, formats determined from prompt") + + # Group ContentParts by documentId (for context in prompt) + partsByDocument = {} + for part in contentParts: + docId = part.metadata.get("documentId", "default") + if docId not in partsByDocument: + partsByDocument[docId] = [] + partsByDocument[docId].append(part) + + # AI determines per-document format and language in structure JSON response + # Pass global fallback for AI to use if not specified per document + prompt = self._buildChapterStructurePrompt( + userPrompt=userPrompt, + contentParts=contentParts, + outputFormat=outputFormat # Fallback for validation (AI determines formats from prompt) + ) +``` + +**Note**: +- `outputFormat` is **optional**. If omitted, formats are determined from user prompt by AI. +- Used as validation fallback if AI doesn't return format per document. +- User prompt language comes from `self.services.currentUserLanguage` which is validated during user intention analysis (`workflowManager._sendFirstMessage()`). The validation ensures: + - AI response `detectedLanguage` is validated (2-character ISO code) + - If AI didn't return language or invalid → uses user language (`self.services.user.language`) + - If user language not set → uses "en" + - `currentUserLanguage` is always valid and safe to use directly without fallback logic + +2. **Update prompt to clarify format determination from prompt**: +```python +def _buildChapterStructurePrompt( + self, + userPrompt: str, + contentParts: List[ContentPart], + outputFormat: str # Global fallback (for validation only) +) -> str: + # Get language from services (validated currentUserLanguage infrastructure) + language = self._getUserLanguage() # Uses self.services.currentUserLanguage (always valid) + + # ... existing prompt building ... + + prompt += f""" +## OUTPUT FORMAT (per document) +- Each document can have its own output format (pdf, docx, html, etc.) +- **Determine the format for each document from the USER REQUEST above** +- Multiple documents can have different formats (e.g., one PDF, one HTML) +- Analyze user prompt to identify format requirements: + * Explicit format mentions (e.g., "as PDF", "in Excel", "HTML document") + * Document purpose (e.g., "spreadsheet" → xlsx, "presentation" → pptx) + * Content type requirements +- If format cannot be determined from prompt, use fallback: "{outputFormat}" (for validation only) +- Include "outputFormat" field in each document in the JSON structure +- **CRITICAL**: Formats are determined from user prompt, not from the fallback value + +## DOCUMENT LANGUAGE (per document) +- Each document can have its own language (ISO 639-1 code: "de", "en", "fr", etc.) +- Determine the language for each document based on: + * User prompt language/context + * Document content context + * User's explicit language requirements +- If not specified, use validated currentUserLanguage: "{language}" (always valid, validated during user intention analysis) +- Include "language" field in each document in the JSON structure + +EXAMPLE JSON STRUCTURE: +{{ + "documents": [ + {{ + "id": "doc_1", + "title": "Document Title", + "outputFormat": "pdf", // ← Determined by AI from user prompt + "language": "de", // ← Determined by AI from user prompt + "chapters": [...] + }}, + {{ + "id": "doc_2", + "title": "Another Document", + "outputFormat": "html", // ← Different format for different document + "language": "en", // ← Different language for different document + "chapters": [...] + }} + ] +}} +""" +``` + +--- + +### Step 5: Update Structure Filling - Two Prompt Types + +**File**: `gateway/modules/services/serviceAi/subStructureFilling.py` + +**Changes**: + +1. **Ensure two prompt types are used** (already implemented, verify): +```python +async def _fillSingleSection( + self, + section: Dict[str, Any], + contentParts: List[ContentPart], + userPrompt: str, + generationHint: str, + document: Dict[str, Any], # ← NEW: Need document to get per-document language + # ... other params ... +) -> List[Dict[str, Any]]: + # Extract per-document language from structure + # Language MUST be defined in structure (validated in State 3) + # If missing, this is an error - should not happen after State 3 validation + if "language" not in document: + raise ValueError(f"Document {document.get('id')} missing 'language' field - should have been set in Phase 3 validation") + + docLanguage = document["language"] + + # Validate language format (should be 2-character ISO code) + if not isinstance(docLanguage, str) or len(docLanguage) != 2: + raise ValueError(f"Document {document.get('id')} has invalid language format: {docLanguage} - should be 2-character ISO 639-1 code") + + contentPartIds = section.get("contentPartIds", []) + hasContentParts = len(contentPartIds) > 0 + + if hasContentParts: + # PROMPT TYPE 1: WITH CONTENT (Aggregation) + # ContentParts passed as parameters, not in prompt text + isAggregation = True + relevantParts = [p for p in contentParts if p.id in contentPartIds] + + generationPrompt = self._buildSectionGenerationPrompt( + section=section, + contentParts=relevantParts, # Passed as parameters + userPrompt=userPrompt, + generationHint=generationHint, + isAggregation=True, # ← Key flag + language=docLanguage # ← Per-document language from structure + ) + else: + # PROMPT TYPE 2: WITHOUT CONTENT (Generation) + # Only generationHint in prompt, no ContentParts + isAggregation = False + + generationPrompt = self._buildSectionGenerationPrompt( + section=section, + contentParts=[], # Empty + userPrompt=userPrompt, + generationHint=generationHint, + isAggregation=False, # ← Key flag + language=docLanguage # ← Per-document language from structure + ) +``` + +**Note**: Language comes from the document in the structure (per-document), not a global parameter. Each document can have its own language as determined in Phase 3. The language MUST be defined and validated in Phase 3 (State 3 validation) - if missing here, it's an error. + +2. **Verify `_buildSectionGenerationPrompt` handles both cases**: +```python +def _buildSectionGenerationPrompt( + self, + section: Dict[str, Any], + contentParts: List[ContentPart], + userPrompt: str, + generationHint: str, + isAggregation: bool, # ← Determines prompt type + language: str +) -> str: + if isAggregation: + # TYPE 1: WITH CONTENT + # ContentParts are passed as parameters to AI call + # Don't include full content in prompt text (token efficiency) + prompt = f"""Generate content for section based on provided ContentParts. + +Section: {sectionTitle} +Generation Hint: {generationHint} +Language: {language} + +ContentParts are provided as parameters (not shown in prompt for efficiency). +Use the ContentParts data to generate the section content. +""" + else: + # TYPE 2: WITHOUT CONTENT + # Only generationHint, no ContentParts + prompt = f"""Generate content for section based on generation hint. + +Section: {sectionTitle} +Generation Hint: {generationHint} +Language: {language} + +Generate content based on the generation hint without referencing external content. +""" +``` + +**Rationale**: +- **Type 1 (with content)**: Efficient for large content (ContentParts as parameters) +- **Type 2 (without content)**: Simple generation based on hint only +- Already implemented via `isAggregation` flag, verify it's used correctly + +--- + +### Step 6: Update Document Rendering + +**File**: `gateway/modules/services/serviceAi/mainServiceAi.py` (renderResult method) +**File**: `gateway/modules/services/serviceGeneration/mainServiceGeneration.py` (renderReport method) + +**Current Implementation**: +- `renderResult()` calls `generationService.renderReport()` +- `renderReport()` already processes each document separately (line 385) +- Currently checks `doc.get("format", outputFormat)` (line 397) - but should check `outputFormat` field +- Language is not handled per-document + +**Changes**: + +1. **Update renderResult to pass language (from structure, validated before rendering)**: +```python +async def renderResult( + self, + filledStructure: Dict[str, Any], + outputFormat: str, # Global fallback + language: str, # ← NEW: Add language parameter (global fallback) + title: str, + userPrompt: str, + parentOperationId: str +) -> List[RenderedDocument]: + """ + Render filled structure to documents. + + Per-document format and language are extracted from structure (validated in State 3). + The outputFormat and language parameters are only used as global fallbacks. + Multiple documents can have different formats and languages. + """ + # Language comes from structure (per-document), validated in State 3 + # This parameter is only used as global fallback if structure validation fails + # Use validated currentUserLanguage as fallback (always valid) + if not language: + language = self._getUserLanguage() # Uses validated currentUserLanguage infrastructure + + # ... existing code ... + + renderedDocuments = await generationService.renderReport( + filledStructure, + outputFormat, + language, # ← Pass language (global fallback, per-document extracted in renderReport) + title, + userPrompt, + self, + parentOperationId=renderOperationId + ) +``` + +**Note**: +- Language comes from structure (per-document) as determined in Phase 3 +- The `language` parameter here is only used as a global fallback +- Per-document language is validated in State 3 (Structure Generation) and extracted from structure in `renderReport()` +- Uses validated `currentUserLanguage` infrastructure if fallback needed + +2. **Update renderReport to handle per-document format and language**: +```python +async def renderReport( + self, + extractedContent: Dict[str, Any], + outputFormat: str, # Global fallback + language: str, # ← NEW: Add language parameter (global fallback) + title: str, + userPrompt: str = None, + aiService=None, + parentOperationId: Optional[str] = None +) -> List[RenderedDocument]: + # ... existing validation ... + + # Process EACH document separately + for docIndex, doc in enumerate(documents): + # ... existing validation ... + + # Determine format for this document + # Check outputFormat field first (per-document), then format field (legacy), then global fallback + docFormat = doc.get("outputFormat") or doc.get("format") or outputFormat + + # Determine language for this document + # Extract per-document language from structure (validated in State 3), fallback to global + docLanguage = doc.get("language") or language + + # Validate language format (should be 2-character ISO code, validated in State 3) + if not isinstance(docLanguage, str) or len(docLanguage) != 2: + logger.warning(f"Document {doc.get('id')} has invalid language format: {docLanguage}, using fallback") + docLanguage = language # Use global fallback + + # Get renderer for this document's format (uses existing renderer registry) + renderer = self._getFormatRenderer(docFormat) + if not renderer: + logger.warning(f"Unsupported format '{docFormat}' for document {doc.get('id', docIndex)}, skipping") + continue + + # Create JSON structure with single document (preserving metadata) + singleDocContent = { + "metadata": {**metadata, "language": docLanguage}, # ← Add per-document language to metadata + "documents": [doc] + } + + # Render this document (can return multiple files, e.g., HTML + images) + renderedDocs = await renderer.render(singleDocContent, docTitle, userPrompt, aiService) + allRenderedDocuments.extend(renderedDocs) +``` + +**Note**: +- Per-document format and language are extracted from structure (validated in State 3) +- Renderers (`RendererPdf`, `RendererHtml`, etc.) receive the structure with language in metadata +- They can use it for language-specific formatting if needed +- Multiple documents can have different formats and languages + +--- + +### Step 7: Update ai.process to Pass documentList and Make outputFormat Optional + +**File**: `gateway/modules/workflows/methods/methodAi/actions/process.py` + +**Changes**: +```python +# Phase 7.3: Pass both documentList and contentParts to AI service +# (Remove extraction logic from here - handled by AI service) + +# resultType is optional - if omitted, formats determined from prompt by AI +# Default "txt" is validation fallback only +resultType = parameters.get("resultType") # Optional: if None, formats determined from prompt +if resultType: + normalized_result_type = (str(resultType).strip().lstrip('.').lower() or "txt") + output_format = output_extension.replace('.', '') or 'txt' +else: + # No format specified - AI will determine formats from prompt + output_format = None + logger.debug("resultType not provided - formats will be determined from prompt by AI") + +# Use unified callAiContent method with BOTH parameters +aiResponse = await self.services.ai.callAiContent( + prompt=aiPrompt, + options=options, + documentList=documentList, # ← PASS documentList (was missing) + contentParts=contentParts, # ← PASS contentParts + outputFormat=output_format, # ← Optional: if None, formats determined from prompt + parentOperationId=operationId, + generationIntent=generationIntent +) +``` + +**Note**: +- `resultType` parameter is **optional**. If omitted, formats are determined from user prompt by AI. +- Default "txt" (if provided) is used as validation fallback only. +- Language detection from user prompt is already done and validated. `self.services.currentUserLanguage` is always valid (validated during user intention analysis in `workflowManager._sendFirstMessage()`). + + + +--- + +## 3. Handover State Definitions and Validation + +**Purpose**: These state definitions document the expected structure and validation rules at each phase boundary. + +**Implementation Approach**: +- **Inline validation** in each phase method +- **Auto-fix** where possible (use defaults, skip invalid items) +- **Stop with error** for critical structural issues +- **Log warnings** for skipped items + +**See**: Appendix "Validation Failure Handling Decisions" below for detailed Q&A on each validation + +**Summary of Validation Decisions**: +- **State 1**: Skip intents for unknown documents; documents without intents are OK +- **State 2**: Skip ContentParts with missing/invalid metadata (with warnings) +- **State 3**: Auto-fix format/language with fallbacks; error on missing structure fields +- **State 4**: Auto-fix missing elements field; allow empty elements +- **State 5**: Skip empty documents; infer mimeType from filename + +### State 1: After Intent Clarification + +**Location**: `gateway/modules/services/serviceAi/subDocumentIntents.py` - After `clarifyDocumentIntents()` returns (line 115) + +**Expected State**: +```python +documentIntents: List[DocumentIntent] # Complete intent analysis +documents: List[ChatDocument] # Resolved documents +preExtractedMapping: Dict[str, str] # Map[originalDocId, jsonDocId] +``` + +**Implementation Code** (add after line 115, before return): +```python +# Validation and auto-fix +documentIds = {d.id for d in documents} +validatedIntents = [] + +for intent in documentIntents: + # Validation 1.2: Skip intents for unknown documents + if intent.documentId not in documentIds: + logger.warning(f"Skipping intent for unknown document: {intent.documentId}") + continue + validatedIntents.append(intent) + +# Validation 1.1: Documents without intents are OK (not needed) +# Intents for non-existing documents are already filtered above +documentIntents = validatedIntents +``` + +### State 2: After Content Extraction + +**Location**: `gateway/modules/services/serviceAi/subContentExtraction.py` - After `extractAndPrepareContent()` returns (at end of method, before return) + +**Expected State**: +```python +finalContentParts: List[ContentPart] # All content parts ready +``` + +**Implementation Code** (add at end of method, before return): +```python +# Validation and auto-fix +validatedParts = [] +for part in finalContentParts: + # Validation 2.1: Skip ContentParts without documentId + if not part.metadata.get("documentId"): + logger.warning(f"Skipping ContentPart {part.id} - missing documentId in metadata") + continue + + # Validation 2.2: Skip ContentParts with invalid contentFormat + contentFormat = part.metadata.get("contentFormat") + if contentFormat not in ["extracted", "object", "reference"]: + logger.warning( + f"Skipping ContentPart {part.id} - invalid contentFormat: {contentFormat}" + ) + continue + + validatedParts.append(part) + +return validatedParts +``` + +### State 3: After Structure Generation + +**Location**: `gateway/modules/services/serviceAi/subStructureGeneration.py` - After `generateStructure()` returns (after parsing JSON, before return, around line 182) + +**Expected State**: +```python +chapterStructure: Dict[str, Any] # Complete structure with documents, chapters, outputFormat, language +``` + +**Implementation Code** (add after structure JSON is parsed, before return): +```python +# After structure JSON is parsed (around line 182) +# Validation and auto-fix + +# Validation 3.1: Structure missing 'documents' field +if "documents" not in structure: + raise ValueError("Structure missing 'documents' field - cannot auto-fix") + +documents = structure["documents"] + +# Validation 3.2: Structure has no documents +if not isinstance(documents, list) or len(documents) == 0: + raise ValueError("Structure has no documents - cannot generate without documents") + +# Import renderer registry for format validation (existing infrastructure) +from modules.services.serviceGeneration.renderers.registry import getRenderer + +# Validate and fix each document +for doc in documents: + # Validation 3.3 & 3.4: Document outputFormat + # outputFormat parameter is optional - if omitted, formats determined from prompt by AI + # Use as fallback only if AI doesn't return format per document + # Multiple documents can have different formats (e.g., one PDF, one HTML) + globalFormatFallback = outputFormat or "txt" # Fallback for validation + + if "outputFormat" not in doc or not doc["outputFormat"]: + # AI didn't return format or returned empty - use global fallback + doc["outputFormat"] = globalFormatFallback + logger.info(f"Document {doc.get('id')} missing outputFormat - using fallback: {doc['outputFormat']}") + else: + # AI returned format - validate using existing renderer registry + formatName = str(doc["outputFormat"]).lower().strip() + renderer = getRenderer(formatName) # Uses existing infrastructure + + if not renderer: + # Format doesn't match any renderer - use txt (simple approach) + logger.warning(f"Document {doc.get('id')} has format without renderer: {formatName}, using 'txt'") + doc["outputFormat"] = "txt" + else: + # Valid format with renderer - normalize and keep AI result + doc["outputFormat"] = formatName + logger.debug(f"Document {doc.get('id')} using AI-determined format: {formatName}") + + # Validation 3.5 & 3.6: Document language + # Use validated currentUserLanguage (always valid, validated during user intention analysis) + # Access via _getUserLanguage() which uses self.services.currentUserLanguage + userPromptLanguage = self._getUserLanguage() # Uses validated currentUserLanguage infrastructure + + if "language" not in doc or not isinstance(doc["language"], str) or len(doc["language"]) != 2: + # AI didn't return language or invalid format - use validated currentUserLanguage + doc["language"] = userPromptLanguage + if "language" not in doc: + logger.info(f"Document {doc.get('id')} missing language - using currentUserLanguage: {doc['language']}") + else: + logger.warning(f"Document {doc.get('id')} has invalid language format from AI: {doc['language']}, using currentUserLanguage") + else: + # AI returned valid language format - normalize + doc["language"] = doc["language"].lower().strip()[:2] + logger.debug(f"Document {doc.get('id')} using AI-determined language: {doc['language']}") + + # Validation 3.7: Document missing 'chapters' field + if "chapters" not in doc: + raise ValueError(f"Document {doc.get('id')} missing 'chapters' field - cannot auto-fix") + + # Validation 3.8: Chapter missing 'contentParts' field + for chapter in doc["chapters"]: + if "contentParts" not in chapter: + raise ValueError(f"Chapter {chapter.get('id')} missing 'contentParts' field - cannot auto-fix") + +return structure +``` + +### State 4: After Structure Filling + +**Location**: `gateway/modules/services/serviceAi/subStructureFilling.py` - After `fillStructure()` returns (at end of method, before return, around line 204) + +**Expected State**: +```python +filledStructure: Dict[str, Any] # Complete content with elements +``` + +**Implementation Code** (add at end of method, before return): +```python +# Validation and auto-fix + +# Validation 4.1: Filled structure missing 'documents' field +if "documents" not in filledStructure: + raise ValueError("Filled structure missing 'documents' field - cannot auto-fix") + +for doc in filledStructure["documents"]: + # Validation 4.4: Verify language is preserved from input structure + # Language MUST be preserved from Phase 3 structure (validated in State 3) + if "language" not in doc: + raise ValueError(f"Document {doc.get('id')} missing language in filled structure - should have been preserved from Phase 3") + + # Validate language format + if not isinstance(doc["language"], str) or len(doc["language"]) != 2: + raise ValueError(f"Document {doc.get('id')} has invalid language format in filled structure: {doc['language']} - should be 2-character ISO 639-1 code") + + for chapter in doc.get("chapters", []): + for section in chapter.get("sections", []): + # Validation 4.2: Section missing 'elements' field + if "elements" not in section: + section["elements"] = [] + logger.info(f"Section {section.get('id')} missing 'elements' - created empty list") + + # Validation 4.3: Section has empty elements list - ALLOW (intentionally empty is OK) + # No action needed - empty elements are allowed + +return filledStructure +``` + +### State 5: After Document Rendering + +**Location**: `gateway/modules/services/serviceGeneration/paths/documentPath.py` - After `renderResult()` returns (line 151, after line 157, before building documentDataList) + +**Expected State**: +```python +renderedDocuments: List[RenderedDocument] # Final output +``` + +**Implementation Code** (add after line 157, before building documentDataList): +```python +# Validation 5.1: Already implemented at line 175-176 +if not renderedDocuments: + raise ValueError("No documents were rendered") + +# Validation 5.2 & 5.3: Validate and filter rendered documents +validatedRenderedDocs = [] +for doc in renderedDocuments: + # Validation 5.2: Skip documents with empty documentData + if not doc.documentData: + logger.warning(f"Skipping rendered document {doc.filename} - empty documentData") + continue + + # Validation 5.3: Infer mimeType from filename if missing + if not doc.mimeType: + from modules.services.serviceGeneration.subDocumentUtility import getMimeTypeFromExtension + if doc.filename: + inferredMimeType = getMimeTypeFromExtension(doc.filename) + if inferredMimeType: + doc.mimeType = inferredMimeType + logger.info(f"Inferred mimeType '{inferredMimeType}' from filename '{doc.filename}'") + else: + logger.warning(f"Could not infer mimeType from filename '{doc.filename}' - keeping as None") + else: + logger.warning(f"Rendered document missing mimeType and filename - cannot infer") + + validatedRenderedDocs.append(doc) + +# Use validated list +renderedDocuments = validatedRenderedDocs + +# Re-check after filtering +if not renderedDocuments: + raise ValueError("No valid documents after validation") +``` + +--- + +## 4. Migration Checklist + +### Phase 1: Model Updates +- [ ] Verify `DocumentIntent` model does NOT include `outputFormat` or `language` +- [ ] Intent clarification focuses only on document purpose (intents, extractionPrompt) +- [ ] Note: outputFormat and language are determined during structure generation (Phase 3) + +### Phase 2: Intent Analysis Updates +- [ ] **CRITICAL**: Add fencing around `userPrompt` in intent analysis prompt + - [ ] Fence user input with code blocks: ```user_request\n{userPrompt}\n``` + - [ ] Test with various user inputs (special chars, JSON, newlines, prompt injection attempts) +- [ ] Update prompt to focus only on document intents (extract, render, reference) +- [ ] Remove any outputFormat/language determination from intent analysis prompt +- [ ] Keep global outputFormat/language as reference only (not for determination) +- [ ] **Verify intent mapping logic** (already implemented in `clarifyDocumentIntents`): + - [ ] Step 1: Map pre-extracted JSONs to original documents (lines 63-83) + - [ ] Step 2: AI analyzes intents for original documents (line 86) + - [ ] Step 3: Map intents back to JSON doc IDs (lines 96-104) + - [ ] Test with pre-extracted JSONs to verify mapping works correctly + +### Phase 3: Content Extraction Updates +- [ ] Verify ContentParts do NOT include outputFormat or language in metadata +- [ ] ContentParts carry only intent and extraction information +- [ ] Verify pre-extracted JSON handling preserves intent information +- [ ] **Add filtering to Data Extraction Path** (`_handleDataExtraction`): + **Current State (BEFORE filtering)**: + ```python + # Line 708: Get documents directly from documentList + documents = self.services.chat.getChatDocumentsFromDocumentList(documentList) + # Line 721: Call extractAndPrepareContent() with ALL documents + preparedContentParts = await self.extractAndPrepareContent(documents, ...) + ``` + **Problem**: If `documentList` contains both: + - Original document: `original_pdf_123.pdf` + - Pre-extracted JSON: `pre_extracted_456.json` (contains ContentParts from `original_pdf_123.pdf`) + → Both are processed → **DUPLICATE ContentParts created** + + **How Filtering Works (Reference: `documentPath.py` lines 62-87)**: + + **Step 1: Identify Pre-Extracted JSONs and Map to Originals** + ```python + # Collect all original document IDs that are covered by pre-extracted JSONs + originalDocIdsCoveredByPreExtracted = set() + for doc in documents: + preExtracted = self.intentAnalyzer.resolvePreExtractedDocument(doc) + if preExtracted: + # Pre-extracted JSON found - get the original document ID it covers + originalDocId = preExtracted["originalDocument"]["id"] + originalDocIdsCoveredByPreExtracted.add(originalDocId) + ``` + **Result**: `originalDocIdsCoveredByPreExtracted = {"original_pdf_123"}` (if pre-extracted JSON covers it) + + **Step 2: Filter Documents List** + ```python + filteredDocuments = [] + for doc in documents: + preExtracted = self.intentAnalyzer.resolvePreExtractedDocument(doc) + if preExtracted: + # Pre-extracted JSON - KEEP IT (will be processed as ContentParts) + filteredDocuments.append(doc) + elif doc.id in originalDocIdsCoveredByPreExtracted: + # Original document covered by pre-extracted JSON - REMOVE IT + logger.info(f"Skipping original document {doc.id} - already covered") + # Do NOT append - skip this document + else: + # Regular document (not pre-extracted, not covered) - KEEP IT + filteredDocuments.append(doc) + + documents = filteredDocuments # Use filtered list + ``` + **Result**: + - ✅ Pre-extracted JSON: `pre_extracted_456.json` → KEPT + - ❌ Original document: `original_pdf_123.pdf` → REMOVED (covered by pre-extracted JSON) + - ✅ Regular document: `other_doc.pdf` → KEPT (not covered) + + **Step 3: Use Filtered Documents** + ```python + # Now call extractAndPrepareContent() with filtered documents only + preparedContentParts = await self.extractAndPrepareContent( + documents, # Only pre-extracted JSONs + regular docs (no originals covered by JSONs) + documentIntents or [], + extractOperationId + ) + ``` + **Result**: No duplicates - original documents already filtered out + + **Implementation Steps**: + - [ ] Add filtering logic between line 708 (get documents) and line 710 (clarify intents) + - [ ] Copy filtering code from `documentPath.py` lines 62-87 + - [ ] Adapt to use `self.intentAnalyzer.resolvePreExtractedDocument()` (same method) + - [ ] **Filtering Logic**: + ```python + # Step 1: Identify all original document IDs covered by pre-extracted JSONs + originalDocIdsCoveredByPreExtracted = set() + for doc in documents: + preExtracted = self.intentAnalyzer.resolvePreExtractedDocument(doc) + if preExtracted: + originalDocId = preExtracted["originalDocument"]["id"] + originalDocIdsCoveredByPreExtracted.add(originalDocId) + logger.debug(f"Found pre-extracted JSON {doc.id} covering original document {originalDocId}") + + # Step 2: Filter documents - remove originals covered by pre-extracted JSONs + filteredDocuments = [] + for doc in documents: + preExtracted = self.intentAnalyzer.resolvePreExtractedDocument(doc) + if preExtracted: + filteredDocuments.append(doc) # Keep pre-extracted JSON + elif doc.id in originalDocIdsCoveredByPreExtracted: + logger.info(f"Skipping original document {doc.id} ({doc.fileName}) - already covered by pre-extracted JSON") + else: + filteredDocuments.append(doc) # Keep regular document + + documents = filteredDocuments # Use filtered list + ``` + - [ ] Test with scenario: original document + pre-extracted JSON → verify no duplicates +- [ ] **Remove redundant check from `extractAndPrepareContent()`**: + - [ ] Remove pre-extracted JSON check (line 77 in `subContentExtraction.py`) + - [ ] Trust that filtering is done upstream + - [ ] Cleaner code, single responsibility +- [ ] Test merging logic +- [ ] Test that both document generation and data extraction paths handle pre-extracted JSONs correctly +- [ ] Note: outputFormat and language are NOT propagated here - determined in structure generation + +### Phase 4: Structure Generation Updates +- [ ] **Make outputFormat optional in generateStructure() method signature**: + - [ ] Update `subStructureGeneration.py` method signature (line 47): `outputFormat: Optional[str] = None` + - [ ] Update `mainServiceAi.py` wrapper method (line 444): Make `outputFormat` optional + - [ ] If `outputFormat` not provided, use "txt" as validation fallback (AI determines formats from prompt) + - [ ] Add logging: "outputFormat not provided - using 'txt' as validation fallback, formats determined from prompt" + - [ ] **Context**: `outputFormat` is only a validation fallback - AI determines per-document formats from user prompt. Multiple documents can have different formats (e.g., one PDF, one HTML). +- [ ] **Note on language handling**: Language is accessed via `self.services.currentUserLanguage` (always valid, validated during user intention analysis). No language parameter needed in `generateStructure()` method signature - language is accessed directly from services within the method. + - [ ] Verify `currentUserLanguage` is used correctly in `subStructureGeneration.py` (via `self.services.currentUserLanguage`) + - [ ] Verify `currentUserLanguage` is used correctly in prompt building (via `self.services.currentUserLanguage`) + - [ ] Note: `mainServiceGeneration.py` uses different service - verify if update needed +- [ ] Group ContentParts by documentId (for context in prompt) +- [ ] Update `_buildChapterStructurePrompt()` to access language via `self.services.currentUserLanguage` (no parameter needed) +- [ ] Update structure generation prompt to ask AI to determine per-document outputFormat + - [ ] Explicitly require `outputFormat` field in each document JSON structure + - [ ] Update example structure to show `outputFormat` field (not just filename) + - [ ] Clarify that multiple documents can have different formats +- [ ] Update structure generation prompt to ask AI to determine per-document language + - [ ] Explicitly require `language` field in each document JSON structure + - [ ] Clarify that multiple documents can have different languages +- [ ] Provide global fallbacks (outputFormat, language) for AI to use if not specified + - [ ] `outputFormat` fallback: from parameter or "txt" + - [ ] `language` fallback: use `self._getUserLanguage()` (validated currentUserLanguage infrastructure) +- [ ] **Parse and validate format/language from AI response**: + - [ ] Extract `outputFormat` and `language` from each document in structure JSON + - [ ] **Format validation (use existing renderer registry infrastructure)**: + - [ ] Import: `from modules.services.serviceGeneration.renderers.registry import getRenderer` + - [ ] If `outputFormat` missing or empty → use global fallback (`outputFormat` or "txt") + - [ ] If `outputFormat` exists → check if it has a renderer using `getRenderer(formatName)` (existing infrastructure) + - [ ] Normalize format name: `formatName.lower().strip()` + - [ ] If format doesn't match any renderer → use "txt" (simple approach, no global fallback attempt) + - [ ] Log warnings for invalid formats + - [ ] **Note**: Infrastructure exists at `mainServiceGeneration.py:529` - reuse `getRenderer()` function + - [ ] **Language validation (use existing validated infrastructure)**: + - [ ] Validate language (must be 2-character ISO 639-1 code) + - [ ] **If language missing**: Set to `self._getUserLanguage()` which uses validated `currentUserLanguage` (always valid, validated during user intention analysis at `workflowManager.py:695-727`) + - [ ] **If language invalid format**: Use `self._getUserLanguage()` (always valid) + - [ ] Normalize language: `language.lower().strip()[:2]` + - [ ] Log warnings for invalid/missing values + - [ ] **Note**: `currentUserLanguage` is always valid - safe to use directly via `_getUserLanguage()` method +- [ ] **Error handling**: + - [ ] If structure JSON is malformed → raise error with details + - [ ] If no documents in structure → raise error + - [ ] If AI doesn't return format → use global `outputFormat` fallback (or "txt" if not provided), log warning + - [ ] If AI doesn't return language → use validated `currentUserLanguage` (always valid), log warning +- [ ] Verify structure output includes per-document format and language (from AI in JSON response) + +### Phase 5: Structure Filling Verification +- [ ] Verify two prompt types are correctly used: + - [ ] `isAggregation=True`: ContentParts as parameters + - [ ] `isAggregation=False`: Only generationHint +- [ ] **Verify per-document language is extracted and used**: + - [ ] Language MUST be defined in structure (validated in State 3) + - [ ] Language extracted from document in structure (per-document) - NO fallback to "en" + - [ ] If language missing: Raise error (should not happen after State 3 validation) + - [ ] If language invalid format: Raise error (should not happen after State 3 validation) + - [ ] Language passed to `_buildSectionGenerationPrompt()` for each section + - [ ] Language preserved in filled structure (State 4 validation) +- [ ] Test both prompt types with various scenarios +- [ ] Verify Vision AI extraction happens during filling phase +- [ ] Test with multi-document scenarios (different languages per document) + +### Phase 6: Document Rendering Updates +- [ ] **Add language parameter to renderResult() method**: + - [ ] Update `mainServiceAi.py` renderResult() signature (line 460) + - [ ] Pass language to `generationService.renderReport()` (as global fallback) +- [ ] **Update renderResult call site** (`documentPath.py` line 151): + - [ ] Language comes from structure (per-document), validated in State 3 + - [ ] Use validated `currentUserLanguage` as global fallback (always valid) + - [ ] Per-document language will be extracted in `renderReport()` from filledStructure + - [ ] Code example: + ```python + # Language is already validated in structure (State 3) and preserved in filled structure (State 4) + # Per-document language will be extracted in renderReport() from filledStructure + # Use validated currentUserLanguage as global fallback (always valid infrastructure) + language = self.services.currentUserLanguage or "en" # Uses validated infrastructure + + renderedDocuments = await self.services.ai.renderResult( + filledStructure, + outputFormat, + language, # ← Global fallback (per-document language extracted from structure in renderReport) + title or "Generated Document", + userPrompt, + docOperationId + ) + ``` +- [ ] **Update renderReport() to handle per-document format and language**: + - [ ] Add language parameter to method signature (line 349): `language: str` (global fallback) + - [ ] Extract per-document format: `docFormat = doc.get("outputFormat") or doc.get("format") or outputFormat` (check `outputFormat` field first) + - [ ] Extract per-document language: `docLanguage = doc.get("language") or language` (from structure, validated in State 3) + - [ ] Validate language format (should be 2-character ISO code, validated in State 3) + - [ ] Add language to metadata passed to renderers: `metadata["language"] = docLanguage` + - [ ] **Note**: Per-document format and language are extracted from structure (validated in State 3). Multiple documents can have different formats and languages. +- [ ] **Error handling**: + - [ ] If no documents in structure → raise error + - [ ] If filtering removes all documents → raise error + - [ ] If format not supported → log warning, skip document +- [ ] Test multi-document rendering with different formats/languages + +### Phase 7: ai.process Refactoring +- [ ] Remove extraction logic from `ai.process` (lines 72-119) +- [x] **Make resultType optional**: ✅ **IMPLEMENTED** + - [x] Update `ai.process`: Make `resultType` optional (can be `None`) - ✅ **COMPLETED** + - [x] Update `ai.generateDocument`: Make `resultType` optional, removed auto-detection - ✅ **COMPLETED** + - [x] Update `ai.generateCode`: Make `resultType` optional, removed auto-detection - ✅ **COMPLETED** + - [x] If `resultType` omitted → pass `None` to `callAiContent()` (formats determined from prompt) - ✅ **COMPLETED** + - [x] Updated action parameter definitions in `methodAi.py` - ✅ **COMPLETED** + + **Implementation Status**: + - ✅ **ai.process**: `resultType` optional, passes `None` if omitted + - ✅ **ai.generateDocument**: `resultType` optional, passes `None` if omitted + - ✅ **ai.generateCode**: `resultType` optional, passes `None` if omitted + - ✅ **callAiContent**: Already supports optional `outputFormat` (defaults to "txt") + - [ ] **generateStructure**: Make `outputFormat` optional (see Phase 4 checklist) + +- [ ] **Add filtering to Data Extraction Path** (`_handleDataExtraction`): + - [ ] **Location**: `mainServiceAi.py` between line 708 (get documents) and line 721 (extract content) + - [ ] **Purpose**: Prevent duplicate ContentParts when both original document and pre-extracted JSON are provided + - [ ] **Implementation**: Copy filtering logic from `documentPath.py:62-87` + - [ ] Filter out original documents covered by pre-extracted JSONs before calling `extractAndPrepareContent()` + - [ ] See Phase 3 checklist for detailed filtering code +- [ ] Pass `documentList` to `callAiContent()` (currently missing, line 155-162 in `process.py`) + - [ ] `documentList` is available in `process.py` (lines 43-55) but not passed to `callAiContent()` + - [ ] Add `documentList=documentList` parameter to `callAiContent()` call +- [ ] Pass `contentParts` to `callAiContent()` (already done) +- [ ] **Error handling**: + - [ ] If no documents and no contentParts → raise error + - [ ] If filtering removes all documents → raise error +- [ ] Verify intelligent merging in AI service works correctly + +### Phase 8: Testing +- [ ] Test with pre-extracted JSON documents +- [ ] Test with mixed `documentList` + `contentParts` +- [ ] Test per-document format/language determination +- [ ] Test two prompt types in structure filling +- [ ] Test multi-document output with different formats/languages +- [ ] Test security: prompt injection attempts with fenced input +- [ ] **Test optional outputFormat handling**: + - [ ] Test with `resultType` provided → formats used as fallback + - [ ] Test with `resultType` omitted → AI determines formats from prompt + - [ ] Test format validation: invalid format → uses "txt" + - [ ] Test format validation: format without renderer → uses "txt" + +### Phase 9: Documentation +- [ ] Update API documentation +- [ ] Update developer documentation +- [ ] Update user documentation (if applicable) + +--- + +## Priority Order + +**High Priority (Security & Critical Path)**: +1. **Phase 2**: Intent Analysis Updates - Security fix (fencing) is CRITICAL +2. **Phase 7**: ai.process Refactoring - Add filtering to Data Extraction Path (prevents duplicate ContentParts) +3. **Phase 1**: Model Updates - Foundation for all other changes + +**Medium Priority (Architectural Improvements)**: +4. **Phase 4**: Structure Generation Updates + - Make outputFormat optional (AI determines per-document formats) + - Implement State 3 validation (use existing renderer registry and language infrastructure) + - Update prompt to require outputFormat field per document +5. **Phase 6**: Document Rendering Updates + - Extract per-document format/language from structure + - Add language parameter to renderResult() and renderReport() +6. **Phase 3**: Content Extraction Updates + - Remove redundant pre-extracted check AFTER filtering added upstream + +**Low Priority (Verification & Polish)**: +7. **Phase 5**: Structure Filling Verification (already implemented, verify) +8. **Phase 8**: Testing +9. **Phase 9**: Documentation + +--- + +## Notes + +- The two prompt types in Phase 4 (Structure Filling) are already implemented via the `isAggregation` flag. This step focuses on verification and documentation. +- Per-document format/language determination follows the same pattern as existing per-document language handling. +- The security fix (fencing user input) should be implemented immediately as it addresses a potential prompt injection vulnerability. + +--- + +## Architectural Note: Filtering and Redundant Pre-Extracted JSON Checks + +### Problem Statement + +When a user provides both an original document and a pre-extracted JSON containing ContentParts from that original document, we need to prevent duplicate ContentParts from being created. + +### Current State + +The pre-extracted JSON check happens **twice**: + +1. **Phase 1** (`documentPath.py` lines 67-87): Filters documents before intent clarification +2. **Phase 2** (`subContentExtraction.py` line 77): Checks again during extraction loop + +### Why Filtering is Necessary + +**The redundant check in `extractAndPrepareContent()` only identifies if a document IS a pre-extracted JSON. It does NOT identify if a document is an ORIGINAL covered by a pre-extracted JSON.** + +**Example**: +```python +# In extractAndPrepareContent loop: +for document in [original_pdf_123, pre_extracted_456]: + # Check document 1: original_pdf_123 + preExtracted = resolvePreExtractedDocument(original_pdf_123) + # Returns: None (it's not a pre-extracted JSON) + # → Processes original_pdf_123 → extracts ContentParts + + # Check document 2: pre_extracted_456 + preExtracted = resolvePreExtractedDocument(pre_extracted_456) + # Returns: {originalDocument: {id: "original_pdf_123"}, ...} + # → Processes pre_extracted_456 → extracts ContentParts + + # Result: BOTH processed → DUPLICATES +``` + +**The redundant check doesn't help because**: +- It only looks at ONE document at a time +- It doesn't know about OTHER documents in the list +- It can't compare documents to find relationships + +### Why Filtering Works + +Filtering happens BEFORE the extraction loop, so it can: +1. Look at ALL documents at once +2. Identify relationships between documents +3. Remove originals BEFORE extraction starts + +### Code Path Analysis + +#### Path 1: Document Generation Path (`documentPath.py`) + +**Location**: Line 103 +**Filtering**: ✅ YES (lines 62-87) +- Identifies pre-extracted JSONs +- Filters out original documents covered by pre-extracted JSONs +- Only passes filtered documents to `extractAndPrepareContent()` + +**Result**: ✅ **NO DUPLICATES** - Original document already filtered out + +#### Path 2: Data Extraction Path (`mainServiceAi.py` `_handleDataExtraction`) + +**Location**: Line 721 +**Filtering**: ❌ **NO** +- Gets documents directly from `documentList` (line 708) +- Calls `extractAndPrepareContent()` without any filtering +- Does NOT filter out original documents covered by pre-extracted JSONs + +**Result**: ❌ **DUPLICATES CREATED** - Both documents processed, same content extracted twice + +### Visual Flow Comparison + +#### Document Generation Path (WITH Filtering - CURRENT) +``` +documentList: [original_pdf_123, pre_extracted_456] + ↓ +[FILTERING] Identify relationships, remove originals + ↓ +filteredDocuments: [pre_extracted_456] ← original_pdf_123 removed + ↓ +extractAndPrepareContent([pre_extracted_456]) + ↓ +ContentParts from pre_extracted_456 only + ↓ +✅ NO DUPLICATES +``` + +#### Data Extraction Path (WITHOUT Filtering - CURRENT) +``` +documentList: [original_pdf_123, pre_extracted_456] + ↓ +[NO FILTERING] Pass all documents + ↓ +extractAndPrepareContent([original_pdf_123, pre_extracted_456]) + ↓ +Process original_pdf_123 → ContentParts +Process pre_extracted_456 → ContentParts + ↓ +❌ DUPLICATES (same content twice) +``` + +#### Data Extraction Path (WITH Filtering - TARGET) +``` +documentList: [original_pdf_123, pre_extracted_456] + ↓ +[FILTERING] Identify relationships, remove originals + ↓ +filteredDocuments: [pre_extracted_456] ← original_pdf_123 removed + ↓ +extractAndPrepareContent([pre_extracted_456]) + ↓ +ContentParts from pre_extracted_456 only + ↓ +✅ NO DUPLICATES +``` + +### Solution + +**Target State**: Add filtering to Data Extraction Path, then remove redundant check + +**Steps**: +1. **Add filtering logic to `_handleDataExtraction`** (between line 708 and line 721) + - Copy filtering code from `documentPath.py` lines 62-87 + - Filter out original documents covered by pre-extracted JSONs +2. **Remove redundant check from `extractAndPrepareContent()`** (line 77) + - Trust that filtering is done upstream + - Cleaner code, single responsibility + +**Risk Assessment**: +- **If we remove redundant check WITHOUT adding filtering**: ⚠️ Duplicates still occur (no change from current state) +- **If we add filtering THEN remove redundant check**: ✅ No duplicates, cleaner code + +### Conclusion + +1. **Filtering is necessary** because it can look at ALL documents and identify relationships +2. **Redundant check is insufficient** because it only looks at ONE document at a time +3. **Current state**: Document Generation Path filters → safe. Data Extraction Path doesn't filter → duplicates possible +4. **Solution**: Add filtering to Data Extraction Path, then remove redundant check (it's not needed if filtering is done) +5. **Risk of removing redundant check**: None IF filtering is added first. High IF filtering is NOT added (but duplicates already exist anyway) + +--- + +## Appendix: Pre-Extracted JSON Document Check Locations + +### Where the Check is Done + +**1. Phase 1 (Before Intent Clarification)**: +- **File**: `gateway/modules/services/serviceGeneration/paths/documentPath.py` +- **Lines**: 67-87 +- **Purpose**: Filter documents before intent analysis +- **Method**: `self.services.ai.intentAnalyzer.resolvePreExtractedDocument(doc)` +- **Action**: Identifies pre-extracted JSONs and filters out original documents covered by them + +**2. Phase 2 (During Content Extraction)**: +- **File**: `gateway/modules/services/serviceAi/subContentExtraction.py` +- **Line**: 77 +- **Purpose**: Process each document during extraction loop +- **Method**: `self.intentAnalyzer.resolvePreExtractedDocument(document)` +- **Action**: Extracts ContentParts from pre-extracted JSON (not treat as regular JSON) +- **Note**: ⚠️ **REDUNDANT** - This check happens again even though Phase 1 already filtered documents +- **Reason**: `extractAndPrepareContent()` is called from multiple code paths: + - Document generation path (`documentPath.py`) - filtering already done + - Data extraction path (`mainServiceAi.py`) - filtering may not be done + - The extraction service needs to handle pre-extracted JSONs defensively +- **Optimization Opportunity**: Could pass filtered documents or a flag to skip redundant checks + +**3. Check Implementation**: +- **File**: `gateway/modules/services/serviceAi/subDocumentIntents.py` +- **Line**: 122 +- **Method**: `resolvePreExtractedDocument(document: ChatDocument)` +- **Logic**: + - Checks if `mimeType == "application/json"` + - Parses JSON and checks for `validationMetadata.actionType == "context.extractContent"` + - Extracts `ContentExtracted` structure from `documentData` + - Returns dict with `originalDocument` and `contentExtracted` info + +### Where Final Merged List is Available + +**After Phase 2 (Content Extraction)**: +- **File**: `gateway/modules/services/serviceGeneration/paths/documentPath.py` +- **Line**: 119 +- **Code**: `contentParts = preparedContentParts` +- **State**: + - ✅ All pre-extracted JSON documents processed → ContentParts + - ✅ All regular documents extracted → ContentParts + - ✅ All provided contentParts merged + - ✅ Final clean merged list ready for Phase 3 (Structure Generation) + +**Before Phase 3 (Structure Generation)**: +- **File**: `gateway/modules/services/serviceGeneration/paths/documentPath.py` +- **Line**: 129 +- **Usage**: `contentParts or []` passed to `generateStructure()` +- **Note**: This is the clean merged list containing all ContentParts from all sources + +--- + +## Appendix: Intent Mapping Logic for Pre-Extracted JSONs + +### How Intent Mapping Works + +**Problem**: When a pre-extracted JSON document is provided, we need to: +1. Analyze intents for the **original document** (not the JSON file itself) +2. Map the intents back to the **JSON document ID** (so they can be applied to the ContentParts extracted from the JSON) + +### Implementation Logic (Already in `clarifyDocumentIntents`) + +**Location**: `gateway/modules/services/serviceAi/subDocumentIntents.py` lines 63-104 + +**Step 1: Build Mapping** (lines 63-83) +```python +documentMapping = {} # Maps original doc ID → JSON doc ID +resolvedDocuments = [] + +for doc in documents: + preExtracted = self.resolvePreExtractedDocument(doc) + if preExtracted: + # This is a pre-extracted JSON + originalDocId = preExtracted["originalDocument"]["id"] + jsonDocId = doc.id # Current document is the JSON + + # Map: original doc ID → JSON doc ID + documentMapping[originalDocId] = jsonDocId + + # Create temporary ChatDocument for original document + originalDoc = ChatDocument( + id=originalDocId, + fileName=preExtracted["originalDocument"]["fileName"], + mimeType=preExtracted["originalDocument"]["mimeType"], + # ... other fields from preExtracted["originalDocument"] + ) + resolvedDocuments.append(originalDoc) # Use original doc for intent analysis + else: + resolvedDocuments.append(doc) # Regular document, use as-is +``` + +**Result**: +- `documentMapping = {"original_pdf_123": "pre_extracted_456"}` +- `resolvedDocuments = [ChatDocument(id="original_pdf_123"), ChatDocument(id="other_doc")]` + +**Step 2: AI Analyzes Intents** (line 86) +```python +# AI analyzes intents for resolvedDocuments (original documents, not JSONs) +intentPrompt = self._buildIntentAnalysisPrompt(userPrompt, resolvedDocuments, actionParameters) +aiResponse = await self.aiService.callAiPlanning(prompt=intentPrompt, ...) +``` + +**AI Response**: +```json +{ + "intents": [ + { + "documentId": "original_pdf_123", // ← Original document ID + "intents": ["extract"], + "extractionPrompt": "Extract all text", + "reasoning": "..." + } + ] +} +``` + +**Step 3: Map Intents Back to JSON Doc IDs** (lines 96-104) +```python +intentsData = json.loads(self.services.utils.jsonExtractString(aiResponse)) +documentIntents = [] + +for intent in intentsData.get("intents", []): + docId = intent.get("documentId") # "original_pdf_123" + + # If intent is for an original document covered by a pre-extracted JSON + if docId in documentMapping: + # Map back to JSON document ID + intent["documentId"] = documentMapping[docId] # "pre_extracted_456" + + documentIntents.append(DocumentIntent(**intent)) +``` + +**Result**: +- `DocumentIntent(documentId="pre_extracted_456", intents=["extract"], ...)` +- Intent is now mapped to the JSON document ID, so it can be applied to ContentParts extracted from the JSON + +### Why This Works + +1. **AI analyzes original documents**: More meaningful context (file name, MIME type, etc.) +2. **Intents mapped to JSON IDs**: ContentParts extracted from JSON can be tagged with correct intents +3. **Consistent with filtering**: Original documents are filtered out, but their intents are preserved via mapping + +### Example Flow + +``` +Input: +- documentList: [original_pdf_123.pdf, pre_extracted_456.json] + +Step 1: Filtering (Phase 1) +- Identify: pre_extracted_456.json covers original_pdf_123.pdf +- Filter: Remove original_pdf_123.pdf +- Result: documents = [pre_extracted_456.json] + +Step 2: Intent Mapping (Phase 1) +- Build mapping: {"original_pdf_123": "pre_extracted_456"} +- Resolve: resolvedDocuments = [ChatDocument(id="original_pdf_123")] +- AI analyzes: intents for "original_pdf_123" +- Map back: intents for "pre_extracted_456" + +Step 3: Content Extraction (Phase 2) +- Extract ContentParts from pre_extracted_456.json +- Apply intents (from Step 2) to ContentParts +- Result: ContentParts with correct intents +``` + +--- + +## Implementation Notes + +### Infrastructure Available + +The following infrastructure already exists and should be reused: + +- **Language Validation**: `currentUserLanguage` is validated at `workflowManager.py:695-727` - always valid 2-character ISO code. Access via `self.services.currentUserLanguage` or `_getUserLanguage()` method. + +- **Format Validation**: Renderer registry exists at `mainServiceGeneration.py:529` (`_getFormatRenderer()` uses `getRenderer()`). Import: `from modules.services.serviceGeneration.renderers.registry import getRenderer`. Returns None if format invalid, falls back to text renderer. + +- **Language Extraction**: `_getDocumentLanguage()` works correctly at `subStructureFilling.py:349` - extracts per-document language from structure. Used properly during section generation. + +### Key Implementation Points + +1. **Per-Document Format/Language**: Multiple documents can have different formats and languages. AI determines these from user prompt. Parameters are only validation fallbacks. + +2. **Filtering**: Must filter pre-extracted JSONs before content extraction to prevent duplicate ContentParts. Filtering logic exists in `documentPath.py:62-87` and should be copied to data extraction path. + +3. **State 3 Validation**: Use existing infrastructure (`getRenderer()`, `_getUserLanguage()`) for validation. Infrastructure exists, just needs to be called. + +4. **Rendering**: Extract per-document `outputFormat` and `language` from structure (validated in State 3). Check `outputFormat` field first, then `format` field (legacy), then global fallback. + +--- + +## Appendix: Validation Failure Handling Decisions + +This appendix documents the decision-making process for how to handle each validation failure. The actual implementation code is integrated into Section 3 above. + +### Approach +- **Try to fix automatically** (use defaults) when validation fails +- **All validations are critical** (must not fail - fix or error) +- **Validation happens inline** in each phase method + +### State 1: After Intent Clarification + +#### Validation 1.1: Intent count mismatch +**Check**: `len(documentIntents) != len(documents)` +**Decision**: Documents without intents are OK. Intents for non-existing documents should be skipped. +**Rationale**: Not all documents need intents (some may be reference-only). Intents referencing unknown documents are invalid and should be removed. + +#### Validation 1.2: Intent references unknown document +**Check**: `intent.documentId not in documentIds` +**Decision**: Skip this intent (remove it) +**Rationale**: Cannot map intent to non-existent document. Better to skip than fail. + +--- + +### State 2: After Content Extraction + +#### Validation 2.1: ContentPart missing documentId +**Check**: `not part.metadata.get("documentId")` +**Decision**: Skip this ContentPart (remove it) with warning in logger +**Rationale**: ContentPart without documentId cannot be properly assigned. Skip with warning for debugging. + +#### Validation 2.2: ContentPart has invalid contentFormat +**Check**: `contentFormat not in ["extracted", "object", "reference"]` +**Decision**: Skip this ContentPart (remove it) with warning in logger +**Rationale**: Invalid contentFormat indicates corrupted data. Skip with warning for debugging. + +--- + +### State 3: After Structure Generation + +#### Validation 3.1: Structure missing 'documents' field +**Check**: `"documents" not in chapterStructure` +**Decision**: Stop with error (cannot auto-fix - structure is invalid) +**Rationale**: Structure without documents field is fundamentally broken. Cannot proceed. + +#### Validation 3.2: Structure has no documents +**Check**: `len(documents) == 0` +**Decision**: Stop with error (cannot generate without documents) +**Rationale**: Cannot generate output without documents. Must have at least one document. + +#### Validation 3.3: Document missing 'outputFormat' field +**Check**: `"outputFormat" not in doc` +**Decision**: Use global fallback format (from parameters), if not available use default "txt" +**Rationale**: Format is required for rendering. Use fallback chain: per-document → global → default. + +#### Validation 3.4: Document has invalid outputFormat +**Check**: `outputFormat not in valid formats` +**Decision**: Use renderer registry to check if format has a renderer. If no renderer exists, try global fallback, then default "txt" +**Rationale**: Use dynamic renderer registry (not hardcoded list) to check format validity. Fallback chain ensures we always have a valid format. + +#### Validation 3.5: Document missing 'language' field +**Check**: `"language" not in doc` +**Decision**: Use user prompt language (from `self.services.currentUserLanguage` via `_getUserLanguage()`), not "en" fallback +**Rationale**: Language is required for content generation. Use user prompt language (detected from user intention analysis) as fallback, not hardcoded "en". + +#### Validation 3.6: Document has invalid language +**Check**: `len(doc["language"]) != 2` +**Decision**: Use validated `currentUserLanguage` (always valid, validated during user intention analysis) +**Rationale**: `currentUserLanguage` is validated during user intention analysis and is always a valid 2-character ISO 639-1 code. Safe to use directly. + +#### Validation 3.7: Document missing 'chapters' field +**Check**: `"chapters" not in doc` +**Decision**: Stop with error (cannot auto-fix - document structure invalid) +**Rationale**: Document without chapters is structurally invalid. Cannot proceed. + +#### Validation 3.8: Chapter missing 'contentParts' field +**Check**: `"contentParts" not in chapter` +**Decision**: Stop with error (cannot auto-fix - chapter structure invalid) +**Rationale**: Chapter without contentParts field is structurally invalid. Cannot proceed. + +--- + +### State 4: After Structure Filling + +#### Validation 4.1: Filled structure missing 'documents' field +**Check**: `"documents" not in filledStructure` +**Decision**: Stop with error (cannot auto-fix - structure is invalid) +**Rationale**: Structure without documents field is fundamentally broken. Cannot proceed. + +#### Validation 4.2: Section missing 'elements' field +**Check**: `"elements" not in section` +**Decision**: Create empty elements list: `section["elements"] = []` +**Rationale**: Section can be intentionally empty. Create empty list to maintain structure. + +#### Validation 4.3: Section has empty elements list +**Check**: `not section["elements"]` (empty list) +**Decision**: Allow empty elements (section might be intentionally empty) +**Rationale**: Empty sections are valid (e.g., placeholder sections). No action needed. + +#### Validation 4.4: Document missing 'language' field in filled structure +**Check**: `"language" not in doc` (in filledStructure) +**Decision**: Stop with error (language MUST be preserved from Phase 3) +**Rationale**: Language is validated and set in Phase 3 (State 3). If missing in filled structure, it's a critical error - language must be preserved. + +#### Validation 4.5: Document has invalid language format in filled structure +**Check**: `not isinstance(doc["language"], str) or len(doc["language"]) != 2` +**Decision**: Stop with error (language format MUST be valid) +**Rationale**: Language format is validated in Phase 3 (State 3). If invalid in filled structure, it's a critical error. + +--- + +### State 5: After Document Rendering + +#### Validation 5.1: No documents rendered +**Check**: `len(renderedDocuments) == 0` +**Decision**: Stop with error (already implemented in documentPath.py line 176) +**Rationale**: Cannot return empty result. Error already implemented. + +#### Validation 5.2: Rendered document has empty documentData +**Check**: `not doc.documentData` +**Decision**: Skip this document (remove from list) +**Rationale**: Empty document is not useful. Skip it rather than fail entire operation. + +#### Validation 5.3: Rendered document missing mimeType +**Check**: `not doc.mimeType` +**Decision**: Infer mimeType from filename extension +**Rationale**: mimeType can be inferred from filename. Use utility function to detect. diff --git a/implementation/implementation_taskintentions_done.md b/implementation/implementation_taskintentions_done.md new file mode 100644 index 0000000..29cb810 --- /dev/null +++ b/implementation/implementation_taskintentions_done.md @@ -0,0 +1,1591 @@ +# Task Intentions & Generic Looping System - Refactoring Architecture + +## Executive Summary + +This document outlines a comprehensive refactoring to enhance the generation system with: +1. **AI Service-Level Intent Detection**: Detect intent (document vs code) when `DATA_GENERATE` operation is called - workflow level remains unchanged +2. **Generic Looping System**: Parametrized looping infrastructure supporting different JSON formats and use cases +3. **Multiple Generation Paths**: Document, code, and image generation paths within the generation service, all unified as action result documents +4. **Smart Code Generation**: Multi-file projects with dependency handling, requirements.txt/package.json generation, and proper cross-file references + +--- + +## Part 1: AI Service-Level Intent Detection + +### 1.1 Current State + +**Problem**: +- `DATA_GENERATE` operation type is used for both document and code generation +- No distinction at AI service level - always routes to document generation pipeline +- Code generation requests treated as document generation +- `IMAGE_GENERATE` already works correctly (no changes needed) + +**Current Flow**: +``` +User Request + ↓ +Task Planning (unchanged) + ↓ +Action Planning (selects ai.process) + ↓ +ai.process → callAiContent(operationType=DATA_GENERATE) + ↓ +Document Generation Pipeline (always) ❌ Wrong for code! +``` + +**Key Insight**: +- **Workflow level (task/action planning)**: Remains unchanged ✅ +- **AI Service level**: Need to detect intent when `DATA_GENERATE` is called +- **Operation Types**: + - `IMAGE_GENERATE` → Already handles images correctly ✅ + - `DATA_GENERATE` → Needs to split: document vs code + +**Current Issue with `ai.process`**: +- `ai.process` creates `AiCallOptions(resultFormat=output_format)` - **no operationType set** +- `callAiContent()` defaults to `DATA_GENERATE` if operationType not set (line 623) +- If `resultType="png"` or `"jpg"` → still uses `DATA_GENERATE`, NOT `IMAGE_GENERATE` ❌ +- Image generation requests go through document pipeline instead of image pipeline + +**Solution**: Detect image generation intent and set `operationType=IMAGE_GENERATE` when appropriate + +### 1.2 Proposed Architecture + +#### Intent Detection at AI Service Level + +**Location**: `gateway/modules/services/serviceAi/mainServiceAi.py` and `callAiContent()` + +**Principle**: When `DATA_GENERATE` operation is called, detect from prompt/content whether it's: +- **Document generation**: Reports, articles, formatted documents (existing behavior) +- **Code generation**: Executable code files (new behavior) + +**No changes needed**: +- Task planning (remains unchanged) +- Action planning (remains unchanged) +- `IMAGE_GENERATE` operation (already works) + +#### Intent Detection Logic + +**NO AUTO-DETECTION**: Intent detection is NOT used in the new architecture. + +**Architecture Principle**: +- **NO auto-detection**: Actions must explicitly provide `generationIntent` +- **Clear use cases**: Each action defines its intent explicitly +- **No fallback**: No fallback to old processing or detection logic +- **Fail fast**: If `generationIntent` is missing, raise error immediately +- **Explicit over implicit**: All intent must be explicitly specified - no guessing or inference +- **Format detection vs Intent detection**: + - ✅ **Format detection is acceptable**: Detecting image formats from explicit `resultType` parameter (e.g., "png", "jpg") is acceptable because it's based on an explicit parameter, not prompt analysis + - ❌ **Intent detection is NOT acceptable**: Detecting intent from prompt content or other inferred sources is not allowed - intent must be explicit + +**Implementation**: +- All actions must pass explicit `generationIntent` parameter +- `callAiContent()` requires `generationIntent` for `DATA_GENERATE` operations +- No IntentDetector class needed - intent comes from action definition +- Image generation detection: `ai.process` detects image formats from `resultType` and sets `operationType=IMAGE_GENERATE` automatically (this is format detection based on explicit parameter, not intent detection from prompt) + +#### AI Service Integration + +**Modify**: `mainServiceAi.py` - `callAiContent()` method + +```python +async def callAiContent( + self, + prompt: str, + options: Optional[AiCallOptions] = None, + documentList: Optional[DocumentReferenceList] = None, + contentParts: Optional[List[ContentPart]] = None, + outputFormat: str = None, + title: str = None, + parentOperationId: Optional[str] = None, + generationIntent: Optional[str] = None # NEW: Explicit intent from action (skips detection) +) -> AiResponse: + """ + Unified AI content generation with explicit intent requirement. + + Args: + generationIntent: REQUIRED explicit intent ("document" | "code" | "image") from action. + NO auto-detection - actions must explicitly specify intent. + """ + options = options or AiCallOptions() + operationType = options.operationType or OperationTypeEnum.DATA_GENERATE + + # Route based on operation type + if operationType == OperationTypeEnum.IMAGE_GENERATE: + # Image generation - already works correctly, no changes needed + return await self._handleImageGeneration(prompt, options, outputFormat) + + elif operationType == OperationTypeEnum.DATA_GENERATE: + # Data generation - REQUIRES explicit generationIntent + if not generationIntent: + raise ValueError( + "generationIntent is required for DATA_GENERATE operation. " + "Actions must explicitly specify 'document' or 'code' intent. " + "No auto-detection - use qualified actions (ai.generateDocument, ai.generateCode)." + ) + + # Route based on explicit intent (no auto-detection, no fallback) + if generationIntent == "code": + # Route to code generation path + return await self._handleCodeGeneration( + prompt=prompt, + options=options, + contentParts=contentParts, + outputFormat=outputFormat, + title=title, + parentOperationId=parentOperationId + ) + else: + # Route to document generation path (existing behavior) + return await self._handleDocumentGeneration( + prompt=prompt, + options=options, + documentList=documentList, + contentParts=contentParts, + outputFormat=outputFormat, + title=title, + parentOperationId=parentOperationId + ) + + # Other operation types (DATA_ANALYSE, DATA_EXTRACT, etc.) - existing logic + # ... +``` + +#### Generation Path Handlers + +**New Methods in `mainServiceAi.py`**: + +```python +async def _handleCodeGeneration( + self, + prompt: str, + options: AiCallOptions, + contentParts: Optional[List[ContentPart]], + outputFormat: str, + title: str, + parentOperationId: Optional[str] +) -> AiResponse: + """Handle code generation using code generation path.""" + from modules.services.serviceGeneration.paths.codePath import CodeGenerationPath + + codePath = CodeGenerationPath(self.services) + return await codePath.generateCode( + userPrompt=prompt, + outputFormat=outputFormat, + contentParts=contentParts + ) + +async def _handleDocumentGeneration( + self, + prompt: str, + options: AiCallOptions, + documentList: Optional[DocumentReferenceList], + contentParts: Optional[List[ContentPart]], + outputFormat: str, + title: str, + parentOperationId: Optional[str] +) -> AiResponse: + """Handle document generation using existing document path.""" + # Existing document generation logic (unchanged) + # ... +``` + +#### Action Integration + +**Enhancement**: Actions can pass explicit `generationIntent` to skip detection + +**1. Enhance `ai.generateDocument` Action** + +**Modify**: `generateDocument.py` + +```python +async def generateDocument(self, parameters: Dict[str, Any]) -> ActionResult: + """Generate documents - explicitly sets intent to 'document'.""" + # ... existing code ... + + aiResponse: AiResponse = await self.services.ai.callAiContent( + prompt=prompt, + options=options, + documentList=docRefList, + outputFormat=resultType, + title=title, + parentOperationId=parentOperationId, + generationIntent="document" # NEW: Explicit intent, skips detection + ) + + # ... rest of method ... +``` + +**2. Create New `ai.generateCode` Action** + +**New File**: `generateCode.py` + +```python +@action +async def generateCode(self, parameters: Dict[str, Any]) -> ActionResult: + """ + Generate code files - explicitly sets intent to 'code'. + + Parameters: + - prompt (str, required): Description of code to generate + - documentList (list, optional): Reference documents + - resultType (str, optional): Output format (html, js, py, etc.). Default: based on prompt + """ + prompt = parameters.get("prompt") + if not prompt: + return ActionResult.isFailure(error="prompt is required") + + documentList = parameters.get("documentList", []) + resultType = parameters.get("resultType") + + # Auto-detect format from prompt if not provided + if not resultType: + promptLower = prompt.lower() + if ".html" in promptLower or "html file" in promptLower: + resultType = "html" + elif ".js" in promptLower or "javascript" in promptLower: + resultType = "js" + elif ".py" in promptLower or "python" in promptLower: + resultType = "py" + else: + resultType = "txt" # Default + + # Prepare title + title = "Generated Code" + + # Call AI service with explicit code intent + options = AiCallOptions( + operationType=OperationTypeEnum.DATA_GENERATE, + priority=PriorityEnum.BALANCED, + processingMode=ProcessingModeEnum.DETAILED + ) + + aiResponse: AiResponse = await self.services.ai.callAiContent( + prompt=prompt, + options=options, + documentList=docRefList, + outputFormat=resultType, + title=title, + parentOperationId=parentOperationId, + generationIntent="code" # Explicit intent, skips detection + ) + + # Convert to ActionResult (same as generateDocument) + # ... +``` + +**3. Enhance `ai.process` Action** + +**Modify**: `process.py` - Detect image generation from resultType, require generationIntent for DATA_GENERATE + +**Important**: Image format detection (png, jpg, etc.) is **format detection**, not intent detection. This is acceptable because it's based on explicit `resultType` parameter, not prompt analysis. + +```python +async def process(self, parameters: Dict[str, Any]) -> ActionResult: + """Universal AI document processing action.""" + # ... existing code ... + + # Detect image generation from resultType (format detection, not intent detection) + # This is acceptable because resultType is an explicit parameter, not inferred from prompt + resultType = parameters.get("resultType", "txt") + normalized_result_type = (str(resultType).strip().lstrip('.').lower() or "txt") + imageFormats = ["png", "jpg", "jpeg", "gif", "webp"] + isImageGeneration = normalized_result_type in imageFormats + + # Build options with correct operationType + output_format = normalized_result_type.replace('.', '') or 'txt' + options = AiCallOptions( + resultFormat=output_format, + operationType=OperationTypeEnum.IMAGE_GENERATE if isImageGeneration else OperationTypeEnum.DATA_GENERATE + ) + + # Get generationIntent from parameters (REQUIRED for DATA_GENERATE) + generationIntent = parameters.get("generationIntent") + + # For DATA_GENERATE, generationIntent is REQUIRED (no auto-detection, no fallback) + if options.operationType == OperationTypeEnum.DATA_GENERATE and not generationIntent: + raise ValueError( + "ai.process called with DATA_GENERATE but no generationIntent. " + "Use qualified actions (ai.generateDocument, ai.generateCode) instead, " + "or explicitly pass generationIntent parameter." + ) + + # ... existing code ... + + # Pass generationIntent to callAiContent (REQUIRED for DATA_GENERATE) + if contentParts: + aiResponse = await self.services.ai.callAiContent( + prompt=aiPrompt, + options=options, + contentParts=contentParts, + outputFormat=output_format, + parentOperationId=operationId, + generationIntent=generationIntent # REQUIRED for DATA_GENERATE + ) + else: + aiResponse = await self.services.ai.callAiContent( + prompt=aiPrompt, + options=options, + documentList=documentList, + outputFormat=output_format, + parentOperationId=operationId, + generationIntent=generationIntent # REQUIRED for DATA_GENERATE + ) + + # ... rest of method ... +``` + +**Behavior**: +- If `resultType` is image format (png, jpg, etc.) → Sets `operationType=IMAGE_GENERATE` ✅ +- For `DATA_GENERATE`: `generationIntent` is REQUIRED (no auto-detection, no fallback) +- If `generationIntent` not provided → Raises ValueError (fail fast) +- **Best Practice**: Use qualified actions (`ai.generateDocument`, `ai.generateCode`) instead of `ai.process` + +**Rationale**: +- `ai.process` detects image generation from `resultType` and sets correct operationType +- For DATA_GENERATE, explicit intent is required - no auto-detection, no fallback +- Wrapper actions (`translateDocument`, `summarizeDocument`) must pass explicit `generationIntent` +- Clear use cases - no ambiguity + +**4. `ai.translateDocument` and `ai.summarizeDocument` Actions** + +**Current**: Both wrap `ai.process()` with specific prompts +**Enhancement**: Pass `generationIntent="document"` when calling `process()` internally + +**Modify**: `translateDocument.py` and `summarizeDocument.py` + +```python +# In translateDocument.py +processParams = { + "aiPrompt": aiPrompt, + "documentList": documentList, + "generationIntent": "document" # NEW: Explicit intent +} +if resultType: + processParams["resultType"] = resultType +return await self.process(processParams) + +# In summarizeDocument.py +return await self.process({ + "aiPrompt": aiPrompt, + "documentList": documentList, + "resultType": resultType, + "generationIntent": "document" # NEW: Explicit intent +}) +``` + +**Summary**: + +| Action | generationIntent | Behavior | +|--------|------------------|----------| +| `ai.generateDocument` | `"document"` | Explicit intent, skips detection ✅ | +| `ai.generateCode` | `"code"` | Explicit intent, skips detection ✅ | +| `ai.translateDocument` | `"document"` | Explicit intent (via process) ✅ | +| `ai.summarizeDocument` | `"document"` | Explicit intent (via process) ✅ | +| `ai.process` | REQUIRED | Must provide `generationIntent` for DATA_GENERATE, raises error if missing ❌ | + +**Benefits**: +- **Efficiency**: Qualified actions skip detection (saves AI call) +- **Clarity**: Intent is explicit in action name +- **No Ambiguity**: Always clear use case - no auto-detection, no fallback +- **Consistency**: All actions must explicitly define intent + +**Critical Requirements**: +- **NO auto-detection**: `callAiContent()` requires explicit `generationIntent` for DATA_GENERATE +- **NO fallback**: No fallback to old processing logic - raises error if intent missing +- **Clear use cases**: Always explicit - no ambiguity +- **Use qualified actions**: Prefer `ai.generateDocument`, `ai.generateCode` over generic `ai.process` +- **Fail fast**: Missing `generationIntent` raises ValueError immediately + +--- + +## Part 2: Generic Looping System + +### 2.1 Current State + +**Current System**: `subAiCallLooping.py` +- Handles different JSON formats through early detection +- Format-specific routing (elements, chapters, sections) +- Continuation context built for sections (not generic) +- No parametrized configuration + +**Issues**: +- Hard-coded format detection +- Continuation context mismatch for different formats +- No accumulation support for all formats +- Not easily extensible for new formats + +### 2.2 Proposed Generic Looping System + +#### Looping Use Case Configuration + +**New Class**: `LoopingUseCase` + +```python +@dataclass +class LoopingUseCase: + """Configuration for a specific looping use case.""" + + # Identification + useCaseId: str # "section_content", "chapter_structure", "document_structure", "code_structure", "code_content", "image_batch" + + # JSON Format Detection + jsonTemplate: Dict[str, Any] # Expected JSON structure template + detectionKeys: List[str] # Keys to check for format detection (e.g., ["elements"], ["chapters"], ["files"]) + detectionPath: str # JSONPath to check (e.g., "documents[0].chapters", "files[0].content") + + # Prompt Building + initialPromptBuilder: Callable # Function to build initial prompt + continuationPromptBuilder: Callable # Function to build continuation prompt + + # Accumulation & Merging + accumulator: Optional[Callable] = None # Function to accumulate fragments + merger: Optional[Callable] = None # Function to merge accumulated data + + # Continuation Context + continuationContextBuilder: Optional[Callable] = None # Build continuation context for this format + + # Result Building + resultBuilder: Optional[Callable] = None # Build final result from accumulated data + + # Metadata + supportsAccumulation: bool = True # Whether this use case supports accumulation + requiresExtraction: bool = False # Whether this requires extraction (like sections) +``` + +#### Use Case Registry + +**New Module**: `gateway/modules/services/serviceAi/subLoopingUseCases.py` + +```python +class LoopingUseCaseRegistry: + """Registry of all looping use cases.""" + + def __init__(self): + self.useCases: Dict[str, LoopingUseCase] = {} + self._registerDefaultUseCases() + + def register(self, useCase: LoopingUseCase): + """Register a new use case.""" + self.useCases[useCase.useCaseId] = useCase + + def get(self, useCaseId: str) -> Optional[LoopingUseCase]: + """Get use case by ID.""" + return self.useCases.get(useCaseId) + + def detectUseCase(self, parsedJson: Dict[str, Any]) -> Optional[str]: + """Detect which use case matches the JSON structure.""" + for useCaseId, useCase in self.useCases.items(): + if self._matchesFormat(parsedJson, useCase): + return useCaseId + return None + + def _matchesFormat(self, json: Dict[str, Any], useCase: LoopingUseCase) -> bool: + """Check if JSON matches use case format.""" + for key in useCase.detectionKeys: + if key in json: + return True + + # Check nested path + if useCase.detectionPath: + try: + from jsonpath_ng import parse + jsonpath_expr = parse(useCase.detectionPath) + matches = [match.value for match in jsonpath_expr.find(json)] + if matches: + return True + except: + pass + + return False + + def _registerDefaultUseCases(self): + """Register default use cases.""" + + # Use Case 1: Section Content Generation + self.register(LoopingUseCase( + useCaseId="section_content", + jsonTemplate={"elements": []}, + detectionKeys=["elements"], + detectionPath="", + initialPromptBuilder=buildSectionContentPrompt, + continuationPromptBuilder=buildSectionContentContinuationPrompt, + accumulator=None, # Direct return, no accumulation + merger=None, + continuationContextBuilder=buildSectionContinuationContext, + resultBuilder=None, # Return JSON directly + supportsAccumulation=False, + requiresExtraction=False + )) + + # Use Case 2: Chapter Structure Generation + self.register(LoopingUseCase( + useCaseId="chapter_structure", + jsonTemplate={"documents": [{"chapters": []}]}, + detectionKeys=["chapters"], + detectionPath="documents[0].chapters", + initialPromptBuilder=buildChapterStructurePrompt, + continuationPromptBuilder=buildChapterStructureContinuationPrompt, + accumulator=None, # Direct return, no accumulation + merger=None, + continuationContextBuilder=buildChapterContinuationContext, + resultBuilder=None, # Return JSON directly + supportsAccumulation=False, + requiresExtraction=False + )) + + # Use Case 3: Document Structure Generation + self.register(LoopingUseCase( + useCaseId="document_structure", + jsonTemplate={"documents": [{"sections": []}]}, + detectionKeys=["sections"], + detectionPath="documents[0].sections", + initialPromptBuilder=buildDocumentStructurePrompt, + continuationPromptBuilder=buildDocumentStructureContinuationPrompt, + accumulator=accumulateDocumentSections, + merger=mergeDocumentSections, + continuationContextBuilder=buildDocumentContinuationContext, + resultBuilder=buildDocumentResultFromSections, + supportsAccumulation=True, + requiresExtraction=True + )) + + # Use Case 4: Code Structure Generation (NEW) + self.register(LoopingUseCase( + useCaseId="code_structure", + jsonTemplate={ + "metadata": { + "language": "", + "projectType": "single_file|multi_file", + "projectName": "" + }, + "files": [ + { + "id": "", + "filename": "", + "fileType": "", + "dependencies": [], # List of file IDs this file depends on + "imports": [], # List of import statements (for dependency extraction) + "functions": [], # Function signatures for cross-file references + "classes": [] # Class definitions for cross-file references + } + ] + }, + detectionKeys=["files"], + detectionPath="files", + initialPromptBuilder=buildCodeStructurePrompt, + continuationPromptBuilder=buildCodeStructureContinuationPrompt, + accumulator=None, # Direct return + merger=None, + continuationContextBuilder=buildCodeContinuationContext, + resultBuilder=None, + supportsAccumulation=False, + requiresExtraction=False + )) + + # Use Case 5: Code Content Generation (NEW) + self.register(LoopingUseCase( + useCaseId="code_content", + jsonTemplate={"files": [{"content": "", "functions": []}]}, + detectionKeys=["content", "functions"], + detectionPath="files[0].content", + initialPromptBuilder=buildCodeContentPrompt, + continuationPromptBuilder=buildCodeContentContinuationPrompt, + accumulator=accumulateCodeContent, + merger=mergeCodeContent, + continuationContextBuilder=buildCodeContentContinuationContext, + resultBuilder=buildCodeResultFromContent, + supportsAccumulation=True, + requiresExtraction=False + )) + + # Use Case 6: Image Batch Generation (NEW) + self.register(LoopingUseCase( + useCaseId="image_batch", + jsonTemplate={"images": []}, + detectionKeys=["images"], + detectionPath="images", + initialPromptBuilder=buildImageBatchPrompt, + continuationPromptBuilder=buildImageBatchContinuationPrompt, + accumulator=None, # Direct return + merger=None, + continuationContextBuilder=buildImageContinuationContext, + resultBuilder=None, + supportsAccumulation=False, + requiresExtraction=False + )) +``` + +#### Refactored Looping System + +**Refactor**: `subAiCallLooping.py` + +```python +class AiCallLooper: + """Generic looping system with parametrized use cases.""" + + def __init__(self, services, aiService, responseParser): + self.services = services + self.aiService = aiService + self.responseParser = responseParser + self.useCaseRegistry = LoopingUseCaseRegistry() + + async def callAiWithLooping( + self, + prompt: str, + options: AiCallOptions, + useCaseId: str, # REQUIRED: Explicit use case ID + debugPrefix: str = "ai_call", + promptArgs: Optional[Dict[str, Any]] = None, + operationId: Optional[str] = None, + userPrompt: Optional[str] = None, + contentParts: Optional[List[ContentPart]] = None + ) -> str: + """ + Generic looping system with parametrized use case. + + Args: + useCaseId: REQUIRED explicit use case ID (e.g., "code_structure", "document_structure", "section_content") + promptArgs: Optional arguments for prompt builders + ... (other args) + """ + maxIterations = 50 + iteration = 0 + accumulatedData = {} # Generic accumulation (replaces allSections) + lastRawResponse = None + + # Get use case (REQUIRED - no auto-detection) + useCase = self.useCaseRegistry.get(useCaseId) + if not useCase: + raise ValueError(f"Use case '{useCaseId}' not found in registry. Available use cases: {list(self.useCaseRegistry.useCases.keys())}") + + while iteration < maxIterations: + iteration += 1 + + # Build prompt using use case + if iteration == 1: + # Initial prompt + currentPrompt = useCase.initialPromptBuilder( + prompt=prompt, + **promptArgs or {} + ) + else: + # Continuation prompt + continuationContext = None + if useCase.continuationContextBuilder: + continuationContext = useCase.continuationContextBuilder( + accumulatedData, + lastRawResponse + ) + + currentPrompt = useCase.continuationPromptBuilder( + prompt=prompt, + continuationContext=continuationContext, + **promptArgs or {} + ) + + # Make AI call + result = await self._makeAiCall(currentPrompt, options, iteration, operationId, debugPrefix) + lastRawResponse = result + + # Process response based on use case + processedResult, isComplete, shouldContinue = await self._processUseCaseResponse( + result, + useCase, + accumulatedData, + iteration, + debugPrefix + ) + + if not shouldContinue: + return processedResult + + # Max iterations reached + logger.warning(f"Max iterations ({maxIterations}) reached") + return accumulatedData.get("finalResult", lastRawResponse) + + async def _processUseCaseResponse( + self, + result: str, + useCase: LoopingUseCase, + accumulatedData: Dict[str, Any], + iteration: int, + debugPrefix: str + ) -> Tuple[str, bool, bool]: + """Process response according to use case configuration.""" + + # Parse JSON + extractedJson = extractJsonString(result) + parsedJson, parseError, _ = tryParseJson(extractedJson) + + if parseError: + # JSON parsing failed - continue + return result, False, True + + # Check if use case requires extraction + if useCase.requiresExtraction: + # Extract data (e.g., sections from document structure) + extracted = self._extractData(parsedJson, useCase) + accumulatedData.setdefault("extracted", []).extend(extracted) + + # Check completeness + isComplete = self._isJsonComplete(parsedJson, useCase) + + # Accumulate if supported + if useCase.supportsAccumulation and useCase.accumulator: + accumulatedData = useCase.accumulator(accumulatedData, parsedJson, iteration) + + # Merge if supported + if useCase.merger and accumulatedData.get("extracted"): + accumulatedData["merged"] = useCase.merger(accumulatedData["extracted"], iteration) + + # Build result if complete + if isComplete: + if useCase.resultBuilder: + finalResult = useCase.resultBuilder(accumulatedData, useCase) + else: + # Direct return + finalResult = json.dumps(parsedJson, indent=2, ensure_ascii=False) + + accumulatedData["finalResult"] = finalResult + return finalResult, True, False + + # Not complete - continue + return result, False, True +``` + +--- + +## Part 3: Multiple Generation Paths + +### 3.1 Current State + +**Current**: Single document generation path in `serviceGeneration` + +**Structure**: +``` +serviceGeneration/ +├── mainServiceGeneration.py +├── subStructureGeneration.py (chapter structure) +├── subStructureFilling.py (section structure + content) +└── renderers/ (document rendering) +``` + +### 3.2 Proposed Multi-Path Architecture + +#### Enhanced Generation Service Structure + +``` +serviceGeneration/ +├── mainServiceGeneration.py # Main entry point, routes by intent +├── paths/ +│ ├── documentPath.py # Document generation path +│ ├── codePath.py # Code generation path (NEW) +│ ├── imagePath.py # Image generation path (NEW) +│ ├── videoPath.py # Video generation path (FUTURE) +│ └── audioPath.py # Audio generation path (FUTURE) +├── shared/ +│ ├── subStructureGeneration.py # Shared structure generation (if applicable) +│ ├── subContentGeneration.py # Shared content generation (if applicable) +│ └── subPromptBuilder.py # Shared prompt builders +└── renderers/ # Format-specific renderers + ├── document/ # Document renderers (existing) + ├── code/ # Code renderers (NEW) + └── image/ # Image renderers (NEW) +``` + +#### Main Service Entry Point + +**Refactor**: `mainServiceGeneration.py` + +```python +class GenerationService: + """Main generation service with multiple paths.""" + + def __init__(self, services): + self.services = services + self.documentPath = DocumentGenerationPath(services) + self.codePath = CodeGenerationPath(services) + self.imagePath = ImageGenerationPath(services) + # Future: videoPath, audioPath + + async def generate( + self, + userPrompt: str, + generationIntent: str, # "document" | "code" | "image" (detected at AI service level) + documentList: Optional[DocumentReferenceList] = None, + contentParts: Optional[List[ContentPart]] = None, + outputFormat: str = None, + **kwargs + ) -> AiResponse: + """ + Main entry point - routes to appropriate generation path. + + Args: + generationIntent: Intent detected at AI service level ("document" | "code" | "image") + + Returns: AiResponse with documents list (unified format) + """ + # Route to appropriate path based on generationIntent + if generationIntent == "code": + return await self.codePath.generateCode( + userPrompt=userPrompt, + contentParts=contentParts, + outputFormat=outputFormat, + **kwargs + ) + + elif generationIntent == "image": + return await self.imagePath.generateImages( + userPrompt=userPrompt, + outputFormat=outputFormat, + **kwargs + ) + + elif generationIntent == "document": + return await self.documentPath.generateDocument( + userPrompt=userPrompt, + documentList=documentList, + contentParts=contentParts, + outputFormat=outputFormat, + **kwargs + ) + + # Future paths... + else: + raise ValueError(f"Unsupported generationIntent: {generationIntent}") +``` + +#### Document Generation Path (Existing, Refactored) + +**File**: `paths/documentPath.py` + +```python +class DocumentGenerationPath: + """Document generation path (existing functionality, refactored).""" + + async def generateDocument( + self, + userPrompt: str, + documentList: Optional[DocumentReferenceList] = None, + outputFormat: str = "txt", + **kwargs + ) -> AiResponse: + """ + Generate document using existing chapter/section model. + + Returns: AiResponse with documents list + """ + # Phase 1: Chapter structure generation (with looping) + chapterStructure = await self._generateChapterStructure( + userPrompt=userPrompt, + contentParts=contentParts, + outputFormat=outputFormat + ) + + # Phase 2: Section structure generation (parallel) + sectionStructure = await self._generateSectionStructures(chapterStructure) + + # Phase 3: Content generation (with looping, parallel) + filledStructure = await self._generateContent(sectionStructure) + + # Phase 4: Rendering + renderedDocuments = await self._renderDocuments(filledStructure, outputFormat) + + # Return unified format + return AiResponse( + documents=renderedDocuments, + content=None, + metadata=AiResponseMetadata(title=title, filename=filename) + ) +``` + +#### Code Generation Path (NEW) + +**File**: `paths/codePath.py` + +```python +class CodeGenerationPath: + """Code generation path.""" + + async def generateCode( + self, + userPrompt: str, + language: str = None, + fileTypes: List[str] = None, + projectType: str = "single_file", + outputFormat: str = None, + **kwargs + ) -> AiResponse: + """ + Generate code files. + + Returns: AiResponse with code files as documents + """ + # Phase 1: Code structure generation (with looping) + codeStructure = await self._generateCodeStructure( + userPrompt=userPrompt, + language=language, + fileTypes=fileTypes, + projectType=projectType + ) + + # Phase 2: Code content generation (with looping, parallel per file) + codeFiles = await self._generateCodeContent(codeStructure) + + # Phase 3: Code formatting & validation + formattedFiles = await self._formatAndValidateCode(codeFiles) + + # Convert to unified document format + documents = [] + for file in formattedFiles: + documents.append(DocumentData( + documentName=file["filename"], + documentData=file["content"].encode('utf-8'), + mimeType=self._getMimeType(file["fileType"]), + sourceJson=file + )) + + return AiResponse( + documents=documents, + content=None, + metadata=AiResponseMetadata(title="Generated Code", filename=None) + ) + + async def _generateCodeStructure( + self, + userPrompt: str, + language: str, + fileTypes: List[str], + projectType: str + ) -> Dict[str, Any]: + """Generate code structure using looping system.""" + prompt = buildCodeStructurePrompt( + userPrompt=userPrompt, + language=language, + fileTypes=fileTypes, + projectType=projectType + ) + + # Use generic looping system with code_structure use case + structureJson = await self.services.ai._callAiWithLooping( + prompt=prompt, + options=AiCallOptions(operationType=OperationTypeEnum.DATA_GENERATE), + useCaseId="code_structure", # Use parametrized use case + debugPrefix="code_structure_generation", + promptArgs={ + "userPrompt": userPrompt, + "language": language, + "fileTypes": fileTypes + } + ) + + return json.loads(structureJson) + + async def _generateCodeContent( + self, + codeStructure: Dict[str, Any] + ) -> List[Dict[str, Any]]: + """Generate code content for each file with dependency handling.""" + files = codeStructure.get("files", []) + metadata = codeStructure.get("metadata", {}) + + # Step 1: Resolve dependency order + orderedFiles = self._resolveDependencyOrder(files) + + # Step 2: Generate dependency files first (requirements.txt, package.json, etc.) + dependencyFiles = await self._generateDependencyFiles(metadata, orderedFiles) + + # Step 3: Generate code files in dependency order (not fully parallel) + codeFiles = [] + generatedFileContext = {} # Track what's been generated for cross-file references + + for fileStructure in orderedFiles: + # Provide context about already-generated files for proper imports + fileContext = self._buildFileContext(generatedFileContext, fileStructure) + + # Generate this file with context + fileContent = await self._generateSingleFileContent( + fileStructure, + fileContext=fileContext, + allFilesStructure=orderedFiles + ) + + codeFiles.append(fileContent) + + # Update context with generated file info (for next files) + generatedFileContext[fileStructure["id"]] = { + "filename": fileContent.get("filename"), + "functions": fileContent.get("functions", []), + "classes": fileContent.get("classes", []), + "exports": fileContent.get("exports", []) + } + + # Combine dependency files and code files + return dependencyFiles + codeFiles + + def _resolveDependencyOrder(self, files: List[Dict[str, Any]]) -> List[Dict[str, Any]]: + """Resolve file generation order based on dependencies.""" + # Build dependency graph + fileMap = {f["id"]: f for f in files} + dependencies = {} + + for file in files: + fileId = file["id"] + deps = file.get("dependencies", []) # List of file IDs this file depends on + dependencies[fileId] = deps + + # Topological sort + ordered = [] + visited = set() + tempMark = set() + + def visit(fileId: str): + if fileId in tempMark: + # Circular dependency detected - break it + logger.warning(f"Circular dependency detected involving {fileId}") + return + if fileId in visited: + return + + tempMark.add(fileId) + for depId in dependencies.get(fileId, []): + if depId in fileMap: + visit(depId) + tempMark.remove(fileId) + visited.add(fileId) + ordered.append(fileMap[fileId]) + + for file in files: + if file["id"] not in visited: + visit(file["id"]) + + return ordered + + async def _generateDependencyFiles( + self, + metadata: Dict[str, Any], + files: List[Dict[str, Any]] + ) -> List[Dict[str, Any]]: + """Generate dependency files (requirements.txt, package.json, etc.).""" + language = metadata.get("language", "").lower() + dependencyFiles = [] + + # Extract all dependencies from files + allDependencies = set() + for file in files: + fileDeps = file.get("dependencies", []) + if isinstance(fileDeps, list): + allDependencies.update(fileDeps) + + # Generate requirements.txt for Python + if language in ["python", "py"]: + requirementsContent = await self._generateRequirementsTxt(files, allDependencies) + if requirementsContent: + dependencyFiles.append({ + "filename": "requirements.txt", + "content": requirementsContent, + "fileType": "txt", + "id": "requirements_txt" + }) + + # Generate package.json for JavaScript/TypeScript + elif language in ["javascript", "typescript", "js", "ts"]: + packageJson = await self._generatePackageJson(files, allDependencies, metadata) + if packageJson: + dependencyFiles.append({ + "filename": "package.json", + "content": json.dumps(packageJson, indent=2), + "fileType": "json", + "id": "package_json" + }) + + return dependencyFiles + + async def _generateRequirementsTxt( + self, + files: List[Dict[str, Any]], + dependencies: set + ) -> str: + """Generate requirements.txt content.""" + # Extract Python imports from file structures + pythonPackages = set() + for file in files: + imports = file.get("imports", []) + if isinstance(imports, list): + for imp in imports: + # Extract package name from import (e.g., "from flask import" -> "flask") + if isinstance(imp, str): + # Simple extraction - can be enhanced + if "import" in imp: + parts = imp.split("import") + if len(parts) > 0: + package = parts[0].strip().split("from")[-1].strip() + if package and not package.startswith("."): + pythonPackages.add(package) + + # Generate requirements.txt + if pythonPackages: + return "\n".join(sorted(pythonPackages)) + return None + + async def _generatePackageJson( + self, + files: List[Dict[str, Any]], + dependencies: set, + metadata: Dict[str, Any] + ) -> Optional[Dict[str, Any]]: + """Generate package.json content.""" + # Extract npm packages from file structures + npmPackages = {} + for file in files: + imports = file.get("imports", []) + if isinstance(imports, list): + for imp in imports: + # Extract npm package (e.g., "import express from 'express'" -> "express") + if isinstance(imp, str) and ("from" in imp or "require" in imp): + # Simple extraction - can be enhanced + if "from" in imp: + parts = imp.split("from") + if len(parts) > 1: + package = parts[1].strip().strip("'\"") + if package and not package.startswith("."): + npmPackages[package] = "*" # Default version + + if npmPackages: + return { + "name": metadata.get("projectName", "generated-project"), + "version": "1.0.0", + "dependencies": npmPackages + } + return None + + def _buildFileContext( + self, + generatedFileContext: Dict[str, Dict[str, Any]], + currentFile: Dict[str, Any] + ) -> Dict[str, Any]: + """Build context about other files for proper imports/references.""" + context = { + "availableFiles": [], + "availableFunctions": {}, + "availableClasses": {} + } + + # Add info about already-generated files + for fileId, fileInfo in generatedFileContext.items(): + context["availableFiles"].append({ + "id": fileId, + "filename": fileInfo["filename"], + "functions": fileInfo.get("functions", []), + "classes": fileInfo.get("classes", []), + "exports": fileInfo.get("exports", []) + }) + + # Build function/class maps for easy lookup + for func in fileInfo.get("functions", []): + funcName = func.get("name", "") + if funcName: + context["availableFunctions"][funcName] = { + "file": fileInfo["filename"], + "signature": func.get("signature", "") + } + + for cls in fileInfo.get("classes", []): + className = cls.get("name", "") + if className: + context["availableClasses"][className] = { + "file": fileInfo["filename"] + } + + return context + + async def _generateSingleFileContent( + self, + fileStructure: Dict[str, Any], + fileContext: Dict[str, Any] = None, + allFilesStructure: List[Dict[str, Any]] = None + ) -> Dict[str, Any]: + """Generate code content for a single file with context about other files.""" + # Build prompt with context about other files for proper imports + prompt = buildCodeContentPrompt( + fileStructure, + fileContext=fileContext, + allFilesStructure=allFilesStructure + ) + + # Use generic looping system with code_content use case + contentJson = await self.services.ai._callAiWithLooping( + prompt=prompt, + options=AiCallOptions(operationType=OperationTypeEnum.DATA_GENERATE), + useCaseId="code_content", # Use parametrized use case + debugPrefix=f"code_content_{fileStructure['id']}", + promptArgs={ + "fileStructure": fileStructure, + "fileContext": fileContext, + "allFilesStructure": allFilesStructure + } + ) + + parsed = json.loads(contentJson) + + # Extract function/class info for context building + parsed["functions"] = parsed.get("files", [{}])[0].get("functions", []) + parsed["classes"] = parsed.get("files", [{}])[0].get("classes", []) + + return parsed +``` + +#### Image Generation Path (NEW) + +**File**: `paths/imagePath.py` + +```python +class ImageGenerationPath: + """Image generation path.""" + + async def generateImages( + self, + userPrompt: str, + count: int = 1, + style: str = None, + format: str = "png", + **kwargs + ) -> AiResponse: + """ + Generate image files. + + Returns: AiResponse with image files as documents + """ + # Phase 1: Image prompt generation (if multiple images) + if count > 1: + imagePrompts = await self._generateImagePrompts(userPrompt, count, style) + else: + imagePrompts = [userPrompt] + + # Phase 2: Generate images (parallel) + images = await self._generateImagesParallel(imagePrompts, format) + + # Convert to unified document format + documents = [] + for i, imageData in enumerate(images): + documents.append(DocumentData( + documentName=f"image_{i+1}.{format}", + documentData=imageData, # Already bytes + mimeType=f"image/{format}", + sourceJson={"prompt": imagePrompts[i], "index": i} + )) + + return AiResponse( + documents=documents, + content=None, + metadata=AiResponseMetadata(title="Generated Images", filename=None) + ) + + async def _generateImagesParallel( + self, + imagePrompts: List[str], + format: str + ) -> List[bytes]: + """Generate multiple images in parallel.""" + tasks = [] + for prompt in imagePrompts: + task = self._generateSingleImage(prompt, format) + tasks.append(task) + + images = await asyncio.gather(*tasks) + return images + + async def _generateSingleImage( + self, + prompt: str, + format: str + ) -> bytes: + """Generate a single image.""" + # Use IMAGE_GENERATE operation + request = AiCallRequest( + prompt=prompt, + options=AiCallOptions( + operationType=OperationTypeEnum.IMAGE_GENERATE, + resultFormat="base64" + ) + ) + + response = await self.services.ai.callAi(request) + + # Decode base64 to bytes + import base64 + imageBytes = base64.b64decode(response.content) + return imageBytes +``` + +--- + +## Part 4: Unified Document Output + +### 4.1 Current State + +**Current State**: ✅ All actions already return unified `ActionResult` format with `ActionDocument` objects + +**Note**: The unification needed is at the **AI Service level** (`AiResponse`), not at the action level. Actions already convert `AiResponse` to `ActionResult` consistently. + +### 4.2 AI Service Level Format + +**Current**: ✅ All AI service paths already return unified `AiResponse` format + +**Format** (already exists): +```python +@dataclass +class DocumentData: + """Unified document data structure (already exists).""" + documentName: str # Filename + documentData: bytes # File content (bytes) + mimeType: str # MIME type (e.g., "text/html", "image/png", "application/pdf") + sourceJson: Optional[Dict[str, Any]] = None # Source JSON structure (if applicable) + +@dataclass +class AiResponse: + """Unified AI response format (already exists).""" + documents: List[DocumentData] # List of generated documents + content: Optional[str] = None # Optional text content + metadata: Optional[AiResponseMetadata] = None +``` + +**Requirement**: Ensure all new generation paths (code, image) return `AiResponse` in this format (same as document path) + +### 4.3 Action Result Integration + +**Current**: ✅ All actions already convert `AiResponse` to `ActionResult` consistently + +**Pattern** (already implemented in all actions): +```python +# All actions follow this pattern (existing code): + +async def execute(self, parameters: Dict[str, Any]) -> ActionResult: + # Call AI service - returns AiResponse + aiResponse = await self.services.ai.callAiContent(...) + + # Convert AiResponse to ActionDocument (unified format) + documents = [] + for docData in aiResponse.documents: + documents.append(ActionDocument( + documentName=docData.documentName, + documentData=docData.documentData, + mimeType=docData.mimeType, + sourceJson=docData.sourceJson + )) + + return ActionResult.isSuccess(documents=documents) # ✅ Already unified +``` + +**Note**: +- ✅ Actions already return unified `ActionResult` format +- ✅ No changes needed at action level +- ✅ Focus: Ensure new AI service paths (code, image) return `AiResponse` consistently + +--- + +## Part 5: Implementation Plan + +### Phase 1: Foundation (Weeks 1-2) + +1. **Explicit Intent Requirement at AI Service Level** + - **Note**: No `IntentDetector` class needed - intent comes explicitly from actions + - Integrate `generationIntent` parameter into `callAiContent()` method + - Add `_handleCodeGeneration()` and `_handleDocumentGeneration()` methods + - Update `ai.process` to detect image formats from `resultType` (format detection, not intent detection) + - Require explicit `generationIntent` for all `DATA_GENERATE` operations + - Test with various actions (generateDocument, generateCode, process) + - Verify `IMAGE_GENERATE` still works correctly (no changes) + +2. **Generic Looping System** + - Create `LoopingUseCase` dataclass + - Create `LoopingUseCaseRegistry` + - Register existing use cases (section_content, chapter_structure, document_structure) + - Refactor `subAiCallLooping.py` to use registry + +### Phase 2: Code Generation (Weeks 3-4) + +1. **Code Generation Path** + - Create `paths/codePath.py` + - Implement code structure generation + - Implement code content generation + - Register code use cases in looping registry + - Create `ai.generateCode` action + +2. **Integration** + - Integrate code path into `mainServiceGeneration.py` + - Test code generation end-to-end + - Validate code output quality + +### Phase 3: Image Generation (Weeks 5-6) + +1. **Image Generation Path** + - Create `paths/imagePath.py` + - Implement standalone image generation + - Support batch image generation + - Register image use cases in looping registry + - Create `ai.generateImages` action + +2. **Integration** + - Integrate image path into `mainServiceGeneration.py` + - Test image generation end-to-end + - Validate image output quality + +### Phase 4: Refinement (Weeks 7-8) + +1. **Unified Output** + - Ensure all paths return unified `AiResponse` format + - Standardize action result handling + - Test cross-path compatibility + +2. **Documentation & Testing** + - Document all use cases + - Add unit tests for looping system + - Add integration tests for each path + - Performance testing + +--- + +## Part 6: Migration Strategy + +### Clean Implementation + +1. **No Legacy Code**: Remove old prompt builder parameters completely +2. **Clear Use Cases**: All calls must specify explicit `useCaseId` +3. **No Fallback**: Fail fast if use case not found or intent missing + +### Testing Strategy + +1. **Unit Tests**: Test each use case independently +2. **Integration Tests**: Test full generation flows +3. **Use Case Tests**: Test all registered use cases +4. **Performance Tests**: Compare performance before/after + +--- + +## Part 7: Future Extensions + +### Video Generation Path (Future) + +- Similar structure to image path +- Video structure planning (scenes, transitions) +- Frame-by-frame generation +- Video encoding + +### Audio Generation Path (Future) + +- Similar structure to image path +- Text-to-speech generation +- Music generation +- Audio file output + +### Additional Use Cases + +- Easy to add new use cases to registry +- Just register new `LoopingUseCase` configuration +- No changes to core looping system needed + +--- + +## Part 8: Critical Cross-Check + +### 8.1 Codebase Verification + +**✅ Multiple Files Support**: +- Current system already supports multiple documents via `renderReport()` → returns `List[RenderedDocument]` +- HTML renderer creates multiple files (HTML + images) as separate documents +- Code generation path enhanced to generate multiple code files + dependency files + +**✅ Code Generation Intelligence**: + +1. **Dependency Handling**: + - ✅ Code structure includes `dependencies` field (list of file IDs) + - ✅ `_resolveDependencyOrder()` implements topological sort for proper generation order + - ✅ Handles circular dependencies gracefully + - ✅ Files generated sequentially based on dependencies (not fully parallel) + +2. **Requirements/Dependencies Files**: + - ✅ `_generateDependencyFiles()` generates: + - `requirements.txt` for Python projects (extracts packages from imports) + - `package.json` for JavaScript/TypeScript projects (extracts npm packages) + - ✅ Dependency files generated BEFORE code files + - ✅ Extracts dependencies from file structures' `imports` field + +3. **Cross-File References**: + - ✅ `_buildFileContext()` provides context about already-generated files + - ✅ Tracks functions, classes, and exports from each file + - ✅ Context passed to each file generation for proper imports + - ✅ `fileContext` includes: + - Available files and their exports + - Function signatures for proper imports + - Class definitions for proper imports + +4. **File Structure Template**: + ```json + { + "metadata": { + "language": "python|javascript|typescript", + "projectType": "single_file|multi_file", + "projectName": "..." + }, + "files": [ + { + "id": "file_1", + "filename": "main.py", + "fileType": "py", + "dependencies": ["file_2"], // File IDs this depends on + "imports": ["from utils import helper"], // For dependency extraction + "functions": [{"name": "main", "signature": "..."}], + "classes": [{"name": "MyClass", "signature": "..."}] + } + ] + } + ``` + +### 8.2 Architecture Validation + +**✅ Smart Enough for Multi-File Projects**: +- ✅ Dependency resolution ensures proper order +- ✅ Requirements.txt/package.json automatically generated +- ✅ Cross-file context enables proper imports/references +- ✅ Function/class tracking enables accurate references +- ✅ Sequential generation with context accumulation + +**✅ Current Codebase Compatibility**: +- ✅ Uses existing `List[RenderedDocument]` pattern +- ✅ Follows existing `AiResponse` → `ActionResult` conversion +- ✅ Compatible with existing document processing pipeline +- ✅ No breaking changes to existing document generation + +**✅ Potential Enhancements** (Future): +- More sophisticated import parsing (AST-based) +- Support for more dependency file types (Cargo.toml, go.mod, etc.) +- Parallel generation of independent files (files without dependencies) +- Validation of imports against generated files + +--- + +## Conclusion + +This refactoring provides: + +1. ✅ **AI Service-Level Intent Detection**: Detect document vs code when `DATA_GENERATE` is called - workflow unchanged +2. ✅ **Generic Looping System**: Parametrized, extensible, supports all JSON formats +3. ✅ **Multiple Generation Paths**: Document, code, image paths (extensible to video/audio) +4. ✅ **Unified Output**: All paths return same format, unified as action result documents +5. ✅ **Smart Code Generation**: Multi-file projects with dependencies, requirements.txt, and proper references + +**Benefits**: +- **Minimal Changes**: Workflow level (task/action planning) remains unchanged +- **Correct Level**: Intent detection at AI service level where generation happens +- **Clean Architecture**: Separation of concerns - workflow handles planning, AI service handles generation +- **Easy to Extend**: New intents can be added by registering new use cases +- **Clear Code**: No legacy code, no deprecated parameters, no fallback logic +- **Well-tested Foundation**: Changes isolated to AI service layer +- **Smart Code Generation**: Handles complex multi-file projects with dependencies + +**Next Steps**: +1. Review and approve architecture +2. Start Phase 1 implementation +3. Iterate based on feedback +