# Detailed Design: Hierarchical Document Generation with Image Integration ## Table of Contents 1. [Architecture Overview](#architecture-overview) 2. [Data Structures](#data-structures) 3. [Component Design](#component-design) 4. [API Design](#api-design) 5. [Image Handling](#image-handling) 6. [Progress Logging](#progress-logging) 7. [Error Handling](#error-handling) 8. [Performance Considerations](#performance-considerations) ## Architecture Overview ### System Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ User Request: generateDocument │ │ Parameters: prompt, documentList, resultType, etc. │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Phase 1: Structure Generation │ │ - Extract content from documentList (if provided) │ │ - Cache extracted content │ │ - Generate document skeleton with sections │ │ - Identify section complexity │ │ - Create generation hints │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Phase 2: Content Generation (Parallel) │ │ │ │ Simple Sections (heading, short paragraph): │ │ ┌────────────────────────────────────────┐ │ │ │ Generate content directly via AI │ │ │ │ Populate elements array │ │ │ └────────────────────────────────────────┘ │ │ │ │ Complex Sections (image, long chapter): │ │ ┌────────────────────────────────────────┐ │ │ │ Create sub-prompt │ │ │ │ Generate content (text or image) │ │ │ │ Store in elements array │ │ │ └────────────────────────────────────────┘ │ │ │ │ Progress Updates: │ │ - "Generating section X/Y..." │ │ - "Generating image for section X..." │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Phase 3: Integration & Rendering │ │ - Validate all sections have content │ │ - Merge generated content into structure │ │ - Replace placeholders with actual data │ │ - Render to target format (docx, pdf, html, etc.) │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Final Document(s) │ │ - Single document (docx, pdf, html, etc.) │ │ - Or multiple files (html + image files) │ └─────────────────────────────────────────────────────────────┘ ``` ## Data Structures ### Document Structure (Phase 1 Output) ```python { "metadata": { "title": str, "split_strategy": str, # "single_document" | "multi_document" "source_documents": List[str], "extraction_method": str }, "documents": [ { "id": str, "title": str, "filename": str, "sections": [ { "id": str, "content_type": str, # "heading" | "paragraph" | "image" | "table" | "bullet_list" | "code_block" "complexity": str, # "simple" | "complex" "generation_hint": str, "image_prompt": Optional[str], # Only for image sections "order": int, "elements": [], # Empty initially, populated in Phase 2 "metadata": Optional[Dict[str, Any]] } ] } ] } ``` ### Section Content (Phase 2 Output) **Simple Section (heading)**: ```python { "id": "section_title", "content_type": "heading", "elements": [ { "level": int, "text": str } ], "order": 1 } ``` **Simple Section (paragraph)**: ```python { "id": "section_intro", "content_type": "paragraph", "elements": [ { "text": str } ], "order": 2 } ``` **Complex Section (image)**: ```python { "id": "section_image_1", "content_type": "image", "elements": [ { "url": "data:image/png;base64,", "base64Data": str, # Full base64 encoded image "altText": str, "caption": Optional[str] } ], "order": 3 } ``` **Error Section**: ```python { "id": "section_failed_4", "content_type": "paragraph", "elements": [ { "text": f"[ERROR: Failed to generate content for this section. Error: {error_message}]" } ], "order": 4, "error": True, "errorMessage": str, "originalContentType": str # Original content_type that failed } ``` ### Content Cache ```python { "extractedContent": List[ContentPart], # From extraction service "extractionTimestamp": float, "sourceDocuments": List[str] # Document IDs } ``` ### Generation Context ```python { "userPrompt": str, "cachedContent": ContentCache, "previousSections": List[Dict[str, Any]], # Already generated sections "targetSection": Dict[str, Any], # Section to generate "documentMetadata": Dict[str, Any] } ``` ## Component Design ### 1. StructureGenerator **Purpose**: Generate document skeleton with section placeholders **Location**: `poweron/gateway/modules/services/serviceGeneration/subStructureGenerator.py` **Methods**: ```python class StructureGenerator: async def generateStructure( self, userPrompt: str, documentList: Optional[DocumentReferenceList], cachedContent: Optional[ContentCache], services: Any ) -> Dict[str, Any]: """ Generate document structure with sections. Returns: Document structure with empty elements arrays """ def _createStructurePrompt( self, userPrompt: str, cachedContent: Optional[ContentCache], services: Any ) -> str: """ Create prompt for structure generation. """ def _identifySectionComplexity( self, section: Dict[str, Any], userPrompt: str ) -> str: """ Identify if section is simple or complex. Rules: - Images: always complex - Long chapters (>maxSectionLength words): complex - Others: simple """ def _extractImagePrompts( self, structure: Dict[str, Any], userPrompt: str ) -> Dict[str, str]: """ Extract image generation prompts from structure and user prompt. Maps section_id -> image_prompt """ ``` ### 2. ContentGenerator **Purpose**: Generate content for each section **Location**: `poweron/gateway/modules/services/serviceGeneration/subContentGenerator.py` **Methods**: ```python class ContentGenerator: async def generateContent( self, structure: Dict[str, Any], cachedContent: Optional[ContentCache], userPrompt: str, services: Any, progressCallback: Optional[Callable] = None ) -> Dict[str, Any]: """ Generate content for all sections in structure. Args: structure: Document structure from Phase 1 cachedContent: Extracted content cache userPrompt: Original user prompt services: Services instance progressCallback: Function to call for progress updates Returns: Complete document structure with populated elements """ async def _generateSectionContent( self, section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate content for a single section. Returns: Section with populated elements array """ async def _generateSimpleSection( self, section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate content for simple section (heading, paragraph). """ async def _generateImageSection( self, section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate image for image section. Calls ai.generate action with image generation. """ async def _generateComplexTextSection( self, section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate content for complex text section (long chapter). Uses focused sub-prompt. """ async def _generateSectionsParallel( self, sections: List[Dict[str, Any]], context: GenerationContext, services: Any, progressCallback: Optional[Callable] = None ) -> List[Dict[str, Any]]: """ Generate content for multiple sections in parallel. Uses asyncio.gather for parallel execution. """ def _createSectionPrompt( self, section: Dict[str, Any], context: GenerationContext ) -> str: """ Create sub-prompt for section content generation. """ ``` ### 3. ContentIntegrator **Purpose**: Merge generated content and render final document **Location**: `poweron/gateway/modules/services/serviceGeneration/subContentIntegrator.py` **Methods**: ```python class ContentIntegrator: def integrateContent( self, structure: Dict[str, Any], generatedSections: List[Dict[str, Any]] ) -> Dict[str, Any]: """ Merge generated sections into document structure. Returns: Complete document structure ready for rendering """ def validateCompleteness( self, document: Dict[str, Any] ) -> Tuple[bool, List[str]]: """ Validate that all sections have content. Returns: (is_complete, list_of_missing_sections) """ def createErrorSection( self, originalSection: Dict[str, Any], errorMessage: str ) -> Dict[str, Any]: """ Create error placeholder section. """ ``` ### 4. Modified generateDocument Action **Location**: `poweron/gateway/modules/workflows/methods/methodAi/actions/generateDocument.py` **Changes**: ```python @action async def generateDocument(self, parameters: Dict[str, Any]) -> ActionResult: """ Generate documents using hierarchical approach. """ # Extract parameters prompt = parameters.get("prompt") documentList = parameters.get("documentList", []) resultType = parameters.get("resultType", "docx") maxSectionLength = parameters.get("maxSectionLength", 500) parallelGeneration = parameters.get("parallelGeneration", True) progressLogging = parameters.get("progressLogging", True) # Create operation ID for progress tracking operationId = f"doc_gen_{self.services.workflow.id}_{int(time.time())}" parentOperationId = parameters.get('parentOperationId') try: # Phase 1: Structure Generation if progressLogging: self.services.chat.progressLogStart( operationId, "Document", "Structure Generation", "Generating document structure...", parentOperationId=parentOperationId ) structureGenerator = StructureGenerator(self.services) # Extract and cache content if documentList provided cachedContent = None if documentList: # Extract content once chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(documentList) if chatDocuments: extractionOptions = ExtractionOptions( prompt="Extract all content from documents", mergeStrategy=MergeStrategy(mergeType="concatenate") ) extractedResults = self.services.extraction.extractContent( chatDocuments, extractionOptions ) cachedContent = { "extractedContent": extractedResults, "extractionTimestamp": time.time(), "sourceDocuments": [doc.id for doc in chatDocuments] } # Generate structure structure = await structureGenerator.generateStructure( userPrompt=prompt, documentList=documentList, cachedContent=cachedContent, services=self.services ) if progressLogging: self.services.chat.progressLogUpdate(operationId, 0.33, "Structure generated") # Phase 2: Content Generation if progressLogging: self.services.chat.progressLogUpdate( operationId, 0.34, "Starting content generation..." ) contentGenerator = ContentGenerator(self.services) def progressCallback(sectionIndex: int, totalSections: int, message: str): if progressLogging: progress = 0.34 + (0.56 * (sectionIndex / totalSections)) self.services.chat.progressLogUpdate( operationId, progress, f"Section {sectionIndex}/{totalSections}: {message}" ) completeStructure = await contentGenerator.generateContent( structure=structure, cachedContent=cachedContent, userPrompt=prompt, services=self.services, progressCallback=progressCallback ) if progressLogging: self.services.chat.progressLogUpdate(operationId, 0.90, "Content generated") # Phase 3: Integration & Rendering if progressLogging: self.services.chat.progressLogUpdate( operationId, 0.91, "Rendering final document..." ) # Use existing renderReport method title = structure.get("metadata", {}).get("title", "Generated Document") renderedContent, mimeType = await self.services.generation.renderReport( extractedContent=completeStructure, outputFormat=resultType, title=title, userPrompt=prompt, aiService=self.services.ai ) # Create document document = self.services.generation._createDocument( fileName=f"document.{resultType}", mimeType=mimeType, content=renderedContent, base64encoded=(mimeType not in ["text/plain", "text/html", "text/markdown"]), messageId=None ) if progressLogging: self.services.chat.progressLogFinish(operationId, True) return ActionResult.isSuccess( documents=[ActionDocument( documentName=f"document.{resultType}", documentData=renderedContent, mimeType=mimeType )] ) except Exception as e: logger.error(f"Error in hierarchical document generation: {str(e)}") if progressLogging: self.services.chat.progressLogFinish(operationId, False) return ActionResult.isFailure(error=str(e)) ``` ## API Design ### Structure Generation Prompt ```python def _createStructurePrompt( userPrompt: str, cachedContent: Optional[ContentCache], services: Any ) -> str: """ Create prompt for structure generation. """ prompt = f""" {'='*80} USER REQUEST: {'='*80} {userPrompt} {'='*80} TASK: Generate a document STRUCTURE (skeleton) with sections. Do NOT generate actual content yet - only the structure. {'='*80} EXTRACTED CONTENT (if available): {'='*80} {_formatCachedContent(cachedContent) if cachedContent else "No source documents provided."} {'='*80} INSTRUCTIONS: 1. Analyze the user request and extracted content 2. Create a document structure with sections 3. For each section, specify: - id: Unique identifier - content_type: "heading" | "paragraph" | "image" | "table" | "bullet_list" | "code_block" - complexity: "simple" (can generate directly) or "complex" (needs sub-prompt) - generation_hint: Brief description of what content should be generated - image_prompt: (only for image sections) Detailed prompt for image generation - order: Section order number - elements: [] (empty array - will be populated later) 4. Identify image sections: - If user requests illustrations/images, create image sections - Add image_prompt field with detailed description - Set complexity to "complex" 5. Identify complex text sections: - Long chapters (>500 words expected) should be marked as "complex" - Short paragraphs/headings should be "simple" 6. Return ONLY valid JSON following this structure: {{ "metadata": {{ "title": "Document Title", "split_strategy": "single_document", "source_documents": [], "extraction_method": "ai_generation" }}, "documents": [ {{ "id": "doc_1", "title": "Document Title", "filename": "document.json", "sections": [ {{ "id": "section_1", "content_type": "heading", "complexity": "simple", "generation_hint": "Main title", "order": 1, "elements": [] }}, {{ "id": "section_2", "content_type": "image", "complexity": "complex", "generation_hint": "Illustration for chapter 1", "image_prompt": "Detailed description for image generation", "order": 2, "elements": [] }} ] }} ] }} Return ONLY the JSON structure. No explanations. """ return prompt ``` ### Section Content Generation Prompt ```python def _createSectionPrompt( section: Dict[str, Any], context: GenerationContext ) -> str: """ Create sub-prompt for section content generation. """ sectionType = section.get("content_type") generationHint = section.get("generation_hint", "") prompt = f""" {'='*80} SECTION TO GENERATE: {'='*80} Type: {sectionType} Hint: {generationHint} {'='*80} CONTEXT: - User Request: {context.userPrompt} - Previous Sections: {len(context.previousSections)} sections already generated - Document Title: {context.documentMetadata.get('title', 'Unknown')} {'='*80} EXTRACTED CONTENT (if available): {'='*80} {_formatCachedContent(context.cachedContent) if context.cachedContent else "None"} {'='*80} TASK: Generate content for this section ONLY. INSTRUCTIONS: 1. Generate content appropriate for section type: {sectionType} 2. Use the generation hint: {generationHint} 3. Consider previous sections for continuity 4. Use extracted content if relevant 5. Return ONLY the elements array for this section: For heading: {{ "elements": [ {{"level": 1, "text": "Heading Text"}} ] }} For paragraph: {{ "elements": [ {{"text": "Paragraph text content"}} ] }} For image: {{ "elements": [ {{ "url": "data:image/png;base64,", "base64Data": "", "altText": "Image description", "caption": "Optional caption" }} ] }} Return ONLY the elements array as JSON. No other text. """ return prompt ``` ## Image Handling ### Image Generation Flow ```python async def _generateImageSection( section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate image for image section. """ imagePrompt = section.get("image_prompt") if not imagePrompt: raise ValueError(f"Image section {section.get('id')} missing image_prompt") # Call ai.generate action with image generation from modules.workflows.methods.methodAi.actions.generate import generate generateParams = { "prompt": imagePrompt, "resultType": "png", "parentOperationId": context.operationId } result = await generate(self=services.ai, parameters=generateParams) if not result.success or not result.documents: raise ValueError(f"Image generation failed: {result.error}") # Extract base64 image data imageDoc = result.documents[0] base64Data = imageDoc.documentData # Create image element section["elements"] = [{ "url": f"data:image/png;base64,{base64Data}", "base64Data": base64Data, "altText": section.get("generation_hint", "Image"), "caption": section.get("metadata", {}).get("caption") }] return section ``` ### HTML Renderer Image Handling **Location**: `poweron/gateway/modules/services/serviceGeneration/renderers/rendererHtml.py` **Changes**: ```python async def render( self, extractedContent: Dict[str, Any], title: str, userPrompt: str = None, aiService=None ) -> Tuple[str, str]: """ Render HTML with separate image files. Returns: (html_content, mime_type) """ # Generate HTML htmlContent = await self._generateHtmlFromJson(...) # Extract images and create separate files images = self._extractImages(extractedContent) if images: # Create image files imageFiles = [] for idx, imageData in enumerate(images): base64Data = imageData.get("base64Data") if base64Data: # Decode base64 imageBytes = base64.b64decode(base64Data) # Create filename filename = f"image_{idx + 1}.png" # Update HTML to use relative path htmlContent = htmlContent.replace( f'data:image/png;base64,{base64Data}', filename ) imageFiles.append({ "filename": filename, "content": imageBytes, "mimeType": "image/png" }) # Return HTML + image files info # Note: This requires modification to return multiple files # For now, embed base64 (will be updated in implementation) return htmlContent, "text/html" return htmlContent, "text/html" def _extractImages(self, jsonContent: Dict[str, Any]) -> List[Dict[str, Any]]: """ Extract all images from JSON structure. """ images = [] documents = jsonContent.get("documents", []) if not documents: sections = jsonContent.get("sections", []) documents = [{"sections": sections}] for doc in documents: sections = doc.get("sections", []) for section in sections: if section.get("content_type") == "image": elements = section.get("elements", []) for element in elements: if element.get("base64Data"): images.append(element) return images ``` ## Progress Logging ### Progress Stages ```python PROGRESS_STAGES = { "structure_generation": { "start": 0.0, "end": 0.33, "messages": [ "Extracting content from documents...", "Generating document structure...", "Structure generated" ] }, "content_generation": { "start": 0.34, "end": 0.90, "messages": [ "Starting content generation...", "Generating section {current}/{total}...", "Generating image for section {section_id}...", "Content generated" ] }, "integration_rendering": { "start": 0.91, "end": 1.0, "messages": [ "Rendering final document...", "Document complete" ] } } ``` ### Progress Callback Implementation ```python def createProgressCallback( operationId: str, totalSections: int, services: Any ) -> Callable: """ Create progress callback function. """ def progressCallback( sectionIndex: int, totalSections: int, message: str ): # Calculate progress baseProgress = 0.34 # Start of content generation phase phaseProgress = 0.56 # Length of content generation phase sectionProgress = (sectionIndex / totalSections) * phaseProgress currentProgress = baseProgress + sectionProgress # Update progress log services.chat.progressLogUpdate( operationId, currentProgress, f"Section {sectionIndex}/{totalSections}: {message}" ) return progressCallback ``` ## Error Handling ### Error Section Creation ```python def createErrorSection( originalSection: Dict[str, Any], errorMessage: str ) -> Dict[str, Any]: """ Create error placeholder section. """ return { "id": originalSection.get("id", "unknown"), "content_type": "paragraph", # Change to paragraph for error display "elements": [{ "text": f"[ERROR: Failed to generate {originalSection.get('content_type', 'content')} for section '{originalSection.get('id', 'unknown')}'. Error: {errorMessage}]" }], "order": originalSection.get("order", 0), "error": True, "errorMessage": errorMessage, "originalContentType": originalSection.get("content_type") } ``` ### Error Handling in Content Generation ```python async def _generateSectionContent( self, section: Dict[str, Any], context: GenerationContext, services: Any ) -> Dict[str, Any]: """ Generate content for a single section with error handling. """ try: complexity = section.get("complexity", "simple") contentType = section.get("content_type") if contentType == "image": return await self._generateImageSection(section, context, services) elif complexity == "complex": return await self._generateComplexTextSection(section, context, services) else: return await self._generateSimpleSection(section, context, services) except Exception as e: logger.error(f"Error generating section {section.get('id')}: {str(e)}") return createErrorSection(section, str(e)) ``` ## Performance Considerations ### Parallel Generation ```python async def _generateSectionsParallel( self, sections: List[Dict[str, Any]], context: GenerationContext, services: Any, progressCallback: Optional[Callable] = None ) -> List[Dict[str, Any]]: """ Generate content for multiple sections in parallel. """ async def generateWithProgress(section: Dict[str, Any], index: int): if progressCallback: progressCallback(index + 1, len(sections), f"Generating {section.get('content_type')}...") return await self._generateSectionContent(section, context, services) # Generate all sections in parallel results = await asyncio.gather( *[generateWithProgress(section, idx) for idx, section in enumerate(sections)], return_exceptions=True ) # Handle exceptions generatedSections = [] for idx, result in enumerate(results): if isinstance(result, Exception): logger.error(f"Error generating section {idx}: {str(result)}") generatedSections.append( createErrorSection(sections[idx], str(result)) ) else: generatedSections.append(result) return generatedSections ``` ### Batch Processing for Large Documents ```python async def generateContent( self, structure: Dict[str, Any], cachedContent: Optional[ContentCache], userPrompt: str, services: Any, progressCallback: Optional[Callable] = None, batchSize: int = 10 ) -> Dict[str, Any]: """ Generate content with batching for large documents. """ documents = structure.get("documents", []) for doc in documents: sections = doc.get("sections", []) # Process in batches for batchStart in range(0, len(sections), batchSize): batch = sections[batchStart:batchStart + batchSize] # Generate batch in parallel generatedBatch = await self._generateSectionsParallel( batch, context, services, progressCallback ) # Update sections for idx, generated in enumerate(generatedBatch): sections[batchStart + idx] = generated return structure ``` ## Testing Strategy ### Unit Tests 1. **StructureGenerator Tests**: - Test structure generation with/without source documents - Test complexity identification - Test image prompt extraction 2. **ContentGenerator Tests**: - Test simple section generation - Test image section generation - Test complex text section generation - Test parallel generation - Test error handling 3. **ContentIntegrator Tests**: - Test content merging - Test validation - Test error section creation ### Integration Tests 1. **End-to-End Tests**: - Test complete document generation flow - Test with images - Test with long documents - Test error scenarios 2. **Renderer Tests**: - Test HTML renderer with separate image files - Test PDF renderer with embedded images - Test XLSX/PPTX renderers with images ### Performance Tests 1. **Large Document Tests**: - Test with 100+ sections - Test parallel generation performance - Test memory usage 2. **Image Generation Tests**: - Test multiple images - Test large images - Test image generation failures