32 KiB
Detailed Design: Hierarchical Document Generation with Image Integration
Table of Contents
- Architecture Overview
- Data Structures
- Component Design
- API Design
- Image Handling
- Progress Logging
- Error Handling
- Performance Considerations
Architecture Overview
System Flow
┌─────────────────────────────────────────────────────────────┐
│ User Request: generateDocument │
│ Parameters: prompt, documentList, resultType, etc. │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Structure Generation │
│ - Extract content from documentList (if provided) │
│ - Cache extracted content │
│ - Generate document skeleton with sections │
│ - Identify section complexity │
│ - Create generation hints │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Content Generation (Parallel) │
│ │
│ Simple Sections (heading, short paragraph): │
│ ┌────────────────────────────────────────┐ │
│ │ Generate content directly via AI │ │
│ │ Populate elements array │ │
│ └────────────────────────────────────────┘ │
│ │
│ Complex Sections (image, long chapter): │
│ ┌────────────────────────────────────────┐ │
│ │ Create sub-prompt │ │
│ │ Generate content (text or image) │ │
│ │ Store in elements array │ │
│ └────────────────────────────────────────┘ │
│ │
│ Progress Updates: │
│ - "Generating section X/Y..." │
│ - "Generating image for section X..." │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Integration & Rendering │
│ - Validate all sections have content │
│ - Merge generated content into structure │
│ - Replace placeholders with actual data │
│ - Render to target format (docx, pdf, html, etc.) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Final Document(s) │
│ - Single document (docx, pdf, html, etc.) │
│ - Or multiple files (html + image files) │
└─────────────────────────────────────────────────────────────┘
Data Structures
Document Structure (Phase 1 Output)
{
"metadata": {
"title": str,
"split_strategy": str, # "single_document" | "multi_document"
"source_documents": List[str],
"extraction_method": str
},
"documents": [
{
"id": str,
"title": str,
"filename": str,
"sections": [
{
"id": str,
"content_type": str, # "heading" | "paragraph" | "image" | "table" | "bullet_list" | "code_block"
"complexity": str, # "simple" | "complex"
"generation_hint": str,
"image_prompt": Optional[str], # Only for image sections
"order": int,
"elements": [], # Empty initially, populated in Phase 2
"metadata": Optional[Dict[str, Any]]
}
]
}
]
}
Section Content (Phase 2 Output)
Simple Section (heading):
{
"id": "section_title",
"content_type": "heading",
"elements": [
{
"level": int,
"text": str
}
],
"order": 1
}
Simple Section (paragraph):
{
"id": "section_intro",
"content_type": "paragraph",
"elements": [
{
"text": str
}
],
"order": 2
}
Complex Section (image):
{
"id": "section_image_1",
"content_type": "image",
"elements": [
{
"url": "data:image/png;base64,<base64_data>",
"base64Data": str, # Full base64 encoded image
"altText": str,
"caption": Optional[str]
}
],
"order": 3
}
Error Section:
{
"id": "section_failed_4",
"content_type": "paragraph",
"elements": [
{
"text": f"[ERROR: Failed to generate content for this section. Error: {error_message}]"
}
],
"order": 4,
"error": True,
"errorMessage": str,
"originalContentType": str # Original content_type that failed
}
Content Cache
{
"extractedContent": List[ContentPart], # From extraction service
"extractionTimestamp": float,
"sourceDocuments": List[str] # Document IDs
}
Generation Context
{
"userPrompt": str,
"cachedContent": ContentCache,
"previousSections": List[Dict[str, Any]], # Already generated sections
"targetSection": Dict[str, Any], # Section to generate
"documentMetadata": Dict[str, Any]
}
Component Design
1. StructureGenerator
Purpose: Generate document skeleton with section placeholders
Location: poweron/gateway/modules/services/serviceGeneration/subStructureGenerator.py
Methods:
class StructureGenerator:
async def generateStructure(
self,
userPrompt: str,
documentList: Optional[DocumentReferenceList],
cachedContent: Optional[ContentCache],
services: Any
) -> Dict[str, Any]:
"""
Generate document structure with sections.
Returns:
Document structure with empty elements arrays
"""
def _createStructurePrompt(
self,
userPrompt: str,
cachedContent: Optional[ContentCache],
services: Any
) -> str:
"""
Create prompt for structure generation.
"""
def _identifySectionComplexity(
self,
section: Dict[str, Any],
userPrompt: str
) -> str:
"""
Identify if section is simple or complex.
Rules:
- Images: always complex
- Long chapters (>maxSectionLength words): complex
- Others: simple
"""
def _extractImagePrompts(
self,
structure: Dict[str, Any],
userPrompt: str
) -> Dict[str, str]:
"""
Extract image generation prompts from structure and user prompt.
Maps section_id -> image_prompt
"""
2. ContentGenerator
Purpose: Generate content for each section
Location: poweron/gateway/modules/services/serviceGeneration/subContentGenerator.py
Methods:
class ContentGenerator:
async def generateContent(
self,
structure: Dict[str, Any],
cachedContent: Optional[ContentCache],
userPrompt: str,
services: Any,
progressCallback: Optional[Callable] = None
) -> Dict[str, Any]:
"""
Generate content for all sections in structure.
Args:
structure: Document structure from Phase 1
cachedContent: Extracted content cache
userPrompt: Original user prompt
services: Services instance
progressCallback: Function to call for progress updates
Returns:
Complete document structure with populated elements
"""
async def _generateSectionContent(
self,
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate content for a single section.
Returns:
Section with populated elements array
"""
async def _generateSimpleSection(
self,
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate content for simple section (heading, paragraph).
"""
async def _generateImageSection(
self,
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate image for image section.
Calls ai.generate action with image generation.
"""
async def _generateComplexTextSection(
self,
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate content for complex text section (long chapter).
Uses focused sub-prompt.
"""
async def _generateSectionsParallel(
self,
sections: List[Dict[str, Any]],
context: GenerationContext,
services: Any,
progressCallback: Optional[Callable] = None
) -> List[Dict[str, Any]]:
"""
Generate content for multiple sections in parallel.
Uses asyncio.gather for parallel execution.
"""
def _createSectionPrompt(
self,
section: Dict[str, Any],
context: GenerationContext
) -> str:
"""
Create sub-prompt for section content generation.
"""
3. ContentIntegrator
Purpose: Merge generated content and render final document
Location: poweron/gateway/modules/services/serviceGeneration/subContentIntegrator.py
Methods:
class ContentIntegrator:
def integrateContent(
self,
structure: Dict[str, Any],
generatedSections: List[Dict[str, Any]]
) -> Dict[str, Any]:
"""
Merge generated sections into document structure.
Returns:
Complete document structure ready for rendering
"""
def validateCompleteness(
self,
document: Dict[str, Any]
) -> Tuple[bool, List[str]]:
"""
Validate that all sections have content.
Returns:
(is_complete, list_of_missing_sections)
"""
def createErrorSection(
self,
originalSection: Dict[str, Any],
errorMessage: str
) -> Dict[str, Any]:
"""
Create error placeholder section.
"""
4. Modified generateDocument Action
Location: poweron/gateway/modules/workflows/methods/methodAi/actions/generateDocument.py
Changes:
@action
async def generateDocument(self, parameters: Dict[str, Any]) -> ActionResult:
"""
Generate documents using hierarchical approach.
"""
# Extract parameters
prompt = parameters.get("prompt")
documentList = parameters.get("documentList", [])
resultType = parameters.get("resultType", "docx")
maxSectionLength = parameters.get("maxSectionLength", 500)
parallelGeneration = parameters.get("parallelGeneration", True)
progressLogging = parameters.get("progressLogging", True)
# Create operation ID for progress tracking
operationId = f"doc_gen_{self.services.workflow.id}_{int(time.time())}"
parentOperationId = parameters.get('parentOperationId')
try:
# Phase 1: Structure Generation
if progressLogging:
self.services.chat.progressLogStart(
operationId,
"Document",
"Structure Generation",
"Generating document structure...",
parentOperationId=parentOperationId
)
structureGenerator = StructureGenerator(self.services)
# Extract and cache content if documentList provided
cachedContent = None
if documentList:
# Extract content once
chatDocuments = self.services.chat.getChatDocumentsFromDocumentList(documentList)
if chatDocuments:
extractionOptions = ExtractionOptions(
prompt="Extract all content from documents",
mergeStrategy=MergeStrategy(mergeType="concatenate")
)
extractedResults = self.services.extraction.extractContent(
chatDocuments,
extractionOptions
)
cachedContent = {
"extractedContent": extractedResults,
"extractionTimestamp": time.time(),
"sourceDocuments": [doc.id for doc in chatDocuments]
}
# Generate structure
structure = await structureGenerator.generateStructure(
userPrompt=prompt,
documentList=documentList,
cachedContent=cachedContent,
services=self.services
)
if progressLogging:
self.services.chat.progressLogUpdate(operationId, 0.33, "Structure generated")
# Phase 2: Content Generation
if progressLogging:
self.services.chat.progressLogUpdate(
operationId,
0.34,
"Starting content generation..."
)
contentGenerator = ContentGenerator(self.services)
def progressCallback(sectionIndex: int, totalSections: int, message: str):
if progressLogging:
progress = 0.34 + (0.56 * (sectionIndex / totalSections))
self.services.chat.progressLogUpdate(
operationId,
progress,
f"Section {sectionIndex}/{totalSections}: {message}"
)
completeStructure = await contentGenerator.generateContent(
structure=structure,
cachedContent=cachedContent,
userPrompt=prompt,
services=self.services,
progressCallback=progressCallback
)
if progressLogging:
self.services.chat.progressLogUpdate(operationId, 0.90, "Content generated")
# Phase 3: Integration & Rendering
if progressLogging:
self.services.chat.progressLogUpdate(
operationId,
0.91,
"Rendering final document..."
)
# Use existing renderReport method
title = structure.get("metadata", {}).get("title", "Generated Document")
renderedContent, mimeType = await self.services.generation.renderReport(
extractedContent=completeStructure,
outputFormat=resultType,
title=title,
userPrompt=prompt,
aiService=self.services.ai
)
# Create document
document = self.services.generation._createDocument(
fileName=f"document.{resultType}",
mimeType=mimeType,
content=renderedContent,
base64encoded=(mimeType not in ["text/plain", "text/html", "text/markdown"]),
messageId=None
)
if progressLogging:
self.services.chat.progressLogFinish(operationId, True)
return ActionResult.isSuccess(
documents=[ActionDocument(
documentName=f"document.{resultType}",
documentData=renderedContent,
mimeType=mimeType
)]
)
except Exception as e:
logger.error(f"Error in hierarchical document generation: {str(e)}")
if progressLogging:
self.services.chat.progressLogFinish(operationId, False)
return ActionResult.isFailure(error=str(e))
API Design
Structure Generation Prompt
def _createStructurePrompt(
userPrompt: str,
cachedContent: Optional[ContentCache],
services: Any
) -> str:
"""
Create prompt for structure generation.
"""
prompt = f"""
{'='*80}
USER REQUEST:
{'='*80}
{userPrompt}
{'='*80}
TASK: Generate a document STRUCTURE (skeleton) with sections.
Do NOT generate actual content yet - only the structure.
{'='*80}
EXTRACTED CONTENT (if available):
{'='*80}
{_formatCachedContent(cachedContent) if cachedContent else "No source documents provided."}
{'='*80}
INSTRUCTIONS:
1. Analyze the user request and extracted content
2. Create a document structure with sections
3. For each section, specify:
- id: Unique identifier
- content_type: "heading" | "paragraph" | "image" | "table" | "bullet_list" | "code_block"
- complexity: "simple" (can generate directly) or "complex" (needs sub-prompt)
- generation_hint: Brief description of what content should be generated
- image_prompt: (only for image sections) Detailed prompt for image generation
- order: Section order number
- elements: [] (empty array - will be populated later)
4. Identify image sections:
- If user requests illustrations/images, create image sections
- Add image_prompt field with detailed description
- Set complexity to "complex"
5. Identify complex text sections:
- Long chapters (>500 words expected) should be marked as "complex"
- Short paragraphs/headings should be "simple"
6. Return ONLY valid JSON following this structure:
{{
"metadata": {{
"title": "Document Title",
"split_strategy": "single_document",
"source_documents": [],
"extraction_method": "ai_generation"
}},
"documents": [
{{
"id": "doc_1",
"title": "Document Title",
"filename": "document.json",
"sections": [
{{
"id": "section_1",
"content_type": "heading",
"complexity": "simple",
"generation_hint": "Main title",
"order": 1,
"elements": []
}},
{{
"id": "section_2",
"content_type": "image",
"complexity": "complex",
"generation_hint": "Illustration for chapter 1",
"image_prompt": "Detailed description for image generation",
"order": 2,
"elements": []
}}
]
}}
]
}}
Return ONLY the JSON structure. No explanations.
"""
return prompt
Section Content Generation Prompt
def _createSectionPrompt(
section: Dict[str, Any],
context: GenerationContext
) -> str:
"""
Create sub-prompt for section content generation.
"""
sectionType = section.get("content_type")
generationHint = section.get("generation_hint", "")
prompt = f"""
{'='*80}
SECTION TO GENERATE:
{'='*80}
Type: {sectionType}
Hint: {generationHint}
{'='*80}
CONTEXT:
- User Request: {context.userPrompt}
- Previous Sections: {len(context.previousSections)} sections already generated
- Document Title: {context.documentMetadata.get('title', 'Unknown')}
{'='*80}
EXTRACTED CONTENT (if available):
{'='*80}
{_formatCachedContent(context.cachedContent) if context.cachedContent else "None"}
{'='*80}
TASK: Generate content for this section ONLY.
INSTRUCTIONS:
1. Generate content appropriate for section type: {sectionType}
2. Use the generation hint: {generationHint}
3. Consider previous sections for continuity
4. Use extracted content if relevant
5. Return ONLY the elements array for this section:
For heading:
{{
"elements": [
{{"level": 1, "text": "Heading Text"}}
]
}}
For paragraph:
{{
"elements": [
{{"text": "Paragraph text content"}}
]
}}
For image:
{{
"elements": [
{{
"url": "data:image/png;base64,<base64_data>",
"base64Data": "<base64_data>",
"altText": "Image description",
"caption": "Optional caption"
}}
]
}}
Return ONLY the elements array as JSON. No other text.
"""
return prompt
Image Handling
Image Generation Flow
async def _generateImageSection(
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate image for image section.
"""
imagePrompt = section.get("image_prompt")
if not imagePrompt:
raise ValueError(f"Image section {section.get('id')} missing image_prompt")
# Call ai.generate action with image generation
from modules.workflows.methods.methodAi.actions.generate import generate
generateParams = {
"prompt": imagePrompt,
"resultType": "png",
"parentOperationId": context.operationId
}
result = await generate(self=services.ai, parameters=generateParams)
if not result.success or not result.documents:
raise ValueError(f"Image generation failed: {result.error}")
# Extract base64 image data
imageDoc = result.documents[0]
base64Data = imageDoc.documentData
# Create image element
section["elements"] = [{
"url": f"data:image/png;base64,{base64Data}",
"base64Data": base64Data,
"altText": section.get("generation_hint", "Image"),
"caption": section.get("metadata", {}).get("caption")
}]
return section
HTML Renderer Image Handling
Location: poweron/gateway/modules/services/serviceGeneration/renderers/rendererHtml.py
Changes:
async def render(
self,
extractedContent: Dict[str, Any],
title: str,
userPrompt: str = None,
aiService=None
) -> Tuple[str, str]:
"""
Render HTML with separate image files.
Returns:
(html_content, mime_type)
"""
# Generate HTML
htmlContent = await self._generateHtmlFromJson(...)
# Extract images and create separate files
images = self._extractImages(extractedContent)
if images:
# Create image files
imageFiles = []
for idx, imageData in enumerate(images):
base64Data = imageData.get("base64Data")
if base64Data:
# Decode base64
imageBytes = base64.b64decode(base64Data)
# Create filename
filename = f"image_{idx + 1}.png"
# Update HTML to use relative path
htmlContent = htmlContent.replace(
f'data:image/png;base64,{base64Data}',
filename
)
imageFiles.append({
"filename": filename,
"content": imageBytes,
"mimeType": "image/png"
})
# Return HTML + image files info
# Note: This requires modification to return multiple files
# For now, embed base64 (will be updated in implementation)
return htmlContent, "text/html"
return htmlContent, "text/html"
def _extractImages(self, jsonContent: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Extract all images from JSON structure.
"""
images = []
documents = jsonContent.get("documents", [])
if not documents:
sections = jsonContent.get("sections", [])
documents = [{"sections": sections}]
for doc in documents:
sections = doc.get("sections", [])
for section in sections:
if section.get("content_type") == "image":
elements = section.get("elements", [])
for element in elements:
if element.get("base64Data"):
images.append(element)
return images
Progress Logging
Progress Stages
PROGRESS_STAGES = {
"structure_generation": {
"start": 0.0,
"end": 0.33,
"messages": [
"Extracting content from documents...",
"Generating document structure...",
"Structure generated"
]
},
"content_generation": {
"start": 0.34,
"end": 0.90,
"messages": [
"Starting content generation...",
"Generating section {current}/{total}...",
"Generating image for section {section_id}...",
"Content generated"
]
},
"integration_rendering": {
"start": 0.91,
"end": 1.0,
"messages": [
"Rendering final document...",
"Document complete"
]
}
}
Progress Callback Implementation
def createProgressCallback(
operationId: str,
totalSections: int,
services: Any
) -> Callable:
"""
Create progress callback function.
"""
def progressCallback(
sectionIndex: int,
totalSections: int,
message: str
):
# Calculate progress
baseProgress = 0.34 # Start of content generation phase
phaseProgress = 0.56 # Length of content generation phase
sectionProgress = (sectionIndex / totalSections) * phaseProgress
currentProgress = baseProgress + sectionProgress
# Update progress log
services.chat.progressLogUpdate(
operationId,
currentProgress,
f"Section {sectionIndex}/{totalSections}: {message}"
)
return progressCallback
Error Handling
Error Section Creation
def createErrorSection(
originalSection: Dict[str, Any],
errorMessage: str
) -> Dict[str, Any]:
"""
Create error placeholder section.
"""
return {
"id": originalSection.get("id", "unknown"),
"content_type": "paragraph", # Change to paragraph for error display
"elements": [{
"text": f"[ERROR: Failed to generate {originalSection.get('content_type', 'content')} for section '{originalSection.get('id', 'unknown')}'. Error: {errorMessage}]"
}],
"order": originalSection.get("order", 0),
"error": True,
"errorMessage": errorMessage,
"originalContentType": originalSection.get("content_type")
}
Error Handling in Content Generation
async def _generateSectionContent(
self,
section: Dict[str, Any],
context: GenerationContext,
services: Any
) -> Dict[str, Any]:
"""
Generate content for a single section with error handling.
"""
try:
complexity = section.get("complexity", "simple")
contentType = section.get("content_type")
if contentType == "image":
return await self._generateImageSection(section, context, services)
elif complexity == "complex":
return await self._generateComplexTextSection(section, context, services)
else:
return await self._generateSimpleSection(section, context, services)
except Exception as e:
logger.error(f"Error generating section {section.get('id')}: {str(e)}")
return createErrorSection(section, str(e))
Performance Considerations
Parallel Generation
async def _generateSectionsParallel(
self,
sections: List[Dict[str, Any]],
context: GenerationContext,
services: Any,
progressCallback: Optional[Callable] = None
) -> List[Dict[str, Any]]:
"""
Generate content for multiple sections in parallel.
"""
async def generateWithProgress(section: Dict[str, Any], index: int):
if progressCallback:
progressCallback(index + 1, len(sections), f"Generating {section.get('content_type')}...")
return await self._generateSectionContent(section, context, services)
# Generate all sections in parallel
results = await asyncio.gather(
*[generateWithProgress(section, idx) for idx, section in enumerate(sections)],
return_exceptions=True
)
# Handle exceptions
generatedSections = []
for idx, result in enumerate(results):
if isinstance(result, Exception):
logger.error(f"Error generating section {idx}: {str(result)}")
generatedSections.append(
createErrorSection(sections[idx], str(result))
)
else:
generatedSections.append(result)
return generatedSections
Batch Processing for Large Documents
async def generateContent(
self,
structure: Dict[str, Any],
cachedContent: Optional[ContentCache],
userPrompt: str,
services: Any,
progressCallback: Optional[Callable] = None,
batchSize: int = 10
) -> Dict[str, Any]:
"""
Generate content with batching for large documents.
"""
documents = structure.get("documents", [])
for doc in documents:
sections = doc.get("sections", [])
# Process in batches
for batchStart in range(0, len(sections), batchSize):
batch = sections[batchStart:batchStart + batchSize]
# Generate batch in parallel
generatedBatch = await self._generateSectionsParallel(
batch,
context,
services,
progressCallback
)
# Update sections
for idx, generated in enumerate(generatedBatch):
sections[batchStart + idx] = generated
return structure
Testing Strategy
Unit Tests
-
StructureGenerator Tests:
- Test structure generation with/without source documents
- Test complexity identification
- Test image prompt extraction
-
ContentGenerator Tests:
- Test simple section generation
- Test image section generation
- Test complex text section generation
- Test parallel generation
- Test error handling
-
ContentIntegrator Tests:
- Test content merging
- Test validation
- Test error section creation
Integration Tests
-
End-to-End Tests:
- Test complete document generation flow
- Test with images
- Test with long documents
- Test error scenarios
-
Renderer Tests:
- Test HTML renderer with separate image files
- Test PDF renderer with embedded images
- Test XLSX/PPTX renderers with images
Performance Tests
-
Large Document Tests:
- Test with 100+ sections
- Test parallel generation performance
- Test memory usage
-
Image Generation Tests:
- Test multiple images
- Test large images
- Test image generation failures