# Concept: Hierarchical Document Generation with Image Integration ## Executive Summary This concept proposes a **three-phase hierarchical approach** to document generation that enables proper image integration and handles complex documents efficiently. **Key Decisions**: - ✅ **Performance**: Parallel processing with ChatLog progress messages - ✅ **Error Handling**: Skip failed sections, show error messages - ✅ **Image Storage**: Store as base64 in JSON (renderers need direct access) - ✅ **Backward Compatibility**: Not needed - implement as new default **Renderer Status**: - ✅ **Ready**: Text, Markdown, DOCX renderers - ⚠️ **Needs Update**: HTML (create separate image files), PDF (embed images) - ⚠️ **Needs Implementation**: XLSX, PPTX (add image support) ## Problem Statement Currently, the document generation system has the following limitations: 1. **No Image Integration**: Images are generated separately but cannot be embedded into document structures 2. **Single-Pass Generation**: Documents are generated in one AI call, making it difficult to handle complex sections (long text, images, chapters) 3. **Repeated Extraction**: Content extraction may happen multiple times unnecessarily 4. **No Structured Approach**: No mechanism to first define document structure, then populate sections ## Current Architecture Analysis ### Current Flow: ``` User Request → ai.generateDocument → ai.process → AI JSON Generation → Renderer → Final Document ``` ### Issues: - AI generates complete JSON structure in one pass - Images are generated separately via `ai.generate` action - No mechanism to integrate generated images into document structure - JSON schema supports `image` content_type, but AI rarely generates it - Content extraction happens per action, not cached/reused ### Current Image Handling: - Images can be rendered IF they exist in JSON structure (`content_type: "image"`) - Image data expected as `base64Data` in elements - Renderers support image rendering (Docx, PDF, HTML, etc.) - But images are never generated WITHIN document generation ## Proposed Solution: Hierarchical Document Generation ### Core Concept **Three-Phase Approach:** 1. **Structure Generation Phase**: Generate document skeleton with section placeholders 2. **Content Generation Phase**: Generate content for each section (text or image) via sub-prompts 3. **Integration Phase**: Merge all generated content into final document structure ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Phase 1: Structure Generation │ │ - Generate document skeleton │ │ - Identify sections (text, image, complex) │ │ - Create section placeholders with metadata │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Phase 2: Content Generation (Tree-like) │ │ │ │ ┌──────────────────────────────────────────────┐ │ │ │ Section 1: Heading (simple) │ │ │ │ → Generate directly │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────┐ │ │ │ Section 2: Paragraph (simple) │ │ │ │ → Generate directly │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────┐ │ │ │ Section 3: Image (complex) │ │ │ │ → Sub-prompt: Generate image │ │ │ │ → Store image data │ │ │ │ → Create image section with base64Data │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────┐ │ │ │ Section 4: Long Chapter (complex) │ │ │ │ → Sub-prompt: Generate chapter content │ │ │ │ → Split into subsections if needed │ │ │ └──────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Phase 3: Integration │ │ - Merge all generated content │ │ - Replace placeholders with actual data │ │ - Validate structure completeness │ │ - Render to final format │ └─────────────────────────────────────────────────────────────┘ ``` ## Detailed Design ### Phase 1: Structure Generation **Purpose**: Create document skeleton with section metadata **Process**: 1. AI generates document structure with sections 2. Each section includes: - `id`: Unique identifier - `content_type`: Type (heading, paragraph, image, table, etc.) - `complexity`: "simple" or "complex" - `generation_hint`: Instructions for content generation - `order`: Section order - `elements`: Empty or placeholder **Example Structure**: ```json { "metadata": { "title": "Children's Bedtime Story", "split_strategy": "single_document" }, "documents": [{ "id": "doc_1", "sections": [ { "id": "section_title", "content_type": "heading", "complexity": "simple", "generation_hint": "Story title", "order": 1, "elements": [] }, { "id": "section_intro", "content_type": "paragraph", "complexity": "simple", "generation_hint": "Introduction paragraph", "order": 2, "elements": [] }, { "id": "section_image_1", "content_type": "image", "complexity": "complex", "generation_hint": "Illustration: Rabbit meeting owl in moonlit forest", "image_prompt": "A small brown rabbit sitting in a peaceful forest clearing under moonlight with stars, meeting a wise owl perched on a branch", "order": 3, "elements": [] }, { "id": "section_chapter_1", "content_type": "paragraph", "complexity": "complex", "generation_hint": "First chapter: Rabbit's adventure begins", "order": 4, "elements": [] } ] }] } ``` ### Phase 2: Content Generation **Purpose**: Generate actual content for each section **Process**: 1. Iterate through sections in order 2. For each section: - **Simple sections** (heading, short paragraph): - Generate content directly via AI - Populate `elements` array - **Complex sections** (image, long chapter): - Create sub-prompt based on `generation_hint` and `image_prompt` - Generate content via specialized action: - Images: `ai.generate` with image generation - Long text: `ai.process` with focused prompt - Store generated content - Populate `elements` array **Content Caching**: - Extract content from source documents ONCE at the start - Cache extracted content for reuse across all sections - Pass cached content to sub-prompts to avoid re-extraction **Image Generation**: - For `content_type: "image"` sections: - Use `image_prompt` from structure - Call `ai.generate` action with image generation - Receive base64 image data - Create image element: ```json { "url": "data:image/png;base64,", "base64Data": "", "altText": "", "caption": "" } ``` ### Phase 3: Integration **Purpose**: Merge all content into final document structure **Process**: 1. Validate all sections have content 2. Merge generated content into structure 3. Replace placeholders with actual data 4. Finalize JSON structure 5. Render to target format (docx, pdf, html, etc.) ## Implementation Strategy ### New Components Needed 1. **Structure Generator** (`structureGenerator.py`) - Generates document skeleton - Identifies section complexity - Creates generation hints 2. **Content Generator** (`contentGenerator.py`) - Generates content for each section - Handles simple vs complex sections - Manages sub-prompts and image generation - Caches extracted content 3. **Content Integrator** (`contentIntegrator.py`) - Merges generated content - Validates completeness - Finalizes document structure ### Modified Components 1. **`generateDocument` action** - Implement hierarchical generation as default - Orchestrate three phases - Add progress logging for each phase 2. **`process` action** - Support content caching (extract once, reuse) - Support sub-prompt generation for sections 3. **Prompt Builder** (`subPromptBuilderGeneration.py`) - Add structure generation prompt - Add section-specific content prompts - Add image generation prompt templates 4. **Renderers** (Update required): - **HTML Renderer**: Create separate image files and link them - **PDF Renderer**: Embed images using reportlab - **XLSX Renderer**: Add image embedding support - **PPTX Renderer**: Add image embedding support ### New Action Parameters **For `generateDocument`**: - `enableImageIntegration`: boolean (default: true) - `maxSectionLength`: int (threshold for "complex" sections, default: 500 words) - `parallelGeneration`: boolean (default: true) - enable parallel section generation - `progressLogging`: boolean (default: true) - send ChatLog progress updates **For sub-prompts**: - `sectionContext`: Previous sections for context - `cachedContent`: Extracted content cache (to avoid re-extraction) - `targetSection`: Section metadata - `previousSections`: Array of already-generated sections for continuity ## Benefits 1. **Image Integration**: Images can be generated and embedded into documents 2. **Structured Approach**: Clear separation of structure and content 3. **Efficiency**: Content extracted once, reused across sections 4. **Scalability**: Can handle very long documents by splitting into sections 5. **Quality**: Better control over complex sections (images, long chapters) 6. **Flexibility**: Can generate different content types per section ## Migration Strategy **Note**: No backwards compatibility needed - can implement directly as new default. 1. **Phase 1**: Implement hierarchical generation as new default 2. **Phase 2**: Update renderers (HTML, PDF, XLSX, PPTX) for image support 3. **Phase 3**: Testing and refinement 4. **Phase 4**: Remove old single-pass mode (or keep as internal fallback only) ## Example Workflow **User Request**: "Create a children's bedtime story with 5 illustrations" **Phase 1 Output**: ```json { "metadata": {"title": "Flöckchen's Adventure"}, "documents": [{ "sections": [ {"id": "title", "content_type": "heading", "complexity": "simple", ...}, {"id": "intro", "content_type": "paragraph", "complexity": "simple", ...}, {"id": "img1", "content_type": "image", "complexity": "complex", "image_prompt": "Rabbit meeting owl", ...}, {"id": "chapter1", "content_type": "paragraph", "complexity": "complex", ...}, {"id": "img2", "content_type": "image", "complexity": "complex", ...}, ... ] }] } ``` **Phase 2 Process**: - Generate title → populate elements - Generate intro → populate elements - Generate image 1 → call `ai.generate`, store base64 → populate elements - Generate chapter 1 → sub-prompt → populate elements - Generate image 2 → call `ai.generate`, store base64 → populate elements - ... **Phase 3 Output**: Complete document with all sections populated, ready for rendering ## Renderer Readiness Assessment ### Current Renderer Status for Image Handling: 1. **Text Renderer** (`rendererText.py`): ✅ **READY** - Skips images, shows placeholder: `[Image: altText]` - No changes needed 2. **Markdown Renderer** (`rendererMarkdown.py`): ✅ **READY** - Shows placeholder with truncated base64: `![altText](data:image/png;base64,...)` - No changes needed (markdown limitation) 3. **HTML Renderer** (`rendererHtml.py`): ⚠️ **NEEDS UPDATE** - Currently: Embeds base64 directly in `` tag as data URI - **Required Change**: Create separate image files and link to them - Implementation: Generate image files (e.g., `image_1.png`, `image_2.png`) alongside HTML - Update `` tags to use relative paths: `...` - Return multiple files: HTML file + image files 4. **PDF Renderer** (`rendererPdf.py`): ⚠️ **NEEDS UPDATE** - Currently: Shows placeholder `[Image: altText]` - **Required Change**: Embed images directly in PDF using reportlab - Implementation: Use `reportlab.platypus.Image()` with base64 decoded bytes 5. **DOCX Renderer** (`rendererDocx.py`): ✅ **READY** - Embeds images directly using `doc.add_picture()` - Adds captions below images - No changes needed 6. **XLSX Renderer** (`rendererXlsx.py`): ⚠️ **NEEDS IMPLEMENTATION** - Currently: No image handling found - **Required Change**: Add image support using openpyxl - Implementation: Use `openpyxl.drawing.image.Image()` to embed images in cells - Store images in worksheet cells or as floating images 7. **PPTX Renderer** (`rendererPptx.py`): ⚠️ **NEEDS IMPLEMENTATION** - Currently: No image handling found - **Required Change**: Add image support using python-pptx - Implementation: Use `slide.shapes.add_picture()` to add images to slides ### Renderer Update Requirements: **Priority 1 (Critical for HTML output)**: - HTML Renderer: Create separate image files and link them **Priority 2 (Important for document formats)**: - PDF Renderer: Embed images using reportlab - XLSX Renderer: Add image embedding support - PPTX Renderer: Add image embedding support ## Answers to Open Questions ### 1. Performance: How to handle very large documents (100+ sections)? **Answer**: Use parallel processing where possible, with progress ChatLog messages. **Implementation Strategy**: - **Parallel Section Generation**: Generate independent sections in parallel using asyncio - **Batch Processing**: Process sections in batches (e.g., 10 sections at a time) - **Progress Tracking**: Send ChatLog progress updates: - "Generating structure..." (Phase 1) - "Generating content for section X/Y..." (Phase 2) - "Generating image for section X..." (Phase 2 - images) - "Merging content..." (Phase 3) - "Rendering final document..." (Phase 3) - **Streaming**: For very large documents, consider streaming partial results **Example Progress Messages**: ``` Phase 1: Structure Generation (0% → 33%) Phase 2: Content Generation (33% → 90%) - Section 1/10: Heading (34%) - Section 2/10: Paragraph (40%) - Section 3/10: Image generation (50%) - Section 4/10: Chapter (60%) ... Phase 3: Integration & Rendering (90% → 100%) ``` ### 2. Error Handling: What if one section fails? **Answer**: Skip failed sections, keep section title and type, show error message in the section. **Implementation Strategy**: - **Graceful Degradation**: Continue processing remaining sections - **Error Section**: Create error placeholder section: ```json { "id": "section_failed_3", "content_type": "paragraph", "elements": [{ "text": "[ERROR: Failed to generate content for this section. Error: ]" }], "order": 3, "error": true, "errorMessage": "" } ``` - **Logging**: Log errors for debugging but don't fail entire document - **User Notification**: Include error count in final progress message ### 3. Image Storage: Where to store generated images? **Answer**: Store images in JSON as base64, as renderers need them afterwards. **Implementation Strategy**: - **In-Memory Storage**: Keep base64 strings in JSON structure during generation - **JSON Structure**: Store in section elements: ```json { "url": "data:image/png;base64,", "base64Data": "", "altText": "Image description", "caption": "Optional caption" } ``` - **Memory Management**: For very large images, consider compression or chunking - **Renderer Access**: All renderers can access `base64Data` directly from JSON - **HTML Special Case**: HTML renderer will extract base64, decode, and save as separate files during rendering ### 4. Backward Compatibility: How to ensure existing workflows still work? **Answer**: No backwards compatibility needed. **Implementation Strategy**: - **New Default**: Hierarchical generation becomes the default mode - **Clean Migration**: All document generation uses hierarchical approach - **No Fallback**: Remove single-pass mode (or keep as internal fallback only) - **Breaking Change**: Acceptable since this is a new feature/enhancement ## Next Steps 1. **Review and Approval**: Get feedback on concept 2. **Detailed Design**: Design API and data structures 3. **Prototype**: Implement Phase 1 (structure generation) 4. **Testing**: Test with real use cases 5. **Full Implementation**: Implement all phases 6. **Migration**: Migrate existing workflows