# Concept: Hierarchical Document Generation with Image Integration

## Executive Summary

This concept proposes a **three-phase hierarchical approach** to document generation that enables proper image integration and handles complex documents efficiently.

**Key Decisions**:
- ✅ **Performance**: Parallel processing with ChatLog progress messages
- ✅ **Error Handling**: Skip failed sections, show error messages
- ✅ **Image Storage**: Store as base64 in JSON (renderers need direct access)
- ✅ **Backward Compatibility**: Not needed - implement as new default

**Renderer Status**:
- ✅ **Ready**: Text, Markdown, DOCX renderers
- ⚠️ **Needs Update**: HTML (create separate image files), PDF (embed images)
- ⚠️ **Needs Implementation**: XLSX, PPTX (add image support)

## Problem Statement

Currently, the document generation system has the following limitations:

1. **No Image Integration**: Images are generated separately but cannot be embedded into document structures
2. **Single-Pass Generation**: Documents are generated in one AI call, making it difficult to handle complex sections (long text, images, chapters)
3. **Repeated Extraction**: Content extraction may happen multiple times unnecessarily
4. **No Structured Approach**: No mechanism to first define document structure, then populate sections

## Current Architecture Analysis

### Current Flow:
```
User Request → ai.generateDocument → ai.process → AI JSON Generation → Renderer → Final Document
```

### Issues:
- AI generates complete JSON structure in one pass
- Images are generated separately via `ai.generate` action
- No mechanism to integrate generated images into document structure
- JSON schema supports `image` content_type, but AI rarely generates it
- Content extraction happens per action, not cached/reused

### Current Image Handling:
- Images can be rendered IF they exist in JSON structure (`content_type: "image"`)
- Image data expected as `base64Data` in elements
- Renderers support image rendering (Docx, PDF, HTML, etc.)
- But images are never generated WITHIN document generation

## Proposed Solution: Hierarchical Document Generation

### Core Concept

**Three-Phase Approach:**
1. **Structure Generation Phase**: Generate document skeleton with section placeholders
2. **Content Generation Phase**: Generate content for each section (text or image) via sub-prompts
3. **Integration Phase**: Merge all generated content into final document structure

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Structure Generation                                │
│ - Generate document skeleton                                 │
│ - Identify sections (text, image, complex)                   │
│ - Create section placeholders with metadata                  │
└─────────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Content Generation (Tree-like)                     │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 1: Heading (simple)                 │          │
│  │ → Generate directly                          │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 2: Paragraph (simple)                │          │
│  │ → Generate directly                          │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 3: Image (complex)                  │          │
│  │ → Sub-prompt: Generate image                │          │
│  │ → Store image data                          │          │
│  │ → Create image section with base64Data      │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 4: Long Chapter (complex)            │          │
│  │ → Sub-prompt: Generate chapter content      │          │
│  │ → Split into subsections if needed          │          │
│  └──────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Integration                                         │
│ - Merge all generated content                                │
│ - Replace placeholders with actual data                      │
│ - Validate structure completeness                            │
│ - Render to final format                                     │
└─────────────────────────────────────────────────────────────┘
```

## Detailed Design

### Phase 1: Structure Generation

**Purpose**: Create document skeleton with section metadata

**Process**:
1. AI generates document structure with sections
2. Each section includes:
   - `id`: Unique identifier
   - `content_type`: Type (heading, paragraph, image, table, etc.)
   - `complexity`: "simple" or "complex"
   - `generation_hint`: Instructions for content generation
   - `order`: Section order
   - `elements`: Empty or placeholder

**Example Structure**:
```json
{
  "metadata": {
    "title": "Children's Bedtime Story",
    "split_strategy": "single_document"
  },
  "documents": [{
    "id": "doc_1",
    "sections": [
      {
        "id": "section_title",
        "content_type": "heading",
        "complexity": "simple",
        "generation_hint": "Story title",
        "order": 1,
        "elements": []
      },
      {
        "id": "section_intro",
        "content_type": "paragraph",
        "complexity": "simple",
        "generation_hint": "Introduction paragraph",
        "order": 2,
        "elements": []
      },
      {
        "id": "section_image_1",
        "content_type": "image",
        "complexity": "complex",
        "generation_hint": "Illustration: Rabbit meeting owl in moonlit forest",
        "image_prompt": "A small brown rabbit sitting in a peaceful forest clearing under moonlight with stars, meeting a wise owl perched on a branch",
        "order": 3,
        "elements": []
      },
      {
        "id": "section_chapter_1",
        "content_type": "paragraph",
        "complexity": "complex",
        "generation_hint": "First chapter: Rabbit's adventure begins",
        "order": 4,
        "elements": []
      }
    ]
  }]
}
```

### Phase 2: Content Generation

**Purpose**: Generate actual content for each section

**Process**:
1. Iterate through sections in order
2. For each section:
   - **Simple sections** (heading, short paragraph):
     - Generate content directly via AI
     - Populate `elements` array
   - **Complex sections** (image, long chapter):
     - Create sub-prompt based on `generation_hint` and `image_prompt`
     - Generate content via specialized action:
       - Images: `ai.generate` with image generation
       - Long text: `ai.process` with focused prompt
     - Store generated content
     - Populate `elements` array

**Content Caching**:
- Extract content from source documents ONCE at the start
- Cache extracted content for reuse across all sections
- Pass cached content to sub-prompts to avoid re-extraction

**Image Generation**:
- For `content_type: "image"` sections:
  - Use `image_prompt` from structure
  - Call `ai.generate` action with image generation
  - Receive base64 image data
  - Create image element:
    ```json
    {
      "url": "data:image/png;base64,<base64_data>",
      "base64Data": "<base64_data>",
      "altText": "<alt_text>",
      "caption": "<caption>"
    }
    ```

### Phase 3: Integration

**Purpose**: Merge all content into final document structure

**Process**:
1. Validate all sections have content
2. Merge generated content into structure
3. Replace placeholders with actual data
4. Finalize JSON structure
5. Render to target format (docx, pdf, html, etc.)

## Implementation Strategy

### New Components Needed

1. **Structure Generator** (`structureGenerator.py`)
   - Generates document skeleton
   - Identifies section complexity
   - Creates generation hints

2. **Content Generator** (`contentGenerator.py`)
   - Generates content for each section
   - Handles simple vs complex sections
   - Manages sub-prompts and image generation
   - Caches extracted content

3. **Content Integrator** (`contentIntegrator.py`)
   - Merges generated content
   - Validates completeness
   - Finalizes document structure

### Modified Components

1. **`generateDocument` action**
   - Implement hierarchical generation as default
   - Orchestrate three phases
   - Add progress logging for each phase

2. **`process` action**
   - Support content caching (extract once, reuse)
   - Support sub-prompt generation for sections

3. **Prompt Builder** (`subPromptBuilderGeneration.py`)
   - Add structure generation prompt
   - Add section-specific content prompts
   - Add image generation prompt templates

4. **Renderers** (Update required):
   - **HTML Renderer**: Create separate image files and link them
   - **PDF Renderer**: Embed images using reportlab
   - **XLSX Renderer**: Add image embedding support
   - **PPTX Renderer**: Add image embedding support

### New Action Parameters

**For `generateDocument`**:
- `enableImageIntegration`: boolean (default: true)
- `maxSectionLength`: int (threshold for "complex" sections, default: 500 words)
- `parallelGeneration`: boolean (default: true) - enable parallel section generation
- `progressLogging`: boolean (default: true) - send ChatLog progress updates

**For sub-prompts**:
- `sectionContext`: Previous sections for context
- `cachedContent`: Extracted content cache (to avoid re-extraction)
- `targetSection`: Section metadata
- `previousSections`: Array of already-generated sections for continuity

## Benefits

1. **Image Integration**: Images can be generated and embedded into documents
2. **Structured Approach**: Clear separation of structure and content
3. **Efficiency**: Content extracted once, reused across sections
4. **Scalability**: Can handle very long documents by splitting into sections
5. **Quality**: Better control over complex sections (images, long chapters)
6. **Flexibility**: Can generate different content types per section

## Migration Strategy

**Note**: No backwards compatibility needed - can implement directly as new default.

1. **Phase 1**: Implement hierarchical generation as new default
2. **Phase 2**: Update renderers (HTML, PDF, XLSX, PPTX) for image support
3. **Phase 3**: Testing and refinement
4. **Phase 4**: Remove old single-pass mode (or keep as internal fallback only)

## Example Workflow

**User Request**: "Create a children's bedtime story with 5 illustrations"

**Phase 1 Output**:
```json
{
  "metadata": {"title": "Flöckchen's Adventure"},
  "documents": [{
    "sections": [
      {"id": "title", "content_type": "heading", "complexity": "simple", ...},
      {"id": "intro", "content_type": "paragraph", "complexity": "simple", ...},
      {"id": "img1", "content_type": "image", "complexity": "complex", 
       "image_prompt": "Rabbit meeting owl", ...},
      {"id": "chapter1", "content_type": "paragraph", "complexity": "complex", ...},
      {"id": "img2", "content_type": "image", "complexity": "complex", ...},
      ...
    ]
  }]
}
```

**Phase 2 Process**:
- Generate title → populate elements
- Generate intro → populate elements
- Generate image 1 → call `ai.generate`, store base64 → populate elements
- Generate chapter 1 → sub-prompt → populate elements
- Generate image 2 → call `ai.generate`, store base64 → populate elements
- ...

**Phase 3 Output**: Complete document with all sections populated, ready for rendering

## Renderer Readiness Assessment

### Current Renderer Status for Image Handling:

1. **Text Renderer** (`rendererText.py`): ✅ **READY**
   - Skips images, shows placeholder: `[Image: altText]`
   - No changes needed

2. **Markdown Renderer** (`rendererMarkdown.py`): ✅ **READY**
   - Shows placeholder with truncated base64: `![altText](data:image/png;base64,...)`
   - No changes needed (markdown limitation)

3. **HTML Renderer** (`rendererHtml.py`): ⚠️ **NEEDS UPDATE**
   - Currently: Embeds base64 directly in `<img>` tag as data URI
   - **Required Change**: Create separate image files and link to them
   - Implementation: Generate image files (e.g., `image_1.png`, `image_2.png`) alongside HTML
   - Update `<img>` tags to use relative paths: `<img src="image_1.png" alt="...">`
   - Return multiple files: HTML file + image files

4. **PDF Renderer** (`rendererPdf.py`): ⚠️ **NEEDS UPDATE**
   - Currently: Shows placeholder `[Image: altText]`
   - **Required Change**: Embed images directly in PDF using reportlab
   - Implementation: Use `reportlab.platypus.Image()` with base64 decoded bytes

5. **DOCX Renderer** (`rendererDocx.py`): ✅ **READY**
   - Embeds images directly using `doc.add_picture()`
   - Adds captions below images
   - No changes needed

6. **XLSX Renderer** (`rendererXlsx.py`): ⚠️ **NEEDS IMPLEMENTATION**
   - Currently: No image handling found
   - **Required Change**: Add image support using openpyxl
   - Implementation: Use `openpyxl.drawing.image.Image()` to embed images in cells
   - Store images in worksheet cells or as floating images

7. **PPTX Renderer** (`rendererPptx.py`): ⚠️ **NEEDS IMPLEMENTATION**
   - Currently: No image handling found
   - **Required Change**: Add image support using python-pptx
   - Implementation: Use `slide.shapes.add_picture()` to add images to slides

### Renderer Update Requirements:

**Priority 1 (Critical for HTML output)**:
- HTML Renderer: Create separate image files and link them

**Priority 2 (Important for document formats)**:
- PDF Renderer: Embed images using reportlab
- XLSX Renderer: Add image embedding support
- PPTX Renderer: Add image embedding support

## Answers to Open Questions

### 1. Performance: How to handle very large documents (100+ sections)?

**Answer**: Use parallel processing where possible, with progress ChatLog messages.

**Implementation Strategy**:
- **Parallel Section Generation**: Generate independent sections in parallel using asyncio
- **Batch Processing**: Process sections in batches (e.g., 10 sections at a time)
- **Progress Tracking**: Send ChatLog progress updates:
  - "Generating structure..." (Phase 1)
  - "Generating content for section X/Y..." (Phase 2)
  - "Generating image for section X..." (Phase 2 - images)
  - "Merging content..." (Phase 3)
  - "Rendering final document..." (Phase 3)
- **Streaming**: For very large documents, consider streaming partial results

**Example Progress Messages**:
```
Phase 1: Structure Generation (0% → 33%)
Phase 2: Content Generation (33% → 90%)
  - Section 1/10: Heading (34%)
  - Section 2/10: Paragraph (40%)
  - Section 3/10: Image generation (50%)
  - Section 4/10: Chapter (60%)
  ...
Phase 3: Integration & Rendering (90% → 100%)
```

### 2. Error Handling: What if one section fails?

**Answer**: Skip failed sections, keep section title and type, show error message in the section.

**Implementation Strategy**:
- **Graceful Degradation**: Continue processing remaining sections
- **Error Section**: Create error placeholder section:
  ```json
  {
    "id": "section_failed_3",
    "content_type": "paragraph",
    "elements": [{
      "text": "[ERROR: Failed to generate content for this section. Error: <error_message>]"
    }],
    "order": 3,
    "error": true,
    "errorMessage": "<detailed_error>"
  }
  ```
- **Logging**: Log errors for debugging but don't fail entire document
- **User Notification**: Include error count in final progress message

### 3. Image Storage: Where to store generated images?

**Answer**: Store images in JSON as base64, as renderers need them afterwards.

**Implementation Strategy**:
- **In-Memory Storage**: Keep base64 strings in JSON structure during generation
- **JSON Structure**: Store in section elements:
  ```json
  {
    "url": "data:image/png;base64,<base64_data>",
    "base64Data": "<full_base64_string>",
    "altText": "Image description",
    "caption": "Optional caption"
  }
  ```
- **Memory Management**: For very large images, consider compression or chunking
- **Renderer Access**: All renderers can access `base64Data` directly from JSON
- **HTML Special Case**: HTML renderer will extract base64, decode, and save as separate files during rendering

### 4. Backward Compatibility: How to ensure existing workflows still work?

**Answer**: No backwards compatibility needed.

**Implementation Strategy**:
- **New Default**: Hierarchical generation becomes the default mode
- **Clean Migration**: All document generation uses hierarchical approach
- **No Fallback**: Remove single-pass mode (or keep as internal fallback only)
- **Breaking Change**: Acceptable since this is a new feature/enhancement

## Next Steps

1. **Review and Approval**: Get feedback on concept
2. **Detailed Design**: Design API and data structures
3. **Prototype**: Implement Phase 1 (structure generation)
4. **Testing**: Test with real use cases
5. **Full Implementation**: Implement all phases
6. **Migration**: Migrate existing workflows