ValueOn AG 982932d2f5 enhanced document generation with images

2025-12-23 00:34:15 +01:00

19 KiB

Raw Blame History

Concept: Hierarchical Document Generation with Image Integration

Executive Summary

This concept proposes a three-phase hierarchical approach to document generation that enables proper image integration and handles complex documents efficiently.

Key Decisions:

✅ Performance: Parallel processing with ChatLog progress messages
✅ Error Handling: Skip failed sections, show error messages
✅ Image Storage: Store as base64 in JSON (renderers need direct access)
✅ Backward Compatibility: Not needed - implement as new default

Renderer Status:

✅ Ready: Text, Markdown, DOCX renderers
⚠️ Needs Update: HTML (create separate image files), PDF (embed images)
⚠️ Needs Implementation: XLSX, PPTX (add image support)

Problem Statement

Currently, the document generation system has the following limitations:

No Image Integration: Images are generated separately but cannot be embedded into document structures
Single-Pass Generation: Documents are generated in one AI call, making it difficult to handle complex sections (long text, images, chapters)
Repeated Extraction: Content extraction may happen multiple times unnecessarily
No Structured Approach: No mechanism to first define document structure, then populate sections

Current Architecture Analysis

Current Flow:

User Request → ai.generateDocument → ai.process → AI JSON Generation → Renderer → Final Document

Issues:

AI generates complete JSON structure in one pass
Images are generated separately via ai.generate action
No mechanism to integrate generated images into document structure
JSON schema supports image content_type, but AI rarely generates it
Content extraction happens per action, not cached/reused

Current Image Handling:

Images can be rendered IF they exist in JSON structure (content_type: "image")
Image data expected as base64Data in elements
Renderers support image rendering (Docx, PDF, HTML, etc.)
But images are never generated WITHIN document generation

Proposed Solution: Hierarchical Document Generation

Core Concept

Three-Phase Approach:

Structure Generation Phase: Generate document skeleton with section placeholders
Content Generation Phase: Generate content for each section (text or image) via sub-prompts
Integration Phase: Merge all generated content into final document structure

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Structure Generation                                │
│ - Generate document skeleton                                 │
│ - Identify sections (text, image, complex)                   │
│ - Create section placeholders with metadata                  │
└─────────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Content Generation (Tree-like)                     │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 1: Heading (simple)                 │          │
│  │ → Generate directly                          │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 2: Paragraph (simple)                │          │
│  │ → Generate directly                          │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 3: Image (complex)                  │          │
│  │ → Sub-prompt: Generate image                │          │
│  │ → Store image data                          │          │
│  │ → Create image section with base64Data      │          │
│  └──────────────────────────────────────────────┘          │
│                                                              │
│  ┌──────────────────────────────────────────────┐          │
│  │ Section 4: Long Chapter (complex)            │          │
│  │ → Sub-prompt: Generate chapter content      │          │
│  │ → Split into subsections if needed          │          │
│  └──────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────┘
                        ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Integration                                         │
│ - Merge all generated content                                │
│ - Replace placeholders with actual data                      │
│ - Validate structure completeness                            │
│ - Render to final format                                     │
└─────────────────────────────────────────────────────────────┘

Detailed Design

Phase 1: Structure Generation

Purpose: Create document skeleton with section metadata

Process:

AI generates document structure with sections
Each section includes:
- id: Unique identifier
- content_type: Type (heading, paragraph, image, table, etc.)
- complexity: "simple" or "complex"
- generation_hint: Instructions for content generation
- order: Section order
- elements: Empty or placeholder

Example Structure:

{
  "metadata": {
    "title": "Children's Bedtime Story",
    "split_strategy": "single_document"
  },
  "documents": [{
    "id": "doc_1",
    "sections": [
      {
        "id": "section_title",
        "content_type": "heading",
        "complexity": "simple",
        "generation_hint": "Story title",
        "order": 1,
        "elements": []
      },
      {
        "id": "section_intro",
        "content_type": "paragraph",
        "complexity": "simple",
        "generation_hint": "Introduction paragraph",
        "order": 2,
        "elements": []
      },
      {
        "id": "section_image_1",
        "content_type": "image",
        "complexity": "complex",
        "generation_hint": "Illustration: Rabbit meeting owl in moonlit forest",
        "image_prompt": "A small brown rabbit sitting in a peaceful forest clearing under moonlight with stars, meeting a wise owl perched on a branch",
        "order": 3,
        "elements": []
      },
      {
        "id": "section_chapter_1",
        "content_type": "paragraph",
        "complexity": "complex",
        "generation_hint": "First chapter: Rabbit's adventure begins",
        "order": 4,
        "elements": []
      }
    ]
  }]
}

Phase 2: Content Generation

Purpose: Generate actual content for each section

Process:

Iterate through sections in order
For each section:
- Simple sections (heading, short paragraph):
  - Generate content directly via AI
  - Populate elements array
- Complex sections (image, long chapter):
  - Create sub-prompt based on generation_hint and image_prompt
  - Generate content via specialized action:
    - Images: ai.generate with image generation
    - Long text: ai.process with focused prompt
  - Store generated content
  - Populate elements array

Content Caching:

Extract content from source documents ONCE at the start
Cache extracted content for reuse across all sections
Pass cached content to sub-prompts to avoid re-extraction

Image Generation:

For content_type: "image" sections:

Use image_prompt from structure
Call ai.generate action with image generation
Receive base64 image data

Create image element:

{
  "url": "data:image/png;base64,<base64_data>",
  "base64Data": "<base64_data>",
  "altText": "<alt_text>",
  "caption": "<caption>"
}

Phase 3: Integration

Purpose: Merge all content into final document structure

Process:

Validate all sections have content
Merge generated content into structure
Replace placeholders with actual data
Finalize JSON structure
Render to target format (docx, pdf, html, etc.)

Implementation Strategy

New Components Needed

Structure Generator (structureGenerator.py)
- Generates document skeleton
- Identifies section complexity
- Creates generation hints
Content Generator (contentGenerator.py)
- Generates content for each section
- Handles simple vs complex sections
- Manages sub-prompts and image generation
- Caches extracted content
Content Integrator (contentIntegrator.py)
- Merges generated content
- Validates completeness
- Finalizes document structure

Modified Components

generateDocument action
- Implement hierarchical generation as default
- Orchestrate three phases
- Add progress logging for each phase
process action
- Support content caching (extract once, reuse)
- Support sub-prompt generation for sections
Prompt Builder (subPromptBuilderGeneration.py)
- Add structure generation prompt
- Add section-specific content prompts
- Add image generation prompt templates
Renderers (Update required):
- HTML Renderer: Create separate image files and link them
- PDF Renderer: Embed images using reportlab
- XLSX Renderer: Add image embedding support
- PPTX Renderer: Add image embedding support

New Action Parameters

For generateDocument:

enableImageIntegration: boolean (default: true)
maxSectionLength: int (threshold for "complex" sections, default: 500 words)
parallelGeneration: boolean (default: true) - enable parallel section generation
progressLogging: boolean (default: true) - send ChatLog progress updates

For sub-prompts:

sectionContext: Previous sections for context
cachedContent: Extracted content cache (to avoid re-extraction)
targetSection: Section metadata
previousSections: Array of already-generated sections for continuity

Benefits

Image Integration: Images can be generated and embedded into documents
Structured Approach: Clear separation of structure and content
Efficiency: Content extracted once, reused across sections
Scalability: Can handle very long documents by splitting into sections
Quality: Better control over complex sections (images, long chapters)
Flexibility: Can generate different content types per section

Migration Strategy

Note: No backwards compatibility needed - can implement directly as new default.

Phase 1: Implement hierarchical generation as new default
Phase 2: Update renderers (HTML, PDF, XLSX, PPTX) for image support
Phase 3: Testing and refinement
Phase 4: Remove old single-pass mode (or keep as internal fallback only)

Example Workflow

User Request: "Create a children's bedtime story with 5 illustrations"

Phase 1 Output:

{
  "metadata": {"title": "Flöckchen's Adventure"},
  "documents": [{
    "sections": [
      {"id": "title", "content_type": "heading", "complexity": "simple", ...},
      {"id": "intro", "content_type": "paragraph", "complexity": "simple", ...},
      {"id": "img1", "content_type": "image", "complexity": "complex", 
       "image_prompt": "Rabbit meeting owl", ...},
      {"id": "chapter1", "content_type": "paragraph", "complexity": "complex", ...},
      {"id": "img2", "content_type": "image", "complexity": "complex", ...},
      ...
    ]
  }]
}

Phase 2 Process:

Generate title → populate elements
Generate intro → populate elements
Generate image 1 → call ai.generate, store base64 → populate elements
Generate chapter 1 → sub-prompt → populate elements
Generate image 2 → call ai.generate, store base64 → populate elements
...

Phase 3 Output: Complete document with all sections populated, ready for rendering

Renderer Readiness Assessment

Current Renderer Status for Image Handling:

Text Renderer (rendererText.py): ✅ READY
- Skips images, shows placeholder: [Image: altText]
- No changes needed
Markdown Renderer (rendererMarkdown.py): ✅ READY
- Shows placeholder with truncated base64: ![altText](data:image/png;base64,...)
- No changes needed (markdown limitation)
HTML Renderer (rendererHtml.py): ⚠️ NEEDS UPDATE
- Currently: Embeds base64 directly in <img> tag as data URI
- Required Change: Create separate image files and link to them
- Implementation: Generate image files (e.g., image_1.png, image_2.png) alongside HTML
- Update <img> tags to use relative paths: <img src="image_1.png" alt="...">
- Return multiple files: HTML file + image files
PDF Renderer (rendererPdf.py): ⚠️ NEEDS UPDATE
- Currently: Shows placeholder [Image: altText]
- Required Change: Embed images directly in PDF using reportlab
- Implementation: Use reportlab.platypus.Image() with base64 decoded bytes
DOCX Renderer (rendererDocx.py): ✅ READY
- Embeds images directly using doc.add_picture()
- Adds captions below images
- No changes needed
XLSX Renderer (rendererXlsx.py): ⚠️ NEEDS IMPLEMENTATION
- Currently: No image handling found
- Required Change: Add image support using openpyxl
- Implementation: Use openpyxl.drawing.image.Image() to embed images in cells
- Store images in worksheet cells or as floating images
PPTX Renderer (rendererPptx.py): ⚠️ NEEDS IMPLEMENTATION
- Currently: No image handling found
- Required Change: Add image support using python-pptx
- Implementation: Use slide.shapes.add_picture() to add images to slides

Renderer Update Requirements:

Priority 1 (Critical for HTML output):

HTML Renderer: Create separate image files and link them

Priority 2 (Important for document formats):

PDF Renderer: Embed images using reportlab
XLSX Renderer: Add image embedding support
PPTX Renderer: Add image embedding support

Answers to Open Questions

1. Performance: How to handle very large documents (100+ sections)?

Answer: Use parallel processing where possible, with progress ChatLog messages.

Implementation Strategy:

Parallel Section Generation: Generate independent sections in parallel using asyncio
Batch Processing: Process sections in batches (e.g., 10 sections at a time)
Progress Tracking: Send ChatLog progress updates:
- "Generating structure..." (Phase 1)
- "Generating content for section X/Y..." (Phase 2)
- "Generating image for section X..." (Phase 2 - images)
- "Merging content..." (Phase 3)
- "Rendering final document..." (Phase 3)
Streaming: For very large documents, consider streaming partial results

Example Progress Messages:

Phase 1: Structure Generation (0% → 33%)
Phase 2: Content Generation (33% → 90%)
  - Section 1/10: Heading (34%)
  - Section 2/10: Paragraph (40%)
  - Section 3/10: Image generation (50%)
  - Section 4/10: Chapter (60%)
  ...
Phase 3: Integration & Rendering (90% → 100%)

2. Error Handling: What if one section fails?

Answer: Skip failed sections, keep section title and type, show error message in the section.

Implementation Strategy:

Graceful Degradation: Continue processing remaining sections

Error Section: Create error placeholder section:

{
  "id": "section_failed_3",
  "content_type": "paragraph",
  "elements": [{
    "text": "[ERROR: Failed to generate content for this section. Error: <error_message>]"
  }],
  "order": 3,
  "error": true,
  "errorMessage": "<detailed_error>"
}

Logging: Log errors for debugging but don't fail entire document
User Notification: Include error count in final progress message

3. Image Storage: Where to store generated images?

Answer: Store images in JSON as base64, as renderers need them afterwards.

Implementation Strategy:

In-Memory Storage: Keep base64 strings in JSON structure during generation

JSON Structure: Store in section elements:

{
  "url": "data:image/png;base64,<base64_data>",
  "base64Data": "<full_base64_string>",
  "altText": "Image description",
  "caption": "Optional caption"
}

Memory Management: For very large images, consider compression or chunking
Renderer Access: All renderers can access base64Data directly from JSON
HTML Special Case: HTML renderer will extract base64, decode, and save as separate files during rendering

4. Backward Compatibility: How to ensure existing workflows still work?

Answer: No backwards compatibility needed.

Implementation Strategy:

New Default: Hierarchical generation becomes the default mode
Clean Migration: All document generation uses hierarchical approach
No Fallback: Remove single-pass mode (or keep as internal fallback only)
Breaking Change: Acceptable since this is a new feature/enhancement

Next Steps

Review and Approval: Get feedback on concept
Detailed Design: Design API and data structures
Prototype: Implement Phase 1 (structure generation)
Testing: Test with real use cases
Full Implementation: Implement all phases
Migration: Migrate existing workflows

19 KiB Raw Blame History

Concept: Hierarchical Document Generation with Image Integration

Executive Summary

Problem Statement

Current Architecture Analysis

Current Flow:

Issues:

Current Image Handling:

Proposed Solution: Hierarchical Document Generation

Core Concept

Architecture Overview

Detailed Design

Phase 1: Structure Generation

Phase 2: Content Generation

Phase 3: Integration

Implementation Strategy

New Components Needed

Modified Components

New Action Parameters

Benefits

Migration Strategy

Example Workflow

Renderer Readiness Assessment

Current Renderer Status for Image Handling:

Renderer Update Requirements:

Answers to Open Questions

1. Performance: How to handle very large documents (100+ sections)?

2. Error Handling: What if one section fails?

3. Image Storage: Where to store generated images?

4. Backward Compatibility: How to ensure existing workflows still work?

Next Steps

19 KiB

Raw Blame History