gateway/modules/services/serviceGeneration/paths/ARCHITECTURE_ANALYSIS.md

4.2 KiB

Document Generation Architecture Analysis

Current Flow

1. Document Input → ContentParts (extractAndPrepareContent)

Location: gateway/modules/services/serviceAi/subContentExtraction.py

Flow:

  • Regular documents → Calls extractContent() (NON-AI extraction) → Creates contentParts with raw extracted text
  • BUT THEN:
    • Images with "extract" intent → Calls Vision AI (line 190) → AI extraction
    • Text with "extract" intent + extractionPrompt → Calls AI processing (line 265) → AI extraction
  • Pre-extracted JSON → Uses contentParts directly (no AI)

Result: ContentParts may already be AI-processed before structure generation

2. Structure Generation

Location: gateway/modules/services/serviceAi/subStructureGeneration.py

Flow:

  • Uses contentParts (may already be AI-processed)
  • Generates document structure (chapters, sections)

3. Section Generation (_processSingleSection)

Location: gateway/modules/services/serviceAi/subStructureFilling.py

Flow:

  • Uses contentParts (which may already be AI-processed)
  • Aggregates "extracted" contentParts with AI (line 554-682)
  • Generates section content using callAiWithLooping with useCaseId="section_content"

Issues Identified

Issue 1: Duplicate AI Processing

  • AI extraction happens in extractAndPrepareContent (for images/text)
  • AI generation happens again in section generation
  • This is redundant and inefficient

Issue 2: Architecture Inconsistency

  • Pre-extracted JSON files → contentParts directly (no AI)
  • Regular documents → contentParts + AI extraction (inconsistent)
  • User wants: Documents → contentParts (like pre-extracted JSON) → AI only in section generation

Issue 3: Image Processing

  • Images need Vision AI to extract text
  • Currently happens in extractAndPrepareContent
  • Question: Should this happen during section generation instead?

Proposed Architecture

Option A: Remove All AI from extractAndPrepareContent

  • Documents → extractContent() → Raw contentParts (text, tables, etc.)
  • Images → Keep as image contentParts (no Vision AI extraction)
  • Section generation → Handle images with Vision AI when needed

Pros:

  • Consistent with pre-extracted JSON flow
  • Single point of AI processing (section generation)
  • Clear separation of concerns

Cons:

  • Images won't have extracted text until section generation
  • May need to handle images differently in section generation

Option B: Keep Vision AI for Images Only

  • Documents → extractContent() → Raw contentParts
  • Images → Vision AI extraction → Text contentParts
  • Section generation → Uses text contentParts (no additional AI extraction)

Pros:

  • Images get text extracted early
  • Section generation can use text directly

Cons:

  • Still has AI extraction before structure generation
  • Inconsistent with user's request

Recommendation

Follow Option A - Remove all AI extraction from extractAndPrepareContent:

  1. Documents → ContentParts (like pre-extracted JSON):

    • Call extractContent() (NON-AI)
    • Create contentParts with raw extracted content
    • Images remain as image contentParts (no Vision AI)
  2. Section Generation:

    • Handle images with Vision AI when needed
    • Aggregate all contentParts with AI
    • Single point of AI processing

Benefits:

  • Clear architecture: Documents = raw contentParts
  • Consistent with pre-extracted JSON flow
  • AI processing only where needed (section generation)
  • Easier to understand and maintain

Questions to Resolve

  1. Image handling: How should images be processed during section generation?

    • Option 1: Vision AI extraction happens automatically when image contentParts are used
    • Option 2: Images are passed to AI with Vision models during section generation
    • Option 3: Images remain as binary and are rendered directly (no text extraction)
  2. Text with extractionPrompt: Should text contentParts with extractionPrompt be processed differently?

    • Currently: AI processing in extractAndPrepareContent
    • Proposed: Raw text → AI processing during section generation
  3. Performance: Will deferring image extraction to section generation cause performance issues?

    • Need to test with multiple images