4.2 KiB
4.2 KiB
Document Generation Architecture Analysis
Current Flow
1. Document Input → ContentParts (extractAndPrepareContent)
Location: gateway/modules/services/serviceAi/subContentExtraction.py
Flow:
- Regular documents → Calls
extractContent()(NON-AI extraction) → Creates contentParts with raw extracted text - BUT THEN:
- Images with "extract" intent → Calls Vision AI (line 190) → AI extraction
- Text with "extract" intent + extractionPrompt → Calls AI processing (line 265) → AI extraction
- Pre-extracted JSON → Uses contentParts directly (no AI)
Result: ContentParts may already be AI-processed before structure generation
2. Structure Generation
Location: gateway/modules/services/serviceAi/subStructureGeneration.py
Flow:
- Uses contentParts (may already be AI-processed)
- Generates document structure (chapters, sections)
3. Section Generation (_processSingleSection)
Location: gateway/modules/services/serviceAi/subStructureFilling.py
Flow:
- Uses contentParts (which may already be AI-processed)
- Aggregates "extracted" contentParts with AI (line 554-682)
- Generates section content using
callAiWithLoopingwithuseCaseId="section_content"
Issues Identified
Issue 1: Duplicate AI Processing
- AI extraction happens in
extractAndPrepareContent(for images/text) - AI generation happens again in section generation
- This is redundant and inefficient
Issue 2: Architecture Inconsistency
- Pre-extracted JSON files → contentParts directly (no AI)
- Regular documents → contentParts + AI extraction (inconsistent)
- User wants: Documents → contentParts (like pre-extracted JSON) → AI only in section generation
Issue 3: Image Processing
- Images need Vision AI to extract text
- Currently happens in
extractAndPrepareContent - Question: Should this happen during section generation instead?
Proposed Architecture
Option A: Remove All AI from extractAndPrepareContent
- Documents →
extractContent()→ Raw contentParts (text, tables, etc.) - Images → Keep as image contentParts (no Vision AI extraction)
- Section generation → Handle images with Vision AI when needed
Pros:
- Consistent with pre-extracted JSON flow
- Single point of AI processing (section generation)
- Clear separation of concerns
Cons:
- Images won't have extracted text until section generation
- May need to handle images differently in section generation
Option B: Keep Vision AI for Images Only
- Documents →
extractContent()→ Raw contentParts - Images → Vision AI extraction → Text contentParts
- Section generation → Uses text contentParts (no additional AI extraction)
Pros:
- Images get text extracted early
- Section generation can use text directly
Cons:
- Still has AI extraction before structure generation
- Inconsistent with user's request
Recommendation
Follow Option A - Remove all AI extraction from extractAndPrepareContent:
-
Documents → ContentParts (like pre-extracted JSON):
- Call
extractContent()(NON-AI) - Create contentParts with raw extracted content
- Images remain as image contentParts (no Vision AI)
- Call
-
Section Generation:
- Handle images with Vision AI when needed
- Aggregate all contentParts with AI
- Single point of AI processing
Benefits:
- Clear architecture: Documents = raw contentParts
- Consistent with pre-extracted JSON flow
- AI processing only where needed (section generation)
- Easier to understand and maintain
Questions to Resolve
-
Image handling: How should images be processed during section generation?
- Option 1: Vision AI extraction happens automatically when image contentParts are used
- Option 2: Images are passed to AI with Vision models during section generation
- Option 3: Images remain as binary and are rendered directly (no text extraction)
-
Text with extractionPrompt: Should text contentParts with extractionPrompt be processed differently?
- Currently: AI processing in
extractAndPrepareContent - Proposed: Raw text → AI processing during section generation
- Currently: AI processing in
-
Performance: Will deferring image extraction to section generation cause performance issues?
- Need to test with multiple images