# Document Generation Architecture Analysis ## Current Flow ### 1. Document Input → ContentParts (`extractAndPrepareContent`) **Location**: `gateway/modules/services/serviceAi/subContentExtraction.py` **Flow**: - Regular documents → Calls `extractContent()` (NON-AI extraction) → Creates contentParts with raw extracted text - **BUT THEN**: - Images with "extract" intent → Calls Vision AI (line 190) → AI extraction - Text with "extract" intent + extractionPrompt → Calls AI processing (line 265) → AI extraction - Pre-extracted JSON → Uses contentParts directly (no AI) **Result**: ContentParts may already be AI-processed before structure generation ### 2. Structure Generation **Location**: `gateway/modules/services/serviceAi/subStructureGeneration.py` **Flow**: - Uses contentParts (may already be AI-processed) - Generates document structure (chapters, sections) ### 3. Section Generation (`_processSingleSection`) **Location**: `gateway/modules/services/serviceAi/subStructureFilling.py` **Flow**: - Uses contentParts (which may already be AI-processed) - Aggregates "extracted" contentParts with AI (line 554-682) - Generates section content using `callAiWithLooping` with `useCaseId="section_content"` ## Issues Identified ### Issue 1: Duplicate AI Processing - AI extraction happens in `extractAndPrepareContent` (for images/text) - AI generation happens again in section generation - This is redundant and inefficient ### Issue 2: Architecture Inconsistency - Pre-extracted JSON files → contentParts directly (no AI) - Regular documents → contentParts + AI extraction (inconsistent) - User wants: Documents → contentParts (like pre-extracted JSON) → AI only in section generation ### Issue 3: Image Processing - Images need Vision AI to extract text - Currently happens in `extractAndPrepareContent` - Question: Should this happen during section generation instead? ## Proposed Architecture ### Option A: Remove All AI from `extractAndPrepareContent` - Documents → `extractContent()` → Raw contentParts (text, tables, etc.) - Images → Keep as image contentParts (no Vision AI extraction) - Section generation → Handle images with Vision AI when needed **Pros**: - Consistent with pre-extracted JSON flow - Single point of AI processing (section generation) - Clear separation of concerns **Cons**: - Images won't have extracted text until section generation - May need to handle images differently in section generation ### Option B: Keep Vision AI for Images Only - Documents → `extractContent()` → Raw contentParts - Images → Vision AI extraction → Text contentParts - Section generation → Uses text contentParts (no additional AI extraction) **Pros**: - Images get text extracted early - Section generation can use text directly **Cons**: - Still has AI extraction before structure generation - Inconsistent with user's request ## Recommendation **Follow Option A** - Remove all AI extraction from `extractAndPrepareContent`: 1. **Documents → ContentParts** (like pre-extracted JSON): - Call `extractContent()` (NON-AI) - Create contentParts with raw extracted content - Images remain as image contentParts (no Vision AI) 2. **Section Generation**: - Handle images with Vision AI when needed - Aggregate all contentParts with AI - Single point of AI processing **Benefits**: - Clear architecture: Documents = raw contentParts - Consistent with pre-extracted JSON flow - AI processing only where needed (section generation) - Easier to understand and maintain ## Questions to Resolve 1. **Image handling**: How should images be processed during section generation? - Option 1: Vision AI extraction happens automatically when image contentParts are used - Option 2: Images are passed to AI with Vision models during section generation - Option 3: Images remain as binary and are rendered directly (no text extraction) 2. **Text with extractionPrompt**: Should text contentParts with extractionPrompt be processed differently? - Currently: AI processing in `extractAndPrepareContent` - Proposed: Raw text → AI processing during section generation 3. **Performance**: Will deferring image extraction to section generation cause performance issues? - Need to test with multiple images