# Architecture Changes Summary ## Problem Identified The architecture had AI extraction happening in TWO places: 1. **`extractAndPrepareContent`**: Vision AI for images, AI processing for text with extractionPrompt 2. **Section generation**: AI aggregation of contentParts This was: - Redundant (double AI processing) - Inconsistent (pre-extracted JSON had no AI, regular documents had AI) - Against the desired architecture (documents should become contentParts like pre-extracted JSON) ## Solution Implemented ### 1. Removed AI Extraction from `extractAndPrepareContent` **File**: `gateway/modules/services/serviceAi/subContentExtraction.py` **Changes**: - **Removed**: Vision AI extraction for images (lines 186-246) - **Removed**: AI text processing with extractionPrompt (lines 260-334) - **Updated**: Images with extract intent are now marked with `needsVisionExtraction=True` flag - **Updated**: Regular documents mark images with `needsVisionExtraction=True` when extract intent is present **Result**: Documents → contentParts (raw extraction only, no AI) ### 2. Added Vision AI Extraction in Section Generation **File**: `gateway/modules/services/serviceAi/subStructureFilling.py` **Changes**: - **Added**: Vision AI extraction logic before aggregation (lines 553-610) - **Added**: Vision AI extraction logic for single-part processing (lines 1074-1115) - **Logic**: - Checks if `part.typeGroup == "image"` AND `needsVisionExtraction == True` AND `intent == "extract"` - Extracts text using Vision AI (`IMAGE_ANALYSE` operation) - Replaces image part with text part for further processing - Images with `contentFormat == "object"` (render intent) are rendered directly (no extraction) **Result**: AI extraction happens ONLY during section generation ## Architecture Flow (After Changes) ### Document Input → ContentParts 1. **Regular documents**: `extractContent()` (NON-AI) → Raw contentParts - Images with extract intent: `contentFormat="extracted"`, `needsVisionExtraction=True` - Images with render intent: `contentFormat="object"` (rendered directly) - Text: `contentFormat="extracted"` (raw text, no AI processing) 2. **Pre-extracted JSON**: Direct contentParts (no changes) ### Section Generation → AI Processing 1. **Images with extract intent**: Vision AI extraction → Text part → AI aggregation 2. **Images with render intent**: Rendered directly (no extraction) 3. **Text contentParts**: AI aggregation with extractionPrompt (if provided) ## Key Benefits 1. **Consistent Architecture**: Documents = raw contentParts (like pre-extracted JSON) 2. **Single Point of AI Processing**: Only in section generation 3. **Clear Separation**: Extraction vs Generation 4. **Intent-Based Logic**: - `intent == "extract"` → Vision AI extraction during section generation - `intent == "render"` → Direct rendering (no extraction) - `contentFormat == "object"` → Embedded/referenced images (no extraction) ## Testing Checklist - [ ] Regular documents create contentParts without AI extraction - [ ] Images with extract intent are marked with `needsVisionExtraction=True` - [ ] Images with render intent are marked with `contentFormat="object"` - [ ] Section generation extracts images with Vision AI when needed - [ ] Section generation renders images with object format directly - [ ] Text contentParts are processed with AI during section generation - [ ] Pre-extracted JSON flow still works correctly