3.4 KiB
3.4 KiB
Architecture Changes Summary
Problem Identified
The architecture had AI extraction happening in TWO places:
extractAndPrepareContent: Vision AI for images, AI processing for text with extractionPrompt- Section generation: AI aggregation of contentParts
This was:
- Redundant (double AI processing)
- Inconsistent (pre-extracted JSON had no AI, regular documents had AI)
- Against the desired architecture (documents should become contentParts like pre-extracted JSON)
Solution Implemented
1. Removed AI Extraction from extractAndPrepareContent
File: gateway/modules/services/serviceAi/subContentExtraction.py
Changes:
- Removed: Vision AI extraction for images (lines 186-246)
- Removed: AI text processing with extractionPrompt (lines 260-334)
- Updated: Images with extract intent are now marked with
needsVisionExtraction=Trueflag - Updated: Regular documents mark images with
needsVisionExtraction=Truewhen extract intent is present
Result: Documents → contentParts (raw extraction only, no AI)
2. Added Vision AI Extraction in Section Generation
File: gateway/modules/services/serviceAi/subStructureFilling.py
Changes:
- Added: Vision AI extraction logic before aggregation (lines 553-610)
- Added: Vision AI extraction logic for single-part processing (lines 1074-1115)
- Logic:
- Checks if
part.typeGroup == "image"ANDneedsVisionExtraction == TrueANDintent == "extract" - Extracts text using Vision AI (
IMAGE_ANALYSEoperation) - Replaces image part with text part for further processing
- Images with
contentFormat == "object"(render intent) are rendered directly (no extraction)
- Checks if
Result: AI extraction happens ONLY during section generation
Architecture Flow (After Changes)
Document Input → ContentParts
-
Regular documents:
extractContent()(NON-AI) → Raw contentParts- Images with extract intent:
contentFormat="extracted",needsVisionExtraction=True - Images with render intent:
contentFormat="object"(rendered directly) - Text:
contentFormat="extracted"(raw text, no AI processing)
- Images with extract intent:
-
Pre-extracted JSON: Direct contentParts (no changes)
Section Generation → AI Processing
- Images with extract intent: Vision AI extraction → Text part → AI aggregation
- Images with render intent: Rendered directly (no extraction)
- Text contentParts: AI aggregation with extractionPrompt (if provided)
Key Benefits
- Consistent Architecture: Documents = raw contentParts (like pre-extracted JSON)
- Single Point of AI Processing: Only in section generation
- Clear Separation: Extraction vs Generation
- Intent-Based Logic:
intent == "extract"→ Vision AI extraction during section generationintent == "render"→ Direct rendering (no extraction)contentFormat == "object"→ Embedded/referenced images (no extraction)
Testing Checklist
- Regular documents create contentParts without AI extraction
- Images with extract intent are marked with
needsVisionExtraction=True - Images with render intent are marked with
contentFormat="object" - Section generation extracts images with Vision AI when needed
- Section generation renders images with object format directly
- Text contentParts are processed with AI during section generation
- Pre-extracted JSON flow still works correctly