Architecture Changes Summary

Problem Identified

The architecture had AI extraction happening in TWO places:

extractAndPrepareContent: Vision AI for images, AI processing for text with extractionPrompt
Section generation: AI aggregation of contentParts

This was:

Redundant (double AI processing)
Inconsistent (pre-extracted JSON had no AI, regular documents had AI)
Against the desired architecture (documents should become contentParts like pre-extracted JSON)

File: gateway/modules/services/serviceAi/subContentExtraction.py

Changes:

Removed: Vision AI extraction for images (lines 186-246)
Removed: AI text processing with extractionPrompt (lines 260-334)
Updated: Images with extract intent are now marked with needsVisionExtraction=True flag
Updated: Regular documents mark images with needsVisionExtraction=True when extract intent is present

Result: Documents → contentParts (raw extraction only, no AI)

File: gateway/modules/services/serviceAi/subStructureFilling.py

Changes:

Added: Vision AI extraction logic before aggregation (lines 553-610)
Added: Vision AI extraction logic for single-part processing (lines 1074-1115)
Logic:
- Checks if part.typeGroup == "image" AND needsVisionExtraction == True AND intent == "extract"
- Extracts text using Vision AI (IMAGE_ANALYSE operation)
- Replaces image part with text part for further processing
- Images with contentFormat == "object" (render intent) are rendered directly (no extraction)

Result: AI extraction happens ONLY during section generation

Regular documents: extractContent() (NON-AI) → Raw contentParts
- Images with extract intent: contentFormat="extracted", needsVisionExtraction=True
- Images with render intent: contentFormat="object" (rendered directly)
- Text: contentFormat="extracted" (raw text, no AI processing)
Pre-extracted JSON: Direct contentParts (no changes)

Images with extract intent: Vision AI extraction → Text part → AI aggregation
Images with render intent: Rendered directly (no extraction)
Text contentParts: AI aggregation with extractionPrompt (if provided)

Consistent Architecture: Documents = raw contentParts (like pre-extracted JSON)
Single Point of AI Processing: Only in section generation
Clear Separation: Extraction vs Generation
Intent-Based Logic:
- intent == "extract" → Vision AI extraction during section generation
- intent == "render" → Direct rendering (no extraction)
- contentFormat == "object" → Embedded/referenced images (no extraction)