gateway/modules/services/serviceGeneration/paths/ARCHITECTURE_CHANGES.md

3.4 KiB

Architecture Changes Summary

Problem Identified

The architecture had AI extraction happening in TWO places:

  1. extractAndPrepareContent: Vision AI for images, AI processing for text with extractionPrompt
  2. Section generation: AI aggregation of contentParts

This was:

  • Redundant (double AI processing)
  • Inconsistent (pre-extracted JSON had no AI, regular documents had AI)
  • Against the desired architecture (documents should become contentParts like pre-extracted JSON)

Solution Implemented

1. Removed AI Extraction from extractAndPrepareContent

File: gateway/modules/services/serviceAi/subContentExtraction.py

Changes:

  • Removed: Vision AI extraction for images (lines 186-246)
  • Removed: AI text processing with extractionPrompt (lines 260-334)
  • Updated: Images with extract intent are now marked with needsVisionExtraction=True flag
  • Updated: Regular documents mark images with needsVisionExtraction=True when extract intent is present

Result: Documents → contentParts (raw extraction only, no AI)

2. Added Vision AI Extraction in Section Generation

File: gateway/modules/services/serviceAi/subStructureFilling.py

Changes:

  • Added: Vision AI extraction logic before aggregation (lines 553-610)
  • Added: Vision AI extraction logic for single-part processing (lines 1074-1115)
  • Logic:
    • Checks if part.typeGroup == "image" AND needsVisionExtraction == True AND intent == "extract"
    • Extracts text using Vision AI (IMAGE_ANALYSE operation)
    • Replaces image part with text part for further processing
    • Images with contentFormat == "object" (render intent) are rendered directly (no extraction)

Result: AI extraction happens ONLY during section generation

Architecture Flow (After Changes)

Document Input → ContentParts

  1. Regular documents: extractContent() (NON-AI) → Raw contentParts

    • Images with extract intent: contentFormat="extracted", needsVisionExtraction=True
    • Images with render intent: contentFormat="object" (rendered directly)
    • Text: contentFormat="extracted" (raw text, no AI processing)
  2. Pre-extracted JSON: Direct contentParts (no changes)

Section Generation → AI Processing

  1. Images with extract intent: Vision AI extraction → Text part → AI aggregation
  2. Images with render intent: Rendered directly (no extraction)
  3. Text contentParts: AI aggregation with extractionPrompt (if provided)

Key Benefits

  1. Consistent Architecture: Documents = raw contentParts (like pre-extracted JSON)
  2. Single Point of AI Processing: Only in section generation
  3. Clear Separation: Extraction vs Generation
  4. Intent-Based Logic:
    • intent == "extract" → Vision AI extraction during section generation
    • intent == "render" → Direct rendering (no extraction)
    • contentFormat == "object" → Embedded/referenced images (no extraction)

Testing Checklist

  • Regular documents create contentParts without AI extraction
  • Images with extract intent are marked with needsVisionExtraction=True
  • Images with render intent are marked with contentFormat="object"
  • Section generation extracts images with Vision AI when needed
  • Section generation renders images with object format directly
  • Text contentParts are processed with AI during section generation
  • Pre-extracted JSON flow still works correctly