gateway/modules/services/serviceGeneration/paths/ARCHITECTURE_ANALYSIS.md

114 lines
4.2 KiB
Markdown

# Document Generation Architecture Analysis
## Current Flow
### 1. Document Input → ContentParts (`extractAndPrepareContent`)
**Location**: `gateway/modules/services/serviceAi/subContentExtraction.py`
**Flow**:
- Regular documents → Calls `extractContent()` (NON-AI extraction) → Creates contentParts with raw extracted text
- **BUT THEN**:
- Images with "extract" intent → Calls Vision AI (line 190) → AI extraction
- Text with "extract" intent + extractionPrompt → Calls AI processing (line 265) → AI extraction
- Pre-extracted JSON → Uses contentParts directly (no AI)
**Result**: ContentParts may already be AI-processed before structure generation
### 2. Structure Generation
**Location**: `gateway/modules/services/serviceAi/subStructureGeneration.py`
**Flow**:
- Uses contentParts (may already be AI-processed)
- Generates document structure (chapters, sections)
### 3. Section Generation (`_processSingleSection`)
**Location**: `gateway/modules/services/serviceAi/subStructureFilling.py`
**Flow**:
- Uses contentParts (which may already be AI-processed)
- Aggregates "extracted" contentParts with AI (line 554-682)
- Generates section content using `callAiWithLooping` with `useCaseId="section_content"`
## Issues Identified
### Issue 1: Duplicate AI Processing
- AI extraction happens in `extractAndPrepareContent` (for images/text)
- AI generation happens again in section generation
- This is redundant and inefficient
### Issue 2: Architecture Inconsistency
- Pre-extracted JSON files → contentParts directly (no AI)
- Regular documents → contentParts + AI extraction (inconsistent)
- User wants: Documents → contentParts (like pre-extracted JSON) → AI only in section generation
### Issue 3: Image Processing
- Images need Vision AI to extract text
- Currently happens in `extractAndPrepareContent`
- Question: Should this happen during section generation instead?
## Proposed Architecture
### Option A: Remove All AI from `extractAndPrepareContent`
- Documents → `extractContent()` → Raw contentParts (text, tables, etc.)
- Images → Keep as image contentParts (no Vision AI extraction)
- Section generation → Handle images with Vision AI when needed
**Pros**:
- Consistent with pre-extracted JSON flow
- Single point of AI processing (section generation)
- Clear separation of concerns
**Cons**:
- Images won't have extracted text until section generation
- May need to handle images differently in section generation
### Option B: Keep Vision AI for Images Only
- Documents → `extractContent()` → Raw contentParts
- Images → Vision AI extraction → Text contentParts
- Section generation → Uses text contentParts (no additional AI extraction)
**Pros**:
- Images get text extracted early
- Section generation can use text directly
**Cons**:
- Still has AI extraction before structure generation
- Inconsistent with user's request
## Recommendation
**Follow Option A** - Remove all AI extraction from `extractAndPrepareContent`:
1. **Documents → ContentParts** (like pre-extracted JSON):
- Call `extractContent()` (NON-AI)
- Create contentParts with raw extracted content
- Images remain as image contentParts (no Vision AI)
2. **Section Generation**:
- Handle images with Vision AI when needed
- Aggregate all contentParts with AI
- Single point of AI processing
**Benefits**:
- Clear architecture: Documents = raw contentParts
- Consistent with pre-extracted JSON flow
- AI processing only where needed (section generation)
- Easier to understand and maintain
## Questions to Resolve
1. **Image handling**: How should images be processed during section generation?
- Option 1: Vision AI extraction happens automatically when image contentParts are used
- Option 2: Images are passed to AI with Vision models during section generation
- Option 3: Images remain as binary and are rendered directly (no text extraction)
2. **Text with extractionPrompt**: Should text contentParts with extractionPrompt be processed differently?
- Currently: AI processing in `extractAndPrepareContent`
- Proposed: Raw text → AI processing during section generation
3. **Performance**: Will deferring image extraction to section generation cause performance issues?
- Need to test with multiple images