114 lines
4.2 KiB
Markdown
114 lines
4.2 KiB
Markdown
# Document Generation Architecture Analysis
|
|
|
|
## Current Flow
|
|
|
|
### 1. Document Input → ContentParts (`extractAndPrepareContent`)
|
|
|
|
**Location**: `gateway/modules/services/serviceAi/subContentExtraction.py`
|
|
|
|
**Flow**:
|
|
- Regular documents → Calls `extractContent()` (NON-AI extraction) → Creates contentParts with raw extracted text
|
|
- **BUT THEN**:
|
|
- Images with "extract" intent → Calls Vision AI (line 190) → AI extraction
|
|
- Text with "extract" intent + extractionPrompt → Calls AI processing (line 265) → AI extraction
|
|
- Pre-extracted JSON → Uses contentParts directly (no AI)
|
|
|
|
**Result**: ContentParts may already be AI-processed before structure generation
|
|
|
|
### 2. Structure Generation
|
|
|
|
**Location**: `gateway/modules/services/serviceAi/subStructureGeneration.py`
|
|
|
|
**Flow**:
|
|
- Uses contentParts (may already be AI-processed)
|
|
- Generates document structure (chapters, sections)
|
|
|
|
### 3. Section Generation (`_processSingleSection`)
|
|
|
|
**Location**: `gateway/modules/services/serviceAi/subStructureFilling.py`
|
|
|
|
**Flow**:
|
|
- Uses contentParts (which may already be AI-processed)
|
|
- Aggregates "extracted" contentParts with AI (line 554-682)
|
|
- Generates section content using `callAiWithLooping` with `useCaseId="section_content"`
|
|
|
|
## Issues Identified
|
|
|
|
### Issue 1: Duplicate AI Processing
|
|
- AI extraction happens in `extractAndPrepareContent` (for images/text)
|
|
- AI generation happens again in section generation
|
|
- This is redundant and inefficient
|
|
|
|
### Issue 2: Architecture Inconsistency
|
|
- Pre-extracted JSON files → contentParts directly (no AI)
|
|
- Regular documents → contentParts + AI extraction (inconsistent)
|
|
- User wants: Documents → contentParts (like pre-extracted JSON) → AI only in section generation
|
|
|
|
### Issue 3: Image Processing
|
|
- Images need Vision AI to extract text
|
|
- Currently happens in `extractAndPrepareContent`
|
|
- Question: Should this happen during section generation instead?
|
|
|
|
## Proposed Architecture
|
|
|
|
### Option A: Remove All AI from `extractAndPrepareContent`
|
|
- Documents → `extractContent()` → Raw contentParts (text, tables, etc.)
|
|
- Images → Keep as image contentParts (no Vision AI extraction)
|
|
- Section generation → Handle images with Vision AI when needed
|
|
|
|
**Pros**:
|
|
- Consistent with pre-extracted JSON flow
|
|
- Single point of AI processing (section generation)
|
|
- Clear separation of concerns
|
|
|
|
**Cons**:
|
|
- Images won't have extracted text until section generation
|
|
- May need to handle images differently in section generation
|
|
|
|
### Option B: Keep Vision AI for Images Only
|
|
- Documents → `extractContent()` → Raw contentParts
|
|
- Images → Vision AI extraction → Text contentParts
|
|
- Section generation → Uses text contentParts (no additional AI extraction)
|
|
|
|
**Pros**:
|
|
- Images get text extracted early
|
|
- Section generation can use text directly
|
|
|
|
**Cons**:
|
|
- Still has AI extraction before structure generation
|
|
- Inconsistent with user's request
|
|
|
|
## Recommendation
|
|
|
|
**Follow Option A** - Remove all AI extraction from `extractAndPrepareContent`:
|
|
|
|
1. **Documents → ContentParts** (like pre-extracted JSON):
|
|
- Call `extractContent()` (NON-AI)
|
|
- Create contentParts with raw extracted content
|
|
- Images remain as image contentParts (no Vision AI)
|
|
|
|
2. **Section Generation**:
|
|
- Handle images with Vision AI when needed
|
|
- Aggregate all contentParts with AI
|
|
- Single point of AI processing
|
|
|
|
**Benefits**:
|
|
- Clear architecture: Documents = raw contentParts
|
|
- Consistent with pre-extracted JSON flow
|
|
- AI processing only where needed (section generation)
|
|
- Easier to understand and maintain
|
|
|
|
## Questions to Resolve
|
|
|
|
1. **Image handling**: How should images be processed during section generation?
|
|
- Option 1: Vision AI extraction happens automatically when image contentParts are used
|
|
- Option 2: Images are passed to AI with Vision models during section generation
|
|
- Option 3: Images remain as binary and are rendered directly (no text extraction)
|
|
|
|
2. **Text with extractionPrompt**: Should text contentParts with extractionPrompt be processed differently?
|
|
- Currently: AI processing in `extractAndPrepareContent`
|
|
- Proposed: Raw text → AI processing during section generation
|
|
|
|
3. **Performance**: Will deferring image extraction to section generation cause performance issues?
|
|
- Need to test with multiple images
|
|
|