diff --git a/appdoc/doc_concept_callAiContent_flow.md b/appdoc/doc_concept_callAiContent_flow.md new file mode 100644 index 0000000..6eb96c1 --- /dev/null +++ b/appdoc/doc_concept_callAiContent_flow.md @@ -0,0 +1,82 @@ +# callAiContent Process Flow + +## Function: `callAiContent(prompt, options, contentParts, outputFormat, title, parentOperationId)` + +### High-Level Flow + +``` +START callAiContent +│ +├─> Initialize operation tracking (aiOperationId) +├─> Determine operationType (if not set) +│ +├─> [BRANCH 1] IMAGE_GENERATE → Handle image generation → RETURN +│ +├─> [BRANCH 2] WEB_SEARCH/WEB_CRAWL → Handle web operations → RETURN +│ +└─> [BRANCH 3] Unified Document Generation Path (outputFormat always set, defaults to "txt") + │ + ├─> IF contentParts provided: + │ │ + │ ├─> SEPARATE parts: + │ │ ├─> imageParts = [typeGroup=="image" OR mimeType starts "image/"] + │ │ ├─> textParts = [typeGroup in ["text","table","structure"] OR mimeType starts "text/"] + │ │ └─> otherParts = [rest] → SKIP + │ │ + │ ├─> [PART 2.1] PROCESS IMAGES: + │ │ FOR each imagePart: + │ │ ├─> Create AiCallRequest with USER PROMPT + IMAGE_ANALYSE + │ │ ├─> Call aiObjects.call() → vision model extracts data + │ │ └─> Replace imagePart with text ContentPart (extracted data) + │ │ + │ ├─> [PART 2.2] ADD TEXT PARTS as-is + │ │ + │ └─> Convert processedParts to text string → content_for_generation + │ + ├─> Build generation prompt (includes content_for_generation) + ├─> Call _callAiWithLooping for JSON generation + ├─> Parse JSON and render to outputFormat (txt, docx, xlsx, pdf, etc.) + └─> RETURN AiResponse with rendered document +``` + +### Detailed Flow: Unified Document Generation Path (lines 1055-1257) + +``` +IF outputFormat specified: +│ +├─> IF contentParts provided: +│ │ +│ ├─> SEPARATE parts (same logic as text path): +│ │ ├─> imageParts, textParts, otherParts +│ │ +│ ├─> [PART 2.1] PROCESS IMAGES: +│ │ └─> Same as text path: vision models with USER PROMPT +│ │ +│ ├─> [PART 2.2] ADD TEXT PARTS as-is +│ │ +│ └─> Convert processedParts to text: +│ └─> content_for_generation = "\n\n".join([...]) +│ +├─> Build generation prompt: +│ └─> generation_prompt = buildGenerationPrompt(outputFormat, prompt, title, content_for_generation, None) +│ +├─> Call _callAiWithLooping(generation_prompt, ...) for JSON generation +├─> Parse generated JSON +├─> Render to outputFormat (xlsx, docx, pdf, etc.) +└─> RETURN AiResponse with rendered document +``` + +### Key Points + +1. **Unified Path**: All formats (txt, docx, xlsx, pdf, etc.) use the same document generation path. If `outputFormat` is not specified, it defaults to "txt". + +2. **User Prompt is Critical**: The prompt from `ai.process` (user's intention) is used to extract data from images, not a generic "extract all text" prompt. + +3. **Vision Models**: Images are processed with `IMAGE_ANALYSE` operation type, which routes to vision-capable models. + +4. **Text Extraction**: After vision processing, images are replaced with text ContentParts containing the extracted data. + +5. **JSON Structure**: All formats generate JSON structure first, then render to the target format (txt rendering extracts text from JSON structure). + +6. **Binary Skipping**: Binary and other non-text parts are excluded from processing. +