From ab8a18200142f365e6add9502eef358126415f3b Mon Sep 17 00:00:00 2001
From: ValueOn AG
Date: Mon, 1 Dec 2025 19:16:01 +0100
Subject: [PATCH] rev ai process
---
appdoc/doc_concept_callAiContent_flow.md | 82 ++++++++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100644 appdoc/doc_concept_callAiContent_flow.md
diff --git a/appdoc/doc_concept_callAiContent_flow.md b/appdoc/doc_concept_callAiContent_flow.md
new file mode 100644
index 0000000..6eb96c1
--- /dev/null
+++ b/appdoc/doc_concept_callAiContent_flow.md
@@ -0,0 +1,82 @@
+# callAiContent Process Flow
+
+## Function: `callAiContent(prompt, options, contentParts, outputFormat, title, parentOperationId)`
+
+### High-Level Flow
+
+```
+START callAiContent
+│
+├─> Initialize operation tracking (aiOperationId)
+├─> Determine operationType (if not set)
+│
+├─> [BRANCH 1] IMAGE_GENERATE → Handle image generation → RETURN
+│
+├─> [BRANCH 2] WEB_SEARCH/WEB_CRAWL → Handle web operations → RETURN
+│
+└─> [BRANCH 3] Unified Document Generation Path (outputFormat always set, defaults to "txt")
+ │
+ ├─> IF contentParts provided:
+ │ │
+ │ ├─> SEPARATE parts:
+ │ │ ├─> imageParts = [typeGroup=="image" OR mimeType starts "image/"]
+ │ │ ├─> textParts = [typeGroup in ["text","table","structure"] OR mimeType starts "text/"]
+ │ │ └─> otherParts = [rest] → SKIP
+ │ │
+ │ ├─> [PART 2.1] PROCESS IMAGES:
+ │ │ FOR each imagePart:
+ │ │ ├─> Create AiCallRequest with USER PROMPT + IMAGE_ANALYSE
+ │ │ ├─> Call aiObjects.call() → vision model extracts data
+ │ │ └─> Replace imagePart with text ContentPart (extracted data)
+ │ │
+ │ ├─> [PART 2.2] ADD TEXT PARTS as-is
+ │ │
+ │ └─> Convert processedParts to text string → content_for_generation
+ │
+ ├─> Build generation prompt (includes content_for_generation)
+ ├─> Call _callAiWithLooping for JSON generation
+ ├─> Parse JSON and render to outputFormat (txt, docx, xlsx, pdf, etc.)
+ └─> RETURN AiResponse with rendered document
+```
+
+### Detailed Flow: Unified Document Generation Path (lines 1055-1257)
+
+```
+IF outputFormat specified:
+│
+├─> IF contentParts provided:
+│ │
+│ ├─> SEPARATE parts (same logic as text path):
+│ │ ├─> imageParts, textParts, otherParts
+│ │
+│ ├─> [PART 2.1] PROCESS IMAGES:
+│ │ └─> Same as text path: vision models with USER PROMPT
+│ │
+│ ├─> [PART 2.2] ADD TEXT PARTS as-is
+│ │
+│ └─> Convert processedParts to text:
+│ └─> content_for_generation = "\n\n".join([...])
+│
+├─> Build generation prompt:
+│ └─> generation_prompt = buildGenerationPrompt(outputFormat, prompt, title, content_for_generation, None)
+│
+├─> Call _callAiWithLooping(generation_prompt, ...) for JSON generation
+├─> Parse generated JSON
+├─> Render to outputFormat (xlsx, docx, pdf, etc.)
+└─> RETURN AiResponse with rendered document
+```
+
+### Key Points
+
+1. **Unified Path**: All formats (txt, docx, xlsx, pdf, etc.) use the same document generation path. If `outputFormat` is not specified, it defaults to "txt".
+
+2. **User Prompt is Critical**: The prompt from `ai.process` (user's intention) is used to extract data from images, not a generic "extract all text" prompt.
+
+3. **Vision Models**: Images are processed with `IMAGE_ANALYSE` operation type, which routes to vision-capable models.
+
+4. **Text Extraction**: After vision processing, images are replaced with text ContentParts containing the extracted data.
+
+5. **JSON Structure**: All formats generate JSON structure first, then render to the target format (txt rendering extracts text from JSON structure).
+
+6. **Binary Skipping**: Binary and other non-text parts are excluded from processing.
+