wiki/z-archive/appdoc/doc_concept_callAiContent_flow.md

3.3 KiB

callAiContent Process Flow

Function: callAiContent(prompt, options, contentParts, outputFormat, title, parentOperationId)

High-Level Flow

START callAiContent
│
├─> Initialize operation tracking (aiOperationId)
├─> Determine operationType (if not set)
│
├─> [BRANCH 1] IMAGE_GENERATE → Handle image generation → RETURN
│
├─> [BRANCH 2] WEB_SEARCH/WEB_CRAWL → Handle web operations → RETURN
│
└─> [BRANCH 3] Unified Document Generation Path (outputFormat always set, defaults to "txt")
    │
    ├─> IF contentParts provided:
    │   │
    │   ├─> SEPARATE parts:
    │   │   ├─> imageParts = [typeGroup=="image" OR mimeType starts "image/"]
    │   │   ├─> textParts = [typeGroup in ["text","table","structure"] OR mimeType starts "text/"]
    │   │   └─> otherParts = [rest] → SKIP
    │   │
    │   ├─> [PART 2.1] PROCESS IMAGES:
    │   │   FOR each imagePart:
    │   │   ├─> Create AiCallRequest with USER PROMPT + IMAGE_ANALYSE
    │   │   ├─> Call aiObjects.call() → vision model extracts data
    │   │   └─> Replace imagePart with text ContentPart (extracted data)
    │   │
    │   ├─> [PART 2.2] ADD TEXT PARTS as-is
    │   │
    │   └─> Convert processedParts to text string → content_for_generation
    │
    ├─> Build generation prompt (includes content_for_generation)
    ├─> Call _callAiWithLooping for JSON generation
    ├─> Parse JSON and render to outputFormat (txt, docx, xlsx, pdf, etc.)
    └─> RETURN AiResponse with rendered document

Detailed Flow: Unified Document Generation Path (lines 1055-1257)

IF outputFormat specified:
│
├─> IF contentParts provided:
│   │
│   ├─> SEPARATE parts (same logic as text path):
│   │   ├─> imageParts, textParts, otherParts
│   │
│   ├─> [PART 2.1] PROCESS IMAGES:
│   │   └─> Same as text path: vision models with USER PROMPT
│   │
│   ├─> [PART 2.2] ADD TEXT PARTS as-is
│   │
│   └─> Convert processedParts to text:
│       └─> content_for_generation = "\n\n".join([...])
│
├─> Build generation prompt:
│   └─> generation_prompt = buildGenerationPrompt(outputFormat, prompt, title, content_for_generation, None)
│
├─> Call _callAiWithLooping(generation_prompt, ...) for JSON generation
├─> Parse generated JSON
├─> Render to outputFormat (xlsx, docx, pdf, etc.)
└─> RETURN AiResponse with rendered document

Key Points

  1. Unified Path: All formats (txt, docx, xlsx, pdf, etc.) use the same document generation path. If outputFormat is not specified, it defaults to "txt".

  2. User Prompt is Critical: The prompt from ai.process (user's intention) is used to extract data from images, not a generic "extract all text" prompt.

  3. Vision Models: Images are processed with IMAGE_ANALYSE operation type, which routes to vision-capable models.

  4. Text Extraction: After vision processing, images are replaced with text ContentParts containing the extracted data.

  5. JSON Structure: All formats generate JSON structure first, then render to the target format (txt rendering extracts text from JSON structure).

  6. Binary Skipping: Binary and other non-text parts are excluded from processing.