ValueOn AG 8bd4e67be6 docs: complete wiki restructuring - new folder hierarchy, canonical reference pages, archive old docs

Made-with: Cursor

2026-04-05 23:28:14 +02:00

27 KiB

Raw Blame History

Implementierungskonzept: Chapter-basierte Generierungs-Struktur

Übersicht

Wechsel von section-basierter zu chapter-basierter Struktur zur Lösung folgender Probleme:

Section-Generierungs-Prompts kennen den Standard-JSON-Schema nicht
Gemischte Element-Typen können nicht korrekt verarbeitet werden
Sections sind zu starr - können nicht mehrere Element-Typen enthalten

Kritische Analyse: Aggregation mehrerer ContentParts

Problem identifiziert:

Bestimmte content_type (z.B. table, bullet_list) benötigen Aggregation mehrerer ContentParts
Beispiel: 20 Spesenbelege → eine Excel-Tabelle
Aktuell: Jeder ContentPart wird einzeln verarbeitet → keine Aggregation möglich

Lösung implementiert:

Generische _needsAggregation() Funktion erkennt Aggregations-Bedarf
Wenn Aggregation nötig: Alle ContentParts zusammen an callAi übergeben
Verwendet callAi statt callAiPlanning für ContentParts-Unterstützung
Automatisches Chunking funktioniert auch bei aggregierten Parts

Unterstützte Aggregations-Typen:

table: Mehrere Parts → eine Tabelle (z.B. Excel-Liste)
bullet_list: Mehrere Parts → eine Liste
Weitere Typen können einfach hinzugefügt werden

Kernprinzipien

Chapter-basierte Struktur: Struktur-Generierung definiert Chapters, nicht Sections
Chapters enthalten Sections: Jedes Chapter kann mehrere Sections unterschiedlicher Typen enthalten
Standard JSON Schema in Prompts: Chapter-Generierungs-Prompts enthalten vollständiges Standard-JSON-Schema
Flexible Content-Verarbeitung: Chapters können gemischte ContentParts enthalten
Hierarchische Überschriften: Chapters sind hierarchische Überschriften (Level 1, 2, 3, etc.)

Architektur

Chapters als Helper-Struktur

Chapters sind eine intermediate Helper-Struktur für die Generierung. Die finale Output-Struktur bleibt unverändert:

Finale Output-Struktur:

Document
  └── Sections[]
      └── content_type
      └── elements[]

Chapter-Struktur (Helper):

ChapterStructure
  └── Chapters[]
      └── level, title
      └── contentPartIds[]
      └── contentPartInstructions{}
      └── generationHint
      └── Sections[] (generiert)
          └── content_type
          └── elements[]

Wichtig:

Chapter = Container zur Generierung eines Dokument-Teils mit Sections
Jedes Chapter hat eine vordefinierte Heading-Section (Chapter-Title + Level)
Finale Output-Struktur hat keine Chapters - nur Sections
Chapters werden zu Sections geflatten für das finale Output

Workflow-Phasen

Wichtig - Debug-File-Logging:

Alle AI-Calls und Responses werden in Debug-Files geloggt
Prompts: {operationType}_{identifier}_prompt.txt
Responses: {operationType}_{identifier}_response.txt
Beispiele:
- Phase 5C: chapter_structure_generation_prompt.txt / chapter_structure_generation_response.txt
- Phase 5D.1: chapter_structure_{chapterId}_prompt.txt / chapter_structure_{chapterId}_response.txt
- Phase 5D.2: section_content_{sectionId}_prompt.txt / section_content_{sectionId}_response.txt

Phase 5B: Content Extraction

Was passiert:

Extrahiert Content basierend auf Intents
Bereitet ContentParts mit Metadaten vor
Alle Extraktionen passieren VOR Struktur-Generierung

Output:

Liste von ContentParts mit vollständigen Metadaten

Phase 5C: Chapter-Struktur-Generierung

Was passiert:

Generiert Chapter-Struktur (Table of Contents)
Definiert für jedes Chapter:
- Level, Title
- contentPartIds
- contentPartInstructions
- generationHint

Input:

userPrompt: User-Anfrage
contentParts: Alle vorbereiteten ContentParts (bereits extrahiert)
outputFormat: Ziel-Format

Process:

async def _generateChapterStructure(
    self,
    userPrompt: str,
    contentParts: List[ContentPart],
    outputFormat: str,
    parentOperationId: str
) -> Dict[str, Any]:
    structurePrompt = self._buildChapterStructurePrompt(
        userPrompt=userPrompt,
        contentParts=contentParts,
        outputFormat=outputFormat
    )
    
    # Debug: Log Prompt
    self.services.utils.writeDebugFile(
        structurePrompt,
        "chapter_structure_generation_prompt"
    )
    
    aiResponse = await self.services.ai.callAiPlanning(
        prompt=structurePrompt
    )
    
    # Debug: Log Response
    self.services.utils.writeDebugFile(
        aiResponse,
        "chapter_structure_generation_response"
    )
    
    structure = json.loads(
        self.services.utils.jsonExtractString(aiResponse)
    )
    
    return structure

Prompt-Format:

USER REQUEST: {userPrompt}

AVAILABLE CONTENT PARTS:
{contentPartsIndex}

TASK: Generiere Chapter-Struktur für die zu generierenden Dokumente.

Für jedes Chapter:
- chapter id
- level (1, 2, 3, etc.)
- title
- contentPartIds: [Liste von ContentPart-IDs]
- contentPartInstructions: {
    "partId": {
        "instruction": "Wie Content strukturiert werden soll"
    }
}
- generationHint: Beschreibung des Inhalts

RETURN JSON:
{
  "metadata": {...},
  "documents": [{
    "chapters": [
      {
        "id": "chapter_1",
        "level": 1,
        "title": "Introduction",
        "contentPartIds": ["part_ext_1"],
        "contentPartInstructions": {...},
        "generationHint": "...",
        "sections": []
      }
    ]
  }]
}

Output-Struktur:

{
  "metadata": {"title": "...", "language": "de"},
  "documents": [{
    "chapters": [
      {
        "id": "chapter_summary",
        "level": 1,
        "title": "Summary",
        "contentPartIds": ["extracted_doc1_part1"],
        "contentPartInstructions": {
          "extracted_doc1_part1": {
            "instruction": "Erstelle Zusammenfassungsparagraph"
          }
        },
        "generationHint": "Create summary",
        "sections": []
      }
    ]
  }]
}

Phase 5D: Chapter-Content-Generierung

Zwei-Phasen-Ansatz:

Phase 5D.1: Sections-Struktur generieren

Was passiert:

Generiert Sections-Struktur für jedes Chapter (ohne Content)
Sections enthalten: content_type, contentPartIds, generationHint, useAiCall
AI setzt useAiCall Flag direkt im JSON

useAiCall Flag:

useAiCall = true wenn:
- content_type != "paragraph" (Transformation nötig)
- Oder spezifische Anweisungen in contentPartInstructions (nur Teile verwenden)
useAiCall = false sonst (Content direkt einfügen)

Process:

async def _generateChapterStructure(
    self,
    chapterStructure: Dict[str, Any],
    contentParts: List[ContentPart],
    userPrompt: str,
    parentOperationId: str
) -> Dict[str, Any]:
    for doc in chapterStructure.get("documents", []):
        for chapter in doc.get("chapters", []):
            chapterId = chapter.get("id", "unknown")
            chapterPrompt = self._buildChapterStructurePrompt(
                chapter=chapter,
                contentPartIds=chapter.get("contentPartIds"),
                contentPartInstructions=chapter.get("contentPartInstructions"),
                userPrompt=userPrompt
            )
            
            # Debug: Log Prompt
            self.services.utils.writeDebugFile(
                chapterPrompt,
                f"chapter_structure_{chapterId}_prompt"
            )
            
            aiResponse = await self.services.ai.callAiPlanning(
                prompt=chapterPrompt
            )
            
            # Debug: Log Response
            self.services.utils.writeDebugFile(
                aiResponse,
                f"chapter_structure_{chapterId}_response"
            )
            
            sectionsStructure = json.loads(
                self.services.utils.jsonExtractString(aiResponse)
            )
            
            chapter["sections"] = sectionsStructure.get("sections", [])
            
            # Setze useAiCall Flag (falls nicht von AI gesetzt)
            for section in chapter["sections"]:
                if "useAiCall" not in section:
                    contentType = section.get("content_type", "paragraph")
                    useAiCall = contentType != "paragraph"
                    
                    # Prüfe contentPartInstructions
                    if not useAiCall:
                        for partId in section.get("contentPartIds", []):
                            instruction = contentPartInstructions.get(partId, {}).get("instruction", "")
                            if instruction and instruction.lower() not in ["include full text", "include all content"]:
                                useAiCall = True
                                break
                    
                    section["useAiCall"] = useAiCall
    
    return chapterStructure

Prompt-Format:

TASK: Generate Chapter Sections Structure

CHAPTER METADATA:
- Chapter ID: {chapterId}
- Chapter Level: {chapterLevel}
- Chapter Title: {chapterTitle}
- Generation Hint: {generationHint}

WICHTIG: Chapter hat bereits vordefinierte Heading-Section.
Generiere NICHT eine Heading-Section für Chapter-Title!

AVAILABLE CONTENT PARTS:
{contentPartIds}  # Nur IDs, KEINE Previews!

Für jeden ContentPart:
- ContentPart ID: {partId}
- Format: {contentFormat}
- Instruction: {contentPartInstructions[partId].instruction}

STANDARD JSON SCHEMA FOR SECTIONS:
[... Standard JSON Schema ...]

Return JSON:
{
  "sections": [
    {
      "id": "section_1",
      "content_type": "paragraph",
      "contentPartIds": ["part_ext_1"],
      "generationHint": "...",
      "useAiCall": false,  # AI setzt Flag direkt
      "elements": []
    }
  ]
}

Phase 5D.2: Sections mit ContentParts füllen

Was passiert:

Füllt Sections separat mit ContentParts
Basierend auf useAiCall Flag:
- useAiCall = true: Separater AI-Call mit ContentPart(s) (Chunking bei großen Parts)
- useAiCall = false: Content direkt einfügen
Rendering/Reference content: Immer direkt ohne AI-Call

Aggregation mehrerer ContentParts:

Bestimmte content_type benötigen Aggregation mehrerer Parts:
- table: Mehrere Parts → eine Tabelle (z.B. 20 Belege → Excel-Liste)
- bullet_list: Mehrere Parts → eine Liste
- paragraph: Kann auch aggregiert werden (z.B. Vergleich mehrerer Dokumente)
Wenn Aggregation nötig: Alle Parts zusammen an AI übergeben (nicht einzeln)
Verwendet callAi statt callAiPlanning für ContentParts-Unterstützung

Process:

async def _fillChapterSections(
    self,
    chapterStructure: Dict[str, Any],
    contentParts: List[ContentPart],
    userPrompt: str,
    parentOperationId: str
) -> Dict[str, Any]:
    for doc in chapterStructure.get("documents", []):
        for chapter in doc.get("chapters", []):
            for section in chapter.get("sections", []):
                elements = []
                useAiCall = section.get("useAiCall", False)
                contentType = section.get("content_type", "paragraph")
                contentPartIds = section.get("contentPartIds", [])
                
                # Prüfe ob Aggregation nötig ist
                needsAggregation = self._needsAggregation(
                    contentType=contentType,
                    contentPartCount=len(contentPartIds)
                )
                
                if needsAggregation and useAiCall:
                    # Aggregation: Alle Parts zusammen verarbeiten
                    sectionParts = [
                        self._findContentPartById(pid, contentParts)
                        for pid in contentPartIds
                    ]
                    sectionParts = [p for p in sectionParts if p is not None]
                    
                    if sectionParts:
                        sectionId = section.get("id", "unknown")
                        sectionPrompt = self._buildSectionContentPrompt(
                            section=section,
                            contentParts=sectionParts,  # ALLE PARTS!
                            generationHint=section.get("generationHint"),
                            userPrompt=userPrompt
                        )
                        
                        # Debug: Log Prompt
                        self.services.utils.writeDebugFile(
                            sectionPrompt,
                            f"section_content_{sectionId}_prompt"
                        )
                        
                        # Verwende callAi für ContentParts-Unterstützung
                        request = AiCallRequest(
                            prompt=sectionPrompt,
                            contentParts=sectionParts,  # ALLE PARTS!
                            options=AiCallOptions(
                                operationType=OperationTypeEnum.DATA_ANALYSE,
                                priority=PriorityEnum.BALANCED,
                                processingMode=ProcessingModeEnum.DETAILED
                            )
                        )
                        aiResponse = await self.services.ai.callAi(request)
                        
                        # Debug: Log Response
                        self.services.utils.writeDebugFile(
                            aiResponse.content,
                            f"section_content_{sectionId}_response"
                        )
                        
                        elements.extend(parseElements(aiResponse.content))
                
                else:
                    # Einzelverarbeitung: Jeder Part einzeln
                    for partId in contentPartIds:
                        part = self._findContentPartById(partId, contentParts)
                        if not part:
                            continue
                        
                        contentFormat = part.metadata.get("contentFormat")
                        
                        if contentFormat == "extracted":
                            if useAiCall:
                                # AI-Call mit einzelnen ContentPart
                                sectionId = section.get("id", "unknown")
                                sectionPrompt = self._buildSectionContentPrompt(
                                    section=section,
                                    contentParts=[part],  # EIN PART
                                    generationHint=section.get("generationHint"),
                                    userPrompt=userPrompt
                                )
                                
                                # Debug: Log Prompt
                                self.services.utils.writeDebugFile(
                                    sectionPrompt,
                                    f"section_content_{sectionId}_prompt"
                                )
                                
                                request = AiCallRequest(
                                    prompt=sectionPrompt,
                                    contentParts=[part],
                                    options=AiCallOptions(...)
                                )
                                aiResponse = await self.services.ai.callAi(request)
                                
                                # Debug: Log Response
                                self.services.utils.writeDebugFile(
                                    aiResponse.content,
                                    f"section_content_{sectionId}_response"
                                )
                                
                                elements.extend(parseElements(aiResponse.content))
                            else:
                                # Content direkt einfügen
                                elements.append({
                                    "type": "paragraph",
                                    "content": part.data or ""
                                })
                        
                        elif contentFormat == "reference":
                            elements.append({
                                "type": "reference",
                                "documentReference": part.metadata.get("documentReference")
                            })
                        
                        elif contentFormat == "object":
                            elements.append({
                                "type": "image",
                                "base64Data": part.data
                            })
                
                section["elements"] = elements
    
    return chapterStructure

def _needsAggregation(
    self,
    contentType: str,
    contentPartCount: int
) -> bool:
    """
    Bestimmt ob mehrere ContentParts aggregiert werden müssen.
    
    Aggregation nötig wenn:
    - content_type erfordert Aggregation (table, bullet_list)
    - UND mehrere ContentParts vorhanden sind (> 1)
    
    Args:
        contentType: Section content_type
        contentPartCount: Anzahl der ContentParts in dieser Section
        
    Returns:
        True wenn Aggregation nötig, False sonst
    """
    aggregationTypes = ["table", "bullet_list"]
    
    if contentType in aggregationTypes and contentPartCount > 1:
        return True
    
    # Optional: Auch für paragraph wenn mehrere Parts vorhanden
    # (z.B. Vergleich mehrerer Dokumente)
    if contentType == "paragraph" and contentPartCount > 1:
        # Prüfe generationHint für Hinweise auf Aggregation
        # (z.B. "Vergleiche", "Zusammenfassung", "Liste")
        return False  # Standard: Keine Aggregation für paragraph
    
    return False

Prompt-Format (für AI-Call):

Einzelverarbeitung (ein ContentPart):

TASK: Generate Section Content

SECTION METADATA:
- Section ID: {sectionId}
- Content Type: {contentType}
- Generation Hint: {generationHint}

CONTEXT:
- User Request: {userPrompt}
- What to generate: {generationHint}

CONTENT PART:
- ContentPart ID: {partId}
- Format: extracted
- ContentPart wird als Parameter übergeben (nicht im Prompt!)
- Kann sehr groß sein (z.B. 200MB) → Chunking automatisch

STANDARD JSON SCHEMA FOR ELEMENTS:
[... Standard JSON Schema ...]

Return JSON:
{
  "elements": [
    {"type": "paragraph", "content": "..."},
    {"type": "table", "headers": [...], "rows": [...]}
  ]
}

Aggregation (mehrere ContentParts):

TASK: Generate Section Content

SECTION METADATA:
- Section ID: {sectionId}
- Content Type: {contentType}  # z.B. "table"
- Generation Hint: {generationHint}  # z.B. "Erstelle Excel-Liste aller Spesenbelege"

CONTEXT:
- User Request: {userPrompt}
- What to generate: {generationHint}

CONTENT PARTS (Aggregation):
- Anzahl: {contentPartCount} ContentParts
- Alle ContentParts werden als Parameter übergeben (nicht im Prompt!)
- Jeder Part kann sehr groß sein → Chunking automatisch
- WICHTIG: Aggregiere ALLE Parts zu einem Element (z.B. eine Tabelle)

ContentPart IDs:
{contentPartIds}  # Liste aller IDs

STANDARD JSON SCHEMA FOR ELEMENTS:
[... Standard JSON Schema ...]

Return JSON:
{
  "elements": [
    {
      "type": "table",
      "headers": ["Spalte1", "Spalte2", ...],
      "rows": [
        ["Daten aus Part 1", ...],
        ["Daten aus Part 2", ...],
        ...
      ]
    }
  ]
}

Hauptfunktion:

async def _generateChapterContent(
    self,
    chapterStructure: Dict[str, Any],
    contentParts: List[ContentPart],
    userPrompt: str,
    parentOperationId: str
) -> Dict[str, Any]:
    # Phase 5D.1: Sections-Struktur generieren
    structureWithSections = await self._generateChapterStructure(
        chapterStructure, contentParts, userPrompt, parentOperationId
    )
    
    # Phase 5D.2: Sections mit ContentParts füllen
    filledStructure = await self._fillChapterSections(
        structureWithSections, contentParts, userPrompt, parentOperationId
    )
    
    return filledStructure

Standard JSON Schema

Supported Section Types

supportedSectionTypes = [
    "table",
    "bullet_list",
    "heading",
    "paragraph",
    "code_block",
    "image"
]

Section Element Types

Standard Elements:
- heading, paragraph, table, bullet_list, code_block, image
Special Elements:
- extracted_text: Extrahierter Text mit Source
- reference: Dokument-Referenz

Flattening: Chapters zu Sections

Wichtig: Finale Output-Struktur hat keine Chapters - nur Sections.

def flattenChapterStructureToSections(
    chapterStructure: Dict[str, Any]
) -> Dict[str, Any]:
    result = {
        "metadata": chapterStructure.get("metadata", {}),
        "documents": []
    }
    
    for doc in chapterStructure.get("documents", []):
        flattened_doc = {
            "id": doc.get("id"),
            "title": doc.get("title"),
            "filename": doc.get("filename"),
            "sections": []
        }
        
        for chapter in doc.get("chapters", []):
            # 1. Vordefinierte Heading-Section
            heading_section = {
                "id": f"{chapter['id']}_heading",
                "content_type": "heading",
                "elements": [{
                    "type": "heading",
                    "content": chapter.get("title"),
                    "level": chapter.get("level", 1)
                }]
            }
            flattened_doc["sections"].append(heading_section)
            
            # 2. Generierte Sections
            flattened_doc["sections"].extend(chapter.get("sections", []))
        
        result["documents"].append(flattened_doc)
    
    return result

Pydantic Models

class ContentPartInstruction(BaseModel):
    instruction: str = Field(
        description="Anweisung, wie der bereits extrahierte Content strukturiert werden soll"
    )

class Chapter(BaseModel):
    id: str
    level: int = Field(ge=1, le=6)
    title: str
    contentPartIds: List[str] = Field(default_factory=list)
    contentPartInstructions: Dict[str, ContentPartInstruction] = Field(default_factory=dict)
    generationHint: str
    sections: List[Dict[str, Any]] = Field(default_factory=list)

class ChapterStructure(BaseModel):
    metadata: Dict[str, Any]
    documents: List[Dict[str, Any]]
    
    def flattenToSections(self) -> Dict[str, Any]:
        # Flattening-Logik
        ...

Chunking für große ContentParts

Wichtig: ContentParts können sehr groß sein (z.B. 200MB). Chunking passiert automatisch.

Flow:

callAi mit ContentParts → routet zu processContentPartsWithAi
processContentPartsWithAi → processContentPartWithFallback für jeden Part
Wenn Part zu groß → chunkContentPartForAi → Chunking passiert EINMAL
Gechunkte Parts werden sequenziell verarbeitet
_callWithModel macht kein weiteres Chunking

Keine Rekursion:

Chunking passiert einmal pro ContentPart
Gechunkte Parts werden sequenziell verarbeitet (nicht rekursiv)
_callWithModel ruft nur Model auf (kein Chunking)

Implementierungsanforderungen

Phase 5C: Generiert Chapters statt Sections
Phase 5D.1: Generiert Sections-Struktur mit useAiCall Flag
Phase 5D.2: Füllt Sections basierend auf useAiCall Flag
Flattening: Konvertiert Chapters zu finaler Section-Struktur
Pydantic Models: ChapterStructure Model definieren
Standard-JSON-Schema: In Chapter-Prompts enthalten
Renderer: Bleiben unverändert (verwenden finale Section-Struktur)

Wichtige Punkte

ContentParts Integration:
- ContentParts kommen aus Phase 5B (bereits extrahiert)
- Phase 5D.1: Nur IDs im Prompt (keine Previews)
- Phase 5D.2: ContentParts als Parameter übergeben (nicht im Prompt)
useAiCall Flag:
- AI setzt Flag direkt im JSON
- Fallback: Automatisch gesetzt basierend auf content_type und instructions
- Generisch, sprachunabhängig (keine Stichwort-Abfragen)
Aggregation mehrerer ContentParts:
- Bestimmte content_types benötigen Aggregation (table, bullet_list)
- Wenn mehrere Parts vorhanden: Alle zusammen an AI übergeben
- Verwendet callAi statt callAiPlanning für ContentParts-Unterstützung
- Automatisches Chunking bei großen aggregierten Parts
Chunking:
- Automatisch bei großen ContentParts
- Funktioniert auch bei Aggregation mehrerer Parts
- Keine Rekursion möglich
- Chunks werden sequenziell verarbeitet
Mehrere Dokumente:
- Struktur unterstützt mehrere Dokumente mit eigenen Chapters
ContentPart Instructions:
- ContentParts sind bereits extrahiert
- Instructions geben Kontext für Strukturierung
- Kein "usage" Feld (Format durch contentFormat klar)
Debug-File-Logging:
- Alle AI-Calls und Responses werden in Debug-Files geloggt
- Prompts: {operationType}_{identifier}_prompt.txt
- Responses: {operationType}_{identifier}_response.txt
- Beispiele:
  - Phase 5C: chapter_structure_generation_prompt.txt / chapter_structure_generation_response.txt
  - Phase 5D.1: chapter_structure_{chapterId}_prompt.txt / chapter_structure_{chapterId}_response.txt
  - Phase 5D.2: section_content_{sectionId}_prompt.txt / section_content_{sectionId}_response.txt

Beispiel-Szenarien

Beispiel 1: Excel-Liste der Spesenbelege

User Prompt: "Erstelle eine Excel-Liste der Spesenbelege" Input: 20 PDF-Dokumente, jedes mit einem Foto eines Beleges

Phase 5B:

20 PDFs werden extrahiert → 20 ContentParts (contentFormat: "extracted")

Phase 5C:

Generiert Chapter mit allen 20 contentPartIds

Phase 5D.1:

Generiert Section mit:
- content_type: "table"
- contentPartIds: [part_1, ..., part_20]
- useAiCall: true

Phase 5D.2:

_needsAggregation("table", 20) → True
Alle 20 ContentParts werden zusammen an callAi übergeben
AI generiert eine Tabelle mit allen Belegdaten

Ergebnis: ✅ Funktioniert mit Aggregationslogik

Beispiel 2: Vergleich mehrerer Dokumente

User Prompt: "Vergleiche die drei Verträge" Input: 3 PDF-Dokumente (Verträge)

Phase 5B:

3 PDFs werden extrahiert → 3 ContentParts

Phase 5C:

Generiert Chapter mit allen 3 contentPartIds

Phase 5D.1:

Generiert Section mit:
- content_type: "table" (für Vergleichstabelle)
- contentPartIds: [part_1, part_2, part_3]
- useAiCall: true

Phase 5D.2:

_needsAggregation("table", 3) → True
Alle 3 ContentParts werden zusammen an callAi übergeben
AI generiert Vergleichstabelle

Ergebnis: ✅ Funktioniert mit Aggregationslogik

Beispiel 3: Liste von Produkten

User Prompt: "Erstelle eine Liste aller Produkte aus den Katalogen" Input: 5 PDF-Dokumente (Produktkataloge)

Phase 5B:

5 PDFs werden extrahiert → 5 ContentParts

Phase 5C:

Generiert Chapter mit allen 5 contentPartIds

Phase 5D.1:

Generiert Section mit:
- content_type: "bullet_list"
- contentPartIds: [part_1, ..., part_5]
- useAiCall: true

Phase 5D.2:

_needsAggregation("bullet_list", 5) → True
Alle 5 ContentParts werden zusammen an callAi übergeben
AI generiert eine Liste mit allen Produkten

Ergebnis: ✅ Funktioniert mit Aggregationslogik

Beispiel 4: Einzelnes Dokument

User Prompt: "Zusammenfassung des Dokuments" Input: 1 PDF-Dokument

Phase 5B:

1 PDF wird extrahiert → 1 ContentPart

Phase 5C:

Generiert Chapter mit 1 contentPartId

Phase 5D.1:

Generiert Section mit:
- content_type: "paragraph"
- contentPartIds: [part_1]
- useAiCall: true

Phase 5D.2:

_needsAggregation("paragraph", 1) → False
Einzelverarbeitung: 1 ContentPart wird an callAi übergeben
AI generiert Zusammenfassung

Ergebnis: ✅ Funktioniert (keine Aggregation nötig)

27 KiB Raw Blame History

Implementierungskonzept: Chapter-basierte Generierungs-Struktur

Übersicht

Kritische Analyse: Aggregation mehrerer ContentParts

Kernprinzipien

Architektur

Chapters als Helper-Struktur

Workflow-Phasen

Phase 5B: Content Extraction

Phase 5C: Chapter-Struktur-Generierung

Phase 5D: Chapter-Content-Generierung

Phase 5D.1: Sections-Struktur generieren

Phase 5D.2: Sections mit ContentParts füllen

Standard JSON Schema

Supported Section Types

Section Element Types

Flattening: Chapters zu Sections

Pydantic Models

Chunking für große ContentParts

Implementierungsanforderungen

Wichtige Punkte

Beispiel-Szenarien

Beispiel 1: Excel-Liste der Spesenbelege

Beispiel 2: Vergleich mehrerer Dokumente

Beispiel 3: Liste von Produkten

Beispiel 4: Einzelnes Dokument

27 KiB

Raw Blame History