ValueOn AG 8bd4e67be6 docs: complete wiki restructuring - new folder hierarchy, canonical reference pages, archive old docs

Made-with: Cursor

2026-04-05 23:28:14 +02:00

35 KiB

Raw Blame History

Expenses Workflow Definition

Übersicht

Dieses Dokument beschreibt die Implementierung eines automatisierten Workflows zum Auslesen von Spesen aus PDF-Dokumenten in SharePoint und deren Speicherung als TrusteePosition-Einträge in der Datenbank.

1. Neue Action: `getExpensesFromPdf`

1.1 Datei-Struktur

gateway/modules/workflows/methods/methodSharepoint/
├── actions/
│   └── getExpensesFromPdf.py  # NEUE DATEI
├── methodSharepoint.py         # Action-Registration hinzufügen

1.2 Action-Definition in `methodSharepoint.py`

from .actions.getExpensesFromPdf import getExpensesFromPdf

# In __init__ der MethodSharepoint-Klasse, innerhalb _actions Dict:
"getExpensesFromPdf": WorkflowActionDefinition(
    actionId="sharepoint.getExpensesFromPdf",
    description="Extract expenses from PDF documents in SharePoint folder and save to TrusteePosition",
    dynamicMode=False,  # WICHTIG: Nicht für dynamic workflow nutzbar
    parameters={
        "connectionReference": WorkflowActionParameter(
            name="connectionReference",
            type="str",
            frontendType=FrontendType.USER_CONNECTION,
            required=True,
            description="Microsoft connection label for SharePoint access"
        ),
        "sharepointFolder": WorkflowActionParameter(
            name="sharepointFolder",
            type="str",
            frontendType=FrontendType.TEXT,
            required=True,
            description="SharePoint folder path containing PDF expense documents (e.g., /sites/MySite/Documents/Expenses)"
        ),
        "featureInstanceId": WorkflowActionParameter(
            name="featureInstanceId",
            type="str",
            frontendType=FrontendType.TEXT,
            required=True,
            description="Feature Instance ID for the Trustee feature where positions will be stored"
        ),
        "prompt": WorkflowActionParameter(
            name="prompt",
            type="str",
            frontendType=FrontendType.TEXTAREA,
            required=True,
            description="AI prompt for extracting expense data from PDF content"
        )
    },
    execute=getExpensesFromPdf.__get__(self, self.__class__)
)

1.3 Action-Logik (`getExpensesFromPdf.py`)

# Copyright (c) 2025 Patrick Motsch
# All rights reserved.

import logging
import time
import json
import csv
import io
import base64
from datetime import datetime, UTC
from typing import Dict, Any, List, Optional
from modules.datamodels.datamodelChat import ActionResult, ActionDocument

logger = logging.getLogger(__name__)

# Erlaubte Tags für TrusteePosition
ALLOWED_TAGS = ["customer", "meeting", "license", "subscription", "fuel", "food", "material"]

async def getExpensesFromPdf(self, parameters: Dict[str, Any]) -> ActionResult:
    """
    Extract expenses from PDF documents in SharePoint and save to TrusteePosition.
    
    Process:
    1. Read PDF files from SharePoint folder (max 50 files per execution)
    2. FOR EACH PDF document:
       a. AI call to extract expense data in CSV format
       b. If 0 records: skip document with warning, move to "error" folder
       c. Validate/calculate VAT, complete valuta/transactionDateTime
       d. Save all records to TrusteePosition
       e. Move document to "processed" subfolder with timestamp prefix
    
    Parameters:
        - connectionReference (str): Microsoft connection label
        - sharepointFolder (str): SharePoint folder path
        - featureInstanceId (str): Feature instance ID for TrusteePosition
        - prompt (str): AI prompt for content extraction
    
    Returns:
        ActionResult with success status and processing summary
    """
    operationId = None
    processedDocuments = []
    skippedDocuments = []
    errorDocuments = []
    totalPositions = 0
    
    try:
        # Initialize progress tracking
        workflowId = self.services.workflow.id if self.services.workflow else f"no-workflow-{int(time.time())}"
        operationId = f"sharepoint_expenses_{workflowId}_{int(time.time())}"
        
        parentOperationId = parameters.get('parentOperationId')
        self.services.chat.progressLogStart(
            operationId,
            "Extract Expenses from PDF",
            "SharePoint PDF Processing",
            "Initializing expense extraction",
            parentOperationId=parentOperationId
        )
        
        # Extract parameters
        connectionReference = parameters.get("connectionReference")
        sharepointFolder = parameters.get("sharepointFolder")
        featureInstanceId = parameters.get("featureInstanceId")
        prompt = parameters.get("prompt")
        
        # Validate required parameters
        if not connectionReference:
            return ActionResult.isFailure(error="connectionReference is required")
        if not sharepointFolder:
            return ActionResult.isFailure(error="sharepointFolder is required")
        if not featureInstanceId:
            return ActionResult.isFailure(error="featureInstanceId is required")
        if not prompt:
            return ActionResult.isFailure(error="prompt is required")
        
        # Step 1: Get Microsoft connection
        self.services.chat.progressLogUpdate(operationId, 0.1, "Getting Microsoft connection")
        connection = self.connection.getMicrosoftConnection(connectionReference)
        if not connection:
            return ActionResult.isFailure(error="No valid Microsoft connection found")
        
        # Step 2: Find PDF files in folder
        self.services.chat.progressLogUpdate(operationId, 0.15, "Finding PDF files in SharePoint folder")
        
        # Use findDocumentPath to locate PDFs
        findParams = {
            "connectionReference": connectionReference,
            "searchQuery": f"{sharepointFolder}:files:.pdf",
            "maxResults": 1000
        }
        findResult = await self.findDocumentPath(findParams)
        if not findResult.success:
            return ActionResult.isFailure(error=f"Failed to find PDF files: {findResult.error}")
        
        # Parse found documents
        pdfFiles = _extractPdfFilesFromResult(findResult)
        if not pdfFiles:
            return ActionResult.isSuccess(
                documents=[ActionDocument(
                    documentName="expense_extraction_result.json",
                    documentData=json.dumps({
                        "status": "no_documents",
                        "message": "No PDF files found in the specified folder",
                        "folder": sharepointFolder
                    }, indent=2),
                    mimeType="application/json",
                    validationMetadata={"actionType": "sharepoint.getExpensesFromPdf"}
                )]
            )
        
        # Limit to max 50 PDFs per execution
        MAX_FILES_PER_EXECUTION = 50
        if len(pdfFiles) > MAX_FILES_PER_EXECUTION:
            logger.warning(f"Found {len(pdfFiles)} PDFs, limiting to {MAX_FILES_PER_EXECUTION}")
            pdfFiles = pdfFiles[:MAX_FILES_PER_EXECUTION]
        
        # Step 3: Process each PDF
        totalFiles = len(pdfFiles)
        progressPerFile = 0.7 / totalFiles  # 70% for file processing
        
        for idx, pdfFile in enumerate(pdfFiles):
            currentProgress = 0.2 + (idx * progressPerFile)
            fileName = pdfFile.get("name", f"file_{idx}")
            fileId = pdfFile.get("id")
            siteId = pdfFile.get("siteId")
            
            self.services.chat.progressLogUpdate(
                operationId, 
                currentProgress, 
                f"Processing {idx + 1}/{totalFiles}: {fileName}"
            )
            
            try:
                # 3a: Download PDF content
                fileContent = await self.services.sharepoint.downloadFile(siteId, fileId)
                if not fileContent:
                    # Move to error folder on download failure
                    await _moveToErrorFolder(
                        self,
                        connectionReference,
                        siteId,
                        pdfFile.get("folderPath", ""),
                        fileName
                    )
                    errorDocuments.append({
                        "file": fileName, 
                        "error": "Failed to download",
                        "movedTo": "error/"
                    })
                    continue
                
                # 3b: AI call to extract expense data
                aiResult = await _extractExpensesWithAi(
                    self.services,
                    fileContent,
                    fileName,
                    prompt
                )
                
                if not aiResult.get("success"):
                    # Move to error folder on AI failure
                    await _moveToErrorFolder(
                        self,
                        connectionReference,
                        siteId,
                        pdfFile.get("folderPath", ""),
                        fileName
                    )
                    errorDocuments.append({
                        "file": fileName, 
                        "error": aiResult.get("error", "AI extraction failed"),
                        "movedTo": "error/"
                    })
                    continue
                
                records = aiResult.get("records", [])
                
                # 3c: Check for empty records - move to error folder
                if not records:
                    logger.warning(f"Document {fileName}: No records extracted, moving to error folder")
                    await _moveToErrorFolder(
                        self,
                        connectionReference,
                        siteId,
                        pdfFile.get("folderPath", ""),
                        fileName  # Keep original filename
                    )
                    skippedDocuments.append({
                        "file": fileName,
                        "reason": "No expense records extracted",
                        "movedTo": "error/"
                    })
                    continue
                
                # 3d: Validate and enrich records
                validatedRecords = _validateAndEnrichRecords(records, fileName)
                
                # 3e: Save to TrusteePosition
                savedCount = await _saveToTrusteePosition(
                    self.services,
                    validatedRecords,
                    featureInstanceId
                )
                totalPositions += savedCount
                
                # 3f: Move document to "processed" subfolder
                timestamp = datetime.now(UTC).strftime("%Y%m%d-%H%M%S")
                newFileName = f"{timestamp}_{fileName}"
                
                moveSuccess = await _moveToProcessedFolder(
                    self,
                    connectionReference,
                    siteId,
                    pdfFile.get("folderPath", ""),
                    fileName,
                    newFileName
                )
                
                processedDocuments.append({
                    "file": fileName,
                    "newLocation": f"processed/{newFileName}" if moveSuccess else "move_failed",
                    "recordsExtracted": len(validatedRecords),
                    "recordsSaved": savedCount
                })
                
            except Exception as e:
                logger.error(f"Error processing {fileName}: {str(e)}")
                # Move to error folder on exception
                await _moveToErrorFolder(
                    self,
                    connectionReference,
                    siteId,
                    pdfFile.get("folderPath", ""),
                    fileName
                )
                errorDocuments.append({
                    "file": fileName, 
                    "error": str(e),
                    "movedTo": "error/"
                })
        
        # Step 4: Create result summary
        self.services.chat.progressLogUpdate(operationId, 0.95, "Creating result summary")
        
        # Calculate remaining files (if limited by MAX_FILES_PER_EXECUTION)
        originalFileCount = len(_extractPdfFilesFromResult(findResult)) if findResult else 0
        remainingFiles = max(0, originalFileCount - MAX_FILES_PER_EXECUTION)
        
        resultSummary = {
            "status": "completed",
            "folder": sharepointFolder,
            "featureInstanceId": featureInstanceId,
            "summary": {
                "totalFilesFound": originalFileCount,
                "filesProcessedThisRun": totalFiles,
                "remainingFiles": remainingFiles,
                "successfulDocuments": len(processedDocuments),
                "skippedDocuments": len(skippedDocuments),
                "errorDocuments": len(errorDocuments),
                "totalPositionsSaved": totalPositions
            },
            "processedDocuments": processedDocuments,
            "skippedDocuments": skippedDocuments,
            "errorDocuments": errorDocuments,
            "note": f"{remainingFiles} files remaining for next execution" if remainingFiles > 0 else None
        }
        
        self.services.chat.progressLogFinish(operationId, True)
        
        return ActionResult.isSuccess(
            documents=[ActionDocument(
                documentName="expense_extraction_result.json",
                documentData=json.dumps(resultSummary, indent=2),
                mimeType="application/json",
                validationMetadata={
                    "actionType": "sharepoint.getExpensesFromPdf",
                    "sharepointFolder": sharepointFolder,
                    "featureInstanceId": featureInstanceId,
                    "totalPositions": totalPositions
                }
            )]
        )
        
    except Exception as e:
        logger.error(f"Error in getExpensesFromPdf: {str(e)}")
        if operationId:
            self.services.chat.progressLogFinish(operationId, False)
        return ActionResult.isFailure(error=str(e))


def _extractPdfFilesFromResult(findResult: ActionResult) -> List[Dict[str, Any]]:
    """Extract PDF file information from findDocumentPath result."""
    pdfFiles = []
    # Implementation: Parse ActionDocument data to extract file IDs, names, paths
    # ...
    return pdfFiles


async def _extractExpensesWithAi(
    services,
    fileContent: bytes,
    fileName: str,
    prompt: str
) -> Dict[str, Any]:
    """
    Call AI service to extract expense data from PDF content.
    AI service handles retries internally - no retry logic needed here.
    
    Returns dict with:
        - success: bool
        - records: List[Dict] - extracted records in TrusteePosition format
        - error: str (if success=False)
    """
    try:
        # Convert PDF to text/base64 for AI
        base64Content = base64.b64encode(fileContent).decode('utf-8')
        
        # Call AI service with prompt (AI service handles PDF extraction internally)
        aiResponse = await services.ai.processDocument(
            documentContent=base64Content,
            documentName=fileName,
            mimeType="application/pdf",
            prompt=prompt,
            outputFormat="csv"
        )
        
        if not aiResponse or not aiResponse.get("success"):
            return {"success": False, "error": aiResponse.get("error", "AI call failed")}
        
        # Parse CSV response to records
        csvContent = aiResponse.get("content", "")
        records = _parseCsvToRecords(csvContent)
        
        return {"success": True, "records": records}
        
    except Exception as e:
        return {"success": False, "error": str(e)}


async def _handleRateLimitError(waitSeconds: int = 60):
    """Handle SharePoint rate limit by waiting."""
    import asyncio
    logger.warning(f"Rate limit hit, waiting {waitSeconds} seconds before continuing")
    await asyncio.sleep(waitSeconds)


def _parseCsvToRecords(csvContent: str) -> List[Dict[str, Any]]:
    """Parse CSV content to list of expense records."""
    records = []
    try:
        reader = csv.DictReader(io.StringIO(csvContent))
        for row in reader:
            records.append(row)
    except Exception as e:
        logger.error(f"Error parsing CSV: {str(e)}")
    return records


def _validateAndEnrichRecords(
    records: List[Dict[str, Any]],
    sourceFileName: str
) -> List[Dict[str, Any]]:
    """
    Validate and enrich expense records:
    1. Calculate/correct VAT amount
    2. Complete valuta/transactionDateTime if one is missing
    3. Validate tags
    """
    enrichedRecords = []
    
    for record in records:
        enriched = record.copy()
        
        # VAT calculation/validation
        vatPercentage = _parseFloat(record.get("vatPercentage", 0))
        vatAmount = _parseFloat(record.get("vatAmount", 0))
        bookingAmount = _parseFloat(record.get("bookingAmount", 0))
        
        if vatPercentage > 0 and bookingAmount > 0:
            # Calculate expected VAT amount
            expectedVat = bookingAmount * vatPercentage / (100 + vatPercentage)
            
            # If vatAmount is missing or significantly different, recalculate
            if vatAmount == 0 or abs(vatAmount - expectedVat) > 0.01:
                enriched["vatAmount"] = round(expectedVat, 2)
                logger.info(f"VAT amount corrected: {vatAmount} -> {enriched['vatAmount']}")
        
        # Valuta / transactionDateTime completion
        valuta = record.get("valuta")
        transactionDateTime = record.get("transactionDateTime")
        
        if valuta and not transactionDateTime:
            # Convert valuta date to timestamp
            try:
                dt = datetime.strptime(valuta, "%Y-%m-%d")
                enriched["transactionDateTime"] = dt.replace(hour=12).timestamp()
            except:
                pass
        elif transactionDateTime and not valuta:
            # Convert timestamp to valuta date
            try:
                ts = float(transactionDateTime)
                dt = datetime.fromtimestamp(ts, UTC)
                enriched["valuta"] = dt.strftime("%Y-%m-%d")
            except:
                pass
        
        # Validate tags
        tags = record.get("tags", "")
        if tags:
            tagList = [t.strip().lower() for t in tags.split(",")]
            validTags = [t for t in tagList if t in ALLOWED_TAGS]
            enriched["tags"] = ",".join(validTags)
        
        # Store source file info in description
        existingDesc = record.get("desc", "")
        if sourceFileName and sourceFileName not in existingDesc:
            enriched["desc"] = f"[Source: {sourceFileName}]\n{existingDesc}"
        
        enrichedRecords.append(enriched)
    
    return enrichedRecords


def _parseFloat(value) -> float:
    """Safely parse float value."""
    try:
        return float(value) if value else 0.0
    except (ValueError, TypeError):
        return 0.0


async def _saveToTrusteePosition(
    services,
    records: List[Dict[str, Any]],
    featureInstanceId: str
) -> int:
    """Save validated records to TrusteePosition table."""
    savedCount = 0
    
    # Get Trustee interface
    from modules.features.trustee.interfaceFeatureTrustee import getInterface
    trusteeInterface = getInterface(
        services.user,
        mandateId=services.mandateId,
        featureInstanceId=featureInstanceId
    )
    
    for record in records:
        try:
            position = {
                "valuta": record.get("valuta"),
                "transactionDateTime": record.get("transactionDateTime"),
                "company": record.get("company", ""),
                "desc": record.get("desc", ""),
                "tags": record.get("tags", ""),
                "bookingCurrency": record.get("bookingCurrency", "CHF"),
                "bookingAmount": _parseFloat(record.get("bookingAmount", 0)),
                "originalCurrency": record.get("originalCurrency", "CHF"),
                "originalAmount": _parseFloat(record.get("originalAmount", 0)),
                "vatPercentage": _parseFloat(record.get("vatPercentage", 0)),
                "vatAmount": _parseFloat(record.get("vatAmount", 0)),
                "featureInstanceId": featureInstanceId,
                "mandateId": services.mandateId
            }
            
            result = trusteeInterface.createPosition(position)
            if result:
                savedCount += 1
                
        except Exception as e:
            logger.error(f"Failed to save position: {str(e)}")
    
    return savedCount


async def _moveToProcessedFolder(
    self,
    connectionReference: str,
    siteId: str,
    sourceFolderPath: str,
    sourceFileName: str,
    destFileName: str
) -> bool:
    """Move processed PDF to 'processed' subfolder."""
    try:
        processedFolder = f"{sourceFolderPath}/processed"
        
        # Ensure 'processed' folder exists (create if not)
        await _ensureFolderExists(self, connectionReference, siteId, processedFolder)
        
        # Copy file to new location
        copyResult = await self.copyFile({
            "connectionReference": connectionReference,
            "siteId": siteId,
            "sourceFolder": sourceFolderPath,
            "sourceFile": sourceFileName,
            "destFolder": processedFolder,
            "destFile": destFileName
        })
        
        if copyResult.success:
            # Delete original file after successful copy
            await _deleteFile(self, connectionReference, siteId, sourceFolderPath, sourceFileName)
            return True
        
        return False
        
    except Exception as e:
        logger.error(f"Failed to move file to processed: {str(e)}")
        return False


async def _moveToErrorFolder(
    self,
    connectionReference: str,
    siteId: str,
    sourceFolderPath: str,
    sourceFileName: str  # Keep original filename
) -> bool:
    """Move failed PDF to 'error' subfolder (filename unchanged)."""
    try:
        errorFolder = f"{sourceFolderPath}/error"
        
        # Ensure 'error' folder exists (create if not)
        await _ensureFolderExists(self, connectionReference, siteId, errorFolder)
        
        # Copy file to error folder (keep original name)
        copyResult = await self.copyFile({
            "connectionReference": connectionReference,
            "siteId": siteId,
            "sourceFolder": sourceFolderPath,
            "sourceFile": sourceFileName,
            "destFolder": errorFolder,
            "destFile": sourceFileName  # Same filename
        })
        
        if copyResult.success:
            # Delete original file after successful copy
            await _deleteFile(self, connectionReference, siteId, sourceFolderPath, sourceFileName)
            return True
        
        return False
        
    except Exception as e:
        logger.error(f"Failed to move file to error folder: {str(e)}")
        return False


async def _ensureFolderExists(
    self,
    connectionReference: str,
    siteId: str,
    folderPath: str
) -> bool:
    """Create folder if it doesn't exist."""
    try:
        # Use SharePoint API to create folder
        # Graph API: POST /sites/{siteId}/drive/root:/{folderPath}
        # with body: {"name": folderName, "folder": {}, "@microsoft.graph.conflictBehavior": "fail"}
        # ... implementation ...
        return True
    except Exception as e:
        logger.error(f"Failed to ensure folder exists: {str(e)}")
        return False


async def _deleteFile(
    self,
    connectionReference: str,
    siteId: str,
    folderPath: str,
    fileName: str
) -> bool:
    """Delete file from SharePoint."""
    try:
        # Use SharePoint API to delete file
        # Graph API: DELETE /sites/{siteId}/drive/root:/{folderPath}/{fileName}
        # ... implementation ...
        return True
    except Exception as e:
        logger.error(f"Failed to delete file: {str(e)}")
        return False

2. Automation Template: `getExpenses`

2.1 Template-Definition (hinzufügen in `subAutomationTemplates.py`)

{
    "template": {
        "overview": "Expenses PDF Extraction",
        "tasks": [
            {
                "id": "Task01",
                "title": "Extract Expenses from SharePoint PDFs",
                "description": "Reads PDF expense documents from SharePoint folder and saves extracted data to TrusteePosition",
                "objective": "Extract expense data from PDF documents and store in Trustee database",
                "actionList": [
                    {
                        "execMethod": "sharepoint",
                        "execAction": "getExpensesFromPdf",
                        "execParameters": {
                            "connectionReference": "{{KEY:connectionName}}",
                            "sharepointFolder": "{{KEY:sharepointFolder}}",
                            "featureInstanceId": "{{KEY:featureInstanceId}}",
                            "prompt": "{{KEY:extractionPrompt}}"
                        },
                        "execResultLabel": "expense_extraction_result"
                    }
                ]
            }
        ]
    },
    "parameters": {
        "connectionName": "",
        "sharepointFolder": "",
        "featureInstanceId": "",
        "extractionPrompt": """Du bist ein Spezialist für die Extraktion von Spesendaten aus PDF-Dokumenten.

AUFGABE:
Extrahiere alle Speseneinträge aus dem bereitgestellten PDF-Dokument und gib sie im CSV-Format zurück.

WICHTIGE REGELN:
1. Pro MwSt-Prozentsatz einen separaten Datensatz erstellen
2. Alle Datensätze zusammen müssen den Gesamtbetrag des Dokuments ergeben
3. Der gesamte extrahierte Text des Dokuments muss im Feld "desc" erfasst werden
4. Feld "company" enthält den Lieferanten/Verkäufer der Buchung
5. Tags müssen aus dieser Liste gewählt werden: customer, meeting, license, subscription, fuel, food, material
   - Mehrere zutreffende Tags mit Komma trennen

CSV-SPALTEN (in dieser Reihenfolge):
valuta,transactionDateTime,company,desc,tags,bookingCurrency,bookingAmount,originalCurrency,originalAmount,vatPercentage,vatAmount

DATENFORMAT:
- valuta: YYYY-MM-DD (Valutadatum)
- transactionDateTime: Unix-Timestamp in Sekunden (Transaktionszeitpunkt)
- company: Lieferant/Verkäufer Name
- desc: Vollständiger extrahierter Text des Dokuments
- tags: Komma-getrennte Tags aus der erlaubten Liste
- bookingCurrency: Währungscode (CHF, EUR, USD, GBP)
- bookingAmount: Buchungsbetrag als Dezimalzahl
- originalCurrency: Original-Währungscode
- originalAmount: Original-Betrag als Dezimalzahl
- vatPercentage: MwSt-Prozentsatz (z.B. 8.1 für 8.1%)
- vatAmount: MwSt-Betrag als Dezimalzahl

BEISPIEL OUTPUT:
```csv
valuta,transactionDateTime,company,desc,tags,bookingCurrency,bookingAmount,originalCurrency,originalAmount,vatPercentage,vatAmount
2026-01-15,1736953200,Migros AG,"Einkauf Migros Zürich...",food,CHF,45.50,CHF,45.50,2.6,1.15
2026-01-15,1736953200,Migros AG,"Einkauf Migros Zürich...",material,CHF,12.30,CHF,12.30,8.1,0.92

HINWEISE:

Wenn nur ein MwSt-Satz vorhanden ist, einen Datensatz erstellen
Wenn mehrere MwSt-Sätze vorhanden sind (z.B. Lebensmittel 2.6% und Non-Food 8.1%), separate Datensätze erstellen
Bei fehlenden Informationen: leeres Feld oder Standardwert
Keine Anführungszeichen um numerische Werte""" } }


### 2.2 Placeholder-Beschreibung

| Placeholder | Beschreibung | Beispielwert |
|------------|--------------|--------------|
| `connectionName` | User Connection Reference für SharePoint | `connection:msft:user@company.ch` |
| `sharepointFolder` | SharePoint-Ordnerpfad mit PDFs | `/sites/MySite/Documents/Expenses` |
| `featureInstanceId` | Feature Instance ID des Trustee | `fi_abc123` |
| `extractionPrompt` | AI-Prompt für Extraktion | (siehe oben) |

---

## 3. Frontend: Neue Seite im Trustee Feature

### 3.1 Komponenten-Struktur

frontend_nyla/src/features/trustee/ ├── pages/ │ └── TrusteeExpenseImport.tsx # NEUE SEITE ├── components/ │ └── SharepointFolderSelect.tsx # Wiederverwendbare Komponente


### 3.2 Seiten-Anforderungen

1. **Microsoft Connection Button**
   - Icon: Microsoft-Logo (wie bei User Connections Seite)
   - Klick öffnet OAuth-Popup für Microsoft-Anmeldung
   - Nutzt `useConnections.createMicrosoftConnectionAndAuth()`
   - Status-Anzeige: verbunden/nicht verbunden

2. **SharePoint Folder Dropdown**
   - Dropdown zur Auswahl eines SharePoint-Ordners
   - Lädt Ordner-Liste über `/api/sharepoint/folders` Endpoint
   - Zeigt Site-Name und Ordner-Pfad
   - Referenz: Neutralization Feature hat ähnliches Dropdown

3. **Aktivieren-Button**
   - Erstellt `AutomationDefinition` mit:
     - Template: "getExpenses"
     - Placeholders: ausgefüllte Werte
     - Schedule: täglich (z.B. `0 22 * * *`)
     - Active: true
   - Speichert über `/api/automation/definitions` Endpoint

### 3.3 Beispiel-Implementation

```tsx
// TrusteeExpenseImport.tsx
import React, { useState, useEffect } from 'react';
import { useConnections } from '@/hooks/useConnections';
import { useFeatureInstance } from '@/hooks/useFeatureInstance';
import { Button } from '@/components/ui/button';
import { Select } from '@/components/ui/select';
import { MicrosoftIcon } from '@/components/icons';
import api from '@/api';

export function TrusteeExpenseImport() {
  const { connections, createMicrosoftConnectionAndAuth } = useConnections();
  const { featureInstanceId } = useFeatureInstance();
  
  const [msftConnection, setMsftConnection] = useState<Connection | null>(null);
  const [folders, setFolders] = useState<SharepointFolder[]>([]);
  const [selectedFolder, setSelectedFolder] = useState<string>('');
  const [isActivating, setIsActivating] = useState(false);
  
  // Find active Microsoft connection
  useEffect(() => {
    const conn = connections.find(c => 
      c.type === 'msft' && c.status === 'active'
    );
    setMsftConnection(conn || null);
  }, [connections]);
  
  // Load SharePoint folders when connected
  useEffect(() => {
    if (msftConnection) {
      loadSharepointFolders();
    }
  }, [msftConnection]);
  
  const loadSharepointFolders = async () => {
    try {
      const response = await api.get('/api/sharepoint/folders', {
        params: { connectionId: msftConnection?.id }
      });
      setFolders(response.data.folders || []);
    } catch (error) {
      console.error('Failed to load folders:', error);
    }
  };
  
  const handleConnect = async () => {
    try {
      await createMicrosoftConnectionAndAuth();
    } catch (error) {
      console.error('Connection failed:', error);
    }
  };
  
  const handleActivate = async () => {
    if (!selectedFolder || !msftConnection || !featureInstanceId) return;
    
    setIsActivating(true);
    try {
      await api.post('/api/automation/definitions', {
        label: 'Expense Import',
        schedule: '0 22 * * *',  // Daily at 22:00
        templateName: 'getExpenses',
        placeholders: {
          connectionName: `connection:msft:${msftConnection.accountName}`,
          sharepointFolder: selectedFolder,
          featureInstanceId: featureInstanceId,
          extractionPrompt: DEFAULT_EXTRACTION_PROMPT
        },
        active: true,
        featureInstanceId: featureInstanceId
      });
      
      // Show success message
    } catch (error) {
      console.error('Activation failed:', error);
    } finally {
      setIsActivating(false);
    }
  };
  
  return (
    <div className="p-6 space-y-6">
      <h1 className="text-2xl font-bold">Expense Import Setup</h1>
      
      {/* Microsoft Connection */}
      <div className="space-y-2">
        <label className="text-sm font-medium">Microsoft Connection</label>
        {msftConnection ? (
          <div className="flex items-center gap-2">
            <MicrosoftIcon className="w-5 h-5" />
            <span className="text-green-600">
              Connected as {msftConnection.accountName}
            </span>
          </div>
        ) : (
          <Button onClick={handleConnect}>
            <MicrosoftIcon className="w-4 h-4 mr-2" />
            Connect Microsoft Account
          </Button>
        )}
      </div>
      
      {/* SharePoint Folder Selection */}
      {msftConnection && (
        <div className="space-y-2">
          <label className="text-sm font-medium">SharePoint Expense Folder</label>
          <Select
            value={selectedFolder}
            onValueChange={setSelectedFolder}
            placeholder="Select a folder..."
          >
            {folders.map(folder => (
              <Select.Option key={folder.path} value={folder.path}>
                {folder.siteName} - {folder.name}
              </Select.Option>
            ))}
          </Select>
        </div>
      )}
      
      {/* Activate Button */}
      {selectedFolder && (
        <Button
          onClick={handleActivate}
          disabled={isActivating}
          className="w-full"
        >
          {isActivating ? 'Activating...' : 'Activate Daily Import'}
        </Button>
      )}
    </div>
  );
}

4. Backend: API Endpoints

4.1 SharePoint Folder List Endpoint

Neuer Endpoint in routeSharepoint.py:

@router.get("/api/sharepoint/folders")
async def listSharepointFolders(
    connectionId: str = Query(..., description="Connection ID"),
    request: Request = None
):
    """List available SharePoint folders for the user's connection."""
    # Implementation: Use Graph API to list sites and root folders
    ...

4.2 Automation Definition Endpoint

Erweiterung von routeFeatureAutomation.py für Template-basierte Erstellung.

5. Datenbank-Änderungen

Keine Schema-Änderungen erforderlich. TrusteePosition Tabelle wird verwendet wie definiert.

6. Design-Entscheidungen

6.1 Geklärte Punkte

Thema	Entscheidung
PDF-Parsing	AI-Service verarbeitet PDFs direkt (inkl. Bilder, Scans etc.) - keine Vorverarbeitung nötig
Folder-Erstellung	"processed" und "error" Subfolders werden automatisch erstellt wenn nicht vorhanden
Fehlerbehandlung	Fehlerhafte PDFs werden in "error" Subfolder verschoben, Dateiname bleibt unverändert
Duplikat-Erkennung	Keine - ein wiederholtes Dokument ist bewusst (Kunde lädt erneut hoch)

6.2 Risiko-Management

Risiko	Handling
AI-Kosten	Kunde bezahlt pro Aufruf - keine weitere Einschränkung nötig
SharePoint Rate-Limiting	Bei Rate-Limit-Error: warten, dann weiterfahren
Timeout	Bereits im System implementiert - funktioniert

6.3 Implementierungs-Vorgaben

Vorgabe	Wert
Max PDFs pro Ausführung	50 Dateien (Limit)
Retry-Logik	NEIN - AI-Service handhabt Retries intern
Preview-Modus	NEIN - nicht benötigt

7. Implementierungs-Reihenfolge

Phase 1: Backend Action
- getExpensesFromPdf.py erstellen
- In methodSharepoint.py registrieren
- Unit-Tests schreiben
Phase 2: Automation Template
- Template in subAutomationTemplates.py hinzufügen
- Prompt optimieren und testen
Phase 3: API Endpoints
- SharePoint Folder-List Endpoint
- Automation Definition Erweiterung
Phase 4: Frontend
- TrusteeExpenseImport.tsx Seite
- Navigation/Routing hinzufügen
- Integration testen

8. Test-Plan

Unit-Tests
- VAT-Berechnung
- Valuta/DateTime-Ergänzung
- CSV-Parsing
- Tag-Validierung
Integration-Tests
- SharePoint-Verbindung
- PDF-Download
- AI-Extraktion
- TrusteePosition-Speicherung
- Datei-Verschiebung
E2E-Tests
- Kompletter Workflow von PDF bis gespeicherter Position
- Automation-Schedule-Ausführung

35 KiB Raw Blame History