1015 lines
35 KiB
Markdown
1015 lines
35 KiB
Markdown
# Expenses Workflow Definition
|
|
|
|
## Übersicht
|
|
|
|
Dieses Dokument beschreibt die Implementierung eines automatisierten Workflows zum Auslesen von Spesen aus PDF-Dokumenten in SharePoint und deren Speicherung als `TrusteePosition`-Einträge in der Datenbank.
|
|
|
|
---
|
|
|
|
## 1. Neue Action: `getExpensesFromPdf`
|
|
|
|
### 1.1 Datei-Struktur
|
|
|
|
```
|
|
gateway/modules/workflows/methods/methodSharepoint/
|
|
├── actions/
|
|
│ └── getExpensesFromPdf.py # NEUE DATEI
|
|
├── methodSharepoint.py # Action-Registration hinzufügen
|
|
```
|
|
|
|
### 1.2 Action-Definition in `methodSharepoint.py`
|
|
|
|
```python
|
|
from .actions.getExpensesFromPdf import getExpensesFromPdf
|
|
|
|
# In __init__ der MethodSharepoint-Klasse, innerhalb _actions Dict:
|
|
"getExpensesFromPdf": WorkflowActionDefinition(
|
|
actionId="sharepoint.getExpensesFromPdf",
|
|
description="Extract expenses from PDF documents in SharePoint folder and save to TrusteePosition",
|
|
dynamicMode=False, # WICHTIG: Nicht für dynamic workflow nutzbar
|
|
parameters={
|
|
"connectionReference": WorkflowActionParameter(
|
|
name="connectionReference",
|
|
type="str",
|
|
frontendType=FrontendType.USER_CONNECTION,
|
|
required=True,
|
|
description="Microsoft connection label for SharePoint access"
|
|
),
|
|
"sharepointFolder": WorkflowActionParameter(
|
|
name="sharepointFolder",
|
|
type="str",
|
|
frontendType=FrontendType.TEXT,
|
|
required=True,
|
|
description="SharePoint folder path containing PDF expense documents (e.g., /sites/MySite/Documents/Expenses)"
|
|
),
|
|
"featureInstanceId": WorkflowActionParameter(
|
|
name="featureInstanceId",
|
|
type="str",
|
|
frontendType=FrontendType.TEXT,
|
|
required=True,
|
|
description="Feature Instance ID for the Trustee feature where positions will be stored"
|
|
),
|
|
"prompt": WorkflowActionParameter(
|
|
name="prompt",
|
|
type="str",
|
|
frontendType=FrontendType.TEXTAREA,
|
|
required=True,
|
|
description="AI prompt for extracting expense data from PDF content"
|
|
)
|
|
},
|
|
execute=getExpensesFromPdf.__get__(self, self.__class__)
|
|
)
|
|
```
|
|
|
|
### 1.3 Action-Logik (`getExpensesFromPdf.py`)
|
|
|
|
```python
|
|
# Copyright (c) 2025 Patrick Motsch
|
|
# All rights reserved.
|
|
|
|
import logging
|
|
import time
|
|
import json
|
|
import csv
|
|
import io
|
|
import base64
|
|
from datetime import datetime, UTC
|
|
from typing import Dict, Any, List, Optional
|
|
from modules.datamodels.datamodelChat import ActionResult, ActionDocument
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Erlaubte Tags für TrusteePosition
|
|
ALLOWED_TAGS = ["customer", "meeting", "license", "subscription", "fuel", "food", "material"]
|
|
|
|
async def getExpensesFromPdf(self, parameters: Dict[str, Any]) -> ActionResult:
|
|
"""
|
|
Extract expenses from PDF documents in SharePoint and save to TrusteePosition.
|
|
|
|
Process:
|
|
1. Read PDF files from SharePoint folder (max 50 files per execution)
|
|
2. FOR EACH PDF document:
|
|
a. AI call to extract expense data in CSV format
|
|
b. If 0 records: skip document with warning, move to "error" folder
|
|
c. Validate/calculate VAT, complete valuta/transactionDateTime
|
|
d. Save all records to TrusteePosition
|
|
e. Move document to "processed" subfolder with timestamp prefix
|
|
|
|
Parameters:
|
|
- connectionReference (str): Microsoft connection label
|
|
- sharepointFolder (str): SharePoint folder path
|
|
- featureInstanceId (str): Feature instance ID for TrusteePosition
|
|
- prompt (str): AI prompt for content extraction
|
|
|
|
Returns:
|
|
ActionResult with success status and processing summary
|
|
"""
|
|
operationId = None
|
|
processedDocuments = []
|
|
skippedDocuments = []
|
|
errorDocuments = []
|
|
totalPositions = 0
|
|
|
|
try:
|
|
# Initialize progress tracking
|
|
workflowId = self.services.workflow.id if self.services.workflow else f"no-workflow-{int(time.time())}"
|
|
operationId = f"sharepoint_expenses_{workflowId}_{int(time.time())}"
|
|
|
|
parentOperationId = parameters.get('parentOperationId')
|
|
self.services.chat.progressLogStart(
|
|
operationId,
|
|
"Extract Expenses from PDF",
|
|
"SharePoint PDF Processing",
|
|
"Initializing expense extraction",
|
|
parentOperationId=parentOperationId
|
|
)
|
|
|
|
# Extract parameters
|
|
connectionReference = parameters.get("connectionReference")
|
|
sharepointFolder = parameters.get("sharepointFolder")
|
|
featureInstanceId = parameters.get("featureInstanceId")
|
|
prompt = parameters.get("prompt")
|
|
|
|
# Validate required parameters
|
|
if not connectionReference:
|
|
return ActionResult.isFailure(error="connectionReference is required")
|
|
if not sharepointFolder:
|
|
return ActionResult.isFailure(error="sharepointFolder is required")
|
|
if not featureInstanceId:
|
|
return ActionResult.isFailure(error="featureInstanceId is required")
|
|
if not prompt:
|
|
return ActionResult.isFailure(error="prompt is required")
|
|
|
|
# Step 1: Get Microsoft connection
|
|
self.services.chat.progressLogUpdate(operationId, 0.1, "Getting Microsoft connection")
|
|
connection = self.connection.getMicrosoftConnection(connectionReference)
|
|
if not connection:
|
|
return ActionResult.isFailure(error="No valid Microsoft connection found")
|
|
|
|
# Step 2: Find PDF files in folder
|
|
self.services.chat.progressLogUpdate(operationId, 0.15, "Finding PDF files in SharePoint folder")
|
|
|
|
# Use findDocumentPath to locate PDFs
|
|
findParams = {
|
|
"connectionReference": connectionReference,
|
|
"searchQuery": f"{sharepointFolder}:files:.pdf",
|
|
"maxResults": 1000
|
|
}
|
|
findResult = await self.findDocumentPath(findParams)
|
|
if not findResult.success:
|
|
return ActionResult.isFailure(error=f"Failed to find PDF files: {findResult.error}")
|
|
|
|
# Parse found documents
|
|
pdfFiles = _extractPdfFilesFromResult(findResult)
|
|
if not pdfFiles:
|
|
return ActionResult.isSuccess(
|
|
documents=[ActionDocument(
|
|
documentName="expense_extraction_result.json",
|
|
documentData=json.dumps({
|
|
"status": "no_documents",
|
|
"message": "No PDF files found in the specified folder",
|
|
"folder": sharepointFolder
|
|
}, indent=2),
|
|
mimeType="application/json",
|
|
validationMetadata={"actionType": "sharepoint.getExpensesFromPdf"}
|
|
)]
|
|
)
|
|
|
|
# Limit to max 50 PDFs per execution
|
|
MAX_FILES_PER_EXECUTION = 50
|
|
if len(pdfFiles) > MAX_FILES_PER_EXECUTION:
|
|
logger.warning(f"Found {len(pdfFiles)} PDFs, limiting to {MAX_FILES_PER_EXECUTION}")
|
|
pdfFiles = pdfFiles[:MAX_FILES_PER_EXECUTION]
|
|
|
|
# Step 3: Process each PDF
|
|
totalFiles = len(pdfFiles)
|
|
progressPerFile = 0.7 / totalFiles # 70% for file processing
|
|
|
|
for idx, pdfFile in enumerate(pdfFiles):
|
|
currentProgress = 0.2 + (idx * progressPerFile)
|
|
fileName = pdfFile.get("name", f"file_{idx}")
|
|
fileId = pdfFile.get("id")
|
|
siteId = pdfFile.get("siteId")
|
|
|
|
self.services.chat.progressLogUpdate(
|
|
operationId,
|
|
currentProgress,
|
|
f"Processing {idx + 1}/{totalFiles}: {fileName}"
|
|
)
|
|
|
|
try:
|
|
# 3a: Download PDF content
|
|
fileContent = await self.services.sharepoint.downloadFile(siteId, fileId)
|
|
if not fileContent:
|
|
# Move to error folder on download failure
|
|
await _moveToErrorFolder(
|
|
self,
|
|
connectionReference,
|
|
siteId,
|
|
pdfFile.get("folderPath", ""),
|
|
fileName
|
|
)
|
|
errorDocuments.append({
|
|
"file": fileName,
|
|
"error": "Failed to download",
|
|
"movedTo": "error/"
|
|
})
|
|
continue
|
|
|
|
# 3b: AI call to extract expense data
|
|
aiResult = await _extractExpensesWithAi(
|
|
self.services,
|
|
fileContent,
|
|
fileName,
|
|
prompt
|
|
)
|
|
|
|
if not aiResult.get("success"):
|
|
# Move to error folder on AI failure
|
|
await _moveToErrorFolder(
|
|
self,
|
|
connectionReference,
|
|
siteId,
|
|
pdfFile.get("folderPath", ""),
|
|
fileName
|
|
)
|
|
errorDocuments.append({
|
|
"file": fileName,
|
|
"error": aiResult.get("error", "AI extraction failed"),
|
|
"movedTo": "error/"
|
|
})
|
|
continue
|
|
|
|
records = aiResult.get("records", [])
|
|
|
|
# 3c: Check for empty records - move to error folder
|
|
if not records:
|
|
logger.warning(f"Document {fileName}: No records extracted, moving to error folder")
|
|
await _moveToErrorFolder(
|
|
self,
|
|
connectionReference,
|
|
siteId,
|
|
pdfFile.get("folderPath", ""),
|
|
fileName # Keep original filename
|
|
)
|
|
skippedDocuments.append({
|
|
"file": fileName,
|
|
"reason": "No expense records extracted",
|
|
"movedTo": "error/"
|
|
})
|
|
continue
|
|
|
|
# 3d: Validate and enrich records
|
|
validatedRecords = _validateAndEnrichRecords(records, fileName)
|
|
|
|
# 3e: Save to TrusteePosition
|
|
savedCount = await _saveToTrusteePosition(
|
|
self.services,
|
|
validatedRecords,
|
|
featureInstanceId
|
|
)
|
|
totalPositions += savedCount
|
|
|
|
# 3f: Move document to "processed" subfolder
|
|
timestamp = datetime.now(UTC).strftime("%Y%m%d-%H%M%S")
|
|
newFileName = f"{timestamp}_{fileName}"
|
|
|
|
moveSuccess = await _moveToProcessedFolder(
|
|
self,
|
|
connectionReference,
|
|
siteId,
|
|
pdfFile.get("folderPath", ""),
|
|
fileName,
|
|
newFileName
|
|
)
|
|
|
|
processedDocuments.append({
|
|
"file": fileName,
|
|
"newLocation": f"processed/{newFileName}" if moveSuccess else "move_failed",
|
|
"recordsExtracted": len(validatedRecords),
|
|
"recordsSaved": savedCount
|
|
})
|
|
|
|
except Exception as e:
|
|
logger.error(f"Error processing {fileName}: {str(e)}")
|
|
# Move to error folder on exception
|
|
await _moveToErrorFolder(
|
|
self,
|
|
connectionReference,
|
|
siteId,
|
|
pdfFile.get("folderPath", ""),
|
|
fileName
|
|
)
|
|
errorDocuments.append({
|
|
"file": fileName,
|
|
"error": str(e),
|
|
"movedTo": "error/"
|
|
})
|
|
|
|
# Step 4: Create result summary
|
|
self.services.chat.progressLogUpdate(operationId, 0.95, "Creating result summary")
|
|
|
|
# Calculate remaining files (if limited by MAX_FILES_PER_EXECUTION)
|
|
originalFileCount = len(_extractPdfFilesFromResult(findResult)) if findResult else 0
|
|
remainingFiles = max(0, originalFileCount - MAX_FILES_PER_EXECUTION)
|
|
|
|
resultSummary = {
|
|
"status": "completed",
|
|
"folder": sharepointFolder,
|
|
"featureInstanceId": featureInstanceId,
|
|
"summary": {
|
|
"totalFilesFound": originalFileCount,
|
|
"filesProcessedThisRun": totalFiles,
|
|
"remainingFiles": remainingFiles,
|
|
"successfulDocuments": len(processedDocuments),
|
|
"skippedDocuments": len(skippedDocuments),
|
|
"errorDocuments": len(errorDocuments),
|
|
"totalPositionsSaved": totalPositions
|
|
},
|
|
"processedDocuments": processedDocuments,
|
|
"skippedDocuments": skippedDocuments,
|
|
"errorDocuments": errorDocuments,
|
|
"note": f"{remainingFiles} files remaining for next execution" if remainingFiles > 0 else None
|
|
}
|
|
|
|
self.services.chat.progressLogFinish(operationId, True)
|
|
|
|
return ActionResult.isSuccess(
|
|
documents=[ActionDocument(
|
|
documentName="expense_extraction_result.json",
|
|
documentData=json.dumps(resultSummary, indent=2),
|
|
mimeType="application/json",
|
|
validationMetadata={
|
|
"actionType": "sharepoint.getExpensesFromPdf",
|
|
"sharepointFolder": sharepointFolder,
|
|
"featureInstanceId": featureInstanceId,
|
|
"totalPositions": totalPositions
|
|
}
|
|
)]
|
|
)
|
|
|
|
except Exception as e:
|
|
logger.error(f"Error in getExpensesFromPdf: {str(e)}")
|
|
if operationId:
|
|
self.services.chat.progressLogFinish(operationId, False)
|
|
return ActionResult.isFailure(error=str(e))
|
|
|
|
|
|
def _extractPdfFilesFromResult(findResult: ActionResult) -> List[Dict[str, Any]]:
|
|
"""Extract PDF file information from findDocumentPath result."""
|
|
pdfFiles = []
|
|
# Implementation: Parse ActionDocument data to extract file IDs, names, paths
|
|
# ...
|
|
return pdfFiles
|
|
|
|
|
|
async def _extractExpensesWithAi(
|
|
services,
|
|
fileContent: bytes,
|
|
fileName: str,
|
|
prompt: str
|
|
) -> Dict[str, Any]:
|
|
"""
|
|
Call AI service to extract expense data from PDF content.
|
|
AI service handles retries internally - no retry logic needed here.
|
|
|
|
Returns dict with:
|
|
- success: bool
|
|
- records: List[Dict] - extracted records in TrusteePosition format
|
|
- error: str (if success=False)
|
|
"""
|
|
try:
|
|
# Convert PDF to text/base64 for AI
|
|
base64Content = base64.b64encode(fileContent).decode('utf-8')
|
|
|
|
# Call AI service with prompt (AI service handles PDF extraction internally)
|
|
aiResponse = await services.ai.processDocument(
|
|
documentContent=base64Content,
|
|
documentName=fileName,
|
|
mimeType="application/pdf",
|
|
prompt=prompt,
|
|
outputFormat="csv"
|
|
)
|
|
|
|
if not aiResponse or not aiResponse.get("success"):
|
|
return {"success": False, "error": aiResponse.get("error", "AI call failed")}
|
|
|
|
# Parse CSV response to records
|
|
csvContent = aiResponse.get("content", "")
|
|
records = _parseCsvToRecords(csvContent)
|
|
|
|
return {"success": True, "records": records}
|
|
|
|
except Exception as e:
|
|
return {"success": False, "error": str(e)}
|
|
|
|
|
|
async def _handleRateLimitError(waitSeconds: int = 60):
|
|
"""Handle SharePoint rate limit by waiting."""
|
|
import asyncio
|
|
logger.warning(f"Rate limit hit, waiting {waitSeconds} seconds before continuing")
|
|
await asyncio.sleep(waitSeconds)
|
|
|
|
|
|
def _parseCsvToRecords(csvContent: str) -> List[Dict[str, Any]]:
|
|
"""Parse CSV content to list of expense records."""
|
|
records = []
|
|
try:
|
|
reader = csv.DictReader(io.StringIO(csvContent))
|
|
for row in reader:
|
|
records.append(row)
|
|
except Exception as e:
|
|
logger.error(f"Error parsing CSV: {str(e)}")
|
|
return records
|
|
|
|
|
|
def _validateAndEnrichRecords(
|
|
records: List[Dict[str, Any]],
|
|
sourceFileName: str
|
|
) -> List[Dict[str, Any]]:
|
|
"""
|
|
Validate and enrich expense records:
|
|
1. Calculate/correct VAT amount
|
|
2. Complete valuta/transactionDateTime if one is missing
|
|
3. Validate tags
|
|
"""
|
|
enrichedRecords = []
|
|
|
|
for record in records:
|
|
enriched = record.copy()
|
|
|
|
# VAT calculation/validation
|
|
vatPercentage = _parseFloat(record.get("vatPercentage", 0))
|
|
vatAmount = _parseFloat(record.get("vatAmount", 0))
|
|
bookingAmount = _parseFloat(record.get("bookingAmount", 0))
|
|
|
|
if vatPercentage > 0 and bookingAmount > 0:
|
|
# Calculate expected VAT amount
|
|
expectedVat = bookingAmount * vatPercentage / (100 + vatPercentage)
|
|
|
|
# If vatAmount is missing or significantly different, recalculate
|
|
if vatAmount == 0 or abs(vatAmount - expectedVat) > 0.01:
|
|
enriched["vatAmount"] = round(expectedVat, 2)
|
|
logger.info(f"VAT amount corrected: {vatAmount} -> {enriched['vatAmount']}")
|
|
|
|
# Valuta / transactionDateTime completion
|
|
valuta = record.get("valuta")
|
|
transactionDateTime = record.get("transactionDateTime")
|
|
|
|
if valuta and not transactionDateTime:
|
|
# Convert valuta date to timestamp
|
|
try:
|
|
dt = datetime.strptime(valuta, "%Y-%m-%d")
|
|
enriched["transactionDateTime"] = dt.replace(hour=12).timestamp()
|
|
except:
|
|
pass
|
|
elif transactionDateTime and not valuta:
|
|
# Convert timestamp to valuta date
|
|
try:
|
|
ts = float(transactionDateTime)
|
|
dt = datetime.fromtimestamp(ts, UTC)
|
|
enriched["valuta"] = dt.strftime("%Y-%m-%d")
|
|
except:
|
|
pass
|
|
|
|
# Validate tags
|
|
tags = record.get("tags", "")
|
|
if tags:
|
|
tagList = [t.strip().lower() for t in tags.split(",")]
|
|
validTags = [t for t in tagList if t in ALLOWED_TAGS]
|
|
enriched["tags"] = ",".join(validTags)
|
|
|
|
# Store source file info in description
|
|
existingDesc = record.get("desc", "")
|
|
if sourceFileName and sourceFileName not in existingDesc:
|
|
enriched["desc"] = f"[Source: {sourceFileName}]\n{existingDesc}"
|
|
|
|
enrichedRecords.append(enriched)
|
|
|
|
return enrichedRecords
|
|
|
|
|
|
def _parseFloat(value) -> float:
|
|
"""Safely parse float value."""
|
|
try:
|
|
return float(value) if value else 0.0
|
|
except (ValueError, TypeError):
|
|
return 0.0
|
|
|
|
|
|
async def _saveToTrusteePosition(
|
|
services,
|
|
records: List[Dict[str, Any]],
|
|
featureInstanceId: str
|
|
) -> int:
|
|
"""Save validated records to TrusteePosition table."""
|
|
savedCount = 0
|
|
|
|
# Get Trustee interface
|
|
from modules.features.trustee.interfaceFeatureTrustee import getInterface
|
|
trusteeInterface = getInterface(
|
|
services.user,
|
|
mandateId=services.mandateId,
|
|
featureInstanceId=featureInstanceId
|
|
)
|
|
|
|
for record in records:
|
|
try:
|
|
position = {
|
|
"valuta": record.get("valuta"),
|
|
"transactionDateTime": record.get("transactionDateTime"),
|
|
"company": record.get("company", ""),
|
|
"desc": record.get("desc", ""),
|
|
"tags": record.get("tags", ""),
|
|
"bookingCurrency": record.get("bookingCurrency", "CHF"),
|
|
"bookingAmount": _parseFloat(record.get("bookingAmount", 0)),
|
|
"originalCurrency": record.get("originalCurrency", "CHF"),
|
|
"originalAmount": _parseFloat(record.get("originalAmount", 0)),
|
|
"vatPercentage": _parseFloat(record.get("vatPercentage", 0)),
|
|
"vatAmount": _parseFloat(record.get("vatAmount", 0)),
|
|
"featureInstanceId": featureInstanceId,
|
|
"mandateId": services.mandateId
|
|
}
|
|
|
|
result = trusteeInterface.createPosition(position)
|
|
if result:
|
|
savedCount += 1
|
|
|
|
except Exception as e:
|
|
logger.error(f"Failed to save position: {str(e)}")
|
|
|
|
return savedCount
|
|
|
|
|
|
async def _moveToProcessedFolder(
|
|
self,
|
|
connectionReference: str,
|
|
siteId: str,
|
|
sourceFolderPath: str,
|
|
sourceFileName: str,
|
|
destFileName: str
|
|
) -> bool:
|
|
"""Move processed PDF to 'processed' subfolder."""
|
|
try:
|
|
processedFolder = f"{sourceFolderPath}/processed"
|
|
|
|
# Ensure 'processed' folder exists (create if not)
|
|
await _ensureFolderExists(self, connectionReference, siteId, processedFolder)
|
|
|
|
# Copy file to new location
|
|
copyResult = await self.copyFile({
|
|
"connectionReference": connectionReference,
|
|
"siteId": siteId,
|
|
"sourceFolder": sourceFolderPath,
|
|
"sourceFile": sourceFileName,
|
|
"destFolder": processedFolder,
|
|
"destFile": destFileName
|
|
})
|
|
|
|
if copyResult.success:
|
|
# Delete original file after successful copy
|
|
await _deleteFile(self, connectionReference, siteId, sourceFolderPath, sourceFileName)
|
|
return True
|
|
|
|
return False
|
|
|
|
except Exception as e:
|
|
logger.error(f"Failed to move file to processed: {str(e)}")
|
|
return False
|
|
|
|
|
|
async def _moveToErrorFolder(
|
|
self,
|
|
connectionReference: str,
|
|
siteId: str,
|
|
sourceFolderPath: str,
|
|
sourceFileName: str # Keep original filename
|
|
) -> bool:
|
|
"""Move failed PDF to 'error' subfolder (filename unchanged)."""
|
|
try:
|
|
errorFolder = f"{sourceFolderPath}/error"
|
|
|
|
# Ensure 'error' folder exists (create if not)
|
|
await _ensureFolderExists(self, connectionReference, siteId, errorFolder)
|
|
|
|
# Copy file to error folder (keep original name)
|
|
copyResult = await self.copyFile({
|
|
"connectionReference": connectionReference,
|
|
"siteId": siteId,
|
|
"sourceFolder": sourceFolderPath,
|
|
"sourceFile": sourceFileName,
|
|
"destFolder": errorFolder,
|
|
"destFile": sourceFileName # Same filename
|
|
})
|
|
|
|
if copyResult.success:
|
|
# Delete original file after successful copy
|
|
await _deleteFile(self, connectionReference, siteId, sourceFolderPath, sourceFileName)
|
|
return True
|
|
|
|
return False
|
|
|
|
except Exception as e:
|
|
logger.error(f"Failed to move file to error folder: {str(e)}")
|
|
return False
|
|
|
|
|
|
async def _ensureFolderExists(
|
|
self,
|
|
connectionReference: str,
|
|
siteId: str,
|
|
folderPath: str
|
|
) -> bool:
|
|
"""Create folder if it doesn't exist."""
|
|
try:
|
|
# Use SharePoint API to create folder
|
|
# Graph API: POST /sites/{siteId}/drive/root:/{folderPath}
|
|
# with body: {"name": folderName, "folder": {}, "@microsoft.graph.conflictBehavior": "fail"}
|
|
# ... implementation ...
|
|
return True
|
|
except Exception as e:
|
|
logger.error(f"Failed to ensure folder exists: {str(e)}")
|
|
return False
|
|
|
|
|
|
async def _deleteFile(
|
|
self,
|
|
connectionReference: str,
|
|
siteId: str,
|
|
folderPath: str,
|
|
fileName: str
|
|
) -> bool:
|
|
"""Delete file from SharePoint."""
|
|
try:
|
|
# Use SharePoint API to delete file
|
|
# Graph API: DELETE /sites/{siteId}/drive/root:/{folderPath}/{fileName}
|
|
# ... implementation ...
|
|
return True
|
|
except Exception as e:
|
|
logger.error(f"Failed to delete file: {str(e)}")
|
|
return False
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Automation Template: `getExpenses`
|
|
|
|
### 2.1 Template-Definition (hinzufügen in `subAutomationTemplates.py`)
|
|
|
|
```python
|
|
{
|
|
"template": {
|
|
"overview": "Expenses PDF Extraction",
|
|
"tasks": [
|
|
{
|
|
"id": "Task01",
|
|
"title": "Extract Expenses from SharePoint PDFs",
|
|
"description": "Reads PDF expense documents from SharePoint folder and saves extracted data to TrusteePosition",
|
|
"objective": "Extract expense data from PDF documents and store in Trustee database",
|
|
"actionList": [
|
|
{
|
|
"execMethod": "sharepoint",
|
|
"execAction": "getExpensesFromPdf",
|
|
"execParameters": {
|
|
"connectionReference": "{{KEY:connectionName}}",
|
|
"sharepointFolder": "{{KEY:sharepointFolder}}",
|
|
"featureInstanceId": "{{KEY:featureInstanceId}}",
|
|
"prompt": "{{KEY:extractionPrompt}}"
|
|
},
|
|
"execResultLabel": "expense_extraction_result"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
},
|
|
"parameters": {
|
|
"connectionName": "",
|
|
"sharepointFolder": "",
|
|
"featureInstanceId": "",
|
|
"extractionPrompt": """Du bist ein Spezialist für die Extraktion von Spesendaten aus PDF-Dokumenten.
|
|
|
|
AUFGABE:
|
|
Extrahiere alle Speseneinträge aus dem bereitgestellten PDF-Dokument und gib sie im CSV-Format zurück.
|
|
|
|
WICHTIGE REGELN:
|
|
1. Pro MwSt-Prozentsatz einen separaten Datensatz erstellen
|
|
2. Alle Datensätze zusammen müssen den Gesamtbetrag des Dokuments ergeben
|
|
3. Der gesamte extrahierte Text des Dokuments muss im Feld "desc" erfasst werden
|
|
4. Feld "company" enthält den Lieferanten/Verkäufer der Buchung
|
|
5. Tags müssen aus dieser Liste gewählt werden: customer, meeting, license, subscription, fuel, food, material
|
|
- Mehrere zutreffende Tags mit Komma trennen
|
|
|
|
CSV-SPALTEN (in dieser Reihenfolge):
|
|
valuta,transactionDateTime,company,desc,tags,bookingCurrency,bookingAmount,originalCurrency,originalAmount,vatPercentage,vatAmount
|
|
|
|
DATENFORMAT:
|
|
- valuta: YYYY-MM-DD (Valutadatum)
|
|
- transactionDateTime: Unix-Timestamp in Sekunden (Transaktionszeitpunkt)
|
|
- company: Lieferant/Verkäufer Name
|
|
- desc: Vollständiger extrahierter Text des Dokuments
|
|
- tags: Komma-getrennte Tags aus der erlaubten Liste
|
|
- bookingCurrency: Währungscode (CHF, EUR, USD, GBP)
|
|
- bookingAmount: Buchungsbetrag als Dezimalzahl
|
|
- originalCurrency: Original-Währungscode
|
|
- originalAmount: Original-Betrag als Dezimalzahl
|
|
- vatPercentage: MwSt-Prozentsatz (z.B. 8.1 für 8.1%)
|
|
- vatAmount: MwSt-Betrag als Dezimalzahl
|
|
|
|
BEISPIEL OUTPUT:
|
|
```csv
|
|
valuta,transactionDateTime,company,desc,tags,bookingCurrency,bookingAmount,originalCurrency,originalAmount,vatPercentage,vatAmount
|
|
2026-01-15,1736953200,Migros AG,"Einkauf Migros Zürich...",food,CHF,45.50,CHF,45.50,2.6,1.15
|
|
2026-01-15,1736953200,Migros AG,"Einkauf Migros Zürich...",material,CHF,12.30,CHF,12.30,8.1,0.92
|
|
```
|
|
|
|
HINWEISE:
|
|
- Wenn nur ein MwSt-Satz vorhanden ist, einen Datensatz erstellen
|
|
- Wenn mehrere MwSt-Sätze vorhanden sind (z.B. Lebensmittel 2.6% und Non-Food 8.1%), separate Datensätze erstellen
|
|
- Bei fehlenden Informationen: leeres Feld oder Standardwert
|
|
- Keine Anführungszeichen um numerische Werte"""
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2.2 Placeholder-Beschreibung
|
|
|
|
| Placeholder | Beschreibung | Beispielwert |
|
|
|------------|--------------|--------------|
|
|
| `connectionName` | User Connection Reference für SharePoint | `connection:msft:user@company.ch` |
|
|
| `sharepointFolder` | SharePoint-Ordnerpfad mit PDFs | `/sites/MySite/Documents/Expenses` |
|
|
| `featureInstanceId` | Feature Instance ID des Trustee | `fi_abc123` |
|
|
| `extractionPrompt` | AI-Prompt für Extraktion | (siehe oben) |
|
|
|
|
---
|
|
|
|
## 3. Frontend: Neue Seite im Trustee Feature
|
|
|
|
### 3.1 Komponenten-Struktur
|
|
|
|
```
|
|
frontend_nyla/src/features/trustee/
|
|
├── pages/
|
|
│ └── TrusteeExpenseImport.tsx # NEUE SEITE
|
|
├── components/
|
|
│ └── SharepointFolderSelect.tsx # Wiederverwendbare Komponente
|
|
```
|
|
|
|
### 3.2 Seiten-Anforderungen
|
|
|
|
1. **Microsoft Connection Button**
|
|
- Icon: Microsoft-Logo (wie bei User Connections Seite)
|
|
- Klick öffnet OAuth-Popup für Microsoft-Anmeldung
|
|
- Nutzt `useConnections.createMicrosoftConnectionAndAuth()`
|
|
- Status-Anzeige: verbunden/nicht verbunden
|
|
|
|
2. **SharePoint Folder Dropdown**
|
|
- Dropdown zur Auswahl eines SharePoint-Ordners
|
|
- Lädt Ordner-Liste über `/api/sharepoint/folders` Endpoint
|
|
- Zeigt Site-Name und Ordner-Pfad
|
|
- Referenz: Neutralization Feature hat ähnliches Dropdown
|
|
|
|
3. **Aktivieren-Button**
|
|
- Erstellt `AutomationDefinition` mit:
|
|
- Template: "getExpenses"
|
|
- Placeholders: ausgefüllte Werte
|
|
- Schedule: täglich (z.B. `0 22 * * *`)
|
|
- Active: true
|
|
- Speichert über `/api/automation/definitions` Endpoint
|
|
|
|
### 3.3 Beispiel-Implementation
|
|
|
|
```tsx
|
|
// TrusteeExpenseImport.tsx
|
|
import React, { useState, useEffect } from 'react';
|
|
import { useConnections } from '@/hooks/useConnections';
|
|
import { useFeatureInstance } from '@/hooks/useFeatureInstance';
|
|
import { Button } from '@/components/ui/button';
|
|
import { Select } from '@/components/ui/select';
|
|
import { MicrosoftIcon } from '@/components/icons';
|
|
import api from '@/api';
|
|
|
|
export function TrusteeExpenseImport() {
|
|
const { connections, createMicrosoftConnectionAndAuth } = useConnections();
|
|
const { featureInstanceId } = useFeatureInstance();
|
|
|
|
const [msftConnection, setMsftConnection] = useState<Connection | null>(null);
|
|
const [folders, setFolders] = useState<SharepointFolder[]>([]);
|
|
const [selectedFolder, setSelectedFolder] = useState<string>('');
|
|
const [isActivating, setIsActivating] = useState(false);
|
|
|
|
// Find active Microsoft connection
|
|
useEffect(() => {
|
|
const conn = connections.find(c =>
|
|
c.type === 'msft' && c.status === 'active'
|
|
);
|
|
setMsftConnection(conn || null);
|
|
}, [connections]);
|
|
|
|
// Load SharePoint folders when connected
|
|
useEffect(() => {
|
|
if (msftConnection) {
|
|
loadSharepointFolders();
|
|
}
|
|
}, [msftConnection]);
|
|
|
|
const loadSharepointFolders = async () => {
|
|
try {
|
|
const response = await api.get('/api/sharepoint/folders', {
|
|
params: { connectionId: msftConnection?.id }
|
|
});
|
|
setFolders(response.data.folders || []);
|
|
} catch (error) {
|
|
console.error('Failed to load folders:', error);
|
|
}
|
|
};
|
|
|
|
const handleConnect = async () => {
|
|
try {
|
|
await createMicrosoftConnectionAndAuth();
|
|
} catch (error) {
|
|
console.error('Connection failed:', error);
|
|
}
|
|
};
|
|
|
|
const handleActivate = async () => {
|
|
if (!selectedFolder || !msftConnection || !featureInstanceId) return;
|
|
|
|
setIsActivating(true);
|
|
try {
|
|
await api.post('/api/automation/definitions', {
|
|
label: 'Expense Import',
|
|
schedule: '0 22 * * *', // Daily at 22:00
|
|
templateName: 'getExpenses',
|
|
placeholders: {
|
|
connectionName: `connection:msft:${msftConnection.accountName}`,
|
|
sharepointFolder: selectedFolder,
|
|
featureInstanceId: featureInstanceId,
|
|
extractionPrompt: DEFAULT_EXTRACTION_PROMPT
|
|
},
|
|
active: true,
|
|
featureInstanceId: featureInstanceId
|
|
});
|
|
|
|
// Show success message
|
|
} catch (error) {
|
|
console.error('Activation failed:', error);
|
|
} finally {
|
|
setIsActivating(false);
|
|
}
|
|
};
|
|
|
|
return (
|
|
<div className="p-6 space-y-6">
|
|
<h1 className="text-2xl font-bold">Expense Import Setup</h1>
|
|
|
|
{/* Microsoft Connection */}
|
|
<div className="space-y-2">
|
|
<label className="text-sm font-medium">Microsoft Connection</label>
|
|
{msftConnection ? (
|
|
<div className="flex items-center gap-2">
|
|
<MicrosoftIcon className="w-5 h-5" />
|
|
<span className="text-green-600">
|
|
Connected as {msftConnection.accountName}
|
|
</span>
|
|
</div>
|
|
) : (
|
|
<Button onClick={handleConnect}>
|
|
<MicrosoftIcon className="w-4 h-4 mr-2" />
|
|
Connect Microsoft Account
|
|
</Button>
|
|
)}
|
|
</div>
|
|
|
|
{/* SharePoint Folder Selection */}
|
|
{msftConnection && (
|
|
<div className="space-y-2">
|
|
<label className="text-sm font-medium">SharePoint Expense Folder</label>
|
|
<Select
|
|
value={selectedFolder}
|
|
onValueChange={setSelectedFolder}
|
|
placeholder="Select a folder..."
|
|
>
|
|
{folders.map(folder => (
|
|
<Select.Option key={folder.path} value={folder.path}>
|
|
{folder.siteName} - {folder.name}
|
|
</Select.Option>
|
|
))}
|
|
</Select>
|
|
</div>
|
|
)}
|
|
|
|
{/* Activate Button */}
|
|
{selectedFolder && (
|
|
<Button
|
|
onClick={handleActivate}
|
|
disabled={isActivating}
|
|
className="w-full"
|
|
>
|
|
{isActivating ? 'Activating...' : 'Activate Daily Import'}
|
|
</Button>
|
|
)}
|
|
</div>
|
|
);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Backend: API Endpoints
|
|
|
|
### 4.1 SharePoint Folder List Endpoint
|
|
|
|
Neuer Endpoint in `routeSharepoint.py`:
|
|
|
|
```python
|
|
@router.get("/api/sharepoint/folders")
|
|
async def listSharepointFolders(
|
|
connectionId: str = Query(..., description="Connection ID"),
|
|
request: Request = None
|
|
):
|
|
"""List available SharePoint folders for the user's connection."""
|
|
# Implementation: Use Graph API to list sites and root folders
|
|
...
|
|
```
|
|
|
|
### 4.2 Automation Definition Endpoint
|
|
|
|
Erweiterung von `routeFeatureAutomation.py` für Template-basierte Erstellung.
|
|
|
|
---
|
|
|
|
## 5. Datenbank-Änderungen
|
|
|
|
Keine Schema-Änderungen erforderlich. `TrusteePosition` Tabelle wird verwendet wie definiert.
|
|
|
|
---
|
|
|
|
## 6. Design-Entscheidungen
|
|
|
|
### 6.1 Geklärte Punkte
|
|
|
|
| Thema | Entscheidung |
|
|
|-------|--------------|
|
|
| **PDF-Parsing** | AI-Service verarbeitet PDFs direkt (inkl. Bilder, Scans etc.) - keine Vorverarbeitung nötig |
|
|
| **Folder-Erstellung** | "processed" und "error" Subfolders werden automatisch erstellt wenn nicht vorhanden |
|
|
| **Fehlerbehandlung** | Fehlerhafte PDFs werden in "error" Subfolder verschoben, Dateiname bleibt unverändert |
|
|
| **Duplikat-Erkennung** | Keine - ein wiederholtes Dokument ist bewusst (Kunde lädt erneut hoch) |
|
|
|
|
### 6.2 Risiko-Management
|
|
|
|
| Risiko | Handling |
|
|
|--------|----------|
|
|
| **AI-Kosten** | Kunde bezahlt pro Aufruf - keine weitere Einschränkung nötig |
|
|
| **SharePoint Rate-Limiting** | Bei Rate-Limit-Error: warten, dann weiterfahren |
|
|
| **Timeout** | Bereits im System implementiert - funktioniert |
|
|
|
|
### 6.3 Implementierungs-Vorgaben
|
|
|
|
| Vorgabe | Wert |
|
|
|---------|------|
|
|
| **Max PDFs pro Ausführung** | 50 Dateien (Limit) |
|
|
| **Retry-Logik** | NEIN - AI-Service handhabt Retries intern |
|
|
| **Preview-Modus** | NEIN - nicht benötigt |
|
|
|
|
---
|
|
|
|
## 7. Implementierungs-Reihenfolge
|
|
|
|
1. **Phase 1: Backend Action**
|
|
- `getExpensesFromPdf.py` erstellen
|
|
- In `methodSharepoint.py` registrieren
|
|
- Unit-Tests schreiben
|
|
|
|
2. **Phase 2: Automation Template**
|
|
- Template in `subAutomationTemplates.py` hinzufügen
|
|
- Prompt optimieren und testen
|
|
|
|
3. **Phase 3: API Endpoints**
|
|
- SharePoint Folder-List Endpoint
|
|
- Automation Definition Erweiterung
|
|
|
|
4. **Phase 4: Frontend**
|
|
- `TrusteeExpenseImport.tsx` Seite
|
|
- Navigation/Routing hinzufügen
|
|
- Integration testen
|
|
|
|
---
|
|
|
|
## 8. Test-Plan
|
|
|
|
1. **Unit-Tests**
|
|
- VAT-Berechnung
|
|
- Valuta/DateTime-Ergänzung
|
|
- CSV-Parsing
|
|
- Tag-Validierung
|
|
|
|
2. **Integration-Tests**
|
|
- SharePoint-Verbindung
|
|
- PDF-Download
|
|
- AI-Extraktion
|
|
- TrusteePosition-Speicherung
|
|
- Datei-Verschiebung
|
|
|
|
3. **E2E-Tests**
|
|
- Kompletter Workflow von PDF bis gespeicherter Position
|
|
- Automation-Schedule-Ausführung
|