logging
All checks were successful
Deploy LLM Service / deploy (push) Successful in 22s

This commit is contained in:
ValueOn AG 2026-06-02 09:42:04 +02:00
parent 5b66aaea0f
commit 1e2a6bea70
3 changed files with 47 additions and 41 deletions

View file

@ -1,21 +1,24 @@
# Private LLM - für Neutralisierung
KI-Dokumentenanalyse mit lokalen Ollama Vision-Modellen.
KI-Dokumentenanalyse und Text-Neutralisierung mit lokalen Ollama-Modellen.
## Integrierte MVP Features
## Features
- Rechnungen, Belege, Bankauszüge analysieren
- Text-Neutralisierung für den Gateway (kein Datenabfluss an externe APIs)
- Rechnungen, Belege, Bankauszüge analysieren (Vision)
- Handschrift erkennen
- PDF-Support
- 100% lokal - keine Cloud-APIs
- PDF-Support (PyMuPDF)
- 100% lokal auf Schweizer Infrastruktur
## Tech Stack
- **Backend:** Python Flask
- **AI:** Ollama Vision Models
- **Server:** Infomaniak Swiss Cloud (GPU)
- **Backend:** FastAPI (Python), Uvicorn
- **AI:** Ollama (qwen2.5:7b, qwen2.5vl:7b, granite3.2-vision)
- **Server:** Infomaniak Swiss Cloud (NVIDIA L4, 24 GB VRAM)
- **TLS:** Let's Encrypt (`llm.poweron.swiss`)
## Deployment
Automatisches Deployment via GitHub Actions bei Push zu `main`.
Automatisches Deployment via Forgejo Actions bei Push zu `main`.
Ziel: `/opt/ollama-webapp/app/` auf `83.228.200.109`.

View file

@ -153,12 +153,24 @@ rateLimiter = RateLimiter(
# Model Mapping
# ============================================================================
# Current models (L4 24 GB VRAM — Infomaniak)
MODEL_MAPPING = {
"poweron-text-general": "qwen2.5:7b",
"poweron-vision-general": "qwen2.5vl:7b",
"poweron-vision-deep": "granite3.2-vision",
}
# Next-gen models (RTX PRO 6000 96 GB VRAM — prepared, activate after migration)
# Uncomment and remove the old entries above once the new hardware is live.
# MODEL_MAPPING = {
# "poweron-text-general": "qwen3:14b",
# "poweron-text-reasoning": "deepseek-r1:70b",
# "poweron-vision-general": "llama4:scout",
# "poweron-vision-deep": "qwen2.5vl:72b",
# "poweron-embed": "nomic-embed-text",
# "poweron-transcribe": "whisper-large-v3-turbo",
# }
INTERNAL_TO_EXTERNAL = {v: k for k, v in MODEL_MAPPING.items()}
@ -260,7 +272,7 @@ def _isVisionModel(modelName: str) -> bool:
if not modelName:
return False
modelLower = modelName.lower()
visionIndicators = ["vision", "vl", "llava", "bakllava", "granite"]
visionIndicators = ["vision", "vl", "llava", "bakllava", "granite", "scout", "llama4"]
return any(indicator in modelLower for indicator in visionIndicators)

View file

@ -51,25 +51,25 @@ Connect: ssh -i "C:\Users\pmots\Downloads\ollama-deploy-key.pem" ubuntu@83.228.2
| **GPU** | NVIDIA L4 (24GB VRAM) |
| **OS** | Ubuntu 24.04 LTS |
| **SSH User** | `ubuntu` |
| **App Port** | `5000` |
| **App Port** | `8000` (HTTPS) |
| **Ollama Port** | `11434` |
| **GitHub Repo** | `https://github.com/valueonag/private-llm` |
| **GitHub Repo** | `https://git.poweron.swiss` (Forgejo) |
### Installierte Modelle
| Modell | Verwendung |
|--------|------------|
| `granite3.2-vision` | Rechnungen, Belege, Dokumente |
| `qwen2.5vl:7b` | Handschrift |
| `deepseek-ocr` | OCR / Text-Extraktion |
| Modell | Ollama-Name | Verwendung |
|--------|-------------|------------|
| `poweron-text-general` | `qwen2.5:7b` | Text-Neutralisierung |
| `poweron-vision-general` | `qwen2.5vl:7b` | Handschrift, Dokumente |
| `poweron-vision-deep` | `granite3.2-vision` | Rechnungen, Belege |
### URLs
| Service | URL |
|---------|-----|
| **App** | http://83.228.200.109:5000 |
| **Health Check** | http://83.228.200.109:5000/api/health |
| **Ollama Status** | http://83.228.200.109:5000/api/ollama/status |
| **App** | https://llm.poweron.swiss:8000 |
| **Health Check** | https://llm.poweron.swiss:8000/api/health |
| **Ollama Status** | https://llm.poweron.swiss:8000/api/ollama/status |
---
@ -573,14 +573,14 @@ sudo systemctl enable ollama
## D.5 Modelle herunterladen
```bash
# Fuer Dokumente (Rechnungen, Belege)
ollama pull granite3.2-vision
# Text-Neutralisierung
ollama pull qwen2.5:7b
# Fuer Handschrift
# Vision: Handschrift, Dokumente
ollama pull qwen2.5vl:7b
# OCR-Spezialist
ollama pull deepseek-ocr
# Vision: Rechnungen, Belege
ollama pull granite3.2-vision
```
### Modelle pruefen
@ -606,7 +606,7 @@ python3 -m venv /opt/ollama-webapp/venv
# Basis-Pakete installieren
/opt/ollama-webapp/venv/bin/pip install --upgrade pip
/opt/ollama-webapp/venv/bin/pip install flask flask-cors requests pymupdf gunicorn
/opt/ollama-webapp/venv/bin/pip install -r /opt/ollama-webapp/app/requirements.txt
```
---
@ -621,26 +621,17 @@ Inhalt:
```ini
[Unit]
Description=Belegscanner Flask App
After=network.target ollama.service
Wants=ollama.service
Description=PowerOn Private-LLM Service
After=network.target
[Service]
Type=simple
User=ubuntu
Group=ubuntu
WorkingDirectory=/opt/ollama-webapp/app
Environment="PATH=/opt/ollama-webapp/venv/bin:/usr/bin"
Environment="FLASK_ENV=production"
ExecStart=/opt/ollama-webapp/venv/bin/gunicorn \
--bind 0.0.0.0:5000 \
--workers 2 \
--timeout 3600 \
--access-logfile /opt/ollama-webapp/logs/access.log \
--error-logfile /opt/ollama-webapp/logs/error.log \
app:app
ExecStart=/opt/ollama-webapp/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --ssl-keyfile /etc/letsencrypt/live/llm.poweron.swiss/privkey.pem --ssl-certfile /etc/letsencrypt/live/llm.poweron.swiss/fullchain.pem
Restart=always
RestartSec=5
Environment=PYTHONUNBUFFERED=1
[Install]
WantedBy=multi-user.target
@ -772,10 +763,10 @@ ollama run granite3.2-vision "Beschreibe dieses Bild"
sudo systemctl status ollama-webapp
# Logs pruefen
tail -50 /opt/ollama-webapp/logs/error.log
sudo journalctl -u ollama-webapp -n 50
# Port pruefen
sudo netstat -tlnp | grep 5000
sudo netstat -tlnp | grep 8000
```
## Ollama nicht erreichbar