service-llm-private/setupserver.md

# Local LLM Server - Komplette Setup-Anleitung

Von GitHub Repo-Setup über Cursor bis zum automatischen Deployment auf Infomaniak.

---

## Access

## Server Data

IP 83.228.200.109
Instance: local-llm	Ubuntu 24.04 LTS Noble Numbat	83.228.226.58, 2001:1600:16:10::7e3 nvl4-a8-ram16-disk0	ollama-deploy-key	Active		az-1
Connect: ssh -i "C:\Users\pmots\Downloads\ollama-deploy-key.pem" ubuntu@83.228.200.109


## Übersicht

```
┌─────────────────┐                    ┌─────────────────┐
│     Cursor      │◄──── sync ────────▶│     GitHub      │
│  (lokale Dev)   │                    │   private-llm   │
└─────────────────┘                    └────────┬────────┘
                                                │
                                                │ Push to main
                                                ▼
                                       ┌─────────────────┐
                                       │  GitHub Actions │
                                       └────────┬────────┘
                                                │
                                                │ SSH Deploy
                                                ▼
                                       ┌─────────────────┐
                                       │ Infomaniak GPU  │
                                       │     Server      │
                                       │  ┌───────────┐  │
                                       │  │  Ollama   │  │
                                       │  │  + Flask  │  │
                                       │  │  (LLM +   │  │
                                       │  │  Vision)  │  │
                                       │  └───────────┘  │
                                       └─────────────────┘
```

---

# Teil A: GitHub Repository Setup

## A.1 Repository klonen in Cursor

Öffne ein Terminal in Cursor oder lokal:

```bash
# In deinen Projektordner wechseln
cd ~/Projects  # oder wo du deine Projekte speicherst

# Repository klonen
git clone https://github.com/valueonag/private-llm.git

# In den Ordner wechseln
cd private-llm
```

## A.2 Cursor mit Repo verbinden

**Option 1: Ordner in Cursor öffnen**
1. Cursor öffnen
2. **File → Open Folder**
3. Wähle den `private-llm` Ordner

**Option 2: Über Terminal**
```bash
cd ~/Projects/private-llm
cursor .
```

## A.3 Projektstruktur erstellen

Erstelle folgende Struktur:

```
private-llm/
├── app.py                      # Deine Flask App
├── requirements.txt            # Python Dependencies
├── templates/
│   └── index.html              # Frontend Template
├── static/                     # CSS, JS, Bilder (optional)
├── .github/
│   └── workflows/
│       └── deploy.yml          # CI/CD Pipeline
├── .gitignore
└── README.md
```

### A.3.1 requirements.txt

Erstelle `requirements.txt`:

```txt
flask>=3.0.0
flask-cors>=4.0.0
requests>=2.31.0
pymupdf>=1.24.0
gunicorn>=21.0.0
```

### A.3.2 .gitignore

Erstelle `.gitignore`:

```gitignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
.venv/

# IDE
.idea/
.vscode/
*.swp
*.swo
.cursor/

# OS
.DS_Store
Thumbs.db

# Logs
*.log
logs/

# Environment
.env
.env.local

# Test
.pytest_cache/
.coverage
htmlcov/
```

### A.3.3 README.md

Erstelle `README.md`:

```markdown
# Private LLM - Belegscanner

KI-Dokumentenanalyse mit lokalen Ollama Vision-Modellen.

## Features

- Rechnungen, Belege, Bankauszüge analysieren
- Handschrift erkennen
- PDF-Support
- 100% lokal - keine Cloud-APIs

## Tech Stack

- **Backend:** Python Flask
- **AI:** Ollama Vision Models
- **Server:** Infomaniak Swiss Cloud (GPU)

## Deployment

Automatisches Deployment via GitHub Actions bei Push zu `main`.
```

---

# Teil B: GitHub Actions Deploy Workflow

## B.1 Workflow-Datei erstellen

Erstelle `.github/workflows/deploy.yml`:

```yaml
name: Deploy to Infomaniak

on:
  push:
    branches:
      - main
  workflow_dispatch:  # Manueller Trigger möglich

env:
  APP_DIR: /opt/ollama-webapp
  SERVICE_NAME: ollama-webapp

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      # 1. Code auschecken
      - name: Checkout code
        uses: actions/checkout@v4

      # 2. SSH Setup
      - name: Setup SSH
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key
          ssh-keyscan -H ${{ secrets.SERVER_HOST }} >> ~/.ssh/known_hosts

      # 3. Dateien zum Server kopieren
      - name: Deploy files to server
        run: |
          rsync -avz --delete \
            -e "ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no" \
            --exclude '.git' \
            --exclude '.github' \
            --exclude '__pycache__' \
            --exclude '*.pyc' \
            --exclude 'venv' \
            --exclude '.env' \
            --exclude 'logs' \
            ./ ${{ secrets.SERVER_USER }}@${{ secrets.SERVER_HOST }}:${{ env.APP_DIR }}/app/

      # 4. Dependencies installieren und Service neu starten
      - name: Install dependencies and restart service
        run: |
          ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no \
            ${{ secrets.SERVER_USER }}@${{ secrets.SERVER_HOST }} << 'ENDSSH'

            echo "📦 Installing dependencies..."
            cd /opt/ollama-webapp
            ./venv/bin/pip install -r app/requirements.txt --quiet --upgrade

            echo "🔄 Restarting service..."
            sudo systemctl restart ollama-webapp

            echo "⏳ Waiting for service to start..."
            sleep 5

            echo "📊 Service status:"
            sudo systemctl status ollama-webapp --no-pager -l

            echo "✅ Deployment complete!"
          ENDSSH

      # 5. Health Check
      - name: Health Check
        run: |
          echo "🏥 Running health check..."
          sleep 3

          HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            http://${{ secrets.SERVER_HOST }}:5000/api/health || echo "000")

          if [ "$HTTP_STATUS" = "200" ]; then
            echo "✅ Health check passed! (HTTP $HTTP_STATUS)"
          else
            echo "❌ Health check failed! (HTTP $HTTP_STATUS)"
            exit 1
          fi

      # 6. Deployment Summary
      - name: Deployment Summary
        if: success()
        run: |
          echo "🎉 Deployment successful!"
          echo ""
          echo "📍 App URL: http://${{ secrets.SERVER_HOST }}:5000"
          echo "📍 Health:  http://${{ secrets.SERVER_HOST }}:5000/api/health"
          echo "📍 Ollama:  http://${{ secrets.SERVER_HOST }}:5000/api/ollama/status"
```

## B.2 GitHub Secrets einrichten

1. Gehe zu: `https://github.com/valueonag/private-llm/settings/secrets/actions`
2. Klicke **New repository secret**
3. Erstelle diese 3 Secrets:

| Secret Name | Wert | Beschreibung |
|-------------|------|--------------|
| `SERVER_HOST` | `185.xxx.xxx.xxx` | Deine Server-IP |
| `SERVER_USER` | `ubuntu` | SSH Benutzer |
| `SSH_PRIVATE_KEY` | `-----BEGIN OPENSSH...` | Der Private Key (ganzer Inhalt) |

---

# Teil C: Infomaniak Server Setup

## C.1 Horizon Dashboard Login

1. Öffne: https://api.pub1.infomaniak.cloud/horizon
2. Login-Daten:

| Feld | Wert |
|------|------|
| **Domain** | `PCU-MPXPVCR` |
| **User Name** | `PCU-MPXPVCR` |
| **Password** | Dein OpenStack-Passwort |

---

## C.2 SSH Key Pair erstellen

1. Gehe zu: **Compute → Key Pairs**
2. Klicke: **Create Key Pair**
3. Fülle aus:

| Feld | Wert |
|------|------|
| **Key Pair Name** | `ollama-deploy-key` |
| **Key Type** | `SSH Key` |

4. Klicke **Create Key Pair**
5. ⚠️ **WICHTIG:** Die `.pem` Datei wird automatisch heruntergeladen
   - Speichere sie sicher ab (z.B. `~/Downloads/ollama-deploy-key.pem`)
   - Du brauchst sie später für SSH-Zugang!

---

## C.3 Security Group erstellen

1. Gehe zu: **Network → Security Groups**
2. Klicke: **Create Security Group**
3. Fülle aus:

| Feld | Wert |
|------|------|
| **Name** | `ollama-webapp` |
| **Description** | `Ports für Ollama und Flask App` |

4. Klicke **Create Security Group**
5. In der Liste: Klicke **Manage Rules** bei `ollama-webapp`
6. Klicke **Add Rule** und erstelle diese Regeln:

| Rule | Direction | Ether Type | IP Protocol | Port Range | CIDR |
|------|-----------|------------|-------------|------------|------|
| 1 | Ingress | IPv4 | TCP | 22 | `0.0.0.0/0` |
| 2 | Ingress | IPv4 | TCP | 80 | `0.0.0.0/0` |
| 3 | Ingress | IPv4 | TCP | 443 | `0.0.0.0/0` |
| 4 | Ingress | IPv4 | TCP | 5000 | `0.0.0.0/0` |
| 5 | Ingress | IPv4 | TCP | 11434 | `0.0.0.0/0` |

**Für jede Regel:**
- Klicke **Add Rule**
- Direction: `Ingress`
- Wähle bei "Rule": `Custom TCP Rule`
- Port: Jeweilige Portnummer eingeben
- CIDR: `0.0.0.0/0`
- Klicke **Add**

---

## C.4 GPU-Instanz erstellen

1. Gehe zu: **Compute → Instances**
2. Klicke: **Launch Instance**

### Tab 1: Details

| Feld | Wert |
|------|------|
| **Instance Name** | `local-llm` |
| **Description** | `Local LLM Server` |
| **Availability Zone** | `nova` (Standard lassen) |
| **Count** | `1` |

→ Klicke **Next**

### Tab 2: Source

| Feld | Wert |
|------|------|
| **Select Boot Source** | `Image` |
| **Create New Volume** | `Yes` ✓ |
| **Volume Size (GB)** | `150` |
| **Delete Volume on Instance Delete** | `No` ✗ |

**Image auswählen:**
1. In der unteren Liste "Available" suche: `Ubuntu 24.04 LTS Noble Numbat`
2. Klicke den **↑ Pfeil** rechts davon
3. Das Image erscheint oben unter "Allocated"

→ Klicke **Next**

### Tab 3: Flavor

**GPU-Flavor auswählen:**

In der Liste "Available" suche nach GPU-Flavors (beginnen mit `nvl4-`, `t4-`, oder `a2-`):

| Flavor | GPU | VRAM | vCPUs | RAM | Empfehlung |
|--------|-----|------|-------|-----|------------|
| `nvl4-12-46-0` | L4 | 24GB | 12 | 46GB | ✓ Beste Wahl |
| `t4-8-32-0` | T4 | 16GB | 8 | 32GB | Budget |
| `a2-8-32-0` | A2 | 16GB | 8 | 32GB | Alternative |

1. Finde den gewünschten Flavor (z.B. mit "L4" oder "nvl4" im Namen)
2. Klicke den **↑ Pfeil** rechts davon
3. Der Flavor erscheint oben unter "Allocated"

→ Klicke **Next**

### Tab 4: Networks

| Feld | Wert |
|------|------|
| **Network** | `ext-net1` |

1. In der Liste "Available" finde: `ext-net1`
2. Klicke den **↑ Pfeil**
3. `ext-net1` erscheint unter "Allocated"

→ Klicke **Next**

### Tab 5: Network Ports

Überspringe diesen Tab (leer lassen).

→ Klicke **Next**

### Tab 6: Security Groups

1. Falls `default` unter "Allocated" steht: Klicke den **↓ Pfeil** um es zu entfernen
2. In "Available" finde: `ollama-webapp`
3. Klicke den **↑ Pfeil**
4. Unter "Allocated" sollte nur `ollama-webapp` stehen

→ Klicke **Next**

### Tab 7: Key Pair

1. In "Available" finde: `ollama-deploy-key`
2. Klicke den **↑ Pfeil**
3. Unter "Allocated" steht: `ollama-deploy-key`

→ Klicke **Next**

### Tab 8: Configuration (optional)

Überspringe diesen Tab (leer lassen).

→ Klicke **Next**

### Tab 9: Server Groups (optional)

Überspringe diesen Tab (leer lassen).

→ Klicke **Next**

### Tab 10: Scheduler Hints (optional)

Überspringe diesen Tab (leer lassen).

→ Klicke **Next**

### Tab 11: Metadata (optional)

Überspringe diesen Tab (leer lassen).

---

### Instanz starten

Klicke: **Launch Instance**

⏳ Warte bis der Status von `Build` zu `Active` wechselt (ca. 1-3 Minuten).

---

## C.5 Floating IP zuweisen

Die Instanz braucht eine öffentliche IP-Adresse:

1. Gehe zu: **Network → Floating IPs**
2. Klicke: **Allocate IP to Project**
3. Wähle:

| Feld | Wert |
|------|------|
| **Pool** | `ext-net1` |
| **Description** | `IP für Ollama Server` |

4. Klicke **Allocate IP**
5. In der Liste: Klicke **Associate** bei der neuen IP
6. Wähle:

| Feld | Wert |
|------|------|
| **Port to be associated** | `local-llm` (deine Instanz) |

7. Klicke **Associate**

📝 **Notiere dir die IP-Adresse** (z.B. `185.132.xxx.xxx`) - du brauchst sie für:
- SSH-Zugang
- GitHub Secrets
- Browser-Zugriff auf die App

---

## C.6 Zusammenfassung deiner Instanz

Nach erfolgreichem Setup hast du:

| Komponente | Wert |
|------------|------|
| **Instance Name** | `local-llm` |
| **Description** | `Local LLM Server` |
| **Image** | Ubuntu 24.04 LTS |
| **Flavor** | GPU mit L4/T4/A2 |
| **Disk** | 150 GB |
| **Network** | `ext-net1` |
| **Security Group** | `ollama-webapp` |
| **Key Pair** | `ollama-deploy-key` |
| **Floating IP** | `185.xxx.xxx.xxx` |

## C.2 Server Basis-Setup

SSH zum Server:

```bash
ssh -i ~/Downloads/ollama-deploy-key.pem ubuntu@185.xxx.xxx.xxx
```

### System aktualisieren

```bash
sudo apt update && sudo apt upgrade -y
```

### NVIDIA-Treiber installieren

```bash
sudo apt install -y nvidia-driver-550
sudo reboot
```

Nach Neustart wieder verbinden und prüfen:

```bash
nvidia-smi
```

### Ollama installieren

```bash
# Installieren
curl -fsSL https://ollama.com/install.sh | sh

# Konfigurieren
sudo systemctl edit ollama
```

Füge ein:

```ini
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_MAX_LOADED_MODELS=2"
```

```bash
sudo systemctl restart ollama
sudo systemctl enable ollama
```

### Modelle herunterladen

```bash
ollama pull granite3.2-vision
ollama pull qwen2.5vl:7b
ollama pull deepseek-ocr
```

## C.3 Python-Umgebung vorbereiten

```bash
# Pakete installieren
sudo apt install -y python3-pip python3-venv git

# App-Verzeichnis erstellen
sudo mkdir -p /opt/ollama-webapp/{app,venv,logs}
sudo chown -R ubuntu:ubuntu /opt/ollama-webapp

# Virtual Environment erstellen
python3 -m venv /opt/ollama-webapp/venv

# Basis-Pakete installieren
/opt/ollama-webapp/venv/bin/pip install --upgrade pip
/opt/ollama-webapp/venv/bin/pip install flask flask-cors requests pymupdf gunicorn
```

## C.4 Deploy SSH-Key erstellen (für GitHub Actions)

```bash
# Key erstellen
ssh-keygen -t ed25519 -C "github-actions-deploy" -f ~/.ssh/github_deploy_key -N ""

# Zu authorized_keys hinzufügen
cat ~/.ssh/github_deploy_key.pub >> ~/.ssh/authorized_keys

# Private Key anzeigen - DIESEN IN GITHUB SECRETS KOPIEREN!
echo ""
echo "=========================================="
echo "DIESEN KEY ALS 'SSH_PRIVATE_KEY' IN GITHUB SPEICHERN:"
echo "=========================================="
cat ~/.ssh/github_deploy_key
echo ""
echo "=========================================="
```

**Kopiere den kompletten Private Key** (inkl. `-----BEGIN...` und `-----END...`) und speichere ihn als GitHub Secret `SSH_PRIVATE_KEY`.

## C.5 Systemd Service erstellen

```bash
sudo nano /etc/systemd/system/ollama-webapp.service
```

Inhalt:

```ini
[Unit]
Description=Belegscanner Flask App
After=network.target ollama.service
Wants=ollama.service

[Service]
Type=simple
User=ubuntu
Group=ubuntu
WorkingDirectory=/opt/ollama-webapp/app
Environment="PATH=/opt/ollama-webapp/venv/bin:/usr/bin"
Environment="FLASK_ENV=production"
ExecStart=/opt/ollama-webapp/venv/bin/gunicorn \
    --bind 0.0.0.0:5000 \
    --workers 2 \
    --timeout 3600 \
    --access-logfile /opt/ollama-webapp/logs/access.log \
    --error-logfile /opt/ollama-webapp/logs/error.log \
    app:app
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
```

Aktivieren:

```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama-webapp
```

## C.6 Sudo-Rechte für GitHub Actions

```bash
sudo visudo
```

Füge am Ende hinzu:

```
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl restart ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl status ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl stop ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl start ollama-webapp
```

## C.7 Templates-Ordner erstellen

Falls deine Flask-App Templates verwendet:

```bash
mkdir -p /opt/ollama-webapp/app/templates
mkdir -p /opt/ollama-webapp/app/static
```

---

# Teil D: Erster Commit und Deploy

## D.1 In Cursor: Code committen

Öffne Terminal in Cursor (im `private-llm` Ordner):

```bash
# Status prüfen
git status

# Alle Dateien hinzufügen
git add .

# Commit erstellen
git commit -m "Initial setup: Flask app with CI/CD"

# Zu GitHub pushen
git push origin main
```

## D.2 GitHub Actions beobachten

1. Gehe zu: `https://github.com/valueonag/private-llm/actions`
2. Du solltest einen laufenden Workflow sehen
3. Klicke drauf um den Fortschritt zu sehen

## D.3 App testen

Nach erfolgreichem Deploy:

```bash
# Health Check
curl http://185.xxx.xxx.xxx:5000/api/health

# Ollama Status
curl http://185.xxx.xxx.xxx:5000/api/ollama/status

# Im Browser öffnen
open http://185.xxx.xxx.xxx:5000
```

---

# Teil E: Entwicklungs-Workflow

## E.1 Lokale Entwicklung in Cursor

```bash
# Virtual Environment erstellen (einmalig)
cd ~/Projects/private-llm
python3 -m venv venv
source venv/bin/activate  # Mac/Linux
# oder: .\venv\Scripts\activate  # Windows

# Dependencies installieren
pip install -r requirements.txt

# Lokal starten (mit lokalem Ollama)
python app.py
```

## E.2 Änderungen deployen

```bash
# Änderungen speichern
git add .

# Commit mit Beschreibung
git commit -m "Feature: Neue Funktion XYZ"

# Push zu GitHub → Automatischer Deploy!
git push origin main
```

## E.3 Cursor Git-Integration

In Cursor kannst du auch die GUI nutzen:

1. **Source Control** Tab (linke Sidebar)
2. Änderungen sehen
3. **+** um Dateien zu stagen
4. Commit Message eingeben
5. **✓** zum Committen
6. **...** → **Push** zum Pushen

---

# Teil F: Nützliche Befehle

## Server-Befehle

```bash
# SSH zum Server
ssh -i ~/Downloads/ollama-deploy-key.pem ubuntu@185.xxx.xxx.xxx

# Service Status
sudo systemctl status ollama-webapp
sudo systemctl status ollama

# Logs anschauen
tail -f /opt/ollama-webapp/logs/access.log
tail -f /opt/ollama-webapp/logs/error.log
sudo journalctl -u ollama-webapp -f

# Service neu starten
sudo systemctl restart ollama-webapp

# GPU Status
nvidia-smi

# Ollama Modelle
ollama list
ollama pull <modell>
```

## Git-Befehle

```bash
# Status
git status

# Änderungen sehen
git diff

# Commit und Push
git add .
git commit -m "Beschreibung"
git push origin main

# Vom Server holen (falls jemand anders gepusht hat)
git pull origin main
```

---

# Teil G: Troubleshooting

## Deploy schlägt fehl

1. **GitHub Actions Tab prüfen** - Fehlermeldung lesen
2. **SSH testen:**
   ```bash
   ssh -i ~/.ssh/github_deploy_key ubuntu@185.xxx.xxx.xxx
   ```
3. **Secrets prüfen** - Sind alle 3 Secrets korrekt?

## App startet nicht

```bash
# Auf dem Server:
sudo systemctl status ollama-webapp -l
cat /opt/ollama-webapp/logs/error.log
```

## Ollama nicht erreichbar

```bash
# Status prüfen
sudo systemctl status ollama

# Neu starten
sudo systemctl restart ollama

# Logs
sudo journalctl -u ollama -f
```

---

# Checkliste

## GitHub Setup
- [ ] Repository geklont
- [ ] Cursor mit Repo verbunden
- [ ] `requirements.txt` erstellt
- [ ] `.gitignore` erstellt
- [ ] `.github/workflows/deploy.yml` erstellt
- [ ] GitHub Secrets konfiguriert:
  - [ ] `SERVER_HOST`
  - [ ] `SERVER_USER`
  - [ ] `SSH_PRIVATE_KEY`

## Server Setup
- [ ] GPU-Instanz erstellt
- [ ] Floating IP zugewiesen
- [ ] NVIDIA-Treiber installiert
- [ ] Ollama installiert und konfiguriert
- [ ] Modelle heruntergeladen
- [ ] Python venv erstellt
- [ ] Deploy SSH-Key erstellt
- [ ] Systemd Service erstellt
- [ ] Sudo-Rechte konfiguriert

## Erster Deploy
- [ ] Code committed und gepusht
- [ ] GitHub Actions erfolgreich
- [ ] Health Check funktioniert
- [ ] App im Browser erreichbar

---

# URLs

| Service | URL |
|---------|-----|
| App | `http://DEINE-IP:5000` |
| Health Check | `http://DEINE-IP:5000/api/health` |
| Ollama Status | `http://DEINE-IP:5000/api/ollama/status` |
| GitHub Repo | `https://github.com/valueonag/private-llm` |
| GitHub Actions | `https://github.com/valueonag/private-llm/actions` |