service-llm-private/docu/setupserver.md

# Local LLM Server - Komplette Setup-Anleitung

Von GitHub Repo-Setup über Cursor bis zum automatischen Deployment auf Infomaniak.

---

## Access

## Server Data

IP 83.228.200.109
Instance: local-llm	Ubuntu 24.04 LTS Noble Numbat	83.228.226.58, 2001:1600:16:10::7e3 nvl4-a8-ram16-disk0	ollama-deploy-key	Active		az-1
Connect: ssh -i "C:\Users\pmots\Downloads\ollama-deploy-key.pem" ubuntu@83.228.200.109


## Übersicht

```
┌─────────────────┐                    ┌─────────────────┐
│     Cursor      │◄──── sync ────────▶│     GitHub      │
│  (lokale Dev)   │                    │   private-llm   │
└─────────────────┘                    └────────┬────────┘
                                                │
                                                │ Push to main
                                                ▼
                                       ┌─────────────────┐
                                       │  GitHub Actions │
                                       └────────┬────────┘
                                                │
                                                │ SSH Deploy
                                                ▼
                                       ┌─────────────────┐
                                       │ Infomaniak GPU  │
                                       │ 83.228.200.109  │
                                       │  ┌───────────┐  │
                                       │  │  Ollama   │  │
                                       │  │  + Flask  │  │
                                       │  │  (LLM +   │  │
                                       │  │  Vision)  │  │
                                       │  └───────────┘  │
                                       └─────────────────┘
```

---

## Finale Konfiguration

| Komponente | Wert |
|------------|------|
| **Server IP** | `83.228.200.109` |
| **GPU** | NVIDIA L4 (24GB VRAM) |
| **OS** | Ubuntu 24.04 LTS |
| **SSH User** | `ubuntu` |
| **App Port** | `8000` (HTTPS) |
| **Ollama Port** | `11434` |
| **GitHub Repo** | `https://git.poweron.swiss` (Forgejo) |

### Installierte Modelle

| Modell | Ollama-Name | Verwendung |
|--------|-------------|------------|
| `poweron-text-general` | `qwen2.5:7b` | Text-Neutralisierung |
| `poweron-vision-general` | `qwen2.5vl:7b` | Handschrift, Dokumente |
| `poweron-vision-deep` | `granite3.2-vision` | Rechnungen, Belege |
| `poweron-embed` | `mxbai-embed-large` | Embedding (1024 dim, RAG failover) |

### URLs

| Service | URL |
|---------|-----|
| **App** | https://llm.poweron.swiss:8000 |
| **Health Check** | https://llm.poweron.swiss:8000/api/health |
| **Ollama Status** | https://llm.poweron.swiss:8000/api/ollama/status |

---

# Teil A: GitHub Repository Setup

## A.1 Repository klonen in Cursor

```bash
cd ~/Projects
git clone https://github.com/valueonag/private-llm.git
cd private-llm
cursor .
```

## A.2 Projektstruktur

```
private-llm/
├── app.py                      # Flask App
├── requirements.txt            # Python Dependencies
├── templates/
│   └── index.html              # Frontend Template
├── static/                     # CSS, JS, Bilder (optional)
├── .github/
│   └── workflows/
│       └── deploy.yml          # CI/CD Pipeline
├── .gitignore
└── README.md
```

## A.3 Wichtige Dateien

### requirements.txt

```txt
flask>=3.0.0
flask-cors>=4.0.0
requests>=2.31.0
pymupdf>=1.24.0
gunicorn>=21.0.0
```

### .gitignore

```gitignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
.venv/

# IDE
.idea/
.vscode/
*.swp
*.swo
.cursor/

# OS
.DS_Store
Thumbs.db

# Logs
*.log
logs/

# Environment
.env
.env.local

# Test
.pytest_cache/
.coverage
htmlcov/
```

---

# Teil B: GitHub Actions Deploy Workflow

## B.1 GitHub Secret einrichten

1. Gehe zu: `https://github.com/valueonag/private-llm/settings/secrets/actions`
2. **New repository secret**
3. Name: `SSH_PRIVATE_KEY`
4. Value: Inhalt der `ollama-deploy-key.pem` Datei

**Nur dieses eine Secret wird benötigt!**

## B.2 Deploy Workflow (.github/workflows/deploy.yml)

```yaml
name: Deploy to Infomaniak

on:
  push:
    branches:
      - main
  workflow_dispatch:

env:
  APP_DIR: /opt/ollama-webapp
  SERVICE_NAME: ollama-webapp
  SERVER_HOST: 83.228.200.109
  SERVER_USER: ubuntu

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup SSH
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key
          ssh-keyscan -H ${{ env.SERVER_HOST }} >> ~/.ssh/known_hosts

      - name: Deploy files to server
        run: |
          rsync -avz --delete \
            -e "ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no" \
            --exclude '.git' \
            --exclude '.github' \
            --exclude '__pycache__' \
            --exclude '*.pyc' \
            --exclude 'venv' \
            --exclude '.env' \
            --exclude 'logs' \
            ./ ${{ env.SERVER_USER }}@${{ env.SERVER_HOST }}:${{ env.APP_DIR }}/app/

      - name: Install dependencies and restart service
        run: |
          ssh -i ~/.ssh/deploy_key -o StrictHostKeyChecking=no \
            ${{ env.SERVER_USER }}@${{ env.SERVER_HOST }} << 'ENDSSH'

            echo "Installing dependencies..."
            cd /opt/ollama-webapp
            ./venv/bin/pip install -r app/requirements.txt --quiet --upgrade

            echo "Restarting service..."
            sudo systemctl restart ollama-webapp

            echo "Waiting for service to start..."
            sleep 5

            echo "Service status:"
            sudo systemctl status ollama-webapp --no-pager -l

            echo "Deployment complete!"
          ENDSSH

      - name: Health Check
        run: |
          echo "Running health check..."
          sleep 3

          HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
            http://${{ env.SERVER_HOST }}:5000/api/health || echo "000")

          if [ "$HTTP_STATUS" = "200" ]; then
            echo "Health check passed! (HTTP $HTTP_STATUS)"
          else
            echo "Health check failed! (HTTP $HTTP_STATUS)"
            exit 1
          fi

      - name: Deployment Summary
        if: success()
        run: |
          echo "Deployment successful!"
          echo ""
          echo "App URL: http://${{ env.SERVER_HOST }}:5000"
          echo "Health:  http://${{ env.SERVER_HOST }}:5000/api/health"
```

---

# Teil C: Infomaniak Server Setup

## C.1 Horizon Dashboard Login

1. Oeffne: https://api.pub1.infomaniak.cloud/horizon
2. Login-Daten:

| Feld | Wert |
|------|------|
| **Domain** | `PCU-MPXPVCR` |
| **User Name** | `PCU-MPXPVCR` |
| **Password** | Dein OpenStack-Passwort |

---

## C.2 SSH Key Pair erstellen

1. Gehe zu: **Compute → Key Pairs**
2. Klicke: **Create Key Pair**
3. Fuelle aus:

| Feld | Wert |
|------|------|
| **Key Pair Name** | `ollama-deploy-key` |
| **Key Type** | `SSH Key` |

4. Klicke **Create Key Pair**
5. **WICHTIG:** Die `.pem` Datei wird automatisch heruntergeladen - sicher aufbewahren!

---

## C.3 Security Group erstellen

1. Gehe zu: **Network → Security Groups**
2. Klicke: **Create Security Group**
3. Fuelle aus:

| Feld | Wert |
|------|------|
| **Name** | `ollama-webapp` |
| **Description** | `Ports fuer Ollama und Flask App` |

4. Klicke **Create Security Group**
5. In der Liste: Klicke **Manage Rules** bei `ollama-webapp`
6. Klicke **Add Rule** und erstelle diese Regeln:

| Rule | Direction | Ether Type | IP Protocol | Port Range | CIDR |
|------|-----------|------------|-------------|------------|------|
| 1 | Ingress | IPv4 | TCP | 22 | `0.0.0.0/0` |
| 2 | Ingress | IPv4 | TCP | 80 | `0.0.0.0/0` |
| 3 | Ingress | IPv4 | TCP | 443 | `0.0.0.0/0` |
| 4 | Ingress | IPv4 | TCP | 5000 | `0.0.0.0/0` |
| 5 | Ingress | IPv4 | TCP | 11434 | `0.0.0.0/0` |

**Fuer jede Regel:**
- Klicke **Add Rule**
- Direction: `Ingress`
- Waehle bei "Rule": `Custom TCP Rule`
- Port: Jeweilige Portnummer eingeben
- CIDR: `0.0.0.0/0`
- Klicke **Add**

---

## C.4 Privates Netzwerk erstellen

Da Infomaniak nur externe Netzwerke hat, muss ein privates Netzwerk erstellt werden.

### Schritt 1: Netzwerk erstellen

1. **Network → Networks → Create Network**

| Tab | Feld | Wert |
|-----|------|------|
| **Network** | Network Name | `private-network` |
| | Enable Admin State | Ja |
| | Create Subnet | Ja |
| **Subnet** | Subnet Name | `private-subnet` |
| | Network Address | `192.168.100.0/24` |
| | IP Version | `IPv4` |
| | Gateway IP | `192.168.100.1` |
| **Subnet Details** | Enable DHCP | Ja |
| | DNS Name Servers | `8.8.8.8` |

2. Klicke **Create**

### Schritt 2: Router erstellen

1. **Network → Routers → Create Router**

| Feld | Wert |
|------|------|
| **Router Name** | `main-router` |
| **Enable Admin State** | Ja |
| **External Network** | `ext-floating1` |

2. Klicke **Create Router**

### Schritt 3: Router mit Subnet verbinden

1. Klicke auf `main-router`
2. Gehe zu Tab: **Interfaces**
3. Klicke: **Add Interface**

| Feld | Wert |
|------|------|
| **Subnet** | `private-subnet` |

4. Klicke **Submit**

---

## C.5 GPU-Instanz erstellen

1. Gehe zu: **Compute → Instances**
2. Klicke: **Launch Instance**

### Tab 1: Details

| Feld | Wert |
|------|------|
| **Instance Name** | `local-llm` |
| **Description** | `Local LLM Server` |
| **Availability Zone** | `nova` (Standard lassen) |
| **Count** | `1` |

→ Klicke **Next**

### Tab 2: Source

| Feld | Wert |
|------|------|
| **Select Boot Source** | `Image` |
| **Create New Volume** | `Yes` |
| **Volume Size (GB)** | `150` |
| **Delete Volume on Instance Delete** | `No` |

**Image auswaehlen:**
1. In der unteren Liste "Available" suche: `Ubuntu 24.04 LTS Noble Numbat`
2. Klicke den **Pfeil nach oben** rechts davon
3. Das Image erscheint oben unter "Allocated"

→ Klicke **Next**

### Tab 3: Flavor

**GPU-Flavor auswaehlen:**

| Flavor | GPU | VRAM | vCPUs | RAM |
|--------|-----|------|-------|-----|
| `nvl4-a8-ram16-disk0` | L4 | 24GB | 8 | 16GB |

1. Suche nach `nvl4` in der Liste
2. Waehle `nvl4-a8-ram16-disk0` (oder aehnlich)
3. Klicke den **Pfeil nach oben**

→ Klicke **Next**

### Tab 4: Networks

| Feld | Wert |
|------|------|
| **Network** | `private-network` |

**WICHTIG:** Waehle `private-network`, NICHT `ext-net1`!

1. In der Liste "Available" finde: `private-network`
2. Klicke den **Pfeil nach oben**

→ Klicke **Next**

### Tab 5: Network Ports

Ueberspringe diesen Tab (leer lassen).

→ Klicke **Next**

### Tab 6: Security Groups

1. Falls `default` unter "Allocated" steht: Klicke den **Pfeil nach unten** um es zu entfernen
2. In "Available" finde: `ollama-webapp`
3. Klicke den **Pfeil nach oben**
4. Unter "Allocated" sollte nur `ollama-webapp` stehen

→ Klicke **Next**

### Tab 7: Key Pair

1. In "Available" finde: `ollama-deploy-key`
2. Klicke den **Pfeil nach oben**
3. Unter "Allocated" steht: `ollama-deploy-key`

→ Klicke **Next**

### Tab 8-11: Optional

Ueberspringe diese Tabs (leer lassen).

### Instanz starten

Klicke: **Launch Instance**

Warte bis der Status von `Build` zu `Active` wechselt (ca. 1-3 Minuten).

---

## C.6 Floating IP zuweisen

1. Gehe zu: **Network → Floating IPs**
2. Klicke: **Allocate IP to Project**
3. Waehle:

| Feld | Wert |
|------|------|
| **Pool** | `ext-floating1` |
| **Description** | `IP fuer Local LLM Server` |

4. Klicke **Allocate IP**
5. In der Liste: Klicke **Associate** bei der neuen IP
6. Waehle deine Instanz: `local-llm`
7. Klicke **Associate**

**Notiere die IP-Adresse** (z.B. `83.228.200.109`)

---

## C.7 Zusammenfassung Instanz

| Komponente | Wert |
|------------|------|
| **Instance Name** | `local-llm` |
| **Description** | `Local LLM Server` |
| **Image** | Ubuntu 24.04 LTS |
| **Flavor** | `nvl4-a8-ram16-disk0` (NVIDIA L4) |
| **Disk** | 150 GB |
| **Network** | `private-network` |
| **Security Group** | `ollama-webapp` |
| **Key Pair** | `ollama-deploy-key` |
| **Floating IP** | `83.228.200.109` |

---

# Teil D: Server Einrichtung

## D.1 SSH-Verbindung

**Windows PowerShell:**

```powershell
ssh -i "C:\Users\pmots\Downloads\ollama-deploy-key.pem" ubuntu@83.228.200.109
```

**Mac/Linux:**

```bash
chmod 400 ~/Downloads/ollama-deploy-key.pem
ssh -i ~/Downloads/ollama-deploy-key.pem ubuntu@83.228.200.109
```

---

## D.2 System vorbereiten

```bash
# Falls dpkg Fehler auftreten
sudo dpkg --configure -a

# System aktualisieren
sudo apt update
sudo apt upgrade -y
```

---

## D.3 GPU pruefen

```bash
nvidia-smi
```

Sollte NVIDIA L4 mit 24GB VRAM zeigen (Treiber sind vorinstalliert).

---

## D.4 Ollama installieren

```bash
curl -fsSL https://ollama.com/install.sh | sh
```

### Ollama fuer Netzwerkzugriff konfigurieren

```bash
sudo systemctl edit ollama
```

Fuege ein (zwischen den Kommentaren):

```ini
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_MAX_LOADED_MODELS=2"
```

Speichern: `Ctrl+X`, dann `Y`, dann `Enter`

```bash
sudo systemctl restart ollama
sudo systemctl enable ollama
```

---

## D.5 Modelle herunterladen

```bash
# Text-Neutralisierung
ollama pull qwen2.5:7b

# Vision: Handschrift, Dokumente
ollama pull qwen2.5vl:7b

# Vision: Rechnungen, Belege
ollama pull granite3.2-vision

# Embedding: RAG multi-provider failover (1024 dim)
ollama pull mxbai-embed-large
```

### Modelle pruefen

```bash
ollama list
```

---

## D.6 Python-Umgebung einrichten

```bash
# Pakete installieren
sudo apt install -y python3-pip python3-venv python3.12-venv git

# App-Verzeichnis erstellen
sudo mkdir -p /opt/ollama-webapp/{app,venv,logs}
sudo chown -R ubuntu:ubuntu /opt/ollama-webapp

# Virtual Environment erstellen
python3 -m venv /opt/ollama-webapp/venv

# Basis-Pakete installieren
/opt/ollama-webapp/venv/bin/pip install --upgrade pip
/opt/ollama-webapp/venv/bin/pip install -r /opt/ollama-webapp/app/requirements.txt
```

---

## D.7 Systemd Service erstellen

```bash
sudo nano /etc/systemd/system/ollama-webapp.service
```

Inhalt:

```ini
[Unit]
Description=PowerOn Private-LLM Service
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/opt/ollama-webapp/app
ExecStart=/opt/ollama-webapp/venv/bin/uvicorn app:app --host 0.0.0.0 --port 8000 --ssl-keyfile /etc/letsencrypt/live/llm.poweron.swiss/privkey.pem --ssl-certfile /etc/letsencrypt/live/llm.poweron.swiss/fullchain.pem
Restart=always
RestartSec=5
Environment=PYTHONUNBUFFERED=1

[Install]
WantedBy=multi-user.target
```

Speichern: `Ctrl+X`, dann `Y`, dann `Enter`

```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama-webapp
```

---

## D.8 Sudo-Rechte fuer GitHub Actions

```bash
sudo visudo
```

Fuege am Ende hinzu:

```
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl restart ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl status ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl stop ollama-webapp
ubuntu ALL=(ALL) NOPASSWD: /bin/systemctl start ollama-webapp
```

Speichern: `Ctrl+X`, dann `Y`, dann `Enter`

---

# Teil E: Entwicklungs-Workflow

## E.1 Lokale Entwicklung

```bash
# In Cursor: Terminal oeffnen
cd ~/Projects/private-llm

# Virtual Environment (einmalig)
python3 -m venv venv
source venv/bin/activate  # Mac/Linux
# oder: .\venv\Scripts\activate  # Windows

# Dependencies installieren
pip install -r requirements.txt

# Lokal starten
python app.py
```

## E.2 Deployment (automatisch)

```bash
# Aenderungen speichern
git add .
git commit -m "Beschreibung der Aenderung"
git push origin main

# GitHub Actions deployed automatisch!
```

## E.3 Deployment Status pruefen

1. Gehe zu: https://github.com/valueonag/private-llm/actions
2. Klicke auf den neuesten Workflow Run
3. Gruen = Erfolgreich, Rot = Fehler

---

# Teil F: Server-Befehle

## SSH Verbindung

```bash
ssh -i "C:\Users\pmots\Downloads\ollama-deploy-key.pem" ubuntu@83.228.200.109
```

## Service-Verwaltung

```bash
# Status pruefen
sudo systemctl status ollama-webapp
sudo systemctl status ollama

# Neu starten
sudo systemctl restart ollama-webapp
sudo systemctl restart ollama

# Logs anschauen
tail -f /opt/ollama-webapp/logs/access.log
tail -f /opt/ollama-webapp/logs/error.log
sudo journalctl -u ollama-webapp -f
sudo journalctl -u ollama -f
```

## GPU Status

```bash
nvidia-smi
```

## Ollama Modelle

```bash
# Liste
ollama list

# Neues Modell hinzufuegen
ollama pull <modell-name>

# Modell entfernen
ollama rm <modell-name>

# Modell testen
ollama run granite3.2-vision "Beschreibe dieses Bild"
```

---

# Teil G: Troubleshooting

## App nicht erreichbar

```bash
# Service Status pruefen
sudo systemctl status ollama-webapp

# Logs pruefen
sudo journalctl -u ollama-webapp -n 50

# Port pruefen
sudo netstat -tlnp | grep 8000
```

## Ollama nicht erreichbar

```bash
# Status pruefen
sudo systemctl status ollama

# Neu starten
sudo systemctl restart ollama

# Logs pruefen
sudo journalctl -u ollama -f
```

## GitHub Actions fehlgeschlagen

1. Gehe zu: https://github.com/valueonag/private-llm/actions
2. Klicke auf den fehlgeschlagenen Run
3. Klicke auf den fehlgeschlagenen Step
4. Lies die Fehlermeldung

**Haeufige Probleme:**
- SSH Key falsch: Secret `SSH_PRIVATE_KEY` pruefen
- Server nicht erreichbar: Floating IP und Security Group pruefen
- Syntax-Fehler in Code: Lokal testen vor Push

## GPU nicht erkannt

```bash
# Treiber pruefen
nvidia-smi

# Falls nicht vorhanden
sudo apt install -y nvidia-driver-550
sudo reboot
```

---

# Checkliste

## Einmalige Einrichtung

- [x] Infomaniak Public Cloud Account
- [x] GPU-Quota aktiviert
- [x] SSH Key Pair erstellt (`ollama-deploy-key`)
- [x] Security Group erstellt (`ollama-webapp`)
- [x] Privates Netzwerk erstellt (`private-network`)
- [x] Router erstellt (`main-router`)
- [x] GPU-Instanz erstellt (`local-llm`)
- [x] Floating IP zugewiesen (`83.228.200.109`)
- [x] Ollama installiert und konfiguriert
- [x] Modelle heruntergeladen
- [x] Python-Umgebung eingerichtet
- [x] Systemd Service erstellt
- [x] GitHub Secret konfiguriert (`SSH_PRIVATE_KEY`)
- [x] GitHub Actions Workflow erstellt

## Bei jedem Deploy

- [ ] Code aendern
- [ ] `git add .`
- [ ] `git commit -m "Beschreibung"`
- [ ] `git push origin main`
- [ ] GitHub Actions pruefen
- [ ] App testen

---

# Kosten

## Infomaniak Public Cloud (geschaetzt)

| Komponente | Preis ca. |
|------------|-----------|
| GPU L4 Instanz (24/7) | ~CHF 580/Monat |
| GPU L4 Instanz (8h/Tag, Mo-Fr) | ~CHF 140/Monat |
| Block Storage 150GB | ~CHF 15/Monat |
| Floating IP | ~CHF 3/Monat |

**Tipp:** Instanz stoppen wenn nicht benoetigt!

```bash
# Im Horizon Dashboard: Compute → Instances → Shut Off Instance
# Oder per CLI
openstack server stop local-llm
```