Merge branch 'main' of https://github.com/valueonag/wiki

2026-05-06 23:28:14 +02:00 · 2026-05-06 23:28:14 +02:00 · 7af8751aa9
commit 7af8751aa9
parent 5f3a444c88 980d505a41
3 changed files with 576 additions and 27 deletions
--- a/c-work/1-plan/2026-04-formgenerator-grouping.md
+++ b/c-work/1-plan/2026-04-formgenerator-grouping.md
@ -0,0 +1,460 @@
+<!-- status: plan -->
+<!-- started: 2026-04-29 -->
+<!-- component: frontend-nyla | gateway -->
+
+# FormGenerator: Persistente Benutzer-Gruppierung
+
+## Beschreibung und Kontext
+
+Der `FormGeneratorTable` wird auf vielen Seiten der Plattform genutzt. Nutzer sollen Einträge in benannte, rekursive Gruppen organisieren können — mit persistenter Speicherung und vollständiger Kompatibilität mit Pagination, Suche, Filter und allen Action-Buttons.
+
+**Kernprinzip: Grouping ist ein eingebautes Feature von `PaginationParams` und `PaginatedResponse` — kein separater Call, keine eigene Route, kein eigenes API-Modul. Der bestehende `refetch()`-Mechanismus ist der einzige Transport.**
+
+---
+
+## Architektur-Kern: Wie es funktioniert
+
+### Grouping reitet auf dem bestehenden Pagination-Call
+
+`PaginationParams` (der JSON-Parameter jedes List-Endpoints) bekommt zwei neue optionale Felder:
+
+```
+saveGroupTree  → wenn gesetzt: Backend speichert diesen Baum VOR dem Fetch
+groupId        → wenn gesetzt: Backend filtert Items auf Items dieser Gruppe
+```
+
+`PaginatedResponse` bekommt ein neues optionales Feld:
+
+```
+groupTree  → aktueller Gruppen-Baum des Users für diesen Endpoint (immer mitgeliefert)
+```
+
+**Ein Aufruf tut damit drei Dinge auf einmal:**
+1. Speichert den neuen Gruppen-Baum (wenn `saveGroupTree` gesetzt)
+2. Filtert auf eine Gruppe (wenn `groupId` gesetzt)  
+3. Gibt aktuelle Items + aktuelle Gruppen-Baum zurück
+
+### Ablauf End-to-End
+
+```
+Seitenaufruf (erster Load):
+  GET /api/connections/?pagination={"page":1,"pageSize":20}
+  ←  { items: [...], pagination: {...}, groupTree: [{id, name, itemIds, subGroups}] }
+
+User erstellt Gruppe (lokal sofort sichtbar, dann debounced Save via refetch):
+  GET /api/connections/?pagination={"page":1,"pageSize":20,"saveGroupTree":[{neuerBaum}]}
+  ←  { items: [...], pagination: {...}, groupTree: [{neuerBaum, vom Backend bestätigt}] }
+
+User betritt Gruppe "Kunden" (id: "g1"):
+  GET /api/connections/?pagination={"page":1,"pageSize":20,"groupId":"g1"}
+  ←  { items: [nur Items der Gruppe], pagination: {totalItems: 3, ...}, groupTree: [...] }
+  → Suche, Filter, Sortierung, mode=ids, mode=filterValues — alles funktioniert
+     innerhalb des Gruppen-Scopes, da das Backend die IN-Liste kennt
+```
+
+### Backend: Pro Route genau 2 Zeilen Overhead
+
+Der gesamte Grouping-Mechanismus ist in `routeHelpers.py` als shared Helper implementiert. Jede Route die Grouping unterstützen soll, ruft ihn auf:
+
+```python
+# Anfang der Route-Funktion (BEVOR items gebaut werden):
+groupCtx = handleGroupingInRequest(paginationParams, interface, "connections")
+# → speichert saveGroupTree falls vorhanden
+# → gibt groupIdItemIds zurück falls groupId gesetzt
+
+# Items bauen (unverändert)...
+
+# Falls Gruppen-Scope aktiv: Items auf Gruppe einschränken
+items = applyGroupScopeFilter(items, groupCtx.itemIds)
+
+# Am Ende: groupTree in Response einbetten
+return {**result, "groupTree": groupCtx.groupTree}
+```
+
+**Kein neues Route-File. Kein neues Interface-File. Keine neue URL.**
+
+---
+
+## Betroffene Module
+
+- **Gateway:**
+  - `modules/datamodels/datamodelPagination.py` — `PaginationParams` + `groupId`, `saveGroupTree`; `PaginatedResponse` + `groupTree`; neue Klassen `TableGroupNode` + `TableGrouping` in **dieser Datei**
+  - `modules/interfaces/interfaceDbApp.py` — `AppObjects` um `getTableGrouping(contextKey)` + `upsertTableGrouping(contextKey, rootGroups)` erweitern; neue Tabelle `table_groupings` in `poweron_app` (auto-created)
+  - `modules/routes/routeHelpers.py` — `handleGroupingInRequest(paginationParams, interface, contextKey)` + `applyGroupScopeFilter(items, itemIds)` hinzufügen
+  - Jede List-Route die Grouping unterstützen soll: **2 Zeilen** am Anfang + **1 Feld** in der Response (`groupTree`)
+
+- **Frontend:**
+  - `FormGeneratorTable.tsx` — `groupingConfig`-Prop; interner Grouping-State; nutzt `hookData.refetch()` als einzigen Transport
+  - `FormGeneratorControls.tsx` — Gruppen-Toolbar-Button
+  - `FormGenerator/GroupingManager/GroupRow.tsx` — Gruppen-Header-Zeile (neue Komponente)
+  - `FormGenerator/GroupingManager/GroupingManager.tsx` — Seitenpanel (neue Komponente)
+  - **Kein neuer Hook, kein neues API-Modul, keine Änderungen an bestehenden Feature-Hooks**
+
+- **DB-Migration:** Nein (Auto-Create via DatabaseConnector)
+
+---
+
+## Datenmodell
+
+### Ergänzungen in `datamodelPagination.py`
+
+```python
+# --- Grouping-Modelle (neu, in derselben Datei) ---
+
+class TableGroupNode(BaseModel):
+    id: str
+    name: str
+    itemIds: List[str] = Field(default_factory=list)
+    subGroups: List['TableGroupNode'] = Field(default_factory=list)
+    order: int = 0
+    isExpanded: bool = True
+
+TableGroupNode.model_rebuild()
+
+class TableGrouping(BaseModel):
+    """DB-Tabelle table_groupings in poweron_app."""
+    id: str
+    userId: str
+    contextKey: str   # abgeleitet aus Route-Prefix, z. B. "connections", "prompts", "admin/users"
+    rootGroups: List[TableGroupNode] = Field(default_factory=list)
+    updatedAt: Optional[float] = None
+
+
+# --- Erweiterung PaginationParams (2 neue optionale Felder) ---
+
+class PaginationParams(BaseModel):
+    page: int = Field(ge=1)
+    pageSize: int = Field(ge=1, le=1000)
+    sort: List[SortField] = Field(default_factory=list)
+    filters: Optional[Dict[str, Any]] = None
+    # NEU:
+    groupId: Optional[str] = None                          # Scope: nur Items dieser Gruppe
+    saveGroupTree: Optional[List[Dict[str, Any]]] = None   # Persistieren: diesen Baum speichern
+
+
+# --- Erweiterung PaginatedResponse (1 neues optionales Feld) ---
+
+class PaginatedResponse(BaseModel, Generic[T]):
+    items: List[T]
+    pagination: Optional[PaginationMetadata]
+    groupTree: Optional[List[TableGroupNode]] = None   # NEU — immer mitgeliefert wenn vorhanden
+    
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+```
+
+---
+
+## Backend-Implementierung
+
+### `routeHelpers.py` — neuer shared Helper
+
+```python
+from dataclasses import dataclass
+from typing import Optional, Set
+
+@dataclass
+class GroupingContext:
+    groupTree: Optional[list]   # Für die Response
+    itemIds: Optional[Set[str]] # Falls groupId gesetzt — IN-Filter-Menge
+
+
+def handleGroupingInRequest(
+    paginationParams: Optional[PaginationParams],
+    interface,              # AppObjects
+    contextKey: str,
+) -> GroupingContext:
+    """
+    Zentraler Grouping-Handler — aufgerufen am Anfang jeder List-Route.
+
+    1. Falls paginationParams.saveGroupTree gesetzt:
+       → interface.upsertTableGrouping(contextKey, saveGroupTree)
+       → saveGroupTree aus params entfernen (wird nicht weiter verarbeitet)
+
+    2. Falls paginationParams.groupId gesetzt:
+       → Gruppe im gespeicherten Baum suchen (rekursiv inkl. Subgruppen)
+       → itemIds der Gruppe (+ alle Subgruppen) als Set zurückgeben
+       → groupId aus params entfernen (wird nicht als normaler Filter verarbeitet)
+
+    3. Aktuellen groupTree laden und für Response bereitstellen.
+
+    Returns: GroupingContext(groupTree, itemIds)
+    """
+
+
+def applyGroupScopeFilter(items: list, itemIds: Optional[Set[str]]) -> list:
+    """
+    Wendet den Gruppen-Scope-Filter an.
+    Gibt items unverändert zurück wenn itemIds is None (kein Scope aktiv).
+    Filtert sonst auf item["id"] in itemIds.
+    """
+    if itemIds is None:
+        return items
+    return [item for item in items if str(item.get("id", "")) in itemIds]
+```
+
+### Route-Erweiterung — Muster (2 + 1 Zeilen)
+
+```python
+@router.get("/")
+async def get_connections(request, pagination=None, mode=None, column=None, currentUser=Depends(getCurrentUser)):
+    from modules.routes.routeHelpers import handleGroupingInRequest, applyGroupScopeFilter
+
+    interface = getInterface(currentUser)
+    CONTEXT_KEY = "connections"
+
+    # 1. Grouping verarbeiten (speichern falls nötig, Scope auflösen)
+    groupCtx = handleGroupingInRequest(paginationParams, interface, CONTEXT_KEY)
+
+    # mode=filterValues / mode=ids (unverändert, aber groupId ist bereits aus params entfernt)
+    if mode == "filterValues": ...
+    if mode == "ids":
+        items = _buildEnhancedItems()
+        items = applyGroupScopeFilter(items, groupCtx.itemIds)  # Scope auch für ids!
+        return handleIdsInMemory(items, pagination)
+
+    # Items bauen (unverändert)
+    items = _buildEnhancedItems()
+
+    # 2. Gruppen-Scope-Filter anwenden
+    items = applyGroupScopeFilter(items, groupCtx.itemIds)
+
+    # Pagination (unverändert)
+    result = paginateInMemory(items, paginationParams)
+
+    # 3. groupTree in Response einbetten
+    return {**result.model_dump(), "groupTree": groupCtx.groupTree}
+```
+
+**`mode=filterValues` und `mode=ids` funktionieren automatisch korrekt im Gruppen-Scope**, weil `groupId` aus `paginationParams` entfernt wurde und `applyGroupScopeFilter` aufgerufen wird — dadurch beziehen sich Filter-Dropdowns und "Select All Filtered" auf Items der aktuellen Gruppe.
+
+---
+
+## Frontend-Implementierung
+
+### `FormGeneratorTable` — interner Grouping-State
+
+```typescript
+// Neues Prop:
+groupingConfig?: {
+  contextKey: string;  // Nur für RBAC/Logging — der eigentliche Key wird serverseitig aus dem Endpoint abgeleitet
+  enabled: boolean;
+}
+
+// Interner State (nur in FormGeneratorTable, nicht im Hook):
+const [groupTree, setGroupTree]             = useState<TableGroupNode[]>([]);
+const [activeGroupId, setActiveGroupId]     = useState<string | null>(null);
+const [pendingGroupTree, setPendingGroupTree] = useState<TableGroupNode[] | null>(null);
+```
+
+### Datenfluss — `hookData.refetch()` als einziger Transport
+
+```typescript
+// groupTree kommt aus der normalen refetch-Response:
+useEffect(() => {
+  if (hookData?.pagination?.groupTree) {
+    setGroupTree(hookData.pagination.groupTree);
+    setPendingGroupTree(null);  // Gespeichert — pending löschen
+  }
+}, [hookData]);
+
+// Beim Betreten eines Gruppen-Scopes:
+const _enterGroup = (groupId: string) => {
+  setActiveGroupId(groupId);
+  hookData.refetch({ ...currentPaginationParams, page: 1, groupId });
+};
+
+// Beim Verlassen des Scopes:
+const _exitGroup = () => {
+  setActiveGroupId(null);
+  hookData.refetch({ ...currentPaginationParams, page: 1, groupId: undefined });
+};
+
+// Bei Gruppen-Mutation (erstellen, umbenennen, löschen, Item zuordnen):
+const _mutateGroupTree = (newTree: TableGroupNode[]) => {
+  setGroupTree(newTree);          // Sofort lokal sichtbar (optimistic)
+  setPendingGroupTree(newTree);   // Markiert für nächsten Save
+  _debouncedSave(newTree);        // Debounced: nach 500ms via refetch speichern
+};
+
+// Debounced Save: normaler refetch + saveGroupTree im pagination param
+const _debouncedSave = useMemo(() => debounce((tree: TableGroupNode[]) => {
+  hookData.refetch({
+    ...currentPaginationParams,
+    saveGroupTree: tree,   // Backend speichert und bestätigt
+  });
+}, 500), [hookData, currentPaginationParams]);
+```
+
+**Kein neues API-Modul. Kein eigener fetch-Call. `hookData.refetch()` ist alles.**
+
+### Render-Struktur
+
+```
+Root-View (kein activeGroupId):
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+▼ Kunden  (3 Items)   [Delete all] [Download all] [+ Subgruppe] [Umbenennen] [×]
+  ▶ Aktive (1)        [Delete all] [+ Subgruppe] [Umbenennen] [×]
+  — Item A            [Edit] [Delete] [Download]
+  — Item B            [Edit] [Delete] [Download]
+  — Item C (Aktive)   [Edit] [Delete] [Download]
+▶ Intern              [5 Items — Gruppe öffnen →]
+── Nicht zugeordnet (2)
+  — Item X            [Edit] [Delete] [Download]
+
+Gruppen-Scope (activeGroupId = "Intern"):
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+← Zurück   |  Intern  (Seite 1/1 · 5 Einträge)   [Delete all] [Download all]
+  — Item C            [Edit] [Delete] [Download]
+  — Item D            [Edit] [Delete] [Download]
+  ...
+```
+
+Wenn `activeGroupId` gesetzt: refetch läuft mit `groupId` → Backend filtert → Pagination, Suche, Filter, `mode=ids` — alles auf Gruppe begrenzt.
+
+---
+
+## Pagination, Suche, Filter — vollständig korrekt
+
+| Szenario | Was passiert |
+|----------|-------------|
+| Root-View, Seite blättern | Normaler refetch. `groupTree` kommt in Response mit. Gruppen-Counter aus `groupNode.itemIds.length`. |
+| Gruppen-Scope aktiv | `groupId` in PaginationParams → Backend IN-Filter → Totalcount kommt vom Backend (korrekt). |
+| Suche im Scope | `groupId` + `filters.search` → Backend filtert Items der Gruppe nach Suchtext. |
+| "Select All Filtered" (mode=ids) | `groupId` in params → `applyGroupScopeFilter` wird VOR `handleIdsInMemory` angewendet → nur IDs der Gruppe werden zurückgegeben. |
+| Filter-Dropdown (mode=filterValues) | `groupId` in params → `applyGroupScopeFilter` vor `handleFilterValuesInMemory` → Distinct-Werte kommen nur aus Gruppen-Items. |
+| Gruppen-Baum speichern während Paginieren | `saveGroupTree` + Seiten-Params im gleichen Call → Backend speichert Baum UND gibt aktuelle Seite zurück. |
+
+---
+
+## Entscheidungen
+
+| Datum | Entscheidung | Begründung |
+|-------|-------------|------------|
+| 2026-04-29 | `saveGroupTree` + `groupId` in `PaginationParams` statt eigener Endpoint | Ein Call, ein Transport, kein zweiter API-Pfad; Grouping ist integraler Bestandteil der Datenabfrage |
+| 2026-04-29 | `groupTree` in `PaginatedResponse` | Immer synchron mit aktuellen Items; kein separater Lade-Call nötig |
+| 2026-04-29 | Shared Helper in `routeHelpers.py`, 2 Zeilen pro Route | DRY; gesamte Komplexität an einem Ort; pro Route null Eigenlogik |
+| 2026-04-29 | Optimistic UI + debounced Save via normalen refetch | UX sofort; kein Flackern; Persistenz ohne Extra-Call |
+| 2026-04-29 | Modelle in `datamodelPagination.py` | `PaginatedResponse` liegt dort; keine neue Datei; Import-Graph bleibt minimal |
+| 2026-04-29 | `applyGroupScopeFilter` auch für `mode=ids` und `mode=filterValues` | Filter-Dropdown und Bulk-Select funktionieren korrekt im Gruppen-Scope ohne Sonderbehandlung |
+| 2026-04-29 | Gruppen-State nur in `FormGeneratorTable`, nicht im Feature-Hook | Keine Änderungen an bestehenden Hooks nötig; Grouping ist transparent für Seiten-Code |
+
+---
+
+## Umsetzungs-Checkliste
+
+### Phase 1: Backend Core
+
+- [ ] `datamodelPagination.py`: `TableGroupNode`, `TableGrouping` Klassen hinzufügen
+- [ ] `datamodelPagination.py`: `PaginationParams` um `groupId` + `saveGroupTree` erweitern
+- [ ] `datamodelPagination.py`: `PaginatedResponse` um `groupTree` erweitern
+- [ ] `datamodelPagination.py`: `normalize_pagination_dict` so erweitern, dass `saveGroupTree` und `groupId` korrekt geparst werden
+- [ ] `interfaceDbApp.py`: `getTableGrouping(contextKey)` + `upsertTableGrouping(contextKey, rootGroups)` zu `AppObjects` hinzufügen
+- [ ] `routeHelpers.py`: `GroupingContext` Dataclass + `handleGroupingInRequest()` + `applyGroupScopeFilter()` implementieren
+
+### Phase 2: Route-Erweiterungen
+
+Pro Route: `handleGroupingInRequest` am Anfang + `applyGroupScopeFilter` vor Pagination + `groupTree` in Response
+
+- [ ] `routeDataConnections.py` (inkl. `mode=ids`, `mode=filterValues`)
+- [ ] `routeDataPrompts.py`
+- [ ] `routeDataUsers.py`
+- [ ] `routeDataMandates.py`
+- [ ] `routeDataFiles.py`
+- [ ] Weitere nach Bedarf (Trustee, RealEstate, Invitations, …)
+
+### Phase 3: Frontend Typen + FormGeneratorTable Grundgerüst
+
+- [ ] TypeScript-Typen `TableGroupNode` direkt in `FormGeneratorTable.tsx` oder `src/types/tableGrouping.ts`
+- [ ] `groupingConfig`-Prop zu `FormGeneratorTableProps` hinzufügen
+- [ ] `groupTree` aus `hookData`-Response parsen und in internem State halten
+- [ ] `activeGroupId`-State + `_enterGroup` / `_exitGroup`
+- [ ] `pendingGroupTree`-State + `_mutateGroupTree` + debounced Save via `hookData.refetch()`
+- [ ] `PaginatedResponse` Frontend-Typ um `groupTree` erweitern
+
+### Phase 4: Render-Logik
+
+- [ ] `GroupRow`-Komponente: `src/components/FormGenerator/GroupingManager/GroupRow.tsx`
+- [ ] Render-Algorithmus: Root-View mit Gruppen-Header-Zeilen und DataRows; "Nicht zugeordnet"-Sektion
+- [ ] Gruppen-Scope-View: Breadcrumb, "Zurück"-Button, normales Table-Layout
+- [ ] Gedimmte Gruppen ohne sichtbare Items (Root-View) mit "Gruppe öffnen"-Button
+- [ ] Expand/Collapse je Gruppe (lokal; isExpanded kommt aus `groupTree`)
+
+### Phase 5: Aktionen und Interaktion
+
+- [ ] `actionButtons` + `customActions` auf `GroupRow`-Ebene (Batch auf alle Items via `mode=ids` im Scope)
+- [ ] Delete-Gruppe mit `useConfirm`: "Nur Gruppe" vs. "Gruppe + alle Items löschen"
+- [ ] "In Gruppe verschieben" als BatchAction bei Multi-Select
+- [ ] Kontextmenü-Button je DataRow → Gruppen-Dropdown
+- [ ] Drag-and-Drop DataRow → GroupRow
+
+### Phase 6: GroupingManager Panel
+
+- [ ] `GroupingManager`-Komponente: Gruppen-Baum-Panel
+- [ ] Button "Gruppen" in `FormGeneratorControls` (nur wenn `groupingConfig.enabled`)
+- [ ] Neue Gruppe erstellen (`usePrompt`)
+- [ ] Gruppe umbenennen (inline)
+- [ ] Subgruppe erstellen
+- [ ] Gruppe löschen (`useConfirm`)
+- [ ] Reihenfolge Up/Down
+
+### Abschluss
+
+- [ ] i18n: alle neuen UI-Texte mit `t('...')` getaggt
+- [ ] CSS Modules für alle neuen Komponenten
+- [ ] `b-reference/frontend-nyla/formgenerator.md` aktualisieren
+- [ ] `b-reference/gateway/architecture.md` — `PaginationParams`/`PaginatedResponse`-Erweiterung dokumentieren
+
+---
+
+## Akzeptanzkriterien
+
+| # | Kriterium (Given-When-Then) | Prio |
+|---|---------------------------|------|
+| 1 | Given `FormGeneratorTable` mit `groupingConfig.enabled` — When Seite lädt — Then kommt `groupTree` in der normalen List-Response mit; **kein zweiter API-Call** | must |
+| 2 | Given User erstellt Gruppe — When 500ms nach letzter Änderung — Then ein einziger `refetch` mit `saveGroupTree` wird abgesendet; nach Reload ist Gruppe vorhanden | must |
+| 3 | Given User klickt Gruppe "Kunden" — When `_enterGroup` aufgerufen — Then refetch mit `groupId`; Backend filtert; Pagination und Totalcount beziehen sich auf die Gruppe | must |
+| 4 | Given aktiver Gruppen-Scope — When User sucht "Test" — Then `groupId` + Search in einem Call; Backend zeigt nur Treffer in der Gruppe | must |
+| 5 | Given aktiver Gruppen-Scope — When User klickt "Alle auswählen" (mode=ids) — Then IDs kommen nur aus der Gruppe, nicht aus der Gesamtliste | must |
+| 6 | Given Filter-Dropdown geöffnet im Gruppen-Scope — When Werte geladen — Then kommen nur aus Items der Gruppe (mode=filterValues korrekt) | should |
+| 7 | Given `FormGeneratorTable` ohne `groupingConfig` — Then identisches Verhalten wie vor dem Feature | must |
+| 8 | Given Gruppe mit 5 Items — When User Delete auf Group-Header — Then Confirm → alle 5 Items gelöscht; Gruppe aus Baum entfernt; ein `refetch` mit aktuellem `saveGroupTree` | must |
+| 9 | Given User zieht DataRow auf Gruppen-Header — Then Item wird zur Gruppe zugeordnet; `_mutateGroupTree` + debounced Save | should |
+
+---
+
+## Testplan
+
+| ID | AC | Art | Automatisiert | Repo-Pfad | Status |
+|----|----|----|--------------|-----------|--------|
+| T1 | 1, 2 | api | ja | `gateway/tests/test_grouping_helpers.py` | pending |
+| T2 | 3, 4, 5, 6 | api | ja | `gateway/tests/test_grouping_helpers.py` | pending |
+| T3 | 7 | component | nein | manuell | pending |
+| T4 | 8 | api + component | nein | manuell | pending |
+| T5 | 9 | component | nein | manuell | pending |
+
+---
+
+## Offene Fragen
+
+1. **`scope: 'user' | 'mandate'`** im `TableGrouping`-Modell bereits vorbereiten für späteres mandate-weites Sharing?
+2. **CSV-Export** soll Gruppen-Spalte enthalten?
+3. **`activeGroupId` im URL-State** (`?group=g1`) für Deep-Links?
+4. **Max-Tiefe** konfigurierbar (`groupingConfig.maxDepth`) oder feste Warnung nach 3 Ebenen?
+
+---
+
+## Links
+
+- PR: —
+- Referenz FormGenerator: `b-reference/frontend-nyla/formgenerator.md`
+- Referenz Gateway-Architektur: `b-reference/gateway/architecture.md`
+- Referenz DB-Architektur: `b-reference/platform/database-architecture.md`
+
+---
+
+## Abschluss
+
+- [ ] `b-reference/frontend-nyla/formgenerator.md` — Grouping-Sektion
+- [ ] `b-reference/gateway/architecture.md` — `PaginationParams`/`PaginatedResponse` Erweiterung
+- [ ] TOPICS.md geprüft
+- [ ] Dieses Dokument → `z-archive/` verschoben
--- a/c-work/4-done/2026-04-id-unified-knowledge-indexing-rag-concept.md
+++ b/c-work/4-done/2026-04-id-unified-knowledge-indexing-rag-concept.md
@ -1,6 +1,6 @@
 <!-- status: build -->
 <!-- started: 2026-04-16 -->
-<!-- lastReviewed: 2026-04-21 -->
+<!-- lastReviewed: 2026-04-24 -->
 <!-- component: gateway | platform | frontend-nyla -->

 # Unified Knowledge Indexing — One RAG Corpus for All Platform Information
@ -15,7 +15,7 @@
 | **Teil 3** | **Feature injection** split into **retrieval** (agent + `buildAgentContext`) vs **corpus** (`indexFile`); **matrix** per `modules/features/*` product; real **gaps** vs false “non-injection”. |
 | **Implementation phases · Ziele · AC · Testplan** | Rollout, explicit non-goals, acceptance criteria, verification. |

-**Single sentence summary:** Keep **retrieval** on **`AgentService`**; unify **when and how** the shared **`interfaceDbKnowledge`** corpus is **filled** (routes, **user connections** / integrations, features, snapshots) behind one **ingestion contract**, without assuming every product uses the workspace agent.
+**Single sentence summary:** Keep **retrieval** on **`AgentService`**; unify **when and how** the shared **`interfaceDbKnowledge`** corpus is **filled** (routes, **user connections**, **feature commit points**) behind one **ingestion contract**. **Current roadmap scope:** user-connection lifecycle (**P1a/P1b**), **daily refresh** to close the post-connect delta gap (**P1c**), **explicit user consent + per-connection ingestion preferences** (incl. optional **neutralization**) in **frontend + API**, then **scalable event bus** (**P3**). **Out of current roadmap:** standalone **profile/mandate snapshot** ingestion (former roadmap **P2** — content remains in Teil 2.3 as future option only).

 ## Beschreibung und Kontext

@ -150,7 +150,7 @@ The first end-to-end AC4 test on a 500-page PDF revealed **three** independent b
 2. **Pre-upserts must preserve `_ingestion` metadata and the `indexed` status.** `routeDataFiles._autoIndexFile` persisted a fresh `FileContentIndex` from the pre-scan **before** calling `requestIngestion`, overwriting `structure._ingestion.hash` and `status="indexed"` from any prior successful run. The duplicate check saw a row with empty metadata and re-ran the whole embedding stage. **Rule:** any upsert on the idempotency row taken outside `requestIngestion` MUST read the existing row first and merge forward both `_ingestion` and (where applicable) the terminal `indexed` status.
 3. **Extraction-pipeline defaults must preserve granularity for RAG.** `ExtractionOptions.mergeStrategy` defaulted to concatenating every text `ContentPart` into one blob, collapsing a 500-page PDF into a single chunk whose embedding is a blurred average of the whole document — unusable for targeted retrieval. **Rule:** every ingestion lane passes `mergeStrategy=None` explicitly until the default itself can be safely flipped after auditing non-RAG callers. (Tests: `tests/unit/services/test_extraction_merge_strategy.py`.)

-**Deferred to P1** (uncovered during P0, not blocking AC1–AC5):
+**Deferred (ingestion idempotency hardening)** (uncovered during P0, not blocking AC1–AC5; naming here is **not** the same milestone as **P1 user-connection hooks** below):

 - **In-flight duplicate detection.** The current duplicate check only matches when `status == "indexed"`, so two nearly-simultaneous calls for the same `sourceId` both run full embedding. Fix candidates: accept `status ∈ {"extracted", "embedding", "indexed"}` with matching hash as "already in progress", or a per-`sourceId` `asyncio.Lock` in `KnowledgeService`.
 - **Pre-extraction byte-hash shortcut.** `requestIngestion`'s duplicate check runs **after** extraction, so re-indexing a 1.6 MB PDF still spends ~15 s in `runExtraction` before the content hash is computed. The file-bytes SHA already exists in `interfaceDbManagement` for upload-dedup — a short-circuit in `_autoIndexFile` (and symmetric paths) could skip extraction entirely for an unchanged file.
@ -231,12 +231,14 @@ The first end-to-end AC4 test on a 500-page PDF revealed **three** independent b

 **Email and messaging (Outlook + Gmail via Microsoft / Google user connections) — shared cautions**

- Default tiers: **metadata only** → **snippet** → **full body** → **attachments** (most expensive / sensitive).
+- Default tiers: **metadata only** → **snippet** → **full body** → **attachments** (most expensive / sensitive). **Product default** vs **user override** is defined in **§2.6** (per-connection mail depth + attachments).
 - Apply **quoted-thread stripping**, **signature removal**, and **max body length** before embed.
 - **Legal hold / retention:** ingestion must respect mandate **delete** and **export** rules; **disconnecting** or **revoking** the mail **connection** must **purge** mail-sourced chunks.

 ### 2.3 “Account and stuff” — what to index vs. what never to index

+**Roadmap note:** Standalone **profile/mandate snapshot** ingestion (formerly roadmap **P2**) is **out of current scope**; the table below remains the **target model** when that work is picked up again.
+
 **Goal:** Give agents **useful, permission-safe** context (“who is this user in this mandate”, “which features are on”, “preferred language”) without creating a **second copy of sensitive credentials** in the vector store.

 | Data | Typical treatment |
@ -256,6 +258,60 @@ Snapshots should be stored with the same **scope model** as file chunks (`person

 **Storage (already implemented — not redesigned here):** The platform already uses **one** knowledge persistence stack: **`FileContentIndex`** (incl. `mandateId`, `scope`, status) and **`ContentChunk`** (pgvector embeddings, `fileId`, `userId`, `featureInstanceId`, `contextRef`, optional **`chunkMetadata`**), accessed via **`interfaceDbKnowledge`**. Chunks are **file-anchored** today; **connection- / source-specific** provenance (e.g. `connectionId`, external ids) can ride in **`contextRef` / `chunkMetadata`** until optional schema extensions are justified. **This document targets ingestion triggers and lifecycles**, not a second corpus or a duplicate storage model.

+### 2.5 Lifecycle gap and daily refresh (roadmap **P1c**, v1)
+
+**Gap:** After a successful connect, **bootstrap** runs once (initial fill). **New** mail, files, or tasks that arrive **after** that run are **not** indexed automatically until a **delta** path exists (webhook, `historyId` / `changes` cursors, etc. — see Teil **2.1** row *“Sync for an existing connection”*).
+
+**Pragmatic mitigation (deliberately simple):** A **daily scheduler** (e.g. once per night, staggered by tenant/load) re-invokes the same **bootstrap walkers** for every **active** `UserConnection` that has **knowledge ingestion enabled** (see **§2.6**). Idempotency + fast-path skips unchanged items; **new** and **changed** items are picked up.
+
+- **Pros:** No new external dependencies (Pub/Sub, watch renewal) in v1; fits existing BackgroundJob + cron/feature-flag patterns.
+- **Con:** Data can lag up to **~24 h** before it appears in RAG — acceptable for v1 product choice.
+- **Later (without replacing P1c):** Add per-authority **delta APIs** (Gmail `users.history.list`, Drive `changes.list`, ClickUp tighter polling) to reduce latency and API cost.
+
+### 2.6 User consent, frontend flow, and per-connection preferences (incl. neutralization)
+
+**Goal:** The user **explicitly** chooses whether this connection may feed the **shared knowledge store** used for AI/RAG — and **how much**. Without consent, **no** knowledge bootstrap is started for that connection (OAuth may still unlock other product features; that split must be obvious in the UI).
+
+**Frontend (`frontend_nyla`):** extend the **add connection** flow (and later **connection settings**) with the dialog and controls below; persist choices via Gateway API **before** or **when** triggering knowledge ingestion.
+
+#### UX when adding a connection
+
+1. User starts OAuth as today.
+2. **Before** or **immediately after** successful authorization: a **dialog** that clearly separates “establish connection” from “add to knowledge base”.
+3. **No:** Connection remains usable for other features; either skip `KnowledgeIngestionConsumer.onConnectionEstablished` for the knowledge lane or persist `knowledgeIngestionEnabled=false` and never schedule walkers.
+4. **Yes:** Show **advanced settings** (second step or accordion) per **settings catalog** below; persist **per `connectionId`** (or a dedicated preferences row); only then enqueue **bootstrap** (and later **P1c** refresh) with allowed surfaces and tiers.
+
+**Suggested copy (DE — pick one tone / A-B test):**
+
+- **Formal:** „Möchten Sie Inhalte aus dieser Verbindung in Ihre **Wissensdatenbank** übernehmen? KI-Funktionen können dann passender auf **Ihre** Dokumente und Nachrichten Bezug nehmen — **nur** mit Ihrer ausdrücklichen Zustimmung und in dem Umfang, den Sie festlegen.“
+- **Approachable:** „Sollen wir aus dieser Verbindung ausgewählte Inhalte sicher in Ihre **persönliche Wissensdatenbank** legen, damit die KI für Sie **besser helfen** kann? Sie entscheiden **was** und **wie stark anonymisiert** — und können das jederzeit in den Einstellungen ändern oder die Daten entfernen.“
+
+Mirror in EN if the UI is bilingual.
+
+#### Minimum settings catalog (all **per connection** where technically applicable)
+
+| Layer | Setting | Meaning |
+|--------|-----------|---------|
+| **Master** | **Knowledge ingestion for this connection** | `off` / `on`: gates bootstrap + **§2.5** (P1c) refresh for the knowledge store. |
+| **Protection** | **Neutralize / anonymize before embedding** | When `on`: apply the same (or stricter) **neutralization** pipeline as for uploads (`FileItem.neutralize` / platform rules) to connector-sourced text **before** chunking — names, e-mail addresses, phone-like patterns, IBAN-like patterns, per policy. User-facing label **„anonymisiert“** maps to this pipeline (not a cryptographic guarantee). |
+| **Mail** (Outlook / Gmail) | **Content depth** | At least: **metadata only** (subject, participants, dates — no body) / **snippet** / **full cleaned body** (after `cleanEmailBody` and caps). |
+| **Mail** | **Index attachments** | `off` / `on` (with size/type caps). |
+| **Files** (Drive / SharePoint / OneDrive) | **Index binary files** | `off` / `on`; optional **MIME allowlist** (Office/PDF/text only) as a simplified UX preset. |
+| **ClickUp** | **Scope** | `titles only` / `title + description` / `+ comments` / optional `attachments`. |
+| **Microsoft** | **Parity** | Same dimensions where Graph surfaces mirror Google (mailbox / drive-like). |
+| **General** | **Time window** | “Only index items from the last **N** days” (aligns with existing walker caps; slider with a sensible max). |
+| **General** | **Help: what RAG is not** | Short explainer: not real-time mail; delay until next scheduled run (**§2.5**). |
+
+**Optional power-user toggles (same screen, collapsed):** per authority **which surfaces** ingest (e.g. **Google:** Gmail on/off, Drive on/off; **Microsoft:** SharePoint on/off, Outlook on/off — when product exposes both). Reduces accidental over-breadth without extra wizard steps.
+
+**Backend consequence:** Walkers read persisted preferences for `connectionId` each run and **filter** surfaces and payload tiers **before** `indexFile`. On preference change, product decision: trigger **re-sync**, or apply only to **new** items — document the chosen rule.
+
+#### Neutralization when the user opts in
+
+- **Ingestion on** + **neutralization on:** After content is obtained (virtual text or extraction output), apply the **neutralization stage** **before** chunking/embedding; **that** text is what gets embedded.
+- **Neutralization off:** Still apply baseline **hygiene** where already defined (e.g. `cleanEmailBody` for quotes/signatures) — hygiene **≠** full PII removal.
+- **Compliance copy:** If the user chooses **full body**, state clearly that **perfect** anonymization is not guaranteed without neutralization.
+
 ---

 ## Teil 3 — Feature injection: retrieval vs corpus, agent loop, and real gaps
@ -338,11 +394,11 @@ Then add **`requestIngestion` / `indexFile`** at the **feature commit point** (o
 3. **Unified façade** — one ingestion API; avoid a second embedding pipeline.  
 4. **Purge** — tie to **`fileId`**, business key, or future connector purge keys on revoke/delete.

-### 3.7 Phasing
+### 3.7 Phasing (feature matrix — **not** the same numbering as roadmap **P1c/P1d/P3** above)

- **P0:** For **each** row in §3.3, confirm **retrieval** vs **corpus** paths; document “satisfied by agent+upload+tools” vs “needs feature hook.”  
- **P1:** Implement **feature-native corpus** for one domain with a clear §3.5 gap (e.g. **trustee** entity text, **teamsbot** persisted transcript).  
- **P2:** **Chatbot** architecture decision: integrate **`serviceKnowledge`** or keep parallel retrieval; if integrate, add explicit **corpus** rules for config/FAQ.
+- **FM0:** For **each** row in §3.3, confirm **retrieval** vs **corpus** paths; document “satisfied by agent+upload+tools” vs “needs feature hook.”  
+- **FM1:** Implement **feature-native corpus** for one domain with a clear §3.5 gap (e.g. **trustee** entity text, **teamsbot** persisted transcript).  
+- **FM2:** **Chatbot** architecture decision: integrate **`serviceKnowledge`** or keep parallel retrieval; if integrate, add explicit **corpus** rules for config/FAQ.

 ---

@ -350,12 +406,41 @@ Then add **`requestIngestion` / `indexFile`** at the **feature commit point** (o

 Phases align with **Teil 1** (façade), **Teil 2** (connector + trigger catalog), and **Teil 3.7** (feature matrix and feature-native corpus pilots). **P0** overlaps **Teil 3.7 P0** (complete the per-feature matrix before large builds).

+**Authority rollout (2026-04-24):** The **user-connection ingestion lane** (bootstrap + purge tied to **`UserConnection`**) is delivered **per OAuth authority**: **`msft` (P1a)**, **`google`** + **`clickup` (P1b)** — same consumer, dispatcher fan-out, purge-by-`connectionId`, and unit tests for walkers + consumer. **Next product slices:** **P1c** (daily refresh, **§2.5**), **consent + per-connection preferences + frontend** (**§2.6**), then **P3** (event bus at scale).
+
 | Phase | Outcome |
 |-------|---------|
 | **P0 — Façade + idempotency** *(done, 2026-04-21)* | Single `requestIngestion` / `getIngestionStatus` entry point on `KnowledgeService` with content-hash idempotency, provenance in `structure._ingestion`, and structured logging (`ingestion.queued` / `ingestion.indexed` / `ingestion.skipped.duplicate` / `ingestion.failed`). All prior `indexFile` call sites now route through the façade: `routeDataFiles._autoIndexFile`, `commcoach/serviceCommcoachIndexer.indexSessionData`, `serviceAgent/coreTools/_workspaceTools.readFile`, `serviceAgent/coreTools/_documentTools.describeImage`. Agent tools no longer carry on-demand extraction + ingestion fallbacks — they are pure consumers of the knowledge store. **Teil 3.3** matrix audited. Three implementation bugs fixed during verification: stable content hash, pre-upsert `_ingestion` preservation, `mergeStrategy=None` for per-page granularity (see **§1.4 Implementation pitfalls**). |
-| **P1 — User-connection hooks** *(done, 2026-04-21)* | `connection.established` / `connection.revoked` callbacks emitted from every OAuth callback (`routeSecurityMsft`, `routeSecurityGoogle`, `routeSecurityClickup`) and from `routeDataConnections.disconnect_service` / `delete_connection`; the `ConnectionStatus.INACTIVE` enum bug (the value did not exist) was fixed by switching the disconnect path to `ConnectionStatus.REVOKED`. A new central `KnowledgeIngestionConsumer` (`subConnectorIngestConsumer.py`, registered in `app.py` lifespan) maps `established` to a `connection.bootstrap` BackgroundJob and `revoked` to a synchronous purge through `KnowledgeService.purgeConnection` → `interfaceDbKnowledge.deleteFileContentIndexByConnectionId`. `FileContentIndex` gained `connectionId` and `sourceKind` columns (auto-applied by `connectorDbPostgre`); `IngestionJob` carries both end-to-end so every chunk is purgeable by connection. **All three OAuth authorities are wired up** with one bootstrap module per service: `subConnectorSyncSharepoint.py` (`sourceKind="sharepoint_item"`, `eTag` as `contentVersion`, walks sites with the `@odata.nextLink` paginated `SharepointAdapter.browse`), `subConnectorSyncOutlook.py` (virtual `outlook_message` documents — header / snippet / cleaned body via the shared `cleanEmailBody` utility — with `changeKey` revisions and optional `outlook_attachment` child jobs), `subConnectorSyncGdrive.py` (`gdrive_item`, `modifiedTime` revisions, recursive walk from My Drive root with depth/age caps and Google-Doc export support inherited from `DriveAdapter.download`), `subConnectorSyncGmail.py` (virtual `gmail_message` documents with `historyId` revisions, walks `INBOX + SENT` by default, MIME-tree body extraction prefers `text/plain` and falls back to `text/html`, optional `gmail_attachment` child jobs), `subConnectorSyncClickup.py` (virtual `clickup_task` documents with `date_updated` revisions, walks teams → spaces → folder/folderless lists → tasks with workspace and per-workspace list caps, header carries name/status/list/space/assignees/tags/url so search prompts retrieve task context without a live API call). The dispatcher `_bootstrapJobHandler` fans out per authority (msft → sharepoint+outlook in parallel, google → drive+gmail in parallel, clickup → tasks); unsupported authorities log `ingestion.connection.bootstrap.skipped reason=unsupported_authority`. Structured-log schema (started / progress / done / purged) defined in **§ Structured ingestion logs** below. Eight new unit tests (purge, consumer dispatch + per-authority routing, `cleanEmailBody`, bootstrapSharepoint, bootstrapOutlook, bootstrapGmail, bootstrapGdrive, bootstrapClickup) lock the contract. **Retrieval threshold calibration (2026-04-21):** during UI verification `buildAgentContext` returned `instanceChunks=0` despite 640 correctly-indexed rows — root cause was overly aggressive `minScore` thresholds (Layer 1 `0.65`, Layer 1.5 `0.55`, Layer 3 `0.70`) versus realistic `text-embedding-3-small` cosine similarities in the `0.30`–`0.55` range. All three thresholds lowered to `0.35`; agent then correctly synthesized answers from indexed Outlook/SharePoint content without resorting to live tools. |
-| **P2 — Profile & mandate snapshots** | Allowlisted fields only (**Teil 2.3**); regenerate on events; explicit admin toggle per mandate if needed. |
-| **P3 — Event bus** | Move direct calls to async consumer where load requires it (**Teil 2.4** scalable target). |
+| **P1a — User-connection hooks (Microsoft `msft`)** *(done, 2026-04-21)* | **`connection.established`** / **`connection.revoked`** emitted from **Microsoft** data-OAuth success paths and from **disconnect/delete** when the row is **`msft`** (incl. **`ConnectionStatus.REVOKED`** fix where **`INACTIVE`** was invalid). Central **`KnowledgeIngestionConsumer`** (`subConnectorIngestConsumer.py`, **`app.py`** lifespan) maps **`established`** → **`connection.bootstrap`** BackgroundJob and **`revoked`** → synchronous **`KnowledgeService.purgeConnection`** → **`interfaceDbKnowledge.deleteFileContentIndexByConnectionId`**. **`FileContentIndex.connectionId`** + **`sourceKind`** (and **`IngestionJob`** carrying both) make connector-sourced rows purgeable. **Bootstrap modules live for Microsoft:** **`subConnectorSyncSharepoint.py`** (`sourceKind="sharepoint_item"`, **`eTag`** as `contentVersion`, **`SharepointAdapter.browse`** with **`@odata.nextLink`** pagination) and **`subConnectorSyncOutlook.py`** (virtual **`outlook_message`** docs — header / snippet / cleaned body via **`cleanEmailBody`**, **`changeKey`** revisions, optional **`outlook_attachment`** child jobs). Dispatcher **`_bootstrapJobHandler`** runs **SharePoint + Outlook in parallel** for **`msft`**. Structured logs: **§ Structured ingestion logs**. **Retrieval threshold calibration (2026-04-21):** **`buildAgentContext`** **`minScore`** layers lowered to **`0.35`** so **`text-embedding-3-small`** matches real cosine scores; validated on **Outlook/SharePoint–indexed** content. **Tests (P1a):** purge, consumer **msft** dispatch, **`cleanEmailBody`**, **`bootstrapSharepoint`**, **`bootstrapOutlook`**. |
+| **P1b — User-connection hooks (Google + ClickUp)** *(done, 2026-04)* | Parity with **`msft`**: **`routeSecurityGoogle`** / **`routeSecurityClickup`** call **`KnowledgeIngestionConsumer.onConnectionEstablished`** after token save; **`routeDataConnections`** disconnect/delete call **`onConnectionRevoked`** for **all** authorities. **`_bootstrapJobHandler`** fans out **google → `bootstrapGdrive` + `bootstrapGmail`** in parallel and **clickup → `bootstrapClickup`**. Walkers: `subConnectorSyncGdrive.py`, `subConnectorSyncGmail.py`, `subConnectorSyncClickup.py` + `subTextClean.py`. Unit tests: `test_bootstrap_gdrive.py`, `test_bootstrap_gmail.py`, `test_bootstrap_clickup.py`, extended `test_knowledge_ingest_consumer.py`. |
+| **P1c — Connection refresh (lifecycle v1)** *(next)* | **Daily** (or nightly) **scheduled** re-run of the same bootstrap walkers for connections with **knowledge ingestion enabled** (**§2.6**). Reuses idempotency + fast-path; closes the **post-connect delta gap** without webhooks in v1. Observability: same log family as bootstrap; optional `event` suffix or `reason=scheduled_refresh` for shippers. |
+| **P1d — Consent + preferences + UI** *(next)* | Persist **§2.6** settings **per `connectionId`**; Gate **`onConnectionEstablished`** / P1c jobs on user choice; **`frontend_nyla`** connection wizard + settings screen; walkers honor mail/file/ClickUp depth and **neutralization** flag. |
+| **~~P2 — Profile & mandate snapshots~~** | **Removed from active roadmap** (focus: connections + feature corpus + scale). Target content remains documented in **§2.3** for a future re-entry when needed. |
+| **P3 — Event bus** | Move direct calls to async consumer where load requires it (**Teil 2.4** scalable target). Remains in scope. |
+
+### P1b checklist *(completed — kept for audit trail)*
+
+1. **`routeSecurityGoogle`:** after successful **data** OAuth, enqueue **same** ingestion consumer path as Microsoft (pass **`connectionId`**, **`AuthAuthority.google`**, mandate/user scope).  
+2. **`routeSecurityClickup`:** after successful OAuth / token persistence, same.  
+3. **`routeDataConnections`:** verify **disconnect_service** / **delete_connection** emit **revoke** (or call **`purgeConnection`**) for **google** and **clickup** rows, not only **msft**.  
+4. **`_bootstrapJobHandler`:** remove any **“unsupported_authority”** skip for **`google`** / **`clickup`** once walkers are registered; keep skip only for **future** authorities.  
+5. **Quality bar:** T10/T12–T15 in the testplan — extend from **Microsoft-only** assumptions to **all three** **`routeDataConnections`** OAuth authorities.
+
+### P1c / P1d checklist *(next engineering slices)*
+
+1. **P1c:** BackgroundJob or cron entry; feature flag; per-tenant stagger; only connections with **knowledge ingestion = on**; metrics on `indexed` vs `skippedDup` per run.  
+2. **P1d ✅ — implemented:**
+   - [x] **`UserConnection`** extended with `knowledgeIngestionEnabled: bool` (default `False` = strict opt-in) and `knowledgePreferences: Optional[Dict]` (`schemaVersion=1`); DB auto-migration adds columns on startup.
+   - [x] **`routeDataConnections` `create_connection`** accepts `knowledgeIngestionEnabled` + `knowledgePreferences` in request body and persists them before returning.
+   - [x] **OAuth callbacks** (`routeSecurityGoogle`, `routeSecurityMsft`, `routeSecurityClickup`) gate `callbackRegistry.trigger("connection.established", …)` on `connection.knowledgeIngestionEnabled`; emit structured log `ingestion.connection.bootstrap.skipped reason=consent_disabled` when disabled.
+   - [x] **`_bootstrapJobHandler`** defensive re-check: loads connection via `getUserConnectionById` and no-ops if flag was disabled after OAuth (race protection).
+   - [x] **`IngestionJob.neutralize: bool`** added; `requestIngestion` + `_indexFileInternal` thread it through; for `sourceKind != "file"` the flag drives `_shouldNeutralize` directly; for `sourceKind == "file"` the `FileItem.neutralize` column remains authoritative.
+   - [x] **`subConnectorPrefs.py`** — `loadConnectionPrefs(connectionId)` helper + `ConnectionIngestionPrefs` dataclass with safe defaults for all §2.6 keys.
+   - [x] **All five walkers** (Gmail, GDrive, ClickUp, Outlook, SharePoint) load prefs at bootstrap start; limits structs gain `mailContentDepth` + `neutralize` (mail walkers), `filesIndexBinaries` (Drive), `clickupScope` (ClickUp), and `neutralize` (all).
+   - [x] **Unit tests** (`test_p1d_consent_prefs.py` — 10 tests): consent gate no-op, prefs defaults + full mapping, Gmail depth modes (metadata/snippet/full), ClickUp scope (titles vs description).
+   - [x] **Frontend** (`frontend_nyla`): `AddConnectionWizard` 4-step modal (connector → consent → preferences → summary + OAuth); old three-button row replaced with single „Verbindung hinzufügen“ button; `createConnectionAndAuth` hook method; `KnowledgePreferences` type in `connectionApi.ts`.
+
+   **Default policy (document for deploy):** `knowledgeIngestionEnabled` defaults to `False` for all new connections. Existing connections (before P1d deploy) have the column `NULL`/`False` — **no bootstrap is triggered retroactively**. Users must explicitly opt in via the wizard or connection settings. If the team decides to migrate existing connections to `True`, a one-time migration script must be run and communicated via release note.

 ---

@ -366,7 +451,8 @@ Phases align with **Teil 1** (façade), **Teil 2** (connector + trigger catalog)
 - One **ingestion contract** for all features and connector lifecycles.  
 - Indexing **decoupled** from the agent loop (agents may still *invoke* tools that ultimately call ingestion, but ingestion must not *depend* on an agent run).  
 - **Explicit** handling of connection establishment, sync, and revocation.  
- **Bounded** indexing of user/mandate context with a clear PII policy.
+- **Bounded** indexing of user/mandate context with a clear PII policy.  
+- **Explicit user consent** and **per-connection** ingestion preferences (incl. optional **neutralization**) before connector content enters the knowledge store (**§2.6**).

 **Explizit NICHT:**

@ -379,7 +465,8 @@ Phases align with **Teil 1** (façade), **Teil 2** (connector + trigger catalog)
 ## Betroffene Module (erwartet)

 - **Gateway:** `serviceKnowledge`, file upload routes, connector OAuth handlers, sync workers, possibly new `serviceKnowledgeIngest` or package under `modules/serviceCenter/services/`.  
- **Interfaces:** `interfaceDbKnowledge` extensions for source metadata if needed.  
+- **Interfaces:** `interfaceDbKnowledge` extensions for source metadata if needed; **`interfaceDbApp`** (or adjacent) for **per-`connectionId`** ingestion preferences from **§2.6**.  
+- **Frontend:** `frontend_nyla` — connection wizard + connection detail settings (consent, depth toggles, neutralization, time window).  
 - **Wiki / Reference:** `b-reference/gateway/ai-agent.md` (ingestion vs. retrieval) after implementation.

 ---
@ -388,19 +475,19 @@ Phases align with **Teil 1** (façade), **Teil 2** (connector + trigger catalog)

 | Thema | Optionen |
 |-------|----------|
-| **Email bodies** | Full text vs. summary-only vs. attachment-only |
+| **Email bodies** | Default product stance is **user-configurable per connection** (**§2.6** table: metadata / snippet / full cleaned body); mandate policy may still cap max tier. |
 | **Multi-tenant isolation audits** | Periodic job to verify chunk `mandateId` matches connection |
 | **Cost caps** | Per-mandate embedding budget; defer large backfills |
-| **Neutralization** | Mandatory for certain `sourceKind`s even when not file-upload |
+| **Neutralization** | **User opt-in** per connection (**§2.6**); optional **mandate floor** (“never below snippet+neutralize for mail”) remains a separate governance decision. |
 | **Provenance shape** | First-class DB columns vs **documented `chunkMetadata` keys** for `connectionId`, external id, revision (must support **Teil 2** purge rules). |
-| **In-flight duplicate handling** | Accept `status ∈ {"extracted","embedding","indexed"}` with matching hash as in-progress (cheap, lossy under failure) **vs** per-`sourceId` `asyncio.Lock` in `KnowledgeService` (strict, requires singleton) — see **§1.4 Deferred to P1**. |
-| **Pre-extraction dedup shortcut** | Short-circuit `_autoIndexFile` via the file-bytes SHA in `interfaceDbManagement` before running `runExtraction` (~15 s saved per re-index of a large PDF) — see **§1.4 Deferred to P1**. |
+| **In-flight duplicate handling** | Accept `status ∈ {"extracted","embedding","indexed"}` with matching hash as in-progress (cheap, lossy under failure) **vs** per-`sourceId` `asyncio.Lock` in `KnowledgeService` (strict, requires singleton) — see **§1.4 Deferred (ingestion idempotency hardening)**. |
+| **Pre-extraction dedup shortcut** | Short-circuit `_autoIndexFile` via the file-bytes SHA in `interfaceDbManagement` before running `runExtraction` (~15 s saved per re-index of a large PDF) — see **§1.4 Deferred (ingestion idempotency hardening)**. |

 ---

 ## Structured ingestion logs (P1 schema)

-The connection-lifecycle lane emits the following structured log events. Each event is a single `logger.info` / `.warning` / `.error` call with a stable `extra={"event": ...}` field so downstream log shippers can route on `event` without parsing the message string.
+The connection-lifecycle lane emits the following structured log events. **`part`** values **`sharepoint`**, **`outlook`**, **`gdrive`**, **`gmail`**, and **`clickup`** are all **implemented** for bootstrap; **P1c** may add the same events with a distinguishable `reason` / `jobType` for **scheduled refresh** (exact field TBD in implementation). Each event is a single `logger.info` / `.warning` / `.error` call with a stable `extra={"event": ...}` field so downstream log shippers can route on `event` without parsing the message string.

 | `event` | Severity | Emitter | Required `extra` keys | Meaning |
 |---------|----------|---------|------------------------|---------|
@ -409,7 +496,7 @@ The connection-lifecycle lane emits the following structured log events. Each ev
 | `ingestion.connection.bootstrap.progress` | info | bootstrap walkers | `connectionId`, `part`, `processed`, `skippedDup`, `failed` | Heart-beat every ~50 items so long-running runs are observable. |
 | `ingestion.connection.bootstrap.done` | info | bootstrap walkers + façade-level totals | `connectionId`, `part`, `indexed`, `skippedDup`, `skippedPolicy`, `failed`, `durationMs` (Outlook/Gmail add `attachmentsIndexed`; SharePoint/Drive add `bytes`; ClickUp adds `workspaces` + `lists`) | Walker finished cleanly. |
 | `ingestion.connection.bootstrap.failed` | error | `_bootstrapJobHandler` | `part`, `connectionId`, `error` | One bootstrap part raised — recorded but the other parts still complete. |
-| `ingestion.connection.bootstrap.skipped` | info | `_bootstrapJobHandler` | `connectionId`, `authority`, `reason` (`unsupported_authority`) | Authority has no bootstrap module registered (e.g. a future provider). |
+| `ingestion.connection.bootstrap.skipped` | info | `_bootstrapJobHandler` + OAuth callbacks + defensive check in `_bootstrapJobHandler` | `connectionId`, `authority`, `reason` (`unsupported_authority` │ `consent_disabled`) | Authority has no bootstrap module registered (e.g. a future provider) — **or** user has not consented (`knowledgeIngestionEnabled=False`). |
 | `ingestion.connection.purged` | info | `_onConnectionRevoked` | `connectionId`, `authority`, `reason`, `indexRows`, `chunks` | Knowledge purge for a revoked connection completed; numbers reflect the deleted rows. |
 | `ingestion.connection.purged.failed` | error | `_onConnectionRevoked` | `connectionId`, `error` | Purge raised; the revoke event was still acknowledged upstream. |

@ -421,16 +508,17 @@ All events should keep field naming consistent with the existing `ingestion.queu
 - **Gateway reference (retrieval + knowledge):** `wiki/b-reference/gateway/architecture.md`, `wiki/b-reference/gateway/ai-agent.md`  
 - **Implementation touchpoints (indicative):** `gateway/modules/serviceCenter/services/serviceKnowledge/mainServiceKnowledge.py`, `gateway/modules/routes/routeDataFiles.py`, `gateway/modules/features/commcoach/serviceCommcoachIndexer.py`, agent `coreTools` `_documentTools` / `_workspaceTools`, `gateway/modules/datamodels/datamodelExtraction.py` (`ExtractionOptions.mergeStrategy: Optional[MergeStrategy]`).
 - **Unit tests (P0 guardrails):** `gateway/tests/unit/services/test_ingestion_hash_stability.py`, `gateway/tests/unit/services/test_extraction_merge_strategy.py`.
- **Unit tests (P1 guardrails):** `gateway/tests/unit/services/test_connection_purge.py`, `gateway/tests/unit/services/test_knowledge_ingest_consumer.py`, `gateway/tests/unit/services/test_clean_email_body.py`, `gateway/tests/unit/services/test_bootstrap_sharepoint.py`, `gateway/tests/unit/services/test_bootstrap_outlook.py`, `gateway/tests/unit/services/test_bootstrap_gmail.py`, `gateway/tests/unit/services/test_bootstrap_gdrive.py`, `gateway/tests/unit/services/test_bootstrap_clickup.py`.
- **P1 implementation touchpoints:** `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorIngestConsumer.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncSharepoint.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncOutlook.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncGdrive.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncGmail.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncClickup.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subTextClean.py`, `gateway/modules/interfaces/interfaceDbKnowledge.py` (`deleteFileContentIndexByConnectionId`), `gateway/modules/datamodels/datamodelKnowledge.py` (`FileContentIndex.connectionId` + `sourceKind`), `gateway/modules/connectors/providerMsft/connectorMsft.py` (`@odata.nextLink`-loop in `SharepointAdapter.browse`, `eTag` in `_graphItemToExternalEntry`), `gateway/modules/routes/routeSecurityMsft.py` / `routeSecurityGoogle.py` / `routeSecurityClickup.py` / `routeDataConnections.py` (callback emission + `ConnectionStatus.REVOKED` fix), `gateway/app.py` (consumer registration in lifespan).
+- **Unit tests (P1a — Microsoft, done):** `gateway/tests/unit/services/test_connection_purge.py`, `gateway/tests/unit/services/test_knowledge_ingest_consumer.py` (incl. **msft** fan-out), `gateway/tests/unit/services/test_clean_email_body.py`, `gateway/tests/unit/services/test_bootstrap_sharepoint.py`, `gateway/tests/unit/services/test_bootstrap_outlook.py`.
+- **Unit tests (P1b — Google + ClickUp, done):** **`test_knowledge_ingest_consumer`** (google / clickup fan-out), **`test_bootstrap_gmail.py`**, **`test_bootstrap_gdrive.py`**, **`test_bootstrap_clickup.py`**. **P1d (done):** **`test_p1d_consent_prefs.py`** (10 tests: consent gate, prefs parsing, Gmail depth modes, ClickUp scope). **P1c:** add scheduler tests when implemented.
+- **P1 implementation touchpoints:** `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorIngestConsumer.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncSharepoint.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncOutlook.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncGdrive.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncGmail.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subConnectorSyncClickup.py`, `gateway/modules/serviceCenter/services/serviceKnowledge/subTextClean.py`, `gateway/modules/interfaces/interfaceDbKnowledge.py` (`deleteFileContentIndexByConnectionId`), `gateway/modules/datamodels/datamodelKnowledge.py` (`FileContentIndex.connectionId` + `sourceKind`), `gateway/modules/connectors/providerMsft/connectorMsft.py` (`@odata.nextLink`-loop in `SharepointAdapter.browse`, `eTag` in `_graphItemToExternalEntry`), `gateway/modules/connectors/providerGoogle/connectorGoogle.py` (P1b: Drive + Gmail revision keys and download/export paths), `gateway/modules/routes/routeSecurityMsft.py` (P1a callbacks), `gateway/modules/routes/routeSecurityGoogle.py` and `gateway/modules/routes/routeSecurityClickup.py` (P1b: parity callbacks), `gateway/modules/routes/routeDataConnections.py` (revoke for **all** authorities), `gateway/app.py` (consumer registration in lifespan).

 ## Akzeptanzkriterien (Plan-Ebene)

 | # | Kriterium | Prio |
 |---|-----------|------|
 | 1 | Every new **file** that should be searchable triggers ingestion **without** requiring an agent session. | must |
-| 2 | **User connection** connect / disconnect has defined ingestion or purge behavior documented and implementable. | must |
-| 3 | **Profile/mandate** snapshots use an explicit allowlist; secrets never enter the embedding pipeline. | must |
+| 2 | **User connection** connect / disconnect has defined ingestion or purge behavior **for each** OAuth authority **`routeDataConnections`** supports (**P1a** **`msft`**, **P1b** **`google`** / **`clickup`**); **plus** user-controlled **opt-in** and **preference bundle** before ingestion (**P1d**, **§2.6**). | must |
+| 3 | **Profile/mandate** snapshot ingestion (**former roadmap P2**) is **deferred**; when re-opened, snapshots must use an explicit allowlist and never embed secrets. Until then, **§2.6** consent + neutralization covers connector-sourced PII risk. | should (reactivated when P2 returns) |
 | 4 | Ingestion is **idempotent** for unchanged content (no duplicate embedding work). Verified 2026-04-21 on a 500-page PDF: second re-index trigger logs `ingestion.skipped.duplicate` with a stable hash, zero embedding API calls. See **§1.4 pitfalls** for the three bug classes that had to be fixed first. | must |
 | 5 | **Teil 3.3** matrix completed: every `modules/features/*` product row has **retrieval** (agent vs none), **corpus** (upload / tools / feature indexer), and **gap** explicitly stated—not “non-injecting” if **`AgentService`** already provides retrieval injection. | should |

@ -449,9 +537,10 @@ All events should keep field naming consistent with the existing `ingestion.queu
 | T7 | Bleiben bei Multi-Page-PDFs die Per-Page-Chunks erhalten (keine `MergeStrategy`-Konkatenation)? | Unit: `tests/unit/services/test_extraction_merge_strategy.py`. Live: 500-Seiten-PDF → 563 ContentObjects, 567 Embedding-Chunks in 24 Batches (verifiziert 2026-04-21). |
 | T8 | Überleben `_ingestion.hash` und `status="indexed"` einen Pre-Scan-Re-Upsert in `_autoIndexFile`? | Review `routeDataFiles._autoIndexFile` Zeile ~127: existing row wird vor upsert gelesen und `_ingestion` + `indexed` in frischen `contentIndex` gemerged. Live: zweiter Trigger → `ingestion.skipped.duplicate` statt Re-Embedding. |
 | T9 | Räumt ein `connection.revoked` Event **alle** `FileContentIndex`-Rows + `ContentChunk`s einer Connection und **nichts anderes** auf (Uploads ohne `connectionId`, andere Connections bleiben intakt)? | Unit: `tests/unit/services/test_connection_purge.py` (3 Cases: positive purge, leerer connectionId-Noop, unbekannter connectionId). |
-| T10 | Dispatcht der `KnowledgeIngestionConsumer` `connection.established` korrekt als asynchroner `connection.bootstrap` Job (msft → SharePoint + Outlook parallel; google → Drive + Gmail parallel; clickup → Tasks; unbekannte Authorities `skipped.reason="unsupported_authority"`) und `connection.revoked` synchron als Purge? | Unit: `tests/unit/services/test_knowledge_ingest_consumer.py` (8 Cases: established enqueue, missing-id ignore, revoked purge, missing-id ignore, skip-unsupported, msft fan-out, google fan-out, clickup dispatch). |
+| T10 | Dispatcht der `KnowledgeIngestionConsumer` `connection.established` korrekt als asynchroner `connection.bootstrap` Job (**P1a:** **msft** → SharePoint + Outlook parallel; **P1b:** **google** → Drive + Gmail parallel; **clickup** → Tasks) und `connection.revoked` synchron als Purge — **für jede** der drei **`routeDataConnections`**-Authorities? | **P1a + P1b (done):** `test_knowledge_ingest_consumer.py` — alle drei Authorities + revoke; unbekannte Authorities `skipped.reason="unsupported_authority"`. **P1d:** zusätzlich nur bei **Consent = ja** dispatch. |
 | T11 | Reduziert `cleanEmailBody` ein realistisches Outlook-HTML auf den eigenen Body-Anteil (HTML strip, Quote-Strip EN+DE, Signature-Strip, Whitespace-Collapse, `maxChars`-Truncate)? | Unit: `tests/unit/services/test_clean_email_body.py` (8 Cases). Konsequenz: `bootstrapOutlook` schickt nie HTML/Quoted-Replies/Signaturen in den Embedding-Pipeline-Schritt. |
 | T12 | Sind die Bootstrap-Walker für SharePoint und Outlook idempotent gegen ein zweites Run mit unveränderten `eTag` / `changeKey`? | Unit: `tests/unit/services/test_bootstrap_sharepoint.py` + `tests/unit/services/test_bootstrap_outlook.py`. Mock-Adapter liefern stable revisions; KnowledgeService-Fake meldet `duplicate` und das Result-Objekt bilanziert `skippedDuplicate`. |
-| T13 | Walked `bootstrapGmail` `INBOX + SENT`, parsed MIME-Bodies (preferring `text/plain`, falling back to `text/html`), folgt `nextPageToken`-Pagination und ist idempotent gegen identische `historyId` Revisions? | Unit: `tests/unit/services/test_bootstrap_gmail.py` (6 Cases: header/snippet/body content-objects, MIME plain-vs-html preference, HTML fallback, multi-label fan-out, `nextPageToken` pagination, duplicate accounting). |
-| T14 | Walked `bootstrapGdrive` My Drive rekursiv (Folder-MIME-Erkennung, `maxDepth`), respektiert den `maxAgeDays`-Recency-Filter und ist idempotent gegen identische `modifiedTime` Revisions? | Unit: `tests/unit/services/test_bootstrap_gdrive.py` (4 Cases: site/subfolder walk, duplicate accounting, recency-skip via `skippedPolicy`, provenance carries `authority="google"` + `service="drive"`). |
-| T15 | Walked `bootstrapClickup` Workspaces → Spaces → Folder/Folderless Lists → Tasks unter `maxWorkspaces` / `maxListsPerWorkspace` / `maxTasks` Caps, respektiert den `maxAgeDays`-Recency-Filter und ist idempotent gegen identische `date_updated` Revisions? | Unit: `tests/unit/services/test_bootstrap_clickup.py` (4 Cases: hierarchy walk indexes 4 tasks across 2 lists, duplicate accounting, recency-skip via `skippedPolicy`, `maxTasks` cap). |
+| T13 | Walked `bootstrapGmail` `INBOX + SENT`, parsed MIME-Bodies (preferring `text/plain`, falling back to `text/html`), folgt `nextPageToken`-Pagination und ist idempotent gegen identische `historyId` Revisions? | **P1b (done):** Unit `test_bootstrap_gmail.py`. **P1d:** Walker respektiert **Content depth** aus **§2.6** (Metadaten/Snippet/Body). |
+| T14 | Walked `bootstrapGdrive` My Drive rekursiv (Folder-MIME-Erkennung, `maxDepth`), respektiert den `maxAgeDays`-Recency-Filter und ist idempotent gegen identische `modifiedTime` Revisions? | **P1b (done):** Unit `test_bootstrap_gdrive.py`. **P1d:** „Binärdateien“ / MIME-Allowlist aus **§2.6**. |
+| T15 | Walked `bootstrapClickup` Workspaces → Spaces → Folder/Folderless Lists → Tasks unter `maxWorkspaces` / `maxListsPerWorkspace` / `maxTasks` Caps, respektiert den `maxAgeDays`-Recency-Filter und ist idempotent gegen identische `date_updated` Revisions? | **P1b (done):** Unit `test_bootstrap_clickup.py`. **P1d:** ClickUp-**Scope** (Titel/Beschreibung/Kommentare) aus **§2.6**. |
+| T16 | Führt der **P1c**-Tagesjob nur Verbindungen mit **Wissens-Injektion = ein** aus und bleiben Kosten/API-Limits durch Idempotenz + Fast-Path beherrschbar? | Integration oder Unit mit Fake-Clock: zweiter Lauf → überwiegend `skippedDup`; Logs `ingestion.connection.bootstrap.*` mit erkennbarem Scheduled-`reason` (falls implementiert). |
--- a/d-guides/deployment/poweron-sec.kdbx
+++ b/d-guides/deployment/poweron-sec.kdbx