wiki/implementation/Chatbot/FUNCTIONAL_DIFFERENCES_ANALYSIS.md

347 lines
13 KiB
Markdown

# Functional Differences Analysis: Legacy vs Current Chatbot
## Executive Summary
The **legacy implementation works correctly** because the LLM **actually uses the `send_streaming_message` tool** as instructed. The **current implementation fails** because the LLM **generates status messages as regular text** instead of using the tool, causing an infinite loop when the system tries to handle these text messages.
---
## Core Functional Difference
### Legacy: Tool-Based Status Updates (WORKS)
**How it works:**
1. System prompt instructs: "Use `send_streaming_message` tool for status updates"
2. LLM (ChatAnthropic) **follows instructions** and calls the tool
3. Event handler listens **ONLY** for `on_tool_start` events with `send_streaming_message`
4. When tool is called → routes to tools node → tool executes → routes back to agent
5. Agent then calls SQL tools → processes results → generates final answer
**Code Evidence:**
```python
# legacy/chatbot.py line 267
if etype == "on_tool_start" and ename == "send_streaming_message":
tool_in = edata.get("input") or {}
msg = tool_in.get("message")
if isinstance(msg, str) and msg.strip():
yield {"type": "status", "label": msg.strip()}
continue
```
**Key Point:** Legacy **ONLY** handles tool calls. It doesn't try to detect status messages in regular text.
---
### Current: Text-Based Status Updates (BROKEN)
**How it fails:**
1. System prompt instructs: "MUST use `send_streaming_message` tool, VERBOTEN to write text messages"
2. LLM (AICenterChatModel) **ignores instructions** and generates text messages like "Ich werde die Datenbank nach Artikeln durchsuchen..."
3. Event handler tries to handle these text messages by detecting them as "status messages"
4. When status message detected → routes back to agent (to "fix" it)
5. Agent generates another text status message (still not using tool)
6. **Infinite loop** until max iterations (15) reached
**Code Evidence:**
```python
# gateway/modules/features/chatbot/chatbotStreaming.py line 198-227
if etype == "on_chain_stream" and ename == "agent":
# Tries to detect status messages in regular text
if content and is_status_message(content):
await _emit_status_event(...) # Convert to status event
continue # Don't store as message
```
```python
# gateway/modules/features/chatbot/chatbotLangGraph.py line 292-296
if is_status:
# Status message without tool calls - route back to agent
# The agent should then call actual tools (like sqlite_query)
logger.info(f"Status message detected without tool calls, routing back to agent...")
return "agent" # THIS CAUSES THE LOOP
```
**Key Point:** Current tries to **compensate** for LLM not following instructions, but this creates a loop.
---
## Why They Differ: Root Causes
### 1. Model Behavior Difference
| Aspect | Legacy (ChatAnthropic) | Current (AICenterChatModel) |
|--------|----------------------|---------------------------|
| **Tool Calling** | Follows prompt, uses `send_streaming_message` tool | Ignores prompt, generates text instead |
| **Instruction Following** | Strong adherence to system prompt | Weak adherence to system prompt |
| **Model Type** | Direct LangChain integration | Bridge to AI center (may use different models) |
**Impact:** The current model doesn't follow the instruction to use the tool, so it generates text messages that break the workflow.
---
### 2. Event Handling Strategy
#### Legacy Event Handling
```python
# Simple: Only listen for tool calls
if etype == "on_tool_start" and ename == "send_streaming_message":
# Handle tool call
yield {"type": "status", "label": msg}
continue # Done, move on
```
**Strategy:** Trust the LLM to use the tool. Only handle tool calls.
#### Current Event Handling
```python
# Complex: Try to handle both tool calls AND text messages
if etype == "on_tool_start" and ename == "send_streaming_message":
# Handle tool call (same as legacy)
await _emit_status_event(...)
if etype == "on_chain_stream" and ename == "agent":
# ALSO try to detect status messages in text
if is_status_message(content):
await _emit_status_event(...) # Convert text to status
```
**Strategy:** Don't trust the LLM. Try to compensate by detecting status messages in text.
**Problem:** This creates a feedback loop where status messages trigger re-routing, causing infinite loops.
---
### 3. Workflow Routing Logic
#### Legacy Routing (`should_continue`)
```python
# Simple logic
def should_continue(state: ChatState) -> str:
last_message = state.messages[-1]
tool_calls = getattr(last_message, "tool_calls", None)
if tool_calls:
return "tools" # Has tool calls → execute tools
else:
return END # No tool calls → done
```
**Key Point:** No special handling for status messages. If there are tool calls, execute them. Otherwise, end.
#### Current Routing (`should_continue`)
```python
# Complex logic with status detection
def should_continue(state: ChatState) -> str:
last_message = state.messages[-1]
tool_calls = getattr(last_message, "tool_calls", None)
if tool_calls:
return "tools"
# NEW: Check if it's a status message
if isinstance(last_message, AIMessage):
content = last_message.content
if is_status_message(content):
return "agent" # Route back to agent (CAUSES LOOP!)
return END
```
**Key Point:** Tries to "fix" status messages by routing back to agent, but agent just generates another status message.
---
### 4. Message Filtering
#### Legacy: No Filtering
- All messages are stored in memory
- Status messages from tool calls are handled, but messages themselves are stored
- No filtering of "status-like" text messages
#### Current: Aggressive Filtering
```python
# chatbotMemory.py - Filters out status messages
if content:
content_lower = content.lower().strip()
status_patterns = ["ich werde", "ich suche", ...]
if len(content) < 150 and any(pattern in content_lower for pattern in status_patterns):
logger.debug(f"Skipping status update message...")
continue # Don't store
```
```python
# chatbotLangGraph.py - Filters from conversation window
if content and is_status_message(content):
logger.debug(f"Filtering out status message from conversation window...")
# Skip this message
```
**Problem:** Status messages are filtered out, so they don't accumulate in memory, but the agent keeps generating them, creating a loop.
---
## The Infinite Loop Explained
### What Happens in Current Implementation
1. **User asks:** "wie viele leds haben wir auf lager"
2. **Agent generates:** "Ich werde die Datenbank nach Artikeln durchsuchen..." (text message, NO tool call)
3. **Status detection:** `is_status_message()` returns `True`
4. **Routing:** `should_continue()` returns `"agent"` (route back)
5. **Memory filtering:** Message is filtered out (not stored)
6. **Agent called again:** Generates another status message (still no tool call)
7. **Repeat steps 3-6** until max iterations (15) reached
8. **Workflow ends:** No final answer, only status messages
### What Should Happen (Like Legacy)
1. **User asks:** "wie viele leds haben wir auf lager"
2. **Agent calls tool:** `send_streaming_message("Durchsuche Datenbank nach LEDs...")` (tool call)
3. **Tool execution:** Tool node executes, emits status event
4. **Routing:** `should_continue()` returns `"tools"` → tools execute → back to agent
5. **Agent calls SQL tool:** `sqlite_query("SELECT ...")` (tool call)
6. **SQL execution:** Tool node executes query, returns results
7. **Agent processes:** Generates final answer with results
8. **Workflow ends:** Final answer returned
---
## Why Legacy Works: Model Compliance
### ChatAnthropic Behavior
- **Strong tool calling:** When instructed to use a tool, it actually uses it
- **Prompt following:** Adheres to system prompt instructions
- **Tool-first approach:** Prefers tool calls over text for structured operations
### Evidence from Legacy Logs
```
Denke nach.. ← Tool call
Durchsuche Datenbank nach LEDs... ← Tool call
Berechne Gesamtlagerbestand... ← Tool call
Formuliere finale Antwort... ← Tool call
Aus der Datenbank habe ich 801... ← Final text answer
```
Each status update is a **tool call**, not a text message.
---
## Why Current Fails: Model Non-Compliance
### AICenterChatModel Behavior
- **Weak tool calling:** Doesn't reliably use tools when instructed
- **Text-first approach:** Generates text messages instead of tool calls
- **Prompt ignoring:** Doesn't follow "VERBOTEN" instructions
### Evidence from Current Logs
```
Ich werde die Datenbank nach Artikeln durchsuchen... ← Text message (WRONG!)
Skipping status update message... ← Filtered out
Status message detected without tool calls... ← Detected as status
Routing back to agent... ← Causes loop
[Repeats 15 times]
```
Each status update is a **text message**, not a tool call, causing the loop.
---
## System Prompt Comparison
### Legacy Prompt (Works)
```
STREAMING-UPDATES: Du hast Zugriff auf das Tool "send_streaming_message",
mit dem du dem Nutzer kurze Status-Updates senden kannst.
Nutze dieses Tool, um den Nutzer über deine aktuellen Aktivitäten zu informieren.
```
**Tone:** Informative, suggests using the tool.
### Current Prompt (Doesn't Work)
```
STREAMING-UPDATES - ABSOLUT KRITISCH:
⚠️⚠️⚠️ WICHTIG: Du MUSST das Tool "send_streaming_message" verwenden,
um Status-Updates zu senden. VERBOTEN ist es, normale Text-Nachrichten
für Status-Updates zu schreiben!
VERBOTEN: Text-Nachrichten wie "Ich werde die Datenbank durchsuchen..."
ERLAUBT: Nur das Tool "send_streaming_message" für Status-Updates verwenden!
```
**Tone:** Aggressive, forbids text messages, but model ignores it anyway.
**Irony:** The more explicit the prompt, the more the model ignores it.
---
## Functional Differences Summary
| Aspect | Legacy | Current | Impact |
|--------|--------|---------|--------|
| **Model Tool Calling** | ✅ Uses tool | ❌ Generates text | **CRITICAL** |
| **Event Handling** | Tool calls only | Tool calls + text detection | Creates complexity |
| **Routing Logic** | Simple (tool calls → tools, else → end) | Complex (status detection → route back) | Creates loop |
| **Message Filtering** | None | Aggressive filtering | Hides the problem |
| **Prompt Style** | Informative | Aggressive/forbidding | Model ignores anyway |
---
## Why This Matters
### Legacy Success Factors
1. **Model compliance:** ChatAnthropic follows instructions
2. **Simple event handling:** Only handles what's expected (tool calls)
3. **No compensation logic:** Doesn't try to "fix" model behavior
4. **Trust-based:** Assumes model will use tools correctly
### Current Failure Factors
1. **Model non-compliance:** AICenterChatModel doesn't follow instructions
2. **Complex event handling:** Tries to handle both tool calls and text
3. **Compensation logic:** Tries to "fix" model behavior, creates loops
4. **Distrust-based:** Assumes model won't use tools, tries to compensate
---
## The Real Problem
The current implementation is trying to **compensate for model non-compliance** by:
1. Detecting status messages in text
2. Converting them to status events
3. Routing back to agent to "fix" it
But this creates a **feedback loop** because:
- Agent generates text status message
- System detects it and routes back
- Agent generates another text status message
- Loop continues
**The solution is NOT to add more compensation logic.** The solution is to **fix the root cause**: Make the model actually use the tool.
---
## Recommendations
### Short-Term Fix
1. **Remove status message detection** from `should_continue()` - don't route back
2. **Remove text-to-status conversion** - only handle tool calls
3. **Let status messages be stored** - don't filter them aggressively
4. **Simplify routing** - if no tool calls, end (like legacy)
### Long-Term Fix
1. **Fix model behavior** - Ensure AICenterChatModel actually uses tools
2. **Improve prompt** - Test different prompt styles to get tool usage
3. **Model selection** - Use a model that reliably follows tool-calling instructions
4. **Tool binding** - Verify tools are properly bound and available to model
---
## Conclusion
The functional difference is **not in the architecture** but in **model behavior**:
- **Legacy:** Model uses tools → Simple event handling → Works
- **Current:** Model doesn't use tools → Complex compensation → Breaks
The current implementation is **over-engineered** to compensate for model non-compliance, but this compensation creates more problems than it solves.
**The fix is simple:** Make the model use the tool (like legacy), then simplify the event handling to match legacy's simplicity.