347 lines
13 KiB
Markdown
347 lines
13 KiB
Markdown
# Functional Differences Analysis: Legacy vs Current Chatbot
|
|
|
|
## Executive Summary
|
|
|
|
The **legacy implementation works correctly** because the LLM **actually uses the `send_streaming_message` tool** as instructed. The **current implementation fails** because the LLM **generates status messages as regular text** instead of using the tool, causing an infinite loop when the system tries to handle these text messages.
|
|
|
|
---
|
|
|
|
## Core Functional Difference
|
|
|
|
### Legacy: Tool-Based Status Updates (WORKS)
|
|
|
|
**How it works:**
|
|
1. System prompt instructs: "Use `send_streaming_message` tool for status updates"
|
|
2. LLM (ChatAnthropic) **follows instructions** and calls the tool
|
|
3. Event handler listens **ONLY** for `on_tool_start` events with `send_streaming_message`
|
|
4. When tool is called → routes to tools node → tool executes → routes back to agent
|
|
5. Agent then calls SQL tools → processes results → generates final answer
|
|
|
|
**Code Evidence:**
|
|
```python
|
|
# legacy/chatbot.py line 267
|
|
if etype == "on_tool_start" and ename == "send_streaming_message":
|
|
tool_in = edata.get("input") or {}
|
|
msg = tool_in.get("message")
|
|
if isinstance(msg, str) and msg.strip():
|
|
yield {"type": "status", "label": msg.strip()}
|
|
continue
|
|
```
|
|
|
|
**Key Point:** Legacy **ONLY** handles tool calls. It doesn't try to detect status messages in regular text.
|
|
|
|
---
|
|
|
|
### Current: Text-Based Status Updates (BROKEN)
|
|
|
|
**How it fails:**
|
|
1. System prompt instructs: "MUST use `send_streaming_message` tool, VERBOTEN to write text messages"
|
|
2. LLM (AICenterChatModel) **ignores instructions** and generates text messages like "Ich werde die Datenbank nach Artikeln durchsuchen..."
|
|
3. Event handler tries to handle these text messages by detecting them as "status messages"
|
|
4. When status message detected → routes back to agent (to "fix" it)
|
|
5. Agent generates another text status message (still not using tool)
|
|
6. **Infinite loop** until max iterations (15) reached
|
|
|
|
**Code Evidence:**
|
|
```python
|
|
# gateway/modules/features/chatbot/chatbotStreaming.py line 198-227
|
|
if etype == "on_chain_stream" and ename == "agent":
|
|
# Tries to detect status messages in regular text
|
|
if content and is_status_message(content):
|
|
await _emit_status_event(...) # Convert to status event
|
|
continue # Don't store as message
|
|
```
|
|
|
|
```python
|
|
# gateway/modules/features/chatbot/chatbotLangGraph.py line 292-296
|
|
if is_status:
|
|
# Status message without tool calls - route back to agent
|
|
# The agent should then call actual tools (like sqlite_query)
|
|
logger.info(f"Status message detected without tool calls, routing back to agent...")
|
|
return "agent" # THIS CAUSES THE LOOP
|
|
```
|
|
|
|
**Key Point:** Current tries to **compensate** for LLM not following instructions, but this creates a loop.
|
|
|
|
---
|
|
|
|
## Why They Differ: Root Causes
|
|
|
|
### 1. Model Behavior Difference
|
|
|
|
| Aspect | Legacy (ChatAnthropic) | Current (AICenterChatModel) |
|
|
|--------|----------------------|---------------------------|
|
|
| **Tool Calling** | Follows prompt, uses `send_streaming_message` tool | Ignores prompt, generates text instead |
|
|
| **Instruction Following** | Strong adherence to system prompt | Weak adherence to system prompt |
|
|
| **Model Type** | Direct LangChain integration | Bridge to AI center (may use different models) |
|
|
|
|
**Impact:** The current model doesn't follow the instruction to use the tool, so it generates text messages that break the workflow.
|
|
|
|
---
|
|
|
|
### 2. Event Handling Strategy
|
|
|
|
#### Legacy Event Handling
|
|
```python
|
|
# Simple: Only listen for tool calls
|
|
if etype == "on_tool_start" and ename == "send_streaming_message":
|
|
# Handle tool call
|
|
yield {"type": "status", "label": msg}
|
|
continue # Done, move on
|
|
```
|
|
|
|
**Strategy:** Trust the LLM to use the tool. Only handle tool calls.
|
|
|
|
#### Current Event Handling
|
|
```python
|
|
# Complex: Try to handle both tool calls AND text messages
|
|
if etype == "on_tool_start" and ename == "send_streaming_message":
|
|
# Handle tool call (same as legacy)
|
|
await _emit_status_event(...)
|
|
|
|
if etype == "on_chain_stream" and ename == "agent":
|
|
# ALSO try to detect status messages in text
|
|
if is_status_message(content):
|
|
await _emit_status_event(...) # Convert text to status
|
|
```
|
|
|
|
**Strategy:** Don't trust the LLM. Try to compensate by detecting status messages in text.
|
|
|
|
**Problem:** This creates a feedback loop where status messages trigger re-routing, causing infinite loops.
|
|
|
|
---
|
|
|
|
### 3. Workflow Routing Logic
|
|
|
|
#### Legacy Routing (`should_continue`)
|
|
```python
|
|
# Simple logic
|
|
def should_continue(state: ChatState) -> str:
|
|
last_message = state.messages[-1]
|
|
tool_calls = getattr(last_message, "tool_calls", None)
|
|
|
|
if tool_calls:
|
|
return "tools" # Has tool calls → execute tools
|
|
else:
|
|
return END # No tool calls → done
|
|
```
|
|
|
|
**Key Point:** No special handling for status messages. If there are tool calls, execute them. Otherwise, end.
|
|
|
|
#### Current Routing (`should_continue`)
|
|
```python
|
|
# Complex logic with status detection
|
|
def should_continue(state: ChatState) -> str:
|
|
last_message = state.messages[-1]
|
|
tool_calls = getattr(last_message, "tool_calls", None)
|
|
|
|
if tool_calls:
|
|
return "tools"
|
|
|
|
# NEW: Check if it's a status message
|
|
if isinstance(last_message, AIMessage):
|
|
content = last_message.content
|
|
if is_status_message(content):
|
|
return "agent" # Route back to agent (CAUSES LOOP!)
|
|
|
|
return END
|
|
```
|
|
|
|
**Key Point:** Tries to "fix" status messages by routing back to agent, but agent just generates another status message.
|
|
|
|
---
|
|
|
|
### 4. Message Filtering
|
|
|
|
#### Legacy: No Filtering
|
|
- All messages are stored in memory
|
|
- Status messages from tool calls are handled, but messages themselves are stored
|
|
- No filtering of "status-like" text messages
|
|
|
|
#### Current: Aggressive Filtering
|
|
```python
|
|
# chatbotMemory.py - Filters out status messages
|
|
if content:
|
|
content_lower = content.lower().strip()
|
|
status_patterns = ["ich werde", "ich suche", ...]
|
|
if len(content) < 150 and any(pattern in content_lower for pattern in status_patterns):
|
|
logger.debug(f"Skipping status update message...")
|
|
continue # Don't store
|
|
```
|
|
|
|
```python
|
|
# chatbotLangGraph.py - Filters from conversation window
|
|
if content and is_status_message(content):
|
|
logger.debug(f"Filtering out status message from conversation window...")
|
|
# Skip this message
|
|
```
|
|
|
|
**Problem:** Status messages are filtered out, so they don't accumulate in memory, but the agent keeps generating them, creating a loop.
|
|
|
|
---
|
|
|
|
## The Infinite Loop Explained
|
|
|
|
### What Happens in Current Implementation
|
|
|
|
1. **User asks:** "wie viele leds haben wir auf lager"
|
|
2. **Agent generates:** "Ich werde die Datenbank nach Artikeln durchsuchen..." (text message, NO tool call)
|
|
3. **Status detection:** `is_status_message()` returns `True`
|
|
4. **Routing:** `should_continue()` returns `"agent"` (route back)
|
|
5. **Memory filtering:** Message is filtered out (not stored)
|
|
6. **Agent called again:** Generates another status message (still no tool call)
|
|
7. **Repeat steps 3-6** until max iterations (15) reached
|
|
8. **Workflow ends:** No final answer, only status messages
|
|
|
|
### What Should Happen (Like Legacy)
|
|
|
|
1. **User asks:** "wie viele leds haben wir auf lager"
|
|
2. **Agent calls tool:** `send_streaming_message("Durchsuche Datenbank nach LEDs...")` (tool call)
|
|
3. **Tool execution:** Tool node executes, emits status event
|
|
4. **Routing:** `should_continue()` returns `"tools"` → tools execute → back to agent
|
|
5. **Agent calls SQL tool:** `sqlite_query("SELECT ...")` (tool call)
|
|
6. **SQL execution:** Tool node executes query, returns results
|
|
7. **Agent processes:** Generates final answer with results
|
|
8. **Workflow ends:** Final answer returned
|
|
|
|
---
|
|
|
|
## Why Legacy Works: Model Compliance
|
|
|
|
### ChatAnthropic Behavior
|
|
- **Strong tool calling:** When instructed to use a tool, it actually uses it
|
|
- **Prompt following:** Adheres to system prompt instructions
|
|
- **Tool-first approach:** Prefers tool calls over text for structured operations
|
|
|
|
### Evidence from Legacy Logs
|
|
```
|
|
Denke nach.. ← Tool call
|
|
Durchsuche Datenbank nach LEDs... ← Tool call
|
|
Berechne Gesamtlagerbestand... ← Tool call
|
|
Formuliere finale Antwort... ← Tool call
|
|
Aus der Datenbank habe ich 801... ← Final text answer
|
|
```
|
|
|
|
Each status update is a **tool call**, not a text message.
|
|
|
|
---
|
|
|
|
## Why Current Fails: Model Non-Compliance
|
|
|
|
### AICenterChatModel Behavior
|
|
- **Weak tool calling:** Doesn't reliably use tools when instructed
|
|
- **Text-first approach:** Generates text messages instead of tool calls
|
|
- **Prompt ignoring:** Doesn't follow "VERBOTEN" instructions
|
|
|
|
### Evidence from Current Logs
|
|
```
|
|
Ich werde die Datenbank nach Artikeln durchsuchen... ← Text message (WRONG!)
|
|
Skipping status update message... ← Filtered out
|
|
Status message detected without tool calls... ← Detected as status
|
|
Routing back to agent... ← Causes loop
|
|
[Repeats 15 times]
|
|
```
|
|
|
|
Each status update is a **text message**, not a tool call, causing the loop.
|
|
|
|
---
|
|
|
|
## System Prompt Comparison
|
|
|
|
### Legacy Prompt (Works)
|
|
```
|
|
STREAMING-UPDATES: Du hast Zugriff auf das Tool "send_streaming_message",
|
|
mit dem du dem Nutzer kurze Status-Updates senden kannst.
|
|
Nutze dieses Tool, um den Nutzer über deine aktuellen Aktivitäten zu informieren.
|
|
```
|
|
|
|
**Tone:** Informative, suggests using the tool.
|
|
|
|
### Current Prompt (Doesn't Work)
|
|
```
|
|
STREAMING-UPDATES - ABSOLUT KRITISCH:
|
|
⚠️⚠️⚠️ WICHTIG: Du MUSST das Tool "send_streaming_message" verwenden,
|
|
um Status-Updates zu senden. VERBOTEN ist es, normale Text-Nachrichten
|
|
für Status-Updates zu schreiben!
|
|
|
|
VERBOTEN: Text-Nachrichten wie "Ich werde die Datenbank durchsuchen..."
|
|
ERLAUBT: Nur das Tool "send_streaming_message" für Status-Updates verwenden!
|
|
```
|
|
|
|
**Tone:** Aggressive, forbids text messages, but model ignores it anyway.
|
|
|
|
**Irony:** The more explicit the prompt, the more the model ignores it.
|
|
|
|
---
|
|
|
|
## Functional Differences Summary
|
|
|
|
| Aspect | Legacy | Current | Impact |
|
|
|--------|--------|---------|--------|
|
|
| **Model Tool Calling** | ✅ Uses tool | ❌ Generates text | **CRITICAL** |
|
|
| **Event Handling** | Tool calls only | Tool calls + text detection | Creates complexity |
|
|
| **Routing Logic** | Simple (tool calls → tools, else → end) | Complex (status detection → route back) | Creates loop |
|
|
| **Message Filtering** | None | Aggressive filtering | Hides the problem |
|
|
| **Prompt Style** | Informative | Aggressive/forbidding | Model ignores anyway |
|
|
|
|
---
|
|
|
|
## Why This Matters
|
|
|
|
### Legacy Success Factors
|
|
1. **Model compliance:** ChatAnthropic follows instructions
|
|
2. **Simple event handling:** Only handles what's expected (tool calls)
|
|
3. **No compensation logic:** Doesn't try to "fix" model behavior
|
|
4. **Trust-based:** Assumes model will use tools correctly
|
|
|
|
### Current Failure Factors
|
|
1. **Model non-compliance:** AICenterChatModel doesn't follow instructions
|
|
2. **Complex event handling:** Tries to handle both tool calls and text
|
|
3. **Compensation logic:** Tries to "fix" model behavior, creates loops
|
|
4. **Distrust-based:** Assumes model won't use tools, tries to compensate
|
|
|
|
---
|
|
|
|
## The Real Problem
|
|
|
|
The current implementation is trying to **compensate for model non-compliance** by:
|
|
1. Detecting status messages in text
|
|
2. Converting them to status events
|
|
3. Routing back to agent to "fix" it
|
|
|
|
But this creates a **feedback loop** because:
|
|
- Agent generates text status message
|
|
- System detects it and routes back
|
|
- Agent generates another text status message
|
|
- Loop continues
|
|
|
|
**The solution is NOT to add more compensation logic.** The solution is to **fix the root cause**: Make the model actually use the tool.
|
|
|
|
---
|
|
|
|
## Recommendations
|
|
|
|
### Short-Term Fix
|
|
1. **Remove status message detection** from `should_continue()` - don't route back
|
|
2. **Remove text-to-status conversion** - only handle tool calls
|
|
3. **Let status messages be stored** - don't filter them aggressively
|
|
4. **Simplify routing** - if no tool calls, end (like legacy)
|
|
|
|
### Long-Term Fix
|
|
1. **Fix model behavior** - Ensure AICenterChatModel actually uses tools
|
|
2. **Improve prompt** - Test different prompt styles to get tool usage
|
|
3. **Model selection** - Use a model that reliably follows tool-calling instructions
|
|
4. **Tool binding** - Verify tools are properly bound and available to model
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The functional difference is **not in the architecture** but in **model behavior**:
|
|
|
|
- **Legacy:** Model uses tools → Simple event handling → Works
|
|
- **Current:** Model doesn't use tools → Complex compensation → Breaks
|
|
|
|
The current implementation is **over-engineered** to compensate for model non-compliance, but this compensation creates more problems than it solves.
|
|
|
|
**The fix is simple:** Make the model use the tool (like legacy), then simplify the event handling to match legacy's simplicity.
|