wiki/implementation/Chatbot/FUNCTIONAL_DIFFERENCES_ANALYSIS.md

13 KiB

Functional Differences Analysis: Legacy vs Current Chatbot

Executive Summary

The legacy implementation works correctly because the LLM actually uses the send_streaming_message tool as instructed. The current implementation fails because the LLM generates status messages as regular text instead of using the tool, causing an infinite loop when the system tries to handle these text messages.


Core Functional Difference

Legacy: Tool-Based Status Updates (WORKS)

How it works:

  1. System prompt instructs: "Use send_streaming_message tool for status updates"
  2. LLM (ChatAnthropic) follows instructions and calls the tool
  3. Event handler listens ONLY for on_tool_start events with send_streaming_message
  4. When tool is called → routes to tools node → tool executes → routes back to agent
  5. Agent then calls SQL tools → processes results → generates final answer

Code Evidence:

# legacy/chatbot.py line 267
if etype == "on_tool_start" and ename == "send_streaming_message":
    tool_in = edata.get("input") or {}
    msg = tool_in.get("message")
    if isinstance(msg, str) and msg.strip():
        yield {"type": "status", "label": msg.strip()}
    continue

Key Point: Legacy ONLY handles tool calls. It doesn't try to detect status messages in regular text.


Current: Text-Based Status Updates (BROKEN)

How it fails:

  1. System prompt instructs: "MUST use send_streaming_message tool, VERBOTEN to write text messages"
  2. LLM (AICenterChatModel) ignores instructions and generates text messages like "Ich werde die Datenbank nach Artikeln durchsuchen..."
  3. Event handler tries to handle these text messages by detecting them as "status messages"
  4. When status message detected → routes back to agent (to "fix" it)
  5. Agent generates another text status message (still not using tool)
  6. Infinite loop until max iterations (15) reached

Code Evidence:

# gateway/modules/features/chatbot/chatbotStreaming.py line 198-227
if etype == "on_chain_stream" and ename == "agent":
    # Tries to detect status messages in regular text
    if content and is_status_message(content):
        await _emit_status_event(...)  # Convert to status event
        continue  # Don't store as message
# gateway/modules/features/chatbot/chatbotLangGraph.py line 292-296
if is_status:
    # Status message without tool calls - route back to agent
    # The agent should then call actual tools (like sqlite_query)
    logger.info(f"Status message detected without tool calls, routing back to agent...")
    return "agent"  # THIS CAUSES THE LOOP

Key Point: Current tries to compensate for LLM not following instructions, but this creates a loop.


Why They Differ: Root Causes

1. Model Behavior Difference

Aspect Legacy (ChatAnthropic) Current (AICenterChatModel)
Tool Calling Follows prompt, uses send_streaming_message tool Ignores prompt, generates text instead
Instruction Following Strong adherence to system prompt Weak adherence to system prompt
Model Type Direct LangChain integration Bridge to AI center (may use different models)

Impact: The current model doesn't follow the instruction to use the tool, so it generates text messages that break the workflow.


2. Event Handling Strategy

Legacy Event Handling

# Simple: Only listen for tool calls
if etype == "on_tool_start" and ename == "send_streaming_message":
    # Handle tool call
    yield {"type": "status", "label": msg}
    continue  # Done, move on

Strategy: Trust the LLM to use the tool. Only handle tool calls.

Current Event Handling

# Complex: Try to handle both tool calls AND text messages
if etype == "on_tool_start" and ename == "send_streaming_message":
    # Handle tool call (same as legacy)
    await _emit_status_event(...)
    
if etype == "on_chain_stream" and ename == "agent":
    # ALSO try to detect status messages in text
    if is_status_message(content):
        await _emit_status_event(...)  # Convert text to status

Strategy: Don't trust the LLM. Try to compensate by detecting status messages in text.

Problem: This creates a feedback loop where status messages trigger re-routing, causing infinite loops.


3. Workflow Routing Logic

Legacy Routing (should_continue)

# Simple logic
def should_continue(state: ChatState) -> str:
    last_message = state.messages[-1]
    tool_calls = getattr(last_message, "tool_calls", None)
    
    if tool_calls:
        return "tools"  # Has tool calls → execute tools
    else:
        return END  # No tool calls → done

Key Point: No special handling for status messages. If there are tool calls, execute them. Otherwise, end.

Current Routing (should_continue)

# Complex logic with status detection
def should_continue(state: ChatState) -> str:
    last_message = state.messages[-1]
    tool_calls = getattr(last_message, "tool_calls", None)
    
    if tool_calls:
        return "tools"
    
    # NEW: Check if it's a status message
    if isinstance(last_message, AIMessage):
        content = last_message.content
        if is_status_message(content):
            return "agent"  # Route back to agent (CAUSES LOOP!)
    
    return END

Key Point: Tries to "fix" status messages by routing back to agent, but agent just generates another status message.


4. Message Filtering

Legacy: No Filtering

  • All messages are stored in memory
  • Status messages from tool calls are handled, but messages themselves are stored
  • No filtering of "status-like" text messages

Current: Aggressive Filtering

# chatbotMemory.py - Filters out status messages
if content:
    content_lower = content.lower().strip()
    status_patterns = ["ich werde", "ich suche", ...]
    if len(content) < 150 and any(pattern in content_lower for pattern in status_patterns):
        logger.debug(f"Skipping status update message...")
        continue  # Don't store
# chatbotLangGraph.py - Filters from conversation window
if content and is_status_message(content):
    logger.debug(f"Filtering out status message from conversation window...")
    # Skip this message

Problem: Status messages are filtered out, so they don't accumulate in memory, but the agent keeps generating them, creating a loop.


The Infinite Loop Explained

What Happens in Current Implementation

  1. User asks: "wie viele leds haben wir auf lager"
  2. Agent generates: "Ich werde die Datenbank nach Artikeln durchsuchen..." (text message, NO tool call)
  3. Status detection: is_status_message() returns True
  4. Routing: should_continue() returns "agent" (route back)
  5. Memory filtering: Message is filtered out (not stored)
  6. Agent called again: Generates another status message (still no tool call)
  7. Repeat steps 3-6 until max iterations (15) reached
  8. Workflow ends: No final answer, only status messages

What Should Happen (Like Legacy)

  1. User asks: "wie viele leds haben wir auf lager"
  2. Agent calls tool: send_streaming_message("Durchsuche Datenbank nach LEDs...") (tool call)
  3. Tool execution: Tool node executes, emits status event
  4. Routing: should_continue() returns "tools" → tools execute → back to agent
  5. Agent calls SQL tool: sqlite_query("SELECT ...") (tool call)
  6. SQL execution: Tool node executes query, returns results
  7. Agent processes: Generates final answer with results
  8. Workflow ends: Final answer returned

Why Legacy Works: Model Compliance

ChatAnthropic Behavior

  • Strong tool calling: When instructed to use a tool, it actually uses it
  • Prompt following: Adheres to system prompt instructions
  • Tool-first approach: Prefers tool calls over text for structured operations

Evidence from Legacy Logs

Denke nach..                                    ← Tool call
Durchsuche Datenbank nach LEDs...              ← Tool call
Berechne Gesamtlagerbestand...                ← Tool call
Formuliere finale Antwort...                   ← Tool call
Aus der Datenbank habe ich 801...             ← Final text answer

Each status update is a tool call, not a text message.


Why Current Fails: Model Non-Compliance

AICenterChatModel Behavior

  • Weak tool calling: Doesn't reliably use tools when instructed
  • Text-first approach: Generates text messages instead of tool calls
  • Prompt ignoring: Doesn't follow "VERBOTEN" instructions

Evidence from Current Logs

Ich werde die Datenbank nach Artikeln durchsuchen...  ← Text message (WRONG!)
Skipping status update message...                    ← Filtered out
Status message detected without tool calls...        ← Detected as status
Routing back to agent...                            ← Causes loop
[Repeats 15 times]

Each status update is a text message, not a tool call, causing the loop.


System Prompt Comparison

Legacy Prompt (Works)

STREAMING-UPDATES: Du hast Zugriff auf das Tool "send_streaming_message", 
mit dem du dem Nutzer kurze Status-Updates senden kannst. 
Nutze dieses Tool, um den Nutzer über deine aktuellen Aktivitäten zu informieren.

Tone: Informative, suggests using the tool.

Current Prompt (Doesn't Work)

STREAMING-UPDATES - ABSOLUT KRITISCH:
⚠️⚠️⚠️ WICHTIG: Du MUSST das Tool "send_streaming_message" verwenden, 
um Status-Updates zu senden. VERBOTEN ist es, normale Text-Nachrichten 
für Status-Updates zu schreiben!

VERBOTEN: Text-Nachrichten wie "Ich werde die Datenbank durchsuchen..."
ERLAUBT: Nur das Tool "send_streaming_message" für Status-Updates verwenden!

Tone: Aggressive, forbids text messages, but model ignores it anyway.

Irony: The more explicit the prompt, the more the model ignores it.


Functional Differences Summary

Aspect Legacy Current Impact
Model Tool Calling Uses tool Generates text CRITICAL
Event Handling Tool calls only Tool calls + text detection Creates complexity
Routing Logic Simple (tool calls → tools, else → end) Complex (status detection → route back) Creates loop
Message Filtering None Aggressive filtering Hides the problem
Prompt Style Informative Aggressive/forbidding Model ignores anyway

Why This Matters

Legacy Success Factors

  1. Model compliance: ChatAnthropic follows instructions
  2. Simple event handling: Only handles what's expected (tool calls)
  3. No compensation logic: Doesn't try to "fix" model behavior
  4. Trust-based: Assumes model will use tools correctly

Current Failure Factors

  1. Model non-compliance: AICenterChatModel doesn't follow instructions
  2. Complex event handling: Tries to handle both tool calls and text
  3. Compensation logic: Tries to "fix" model behavior, creates loops
  4. Distrust-based: Assumes model won't use tools, tries to compensate

The Real Problem

The current implementation is trying to compensate for model non-compliance by:

  1. Detecting status messages in text
  2. Converting them to status events
  3. Routing back to agent to "fix" it

But this creates a feedback loop because:

  • Agent generates text status message
  • System detects it and routes back
  • Agent generates another text status message
  • Loop continues

The solution is NOT to add more compensation logic. The solution is to fix the root cause: Make the model actually use the tool.


Recommendations

Short-Term Fix

  1. Remove status message detection from should_continue() - don't route back
  2. Remove text-to-status conversion - only handle tool calls
  3. Let status messages be stored - don't filter them aggressively
  4. Simplify routing - if no tool calls, end (like legacy)

Long-Term Fix

  1. Fix model behavior - Ensure AICenterChatModel actually uses tools
  2. Improve prompt - Test different prompt styles to get tool usage
  3. Model selection - Use a model that reliably follows tool-calling instructions
  4. Tool binding - Verify tools are properly bound and available to model

Conclusion

The functional difference is not in the architecture but in model behavior:

  • Legacy: Model uses tools → Simple event handling → Works
  • Current: Model doesn't use tools → Complex compensation → Breaks

The current implementation is over-engineered to compensate for model non-compliance, but this compensation creates more problems than it solves.

The fix is simple: Make the model use the tool (like legacy), then simplify the event handling to match legacy's simplicity.