Ida Dittrich fbb17f1828 Dokumentation zur Roadmap

2026-02-24 13:05:06 +01:00

13 KiB

Raw Blame History

Functional Differences Analysis: Legacy vs Current Chatbot

Executive Summary

The legacy implementation works correctly because the LLM actually uses the send_streaming_message tool as instructed. The current implementation fails because the LLM generates status messages as regular text instead of using the tool, causing an infinite loop when the system tries to handle these text messages.

Core Functional Difference

Legacy: Tool-Based Status Updates (WORKS)

How it works:

System prompt instructs: "Use send_streaming_message tool for status updates"
LLM (ChatAnthropic) follows instructions and calls the tool
Event handler listens ONLY for on_tool_start events with send_streaming_message
When tool is called → routes to tools node → tool executes → routes back to agent
Agent then calls SQL tools → processes results → generates final answer

Code Evidence:

# legacy/chatbot.py line 267
if etype == "on_tool_start" and ename == "send_streaming_message":
    tool_in = edata.get("input") or {}
    msg = tool_in.get("message")
    if isinstance(msg, str) and msg.strip():
        yield {"type": "status", "label": msg.strip()}
    continue

Key Point: Legacy ONLY handles tool calls. It doesn't try to detect status messages in regular text.

Current: Text-Based Status Updates (BROKEN)

How it fails:

System prompt instructs: "MUST use send_streaming_message tool, VERBOTEN to write text messages"
LLM (AICenterChatModel) ignores instructions and generates text messages like "Ich werde die Datenbank nach Artikeln durchsuchen..."
Event handler tries to handle these text messages by detecting them as "status messages"
When status message detected → routes back to agent (to "fix" it)
Agent generates another text status message (still not using tool)
Infinite loop until max iterations (15) reached

Code Evidence:

# gateway/modules/features/chatbot/chatbotStreaming.py line 198-227
if etype == "on_chain_stream" and ename == "agent":
    # Tries to detect status messages in regular text
    if content and is_status_message(content):
        await _emit_status_event(...)  # Convert to status event
        continue  # Don't store as message

# gateway/modules/features/chatbot/chatbotLangGraph.py line 292-296
if is_status:
    # Status message without tool calls - route back to agent
    # The agent should then call actual tools (like sqlite_query)
    logger.info(f"Status message detected without tool calls, routing back to agent...")
    return "agent"  # THIS CAUSES THE LOOP

Key Point: Current tries to compensate for LLM not following instructions, but this creates a loop.

Why They Differ: Root Causes

1. Model Behavior Difference

Aspect	Legacy (ChatAnthropic)	Current (AICenterChatModel)
Tool Calling	Follows prompt, uses `send_streaming_message` tool	Ignores prompt, generates text instead
Instruction Following	Strong adherence to system prompt	Weak adherence to system prompt
Model Type	Direct LangChain integration	Bridge to AI center (may use different models)

Impact: The current model doesn't follow the instruction to use the tool, so it generates text messages that break the workflow.

2. Event Handling Strategy

Legacy Event Handling

# Simple: Only listen for tool calls
if etype == "on_tool_start" and ename == "send_streaming_message":
    # Handle tool call
    yield {"type": "status", "label": msg}
    continue  # Done, move on

Strategy: Trust the LLM to use the tool. Only handle tool calls.

Current Event Handling

# Complex: Try to handle both tool calls AND text messages
if etype == "on_tool_start" and ename == "send_streaming_message":
    # Handle tool call (same as legacy)
    await _emit_status_event(...)
    
if etype == "on_chain_stream" and ename == "agent":
    # ALSO try to detect status messages in text
    if is_status_message(content):
        await _emit_status_event(...)  # Convert text to status

Strategy: Don't trust the LLM. Try to compensate by detecting status messages in text.

Problem: This creates a feedback loop where status messages trigger re-routing, causing infinite loops.

3. Workflow Routing Logic

Legacy Routing (`should_continue`)

# Simple logic
def should_continue(state: ChatState) -> str:
    last_message = state.messages[-1]
    tool_calls = getattr(last_message, "tool_calls", None)
    
    if tool_calls:
        return "tools"  # Has tool calls → execute tools
    else:
        return END  # No tool calls → done

Key Point: No special handling for status messages. If there are tool calls, execute them. Otherwise, end.

Current Routing (`should_continue`)

# Complex logic with status detection
def should_continue(state: ChatState) -> str:
    last_message = state.messages[-1]
    tool_calls = getattr(last_message, "tool_calls", None)
    
    if tool_calls:
        return "tools"
    
    # NEW: Check if it's a status message
    if isinstance(last_message, AIMessage):
        content = last_message.content
        if is_status_message(content):
            return "agent"  # Route back to agent (CAUSES LOOP!)
    
    return END

Key Point: Tries to "fix" status messages by routing back to agent, but agent just generates another status message.

4. Message Filtering

Legacy: No Filtering

All messages are stored in memory
Status messages from tool calls are handled, but messages themselves are stored
No filtering of "status-like" text messages

Current: Aggressive Filtering

# chatbotMemory.py - Filters out status messages
if content:
    content_lower = content.lower().strip()
    status_patterns = ["ich werde", "ich suche", ...]
    if len(content) < 150 and any(pattern in content_lower for pattern in status_patterns):
        logger.debug(f"Skipping status update message...")
        continue  # Don't store

# chatbotLangGraph.py - Filters from conversation window
if content and is_status_message(content):
    logger.debug(f"Filtering out status message from conversation window...")
    # Skip this message

Problem: Status messages are filtered out, so they don't accumulate in memory, but the agent keeps generating them, creating a loop.

The Infinite Loop Explained

What Happens in Current Implementation

User asks: "wie viele leds haben wir auf lager"
Agent generates: "Ich werde die Datenbank nach Artikeln durchsuchen..." (text message, NO tool call)
Status detection: is_status_message() returns True
Routing: should_continue() returns "agent" (route back)
Memory filtering: Message is filtered out (not stored)
Agent called again: Generates another status message (still no tool call)
Repeat steps 3-6 until max iterations (15) reached
Workflow ends: No final answer, only status messages

What Should Happen (Like Legacy)

User asks: "wie viele leds haben wir auf lager"
Agent calls tool: send_streaming_message("Durchsuche Datenbank nach LEDs...") (tool call)
Tool execution: Tool node executes, emits status event
Routing: should_continue() returns "tools" → tools execute → back to agent
Agent calls SQL tool: sqlite_query("SELECT ...") (tool call)
SQL execution: Tool node executes query, returns results
Agent processes: Generates final answer with results
Workflow ends: Final answer returned

Why Legacy Works: Model Compliance

ChatAnthropic Behavior

Strong tool calling: When instructed to use a tool, it actually uses it
Prompt following: Adheres to system prompt instructions
Tool-first approach: Prefers tool calls over text for structured operations

Evidence from Legacy Logs

Denke nach..                                    ← Tool call
Durchsuche Datenbank nach LEDs...              ← Tool call
Berechne Gesamtlagerbestand...                ← Tool call
Formuliere finale Antwort...                   ← Tool call
Aus der Datenbank habe ich 801...             ← Final text answer

Each status update is a tool call, not a text message.

Why Current Fails: Model Non-Compliance

AICenterChatModel Behavior

Weak tool calling: Doesn't reliably use tools when instructed
Text-first approach: Generates text messages instead of tool calls
Prompt ignoring: Doesn't follow "VERBOTEN" instructions

Evidence from Current Logs

Ich werde die Datenbank nach Artikeln durchsuchen...  ← Text message (WRONG!)
Skipping status update message...                    ← Filtered out
Status message detected without tool calls...        ← Detected as status
Routing back to agent...                            ← Causes loop
[Repeats 15 times]

Each status update is a text message, not a tool call, causing the loop.

System Prompt Comparison

Legacy Prompt (Works)

STREAMING-UPDATES: Du hast Zugriff auf das Tool "send_streaming_message", 
mit dem du dem Nutzer kurze Status-Updates senden kannst. 
Nutze dieses Tool, um den Nutzer über deine aktuellen Aktivitäten zu informieren.

Tone: Informative, suggests using the tool.

Current Prompt (Doesn't Work)

STREAMING-UPDATES - ABSOLUT KRITISCH:
⚠️⚠️⚠️ WICHTIG: Du MUSST das Tool "send_streaming_message" verwenden, 
um Status-Updates zu senden. VERBOTEN ist es, normale Text-Nachrichten 
für Status-Updates zu schreiben!

VERBOTEN: Text-Nachrichten wie "Ich werde die Datenbank durchsuchen..."
ERLAUBT: Nur das Tool "send_streaming_message" für Status-Updates verwenden!

Tone: Aggressive, forbids text messages, but model ignores it anyway.

Irony: The more explicit the prompt, the more the model ignores it.

Functional Differences Summary

Aspect	Legacy	Current	Impact
Model Tool Calling	✅ Uses tool	❌ Generates text	CRITICAL
Event Handling	Tool calls only	Tool calls + text detection	Creates complexity
Routing Logic	Simple (tool calls → tools, else → end)	Complex (status detection → route back)	Creates loop
Message Filtering	None	Aggressive filtering	Hides the problem
Prompt Style	Informative	Aggressive/forbidding	Model ignores anyway

Why This Matters

Legacy Success Factors

Model compliance: ChatAnthropic follows instructions
Simple event handling: Only handles what's expected (tool calls)
No compensation logic: Doesn't try to "fix" model behavior
Trust-based: Assumes model will use tools correctly

Current Failure Factors

Model non-compliance: AICenterChatModel doesn't follow instructions
Complex event handling: Tries to handle both tool calls and text
Compensation logic: Tries to "fix" model behavior, creates loops
Distrust-based: Assumes model won't use tools, tries to compensate

The Real Problem

The current implementation is trying to compensate for model non-compliance by:

Detecting status messages in text
Converting them to status events
Routing back to agent to "fix" it

But this creates a feedback loop because:

Agent generates text status message
System detects it and routes back
Agent generates another text status message
Loop continues

The solution is NOT to add more compensation logic. The solution is to fix the root cause: Make the model actually use the tool.

Recommendations

Short-Term Fix

Remove status message detection from should_continue() - don't route back
Remove text-to-status conversion - only handle tool calls
Let status messages be stored - don't filter them aggressively
Simplify routing - if no tool calls, end (like legacy)

Long-Term Fix

Fix model behavior - Ensure AICenterChatModel actually uses tools
Improve prompt - Test different prompt styles to get tool usage
Model selection - Use a model that reliably follows tool-calling instructions
Tool binding - Verify tools are properly bound and available to model

Conclusion

The functional difference is not in the architecture but in model behavior:

Legacy: Model uses tools → Simple event handling → Works
Current: Model doesn't use tools → Complex compensation → Breaks

The current implementation is over-engineered to compensate for model non-compliance, but this compensation creates more problems than it solves.

The fix is simple: Make the model use the tool (like legacy), then simplify the event handling to match legacy's simplicity.

13 KiB Raw Blame History