Botpress Cost Control: AI Task Retry Loops, Bot Handoff Cycles, Knowledge Base Re-Query Fan-Out, and Autonomous Agent Action Spirals
Botpress is an open-source conversational AI platform with a managed cloud hosting option — Botpress Cloud — that powers customer service bots, internal support assistants, and multi-step AI agents for thousands of teams. The platform has evolved from a deterministic intent-matching engine into an LLM-first system where most conversational decisions are made by calling foundation models: Claude, GPT-4o, or open-weight models routed through Botpress's model hub. Botpress Cloud bills these LLM calls as AI credits — a unified credit unit that abstracts across model families, with larger and more capable models costing more credits per call. Credits are consumed against your subscription tier's monthly allotment or against a purchased credit pack; when your allocation exhausts, bot conversations stop processing.
The billing model makes explicit what every AI-agent platform eventually discovers: the cost of a bot deployment is not the cost of a single conversation turn — it is the sum of all LLM invocations triggered by every node in the conversation graph that touches a model, multiplied by the number of times each node fires across all sessions. Four structural patterns in Botpress's node model can push this product well above what's visible when designing the conversation flow in the studio:
- AI Task retry loops — An AI Task node in Botpress calls the LLM to perform a structured transformation or extraction, then validates the result against an output schema or custom condition. If validation fails — because the model produced an out-of-bounds value, a malformed JSON response, or output that fails a regexp check — Botpress can be configured to retry the same AI Task node automatically. Each retry is a full LLM invocation at full credit cost. A validation condition that is impossible to satisfy for a given input (schema too strict, condition too narrow, model systematically producing a disallowed output format for a class of inputs) causes the flow to retry indefinitely until a session timeout or a platform-level retry cap is hit.
- Bot-to-bot handoff cycles — Botpress's Orchestrator feature routes conversations between multiple specialized bots based on the user's current intent. The Orchestrator itself is an LLM call — it reads the conversation context and selects which bot should handle the next turn. If Bot A handles product questions and Bot B handles billing questions, and a user's message is ambiguous, the Orchestrator may route to Bot A. Bot A's fallback for unrecognized intents transfers back to the Orchestrator. The Orchestrator evaluates again and routes to Bot B. Bot B's fallback routes back to the Orchestrator. Each Orchestrator decision and each bot's intent classification is a billed LLM invocation. A conversation that neither bot can resolve cycles until the session ends or the Orchestrator's loop detection trips.
- Knowledge Base re-query fan-out — Botpress's Knowledge Base node fires a semantic search call against your uploaded documents using an embedding model — a separate billable AI credit from the conversation LLM call. When a Search KB node appears inside a retry or re-evaluation loop construct (an Execute Code node that checks confidence and loops back if the threshold isn't met, or a Transition that sends low-confidence answers back to a Search KB node for a second attempt), each loop iteration fires a full KB search call plus a new context window synthesis call. Loops with unsatisfiable confidence thresholds or KB content that cannot answer the question reliably run the search and synthesis calls back-to-back until a maximum iteration count stops them.
- Autonomous agent action spirals — Botpress's Autonomous mode lets an AI agent decide which Actions (tools) to call to complete a user's goal. The planning loop is an LLM call per turn: the model reads the current state, selects the next action, observes the result, and calls the LLM again to decide what to do next. If the available actions are insufficient to complete the stated goal, the model loops — it tries Action A, sees the result doesn't achieve the goal, tries Action A again with a rephrased input, sees the same insufficient result, tries Action B, and cycles back to Action A. Each planning call is a billed LLM invocation. An agent given a goal like "find the customer's order status" with no order-lookup action available can run dozens of planning cycles before a timeout stops it.
Failure Mode 1 — AI Task Retry Loops
AI Task nodes are Botpress's primary mechanism for LLM-driven structured extraction: extract the user's intent entities into a typed schema, classify a message into one of N categories, generate a structured response object that downstream nodes consume. The node sends a prompt to the configured model, receives a response, and optionally validates the response against an output schema (JSON Schema or a custom validation expression before continuing to the next node.
The retry failure mode is straightforward to trigger inadvertently: a developer configures an AI Task to extract an order number from a user message, validates the output against a regex like ^ORD-\d{8}$, and sets the node to retry up to 5 times on validation failure. The regex works for the expected input "my order ORD-12345678 is delayed." It fails for every message where no order number is present — "I haven't received my package yet" — because the model cannot produce an ORD-XXXXXXXX-formatted number from a message that contains no order number. Each of the 5 retries calls the LLM with the same input and receives a similarly invalid output. Five credits consumed to determine what a single credit should have determined: no order number in this message.
At scale, the credit burn compounds. A support bot handling 10,000 conversations per day, where 30% of messages mention a package issue without specifying an order number, triggers 15,000 re-queries per day × 5 retries each = 75,000 additional LLM calls. If the base conversation would have cost 3 credits to resolve, the retry cost adds 5 × cost-per-AI-Task credits per failing session before the retry cap trips. Multiply by per-model credit rates, and the AI Task retry cost exceeds the base conversation cost for that message class.
The retry rule: Every AI Task node with a validation condition that can fail on a class of valid user inputs must gate the retry on a structural pre-check. Before retrying the LLM call, verify that the input message contains the prerequisite signal the extraction is looking for. If the pre-check fails — no order number pattern, no date string, no numeric value — route to a clarification message node instead of retrying the same AI Task with the same input. The LLM cannot extract what is not in the message.
import re
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "botpress_ai_task_guard.db"
MAX_RETRIES_PER_SESSION_PER_TASK = 2
SESSION_WINDOW_SECONDS = 3600
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS ai_task_retries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
task_name TEXT NOT NULL,
attempt INTEGER NOT NULL,
input_pre_check TEXT,
validation_error TEXT,
recorded_at REAL
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_session_task "
"ON ai_task_retries (session_id, task_name, recorded_at)"
)
# Pre-check patterns: signal that the AI Task is extracting a structured value.
# If none of the patterns match, the extraction will fail regardless of retries.
PRE_CHECKS = {
"extract_order_number": re.compile(r"\b(?:ORD|order|#)\s*[-]?\d{5,}", re.I),
"extract_date": re.compile(r"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b|\b(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\w*\s+\d{1,2}", re.I),
"extract_amount": re.compile(r"\$\s*\d+|\d+\s*(?:dollar|usd|cent)", re.I),
}
class AITaskRetryGuard:
"""
Call check() before allowing an AI Task node to retry.
Returns allow=False when the retry cap is reached or when a pre-check
signals that the input cannot satisfy the extraction requirement.
"""
@staticmethod
def check(session_id: str, task_name: str,
user_message: str, validation_error: str) -> dict:
now = time.time()
window_start = now - SESSION_WINDOW_SECONDS
# Structural pre-check: does the input contain the signal we're extracting?
pre_check_pattern = PRE_CHECKS.get(task_name)
if pre_check_pattern and not pre_check_pattern.search(user_message):
return {
"allow": False,
"reason": "input_missing_prerequisite_signal",
"session_id": session_id,
"task_name": task_name,
"validation_error": validation_error,
"message": (
f"AI Task {task_name!r} validation failed and the user message "
f"does not contain the signal pattern required for extraction. "
"Retrying the LLM call will not produce a valid output. "
"Route to a clarification node asking the user to provide "
"the missing information instead of retrying the same input."
),
}
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
recent_retries = conn.execute(
"SELECT COUNT(*) FROM ai_task_retries "
"WHERE session_id = ? AND task_name = ? AND recorded_at > ?",
(session_id, task_name, window_start)
).fetchone()[0]
if recent_retries >= MAX_RETRIES_PER_SESSION_PER_TASK:
return {
"allow": False,
"reason": "retry_cap_reached",
"session_id": session_id,
"task_name": task_name,
"retries_in_window": recent_retries,
"cap": MAX_RETRIES_PER_SESSION_PER_TASK,
"message": (
f"AI Task {task_name!r} has retried {recent_retries} times "
f"in this session (cap: {MAX_RETRIES_PER_SESSION_PER_TASK}). "
"Further retries will consume credits without changing the outcome. "
"Route to a fallback node that collects the required information "
"directly from the user via a structured capture card."
),
}
conn.execute(
"INSERT INTO ai_task_retries "
"(session_id, task_name, attempt, input_pre_check, "
"validation_error, recorded_at) VALUES (?, ?, ?, ?, ?, ?)",
(session_id, task_name,
recent_retries + 1,
"passed" if pre_check_pattern else "no_pattern_configured",
validation_error, now)
)
return {
"allow": True,
"session_id": session_id,
"task_name": task_name,
"attempt_number": recent_retries + 1,
"remaining_retries": MAX_RETRIES_PER_SESSION_PER_TASK - recent_retries - 1,
}
@app.route("/guard/ai-task-retry", methods=["POST"])
def ai_task_retry_guard():
data = request.get_json(force=True)
result = AITaskRetryGuard.check(
session_id=data.get("session_id", ""),
task_name=data.get("task_name", ""),
user_message=data.get("user_message", ""),
validation_error=data.get("validation_error", ""),
)
return jsonify(result), 200 if result["allow"] else 429
if __name__ == "__main__":
init_db()
app.run(port=8100)
Wire this guard into your Botpress flow using an Execute Code node placed between the AI Task's failure output and the Retry transition. In the Execute Code node, call the /guard/ai-task-retry endpoint via a fetch call, passing the current event.nlu.conversationId as session_id, the AI Task node identifier as task_name, the last user message from event.preview as user_message, and the validation error from the AI Task's output variable as validation_error. If the response allow field is false, transition to a clarification or escalation node instead of looping back to the AI Task. The remaining_retries field in successful responses lets downstream logic show progressively more explicit clarification prompts before the cap is reached.
Failure Mode 2 — Bot-to-Bot Handoff Cycles
Botpress's multi-bot architecture supports an Orchestrator bot that routes conversations to specialized bots based on detected intent. The Orchestrator uses an LLM to read the conversation state and decide which bot handles the next turn — this decision is itself a billed AI credit, separate from the credits consumed by the selected bot's own nodes. Specialized bots handle a defined domain: product questions, billing inquiries, technical support, returns processing. Each bot has a fallback path for messages outside its domain — typically a transition back to the Orchestrator so it can re-route.
The handoff cycle emerges when the Orchestrator routes a conversation whose intent doesn't map cleanly to any bot's domain. A user asking "I want to use my store credit toward a subscription upgrade" sits at the boundary between the billing bot (which handles payment methods) and the subscription bot (which handles plan changes). The billing bot's intent classifier doesn't match "subscription upgrade" and returns to the Orchestrator. The Orchestrator evaluates the updated context — now containing the billing bot's failure — and routes to the subscription bot. The subscription bot's classifier doesn't match "store credit" and returns to the Orchestrator. The Orchestrator evaluates again and routes back to the billing bot. Each Orchestrator routing decision and each bot's intent classification consumes AI credits. The cycle repeats until the session times out or a human takeover is triggered.
The handoff cost multiplies with bot chain length. An enterprise Botpress deployment with six specialized bots, each routing failures back to a central Orchestrator, can cycle through all six bots in sequence for an unresolvable query before the Orchestrator finally gives up. Six bot intent-classification calls plus six Orchestrator routing decisions = twelve LLM invocations consuming twelve credits for a conversation that delivered zero value to the user.
The handoff rule: Track the full handoff chain for each conversation by its root session ID. If the Orchestrator routes to a bot that has already appeared in the current handoff chain for this conversation, the routing is circular — block the handoff and route directly to a human escalation node instead of re-entering the cycle. A bot that has already failed to handle this conversation context will fail again on the next visit.
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "botpress_handoff_guard.db"
SESSION_WINDOW_SECONDS = 1800 # 30-minute conversation window
MAX_HANDOFFS_PER_SESSION = 6 # total transfers before escalation
MAX_VISITS_PER_BOT = 2 # same bot can be visited at most twice
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS handoff_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
source_bot TEXT NOT NULL,
target_bot TEXT NOT NULL,
handoff_at REAL NOT NULL
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_session_time "
"ON handoff_log (session_id, handoff_at)"
)
class BotHandoffCycleGuard:
"""
Call check() before the Orchestrator executes a bot transfer.
Returns allow=False when a cycle is detected or total handoff count
for this session exceeds the safety ceiling.
"""
@staticmethod
def check(session_id: str, source_bot: str, target_bot: str) -> dict:
now = time.time()
window_start = now - SESSION_WINDOW_SECONDS
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
rows = conn.execute(
"SELECT source_bot, target_bot FROM handoff_log "
"WHERE session_id = ? AND handoff_at > ? "
"ORDER BY handoff_at ASC",
(session_id, window_start)
).fetchall()
total_handoffs = len(rows)
if total_handoffs >= MAX_HANDOFFS_PER_SESSION:
return {
"allow": False,
"reason": "handoff_ceiling_reached",
"session_id": session_id,
"total_handoffs": total_handoffs,
"ceiling": MAX_HANDOFFS_PER_SESSION,
"chain": [r[1] for r in rows],
"message": (
f"Session {session_id!r} has transferred between bots "
f"{total_handoffs} times (ceiling: {MAX_HANDOFFS_PER_SESSION}). "
"The conversation is unresolvable by the current bot set. "
"Route to human escalation immediately — "
"further LLM routing decisions will not change the outcome."
),
}
# Count how many times target_bot has appeared as a destination
target_visits = sum(1 for _, tgt in rows if tgt == target_bot)
if target_visits >= MAX_VISITS_PER_BOT:
visited_chain = [r[1] for r in rows]
return {
"allow": False,
"reason": "cycle_detected",
"session_id": session_id,
"target_bot": target_bot,
"target_visits": target_visits,
"ceiling": MAX_VISITS_PER_BOT,
"handoff_chain": visited_chain,
"message": (
f"Routing to {target_bot!r} would be its "
f"{target_visits + 1}th visit in this session "
f"(ceiling: {MAX_VISITS_PER_BOT}). "
f"Handoff chain so far: {' → '.join(visited_chain)}. "
f"{target_bot!r} has already failed to resolve this "
"conversation context — re-routing will cycle. "
"Escalate to a human agent instead."
),
}
conn.execute(
"INSERT INTO handoff_log "
"(session_id, source_bot, target_bot, handoff_at) "
"VALUES (?, ?, ?, ?)",
(session_id, source_bot, target_bot, now)
)
return {
"allow": True,
"session_id": session_id,
"source_bot": source_bot,
"target_bot": target_bot,
"handoffs_used": total_handoffs + 1,
"handoffs_remaining": MAX_HANDOFFS_PER_SESSION - total_handoffs - 1,
}
@app.route("/guard/handoff", methods=["POST"])
def handoff_guard():
data = request.get_json(force=True)
result = BotHandoffCycleGuard.check(
session_id=data.get("session_id", ""),
source_bot=data.get("source_bot", ""),
target_bot=data.get("target_bot", ""),
)
return jsonify(result), 200 if result["allow"] else 429
if __name__ == "__main__":
init_db()
app.run(port=8101)
Place this guard in the Orchestrator bot's Execute Code node immediately before every "Transfer to Bot" card. Pass event.nlu.conversationId as session_id, the current Orchestrator bot's identifier as source_bot, and the intended transfer target as target_bot. If the response allow field is false, execute your human-agent handoff action (Botpress's built-in Human Takeover card or your CRM escalation hook) and end the current node execution without proceeding to the Transfer card. The handoff_chain array in cycle-detected responses gives the Orchestrator's reasoning block useful context when writing the escalation summary passed to the human agent — "Conversation visited [billing-bot → subscription-bot → billing-bot] without resolution" conveys exactly what the agent needs to pick up the thread without re-investigating the same intent space.
Failure Mode 3 — Knowledge Base Re-Query Fan-Out
Botpress's Knowledge Base feature lets you upload documents, URLs, or structured data that the bot searches using a semantic embedding model. When a Search KB node fires, Botpress sends the user's query to an embedding model, retrieves the most relevant document chunks, synthesizes an answer using the conversation LLM, and returns the result. This is two AI credit events: one for the embedding search and one for the synthesis call. Both are billed regardless of whether the search finds a relevant answer.
The fan-out failure mode appears when the Search KB node is placed inside a confidence-check loop. A common pattern: Search KB → Execute Code checks the kb.confidence output → if confidence is below threshold (e.g., 0.75), route back to the Search KB node with a slightly rephrased query generated from the conversation context, hoping a different phrasing retrieves a better chunk. If the KB genuinely does not contain the answer — the user is asking about a topic not covered in any uploaded document — the confidence score will never reach 0.75 regardless of how many rephrasing iterations are attempted. Each iteration fires two billed events (embed + synthesize). A loop that allows up to 10 re-query attempts for a knowledge gap costs 20 credits for a conversation that produced no useful answer.
The fan-out becomes a billing crisis during high-traffic events. A product launch announcement that generates 5,000 user questions about a feature not yet documented in the KB produces 5,000 conversations × 10 re-query loop iterations × 2 credits per iteration = 100,000 credits consumed in the event window — for zero successful answers. The KB's inability to answer scales with traffic in the worst possible way.
The re-query rule: Limit KB re-query attempts per session to a maximum of 2 regardless of confidence score. If the second query also fails the confidence threshold, the KB does not have the answer for this question in this session — route to a fallback that either escalates to a human agent or presents the closest available result with an explicit confidence caveat. A third re-query consumes two more credits and statistically produces a lower-confidence result than the second query, not a higher one.
import time
import sqlite3
import threading
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "botpress_kb_guard.db"
MAX_KB_QUERIES_PER_SESSION_PER_TOPIC = 2
TOPIC_WINDOW_SECONDS = 900 # 15-minute window per topic within a session
LOW_CONFIDENCE_THRESHOLD = 0.75
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS kb_query_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
topic_hash TEXT NOT NULL,
confidence_score REAL,
query_text TEXT,
queried_at REAL
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_session_topic "
"ON kb_query_log (session_id, topic_hash, queried_at)"
)
class KBReQueryGuard:
"""
Call check() before issuing a re-query to the Knowledge Base.
topic_hash should be a stable fingerprint of the user's original question
(e.g., first 64 chars lowercased, or a hash of the initial query).
Returns allow=False when the re-query cap for this topic is reached.
"""
@staticmethod
def check(session_id: str, topic_hash: str,
query_text: str, last_confidence: float) -> dict:
now = time.time()
window_start = now - TOPIC_WINDOW_SECONDS
if last_confidence >= LOW_CONFIDENCE_THRESHOLD:
return {
"allow": False,
"reason": "confidence_already_met",
"session_id": session_id,
"topic_hash": topic_hash,
"last_confidence": last_confidence,
"threshold": LOW_CONFIDENCE_THRESHOLD,
"message": (
"Re-query guard called but last_confidence already meets the threshold. "
"This is a guard misconfiguration — the loop condition should have "
"exited before calling the guard. No re-query needed."
),
}
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
prior_queries = conn.execute(
"SELECT COUNT(*), MIN(confidence_score) FROM kb_query_log "
"WHERE session_id = ? AND topic_hash = ? AND queried_at > ?",
(session_id, topic_hash, window_start)
).fetchone()
query_count = prior_queries[0]
best_confidence_so_far = prior_queries[1] or 0.0
if query_count >= MAX_KB_QUERIES_PER_SESSION_PER_TOPIC:
return {
"allow": False,
"reason": "kb_requery_cap_reached",
"session_id": session_id,
"topic_hash": topic_hash,
"queries_in_window": query_count,
"cap": MAX_KB_QUERIES_PER_SESSION_PER_TOPIC,
"best_confidence_seen": best_confidence_so_far,
"message": (
f"Session {session_id!r} has queried the Knowledge Base "
f"{query_count} times for topic {topic_hash!r} "
f"(cap: {MAX_KB_QUERIES_PER_SESSION_PER_TASK}). "
f"Best confidence seen: {best_confidence_so_far:.2f} "
f"(threshold: {LOW_CONFIDENCE_THRESHOLD}). "
"The KB does not have a sufficient answer for this question. "
"Route to human escalation or present the best available result "
"with an explicit low-confidence caveat."
),
}
conn.execute(
"INSERT INTO kb_query_log "
"(session_id, topic_hash, confidence_score, query_text, queried_at) "
"VALUES (?, ?, ?, ?, ?)",
(session_id, topic_hash, last_confidence, query_text[:500], now)
)
return {
"allow": True,
"session_id": session_id,
"topic_hash": topic_hash,
"queries_used": query_count + 1,
"queries_remaining": MAX_KB_QUERIES_PER_SESSION_PER_TOPIC - query_count - 1,
}
@app.route("/guard/kb-requery", methods=["POST"])
def kb_requery_guard():
data = request.get_json(force=True)
result = KBReQueryGuard.check(
session_id=data.get("session_id", ""),
topic_hash=data.get("topic_hash", ""),
query_text=data.get("query_text", ""),
last_confidence=float(data.get("last_confidence", 0.0)),
)
return jsonify(result), 200 if result["allow"] else 429
if __name__ == "__main__":
init_db()
app.run(port=8102)
In your Botpress flow, the Execute Code node that evaluates the KB confidence score should call this guard before looping back to the Search KB node. Compute a stable topic_hash from the user's original question — using the first 64 characters of the message lowercased, stripped of punctuation — so that rephrased re-queries on the same underlying topic share the same counter. Pass the current kb.confidence score as last_confidence. A 429 response routes to your fallback node; a 200 with queries_remaining: 0 means this is the last permitted re-query and the flow should prepare the fallback path even before the next KB response arrives. The best_confidence_seen value in blocked responses is useful for composing the low-confidence caveat message — "I found a partial answer (confidence: 0.52) but I'm not confident it fully addresses your question" is more useful to the user than a generic "I don't know."
Failure Mode 4 — Autonomous Agent Action Spirals
Botpress's Autonomous mode (released in the v12 platform) lets a bot operate as a tool-using agent: it reads the user's goal, selects from a defined set of Actions (HTTP calls, database queries, integrations), executes the selected action, reads the result, and calls the LLM again to decide what to do next. This planning loop continues until the agent concludes the goal is achieved and returns a final response, or until the configured maximum turn count stops it. Each iteration of the planning loop — the "think, act, observe" cycle — is at minimum one LLM call billed as AI credits.
The action spiral failure mode occurs when the goal description and available actions are mismatched in a way that the planning model cannot detect. Consider an agent tasked with "retrieve the customer's current subscription tier and confirm whether they are eligible for the annual discount." The agent has two actions available: get_customer_profile (returns name, email, account creation date) and get_recent_orders (returns the last 5 order records). Neither action returns subscription tier or discount eligibility. The planning model calls get_customer_profile, reads the result, determines the profile doesn't contain tier information, calls get_recent_orders, reads the orders, determines tier is still not visible, calls get_customer_profile again with a different parameter combination (or the same parameters — the model sometimes tries the same tool with minor variations hoping for a different result), observes the same profile data, and loops. Each cycle is one or two LLM invocations. A maximum turn count of 20 allows 20 credits to be spent proving that a goal that requires unavailable information cannot be completed with the available action set.
The spiral rule: Track which actions the autonomous agent has called per session and how many times each action has been called. If the same action appears in three consecutive planning iterations, the agent is in a repetition spiral — the action is not producing the information needed to advance toward the goal, and the planning model is not breaking out of the pattern on its own. Trip the breaker, surface a "goal not achievable with available tools" response to the user, and log the goal description and available action set for the team to review and add the missing action.
import time
import sqlite3
import threading
from collections import Counter
from flask import Flask, request, jsonify
app = Flask(__name__)
db_lock = threading.Lock()
DB_PATH = "botpress_agent_spiral_guard.db"
MAX_TOTAL_ACTIONS_PER_SESSION = 20
MAX_CONSECUTIVE_SAME_ACTION = 3
MAX_SAME_ACTION_TOTAL = 6 # same action called this many times total = spiral
SESSION_WINDOW_SECONDS = 1800
def init_db():
with sqlite3.connect(DB_PATH) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS agent_action_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
action_name TEXT NOT NULL,
action_params_hash TEXT,
result_summary TEXT,
called_at REAL
)
""")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_session_time "
"ON agent_action_log (session_id, called_at)"
)
class AgentActionSpiralGuard:
"""
Call check() before the autonomous agent executes each planned action.
Returns allow=False when spiral or total-turn ceiling is detected.
"""
@staticmethod
def check(session_id: str, action_name: str,
action_params_hash: str = "") -> dict:
now = time.time()
window_start = now - SESSION_WINDOW_SECONDS
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
rows = conn.execute(
"SELECT action_name FROM agent_action_log "
"WHERE session_id = ? AND called_at > ? "
"ORDER BY called_at ASC",
(session_id, window_start)
).fetchall()
action_sequence = [r[0] for r in rows]
total_actions = len(action_sequence)
if total_actions >= MAX_TOTAL_ACTIONS_PER_SESSION:
action_counts = Counter(action_sequence)
return {
"allow": False,
"reason": "total_action_ceiling",
"session_id": session_id,
"action_name": action_name,
"total_actions": total_actions,
"ceiling": MAX_TOTAL_ACTIONS_PER_SESSION,
"action_distribution": dict(action_counts),
"message": (
f"Session {session_id!r} autonomous agent has called "
f"{total_actions} actions (ceiling: {MAX_TOTAL_ACTIONS_PER_SESSION}). "
"The goal has not been achieved within the allowed turn budget. "
"Terminate the agent loop and inform the user that the task "
"could not be completed. Review the goal description and available "
"actions to identify the missing capability."
),
}
# Check consecutive same-action pattern
if action_sequence:
consecutive_count = 0
for prev_action in reversed(action_sequence):
if prev_action == action_name:
consecutive_count += 1
else:
break
if consecutive_count >= MAX_CONSECUTIVE_SAME_ACTION:
return {
"allow": False,
"reason": "consecutive_same_action_spiral",
"session_id": session_id,
"action_name": action_name,
"consecutive_count": consecutive_count,
"ceiling": MAX_CONSECUTIVE_SAME_ACTION,
"message": (
f"Action {action_name!r} has been called {consecutive_count} "
f"consecutive times in session {session_id!r} "
f"(ceiling: {MAX_CONSECUTIVE_SAME_ACTION}). "
"The planning model is repeating the same action without "
"advancing toward the goal — the action output is not providing "
"the information needed to complete the task. "
"Trip the agent loop and surface a 'tool unavailable' response. "
"The missing action must be added to the agent's action set."
),
}
# Check total same-action frequency
same_action_total = sum(1 for a in action_sequence if a == action_name)
if same_action_total >= MAX_SAME_ACTION_TOTAL:
return {
"allow": False,
"reason": "same_action_frequency_spiral",
"session_id": session_id,
"action_name": action_name,
"same_action_total": same_action_total,
"ceiling": MAX_SAME_ACTION_TOTAL,
"message": (
f"Action {action_name!r} has been called {same_action_total} times "
f"total in session {session_id!r} (ceiling: {MAX_SAME_ACTION_TOTAL}). "
"This action is not converging the agent toward goal completion. "
"Terminate the agent loop — repeated calls to the same insufficient "
"action are consuming credits without progress."
),
}
conn.execute(
"INSERT INTO agent_action_log "
"(session_id, action_name, action_params_hash, called_at) "
"VALUES (?, ?, ?, ?)",
(session_id, action_name, action_params_hash, now)
)
return {
"allow": True,
"session_id": session_id,
"action_name": action_name,
"total_actions_used": total_actions + 1,
"actions_remaining": MAX_TOTAL_ACTIONS_PER_SESSION - total_actions - 1,
}
@staticmethod
def record_result(session_id: str, action_name: str,
result_summary: str) -> bool:
with db_lock:
with sqlite3.connect(DB_PATH) as conn:
conn.execute(
"UPDATE agent_action_log SET result_summary = ? "
"WHERE session_id = ? AND action_name = ? "
"AND id = (SELECT MAX(id) FROM agent_action_log "
" WHERE session_id = ? AND action_name = ?)",
(result_summary[:200], session_id, action_name,
session_id, action_name)
)
return True
@app.route("/guard/agent-action", methods=["POST"])
def agent_action_guard():
data = request.get_json(force=True)
result = AgentActionSpiralGuard.check(
session_id=data.get("session_id", ""),
action_name=data.get("action_name", ""),
action_params_hash=data.get("action_params_hash", ""),
)
return jsonify(result), 200 if result["allow"] else 429
@app.route("/guard/agent-action/result", methods=["POST"])
def agent_action_result():
data = request.get_json(force=True)
AgentActionSpiralGuard.record_result(
session_id=data.get("session_id", ""),
action_name=data.get("action_name", ""),
result_summary=data.get("result_summary", ""),
)
return "", 204
if __name__ == "__main__":
init_db()
app.run(port=8103)
In Botpress's Autonomous agent configuration, the agent's action execution lifecycle has pre-action and post-action hooks available via Execute Code nodes injected into the flow. Call /guard/agent-action in the pre-action hook with the current conversation ID as session_id and the planned action name as action_name. Compute action_params_hash as a short hash of the serialized action parameters — this lets you detect not just same-action repetition but same-action-same-params repetition, which is a stronger spiral signal. If the response is 429, set the Botpress agent.isDone flag to true and set the agent's final message to the message field from the guard response — this terminates the planning loop gracefully and gives the user an actionable explanation rather than a generic failure. Call /guard/agent-action/result in the post-action hook with a 200-character summary of the action result; this populates the observability log used by your team when reviewing which actions are spiraling on which goal types.
State Table
| Failure mode | Guard class | Ceiling / trigger | What to watch |
|---|---|---|---|
| AI Task retry loop Validation failure causes LLM re-invocation on input that cannot satisfy the schema |
AITaskRetryGuard |
2 retries per session per task; input pre-check blocks structural mismatches immediately | reason: input_missing_prerequisite_signal frequency; high rate = validation condition too strict for real input distribution |
| Bot handoff cycle Orchestrator routes between bots whose fallbacks send the conversation in a circle |
BotHandoffCycleGuard |
2 visits per bot per session; 6 total handoffs before escalation | handoff_chain patterns in blocked responses; recurring chains reveal domain gaps between adjacent bots |
| Knowledge Base re-query fan-out Search KB node loops on unsatisfiable confidence threshold consuming embed + synthesis credits |
KBReQueryGuard |
2 queries per session per topic; cap routes to fallback on third attempt | best_confidence_seen histogram; topics with persistent sub-0.5 scores are documentation gaps to fill |
| Autonomous agent action spiral Planning model repeats same action or exceeds total turn budget without reaching goal |
AgentActionSpiralGuard |
3 consecutive same-action calls; 6 total same-action calls; 20 total actions per session | action_distribution in ceiling responses; actions with high skew ratios are missing capabilities the goal requires |
Checklist Before Going Live
- Audit every AI Task node that has a retry-on-failure configuration. For each node, define the structural pre-check condition: what signal in the user's message is required for the extraction to succeed? A phone number extraction node requires a digit sequence; an address extraction node requires geographic terms; an order number extraction requires the order ID pattern. Add the pre-check to the guard's
PRE_CHECKSmap before deploying. Nodes without a pre-check still benefit from the retry cap, but adding the pre-check eliminates the retry cost entirely for the most common failure class. - Map the fallback path of every bot in your multi-bot deployment. Draw the directed graph: Bot A's fallback → Orchestrator → Bot B's fallback → Orchestrator → ? Walk every path looking for cycles — any sequence where a bot appears twice in the chain is a cycle waiting to be triggered. For each cycle found, decide whether to widen one bot's domain (so it handles the edge case that currently falls through), create a dedicated edge-case bot for the gap, or accept that the gap goes to human escalation immediately. The handoff guard catches cycles at runtime, but eliminating them structurally reduces Orchestrator LLM calls on every ambiguous query.
- Set an explicit maximum iteration count on every KB confidence-check loop. Botpress's Execute Code loop construct does not enforce a default iteration limit — the developer sets the exit condition. Wherever your flow has a loop that includes a Search KB node, add a counter variable that increments on every loop iteration and add a counter-ceiling exit branch (≥ 2 → route to fallback) alongside the confidence-ceiling exit branch (≥ threshold → route to answer). The guard provides a network-level safety net, but a flow-level counter eliminates the network round-trip for every loop iteration at the cost of one in-memory counter check.
- Verify that your Autonomous agent's goal descriptions are achievable with its available action set before deploying. For each goal type the agent handles, enumerate which actions are needed to satisfy it and confirm all of those actions are in the agent's action list. A goal that requires an action not in the list will spiral. Maintain a goal-to-required-actions mapping document and update it whenever you add a new goal type or modify an existing action's data contract. When the spiral guard fires in production on a new goal type, add the missing action before re-enabling that goal path.
- Monitor AI credit consumption per bot and per node type daily. Botpress Cloud's usage dashboard breaks down credit consumption by workspace and bot. Set daily alerts at 60% of your monthly allocation. When a spike appears, cross-reference it with your guard logs: spikes that correlate with high
retry_cap_reachedevents point to AI Task configuration issues; spikes that correlate with highcycle_detectedevents point to Orchestrator routing gaps; spikes that correlate withtotal_action_ceilingevents point to missing agent actions. Guard logs convert generic "you spent a lot of credits" alerts into actionable "here is which failure mode fired and on which input class." - Test every bot flow with adversarial inputs that match the failure mode triggers before launch. For AI Task retry loops: send messages that match your extraction task's topic but lack the prerequisite signal (ask about an order without including an order number). For bot handoff cycles: craft a message that sits at the exact boundary between two bots' domains. For KB re-query loops: ask a question that is topically related to your KB content but not actually answerable from the documents uploaded. For agent spirals: give the agent a goal that requires information only available from an action not in its action set. Run all four tests against the staging deployment before promoting any bot to production. The guards are the runtime defense; these tests are the pre-flight check.
FAQ
How does Botpress Cloud's AI credit billing model compare to other conversational AI platforms?
Botpress Cloud bills per LLM invocation as AI credits, where the credit cost per call scales with the model tier — lower-cost models consume fewer credits per call. This is structurally similar to Flowise and LangFlow's per-execution credit model and to Mistral AI's per-token API billing, but distinct from Copilot Studio's per-message model (which bills per completed conversation turn, not per underlying LLM call) and Salesforce Agentforce's per-conversation model (which bills per full resolved session). The Botpress model means that every internal retry, re-query, and planning iteration inside a single conversation turn consumes additional credits — costs that would be invisible under a per-message or per-session billing model are explicitly metered by Botpress Cloud.
Does Botpress's built-in maximum turn count on Autonomous agents make the action spiral guard redundant?
No — the built-in turn count is a last-resort ceiling, not a pattern detector. A maximum turn count of 20 still allows 20 credit-consuming LLM calls before the loop is stopped, all of which are wasted when the spiral is detectable after 3 consecutive identical actions. The spiral guard trips at the third consecutive same-action call and at the sixth total same-action call, intercepting the spiral after 3–6 credits rather than after 20. The guard also produces structured diagnostic output — the action distribution and consecutive count — that the built-in turn-count error does not provide. Use both: the built-in count as the absolute last-resort ceiling, the guard as the early-trip mechanism that makes the cost-effective catch.
How do we pick the right MAX_KB_QUERIES_PER_SESSION_PER_TOPIC ceiling for our KB?
Start at 2. The empirical pattern across Knowledge Base deployments is that the second query on the same topic with a different phrasing almost never produces a materially better confidence score than the first if the first score was below 0.5. A rephrased embedding query retrieves different chunks, but if those chunks do not contain the answer either, the synthesis call produces a similarly low-confidence result. If your KB is dense and well-indexed for the topics your users ask about, you may see second queries produce meaningful confidence improvements — in that case, a ceiling of 2 still permits the one productive re-query while blocking the third-and-beyond that produce diminishing returns. If your KB has significant documentation gaps, even a ceiling of 2 wastes one re-query per gap. Monitor best_confidence_seen across all blocked queries; topics where the best confidence never exceeds 0.4 across multiple sessions are documentation gaps to fill, not re-query ceiling calibration problems.
Can we run these guards inside the Botpress flow itself without an external endpoint?
Yes, for simple counters. Botpress's Execute Code node has access to the conversation's variable store, and you can implement retry counting, handoff chain tracking, and KB query counting using flow variables without an external HTTP endpoint. The limitation is persistence: flow variables exist within a single conversation session and are not accessible for cross-session analytics, alerting, or operational visibility into guard fire rates. The external endpoint pattern used in these examples gives you a queryable log of every guard decision — which is the observability layer that tells you which failure modes are firing at scale and which bot configurations need adjustment. For development and low-volume deployments, flow-variable-based guards are a reasonable starting point; for production deployments where the cost of a widespread spiral is significant, the external log is the mechanism that turns a runtime block into a configuration insight.
How do I integrate RunGuard's SDK with Botpress to implement these guards?
Deploy RunGuard on a server accessible from Botpress Cloud's outbound HTTP connections and use RunGuard's LoopDetector as the core engine for the handoff cycle detection and action spiral detection guards. Construct the conversation's action sequence as a list of action name strings and call detector.record(session_id=conversationId, tool=action_name) after each planned action — RunGuard's configurable cycle detection will identify repetition patterns across cycle lengths 1–8 without you maintaining your own action sequence history. For the retry cap and KB re-query limit, use RunGuard's BudgetTracker with cap=2 per session per task — tracker.check() returns the remaining budget before each retry or re-query, and tracker.record() decrements the counter after each allowed invocation. Install with pip install runguard for the Python SDK or npm install @runguard/sdk for TypeScript, and call from Botpress's Execute Code nodes via HTTP fetch.