Guard rails failing?

edorta.garcia · April 13, 2025, 2:27pm

Hello,

We have activated all guardrails for rephrasing feature, for example, we have included some topics that need to be avoided in the “restrict topics” guard rail configuration (war, politics and religion).

When we ask the VA to tell some information or opinions about any of those topics, some of the dialog tasks or the fallback alternative is triggered and the topic is avoided. However in the debug window we can’t see any information regarding a guard rail has been triggered to avoid the topic.

Is this correct? the documentation says there should be a message visible in debug console about this.

What is happening? Are really the guard rails working?

Any example we can implement to check it?

Best regards,
Edo.

jaswanth.narne · September 9, 2025, 3:59pm

Hey Edorta,

Welcome to the Kore.ai Community!

To answer your question, it seems like the guardrails are working perfectly fine, but I understand that you’re not seeing a specific message in the debug window when a guardrail is triggered.
Here’s a suggestion to verify if the guardrails are functioning as expected:
Try running a query after initiating a task.
After that, ask a question related to one of the restricted topics (like politics, war, or religion).

If the guardrails are working, the bot should avoid responding to those topics and trigger the fallback message you’ve configured. The debug window may not always show an explicit Guardrail triggered message, but you should see that the input is flagged as invalid in the is_valid field and the restricted topic will have a high score under RestrictTopics.
Here’s an example JSON output that you should see after a query containing a restricted topic like politics:
{
“source”: “Guardrails”,
“pN”: “NLPGuard”,
“req”: {
“scanner”: {
“Regex”: [
“politics”,
“hacking”,
“age”
],
“PromptInjection”: {
“enabled”: true
},
“Toxicity”: {
“enabled”: true
},
“RestrictTopics”: {
“Topics”: [
“badwords”,
“politics”,
“hacking”
]
}
},
“query”: What to do you think about the politics in india ?",
“req_type”: “llm_request”,
“output”: “”
},
“sT”: “2025-09-09T13:29:15.224Z”,
“eT”: “2025-09-09T13:29:15.354Z”,
“tt”: 130,
“sC”: 200,
“res”: {
“is_valid”: false,
“scanners”: {
“PromptInjection”: 0,
“Toxicity”: 0,
“RestrictTopics”: 0.89
},
“meta”: null
}
}
As per this JSON response, the RestrictTopics scanner flagged the query due to the word politics, and the scanner score for RestrictTopics is 0.89, indicating that the guardrails have detected the restricted topic.
The is_valid: false confirms that the system correctly blocked the input with the restricted topic, and fallback actions should trigger.

To ensure the guardrails are properly configured, you can test the system by:

Initiating a task
Asking a question related to a restricted topic (like politics).
Check the debug window for is_valid: false and a high score under RestrictTopics.

If the output is as expected (no response for restricted topics and a fallback message), the guardrails are working fine!
Thanks,
Jaswanth Narne
Kore.ai Community Team