flow jailbreak detection heuristics
"""
Heuristic checks to assess whether the user's prompt is an attempted jailbreak.
"""
$is_jailbreak = await JailbreakDetectionHeuristicsAction
if $is_jailbreak
if $system.config.enable_rails_exceptions
send JailbreakDetectionRailException(message="Jailbreak attempt detected. The user's prompt was identified as an attempted jailbreak. Please ensure your prompt adheres to the guidelines.")
else
bot refuse to respond
abort
flow jailbreak detection model
"""
Assess whether the user's prompt is an attempted jailbreak using embedding-based jailbreak detection models.
"""
$is_jailbreak = await JailbreakDetectionModelAction
if $is_jailbreak
if $system.config.enable_rails_exceptions
send JailbreakDetectionRailException(message="Jailbreak attempt detected. The user's prompt was identified as an attempted jailbreak. Please ensure your prompt adheres to the guidelines.")
else
bot refuse to respond
abort
← Back to catalog
Jailbreak Detection (NeMo Guardrail)
flow jailbreak detection heuristics """ Heuristic checks to assess whether the user's prompt is an attempted jailbreak. """ $is_jailbreak = await Jail
Flagged for review by the automated scan
- AST01Safety bypass / jailbreakMalicious Skills · High
This is a heuristic text match and may be a false positive (security and coding prompts often mention these terms). Review the content before use. How we scan
- Provider
- NVIDIA
- Origin
- Official
- Type
- Guardrails
- License
- Apache-2.0
- Language
- English
- Added
- 2025-08-25
#guardrail#nemo#rails#colang