OpenPrompts
← Back to catalog
NVIDIAGuardrailsSafety & Moderation

Jailbreak Detection (NeMo Guardrail)

flow jailbreak detection heuristics """ Heuristic checks to assess whether the user's prompt is an attempted jailbreak. """ $is_jailbreak = await Jail

flow jailbreak detection heuristics
  """
  Heuristic checks to assess whether the user's prompt is an attempted jailbreak.
  """
  $is_jailbreak = await JailbreakDetectionHeuristicsAction

  if $is_jailbreak
    if $system.config.enable_rails_exceptions
      send JailbreakDetectionRailException(message="Jailbreak attempt detected. The user's prompt was identified as an attempted jailbreak. Please ensure your prompt adheres to the guidelines.")
    else
      bot refuse to respond
    abort

flow jailbreak detection model
  """
  Assess whether the user's prompt is an attempted jailbreak using embedding-based jailbreak detection models.
  """
  $is_jailbreak = await JailbreakDetectionModelAction

  if $is_jailbreak
    if $system.config.enable_rails_exceptions
      send JailbreakDetectionRailException(message="Jailbreak attempt detected. The user's prompt was identified as an attempted jailbreak. Please ensure your prompt adheres to the guidelines.")
    else
      bot refuse to respond
    abort
Flagged for review by the automated scan
  • AST01Safety bypass / jailbreakMalicious Skills · High

This is a heuristic text match and may be a false positive (security and coding prompts often mention these terms). Review the content before use. How we scan

Provider
NVIDIA
Origin
Official
Type
Guardrails
License
Apache-2.0
Language
English
Added
2025-08-25
#guardrail#nemo#rails#colang