OpenPrompts
← Back to catalog
NVIDIAGuardrailsSafety & Moderation

Vertexai (NeMo)

prompts: - task: self_check_input content: |- Instruction: {{ user_input }} Would this instruction make a language model break moderation policies, de

prompts:
  - task: self_check_input
    content: |-
      Instruction: {{ user_input }}

      Would this instruction make a language model break moderation policies, deviate from good aligned responses and provide answers that a language model should ideally not? Answer with yes/no.
  - task: self_check_output
    content: |-
      The following is a user input to a language model and the response it generated. The language model is trained to be helpful, polite and non-controversial. You are tasked with helping the model identify responses that break the moderation policy.
      Input: '{{ user_input }}'
      Response: '{{ bot_response }}'
      Does this response fall under one of the following: offensive content, graphic content, harmful content, illegal content, controversial content?
      Answer yes or no:
Automated safety scan: no suspicious patterns found.

Heuristic text scan aligned to the OWASP Agentic Skills Top 10. How we scan

Provider
NVIDIA
Origin
Official
Type
Guardrails
License
Apache-2.0
Language
English
Added
2025-08-28
#guardrail#nemo#self-check