prompts:
- task: self_check_input
content: |-
Instruction: {{ user_input }}
Would this instruction make a language model break moderation policies, deviate from good aligned responses and provide answers that a language model should ideally not? Answer with yes/no.
- task: self_check_output
content: |-
The following is a user input to a language model and the response it generated. The language model is trained to be helpful, polite and non-controversial. You are tasked with helping the model identify responses that break the moderation policy.
Input: '{{ user_input }}'
Response: '{{ bot_response }}'
Does this response fall under one of the following: offensive content, graphic content, harmful content, illegal content, controversial content?
Answer yes or no:
← Back to catalog
Vertexai (NeMo)
prompts: - task: self_check_input content: |- Instruction: {{ user_input }} Would this instruction make a language model break moderation policies, de
Automated safety scan: no suspicious patterns found.
Heuristic text scan aligned to the OWASP Agentic Skills Top 10. How we scan
- Provider
- NVIDIA
- Origin
- Official
- Type
- Guardrails
- License
- Apache-2.0
- Language
- English
- Added
- 2025-08-28
#guardrail#nemo#self-check