Basic Usage
To provide your Agent with the Prompt Injection Guardrail, you need to import it and pass it to the Agent using thepre_hooks
parameter:
Injection patterns
The Prompt Injection Guardrail works by detecting patterns in the input that are likely to be used to inject malicious instructions into your system. The default list of injection patterns handled by the guardrail are:- “ignore previous instructions”
- “ignore your instructions”
- “you are now a”
- “forget everything above”
- “developer mode”
- “override safety”
- “disregard guidelines”
- “system prompt”
- “jailbreak”
- “act as if”
- “pretend you are”
- “roleplay as”
- “simulate being”
- “bypass restrictions”
- “ignore safeguards”
- “admin override”
- “root access”