Security Shield
Security Shield is AgentGazer's built-in protection layer that monitors and filters AI agent communications in real-time. It provides defense-in-depth through three categories of protection:
- Prompt Injection Detection — Identifies attempts to manipulate the AI's instructions
- Sensitive Data Masking — Automatically redacts sensitive information from requests and responses
- Tool Call Restrictions — Controls which tools agents are allowed to use
Enabling Security Shield
Navigate to the Security page in the dashboard. Select an agent or use "Global Default" to apply rules to all agents. Toggle individual rules on/off and save your configuration.
Per-Agent Configuration
You can configure different security rules for each agent. This allows stricter rules for customer-facing agents while relaxing rules for internal testing agents.
Prompt Injection Detection
Prompt injection attacks attempt to manipulate AI behavior by inserting malicious instructions. AgentGazer detects four categories of prompt injection:
Ignore Instructions
Detects attempts to make the AI disregard its original instructions or system prompt.
Detected Patterns:
- "ignore all previous instructions"
- "forget your rules"
- "disregard prior context"
- "do not follow your original instructions"
Severity: Critical
When to Enable: Always recommended for production agents. Disable only for testing environments where you need to override agent behavior.
System Prompt Override
Detects attempts to inject a new system prompt or override the existing one.
Detected Patterns:
- "new system prompt:"
- "override system message"
- "enable developer mode"
- "sudo mode" / "admin access"
- Messages starting with "System:"
Severity: Critical (most patterns), Warning (System: prefix)
When to Enable: Essential for agents that handle untrusted user input. Attackers often try to escalate privileges through fake system prompts.
Role Hijacking
Detects attempts to make the AI assume a different identity or persona that bypasses its safety guidelines.
Detected Patterns:
- "you are now a ..."
- "pretend to be ..."
- "act as if you are ..."
- "roleplay as ..."
- "assume the identity of ..."
Severity: Critical (identity assumption), Warning (roleplay requests)
When to Enable: Recommended for agents with defined personas. Can be disabled for creative/roleplay applications where persona switching is intended.
Jailbreak Attempts
Detects known jailbreak techniques and attempts to bypass AI safety measures.
Detected Patterns:
- "DAN" (Do Anything Now) prompts
- "jailbreak" keyword
- "bypass safety filters"
- "remove restrictions"
- "without any limitations"
- "disable safety checks"
Severity: Critical
When to Enable: Always recommended. These patterns indicate intentional attempts to circumvent safety measures.
Sensitive Data Masking
Data masking automatically detects and redacts sensitive information to prevent accidental data leakage. Detected content is replaced with [REDACTED].
API Keys
Detects API keys and access tokens from major providers.
Supported Formats:
| Provider | Pattern |
|---|---|
| OpenAI | sk-... (32+ chars) |
| Anthropic | sk-ant-... (32+ chars) |
AIza... (39 chars) | |
| AWS | AKIA... (20 chars) |
| GitHub | ghp_, gho_, ghu_, ghs_, ghr_ tokens |
| Stripe | sk_live_, sk_test_ keys |
| Generic | api_key=, access_token= patterns |
When to Enable: Always recommended. Prevents accidental exposure of credentials in logs, responses, or external integrations.
Credit Cards
Detects credit card numbers from major payment networks.
Supported Formats:
- Visa (starts with 4)
- Mastercard (starts with 51-55)
- American Express (starts with 34 or 37)
- CVV/CVC codes
When to Enable: Essential for agents that handle financial data or customer information.
Personal Data
Detects personally identifiable information (PII).
Supported Formats:
- Social Security Numbers (XXX-XX-XXXX)
- Email addresses
- US phone numbers
- Passport numbers
When to Enable: Required for compliance with GDPR, HIPAA, and other privacy regulations. Consider enabling for all customer-facing agents.
Crypto Wallets & Keys
Detects cryptocurrency addresses and private keys.
Supported Formats:
- Bitcoin addresses (legacy and bech32)
- Ethereum addresses (0x...)
- Ethereum private keys (0x... 64 chars)
- Solana addresses
- Seed phrases (12 or 24 words)
When to Enable: Critical for crypto-related applications. Protects against accidental exposure of wallet addresses or private keys.
Environment Variables
Detects secrets commonly stored in environment variables.
Supported Formats:
password=,passwd=,pwd=secret=,private_key=- Database connection strings (postgres://, mysql://, mongodb://, redis://)
When to Enable: Recommended for development environments. Prevents leaking configuration secrets through error messages or debug output.
Tool Call Restrictions
Tool restrictions control which categories of tools an agent can invoke. This provides defense against tool-based attacks and limits the blast radius of compromised agents.
Filesystem Operations
Blocks tools that read, write, or manipulate files and directories.
Blocked Tool Patterns:
read_file,write_file,delete_fileread_dir,write_dir,list_dir- Tools starting with
fs_,file_,path_
When to Enable: For agents that should not have filesystem access. Prevents unauthorized reading of sensitive files or writing malicious content.
Network Operations
Blocks tools that make HTTP requests or send external communications.
Blocked Tool Patterns:
http_,fetch_,curl_,wget_,request_get_url,post_url,api_callsend_email,send_sms,webhook_
When to Enable: For sandboxed agents that should not communicate externally. Prevents data exfiltration and unauthorized API calls.
Code Execution
Blocks tools that execute arbitrary code or shell commands.
Blocked Tool Patterns:
exec,execute,run,evalshell,bash,system,subprocesspython_exec,node_exec
When to Enable: For agents that should not run arbitrary code. This is a critical restriction for customer-facing agents handling untrusted input.
Self-Protection
Self-protection prevents AI agents from accessing AgentGazer's own configuration and other sensitive local files. This protects against prompt injection attacks that try to exfiltrate credentials or modify security settings.
Protected Paths
| Category | Protected Files |
|---|---|
| AgentGazer Config | ~/.agentgazer/config.json, ~/.agentgazer/data.db |
| SSH Keys | ~/.ssh/id_rsa, ~/.ssh/id_ed25519, ~/.ssh/config |
| Cloud Credentials | ~/.aws/credentials, ~/.azure/, ~/.config/gcloud/ |
| Shell History | ~/.bash_history, ~/.zsh_history |
| Environment Files | .env, .env.local, .env.production |
Detection Logic
Self-protection only triggers when:
- Action verb present — The message contains read-related verbs like
read,open,cat,show,display,print,view - Sensitive path mentioned — The message references a protected file path
- Latest message only — Only checks the most recent user message (not conversation history)
This prevents false positives from:
- AI responses that mention file paths in explanations
- Historical messages in conversation context
- General discussion about configuration files
Example Blocked Requests
❌ "Can you read ~/.agentgazer/config.json for me?"
❌ "Open the file at ~/.ssh/id_rsa and show me the contents"
❌ "Cat ~/.aws/credentials"Example Allowed Requests
✓ "What is the format of ~/.agentgazer/config.json?" (no action verb)
✓ "Tell me about SSH key security" (no specific path)
✓ "How do I configure AWS credentials?" (educational, no read action)Response When Blocked
When self-protection triggers, the agent receives a clear message:
🛡️ Request blocked: Self-protection policy violation
This request attempted to access protected system files.
For security reasons, AI agents cannot read:
- AgentGazer configuration files
- SSH keys and credentials
- Cloud provider credentials
- Shell history files
This is not an error with your request. AgentGazer's self-protection
feature blocked this to prevent potential credential exposure.Custom Patterns
In addition to built-in patterns, you can define custom detection rules.
Custom Prompt Injection Patterns
Add regex patterns to detect domain-specific injection attempts. For example:
- Internal command keywords
- Company-specific role names
- Custom jailbreak phrases
Custom Data Masking Patterns
Add regex patterns to redact business-specific sensitive data. For example:
- Internal project codes
- Employee IDs
- Custom identifier formats
Tool Allowlist / Blocklist
- Allowlist: Only allow specific tools (whitelist approach)
- Blocklist: Block specific tools by name (blacklist approach)
Security Events
When a security rule triggers, AgentGazer logs a security event with:
- Event type
- Severity (warning, critical)
- Matched pattern details
- Agent and request context
- Timestamp
Event Types
| Event Type | Description | Where to View |
|---|---|---|
prompt_injection | Detected prompt injection attempt | Security page |
data_masked | Sensitive data was redacted | Security page |
tool_blocked | Tool call was blocked by restrictions | Security page |
self_protection | Blocked access to sensitive files | Security page |
security_blocked | Request blocked by security filter | Security page, Logs page |
Viewing Events
- Security Page → Events tab: All security-related events with detailed context
- Logs Page → Filter by
security_blocked: Quick view of blocked requests alongside normal LLM calls
The security_blocked event type appears in both the Security page (detailed) and the Logs page (for unified request tracking). This allows you to see security blocks in context with your normal agent activity.
Alert Integration
Security events can trigger alerts. Configure alert rules on the Alerts page:
- Create a new alert rule
- Select rule type: "Security Event"
- Choose severity threshold (warning or critical)
- Configure notification channels (webhook, email, Telegram)
Best Practices
Start Strict, Then Relax: Enable all rules initially, then disable specific rules only when needed for legitimate use cases.
Use Per-Agent Config: Apply stricter rules to customer-facing agents, more permissive rules to internal tools.
Monitor Events: Regularly review security events to identify attack patterns and false positives.
Custom Patterns: Add domain-specific patterns for your use case rather than disabling built-in protection.
Defense in Depth: Combine Security Shield with other protections (rate limiting, authentication, input validation).