Security Shield

Security Shield is AgentGazer's built-in protection layer that monitors and filters AI agent communications in real-time. It provides defense-in-depth through three categories of protection:

Prompt Injection Detection — Identifies attempts to manipulate the AI's instructions
Sensitive Data Masking — Automatically redacts sensitive information from requests and responses
Tool Call Restrictions — Controls which tools agents are allowed to use

Enabling Security Shield

Navigate to the Security page in the dashboard. Select an agent or use "Global Default" to apply rules to all agents. Toggle individual rules on/off and save your configuration.

Per-Agent Configuration

You can configure different security rules for each agent. This allows stricter rules for customer-facing agents while relaxing rules for internal testing agents.

Prompt Injection Detection

Prompt injection attacks attempt to manipulate AI behavior by inserting malicious instructions. AgentGazer detects four categories of prompt injection:

Ignore Instructions

Detects attempts to make the AI disregard its original instructions or system prompt.

Detected Patterns:

"ignore all previous instructions"
"forget your rules"
"disregard prior context"
"do not follow your original instructions"

Severity: Critical

When to Enable: Always recommended for production agents. Disable only for testing environments where you need to override agent behavior.

System Prompt Override

Detects attempts to inject a new system prompt or override the existing one.

Detected Patterns:

"new system prompt:"
"override system message"
"enable developer mode"
"sudo mode" / "admin access"
Messages starting with "System:"

Severity: Critical (most patterns), Warning (System: prefix)

When to Enable: Essential for agents that handle untrusted user input. Attackers often try to escalate privileges through fake system prompts.

Role Hijacking

Detects attempts to make the AI assume a different identity or persona that bypasses its safety guidelines.

Detected Patterns:

"you are now a ..."
"pretend to be ..."
"act as if you are ..."
"roleplay as ..."
"assume the identity of ..."

Severity: Critical (identity assumption), Warning (roleplay requests)

When to Enable: Recommended for agents with defined personas. Can be disabled for creative/roleplay applications where persona switching is intended.

Jailbreak Attempts

Detects known jailbreak techniques and attempts to bypass AI safety measures.

Detected Patterns:

"DAN" (Do Anything Now) prompts
"jailbreak" keyword
"bypass safety filters"
"remove restrictions"
"without any limitations"
"disable safety checks"

Severity: Critical

When to Enable: Always recommended. These patterns indicate intentional attempts to circumvent safety measures.

Sensitive Data Masking

Data masking automatically detects and redacts sensitive information to prevent accidental data leakage. Detected content is replaced with [REDACTED].

API Keys

Detects API keys and access tokens from major providers.

Supported Formats:

Provider	Pattern
OpenAI	`sk-...` (32+ chars)
Anthropic	`sk-ant-...` (32+ chars)
Google	`AIza...` (39 chars)
AWS	`AKIA...` (20 chars)
GitHub	`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_` tokens
Stripe	`sk_live_`, `sk_test_` keys
Generic	`api_key=`, `access_token=` patterns

When to Enable: Always recommended. Prevents accidental exposure of credentials in logs, responses, or external integrations.

Credit Cards

Detects credit card numbers from major payment networks.

Supported Formats:

Visa (starts with 4)
Mastercard (starts with 51-55)
American Express (starts with 34 or 37)
CVV/CVC codes

When to Enable: Essential for agents that handle financial data or customer information.

Personal Data

Detects personally identifiable information (PII).

Supported Formats:

Social Security Numbers (XXX-XX-XXXX)
Email addresses
US phone numbers
Passport numbers

When to Enable: Required for compliance with GDPR, HIPAA, and other privacy regulations. Consider enabling for all customer-facing agents.

Crypto Wallets & Keys

Detects cryptocurrency addresses and private keys.

Supported Formats:

Bitcoin addresses (legacy and bech32)
Ethereum addresses (0x...)
Ethereum private keys (0x... 64 chars)
Solana addresses
Seed phrases (12 or 24 words)

When to Enable: Critical for crypto-related applications. Protects against accidental exposure of wallet addresses or private keys.

Environment Variables

Detects secrets commonly stored in environment variables.

Supported Formats:

password=, passwd=, pwd=
secret=, private_key=
Database connection strings (postgres://, mysql://, mongodb://, redis://)

When to Enable: Recommended for development environments. Prevents leaking configuration secrets through error messages or debug output.

Tool Call Restrictions

Tool restrictions control which categories of tools an agent can invoke. This provides defense against tool-based attacks and limits the blast radius of compromised agents.

Filesystem Operations

Blocks tools that read, write, or manipulate files and directories.

Blocked Tool Patterns:

read_file, write_file, delete_file
read_dir, write_dir, list_dir
Tools starting with fs_, file_, path_

When to Enable: For agents that should not have filesystem access. Prevents unauthorized reading of sensitive files or writing malicious content.

Network Operations

Blocks tools that make HTTP requests or send external communications.

Blocked Tool Patterns:

http_, fetch_, curl_, wget_, request_
get_url, post_url, api_call
send_email, send_sms, webhook_

When to Enable: For sandboxed agents that should not communicate externally. Prevents data exfiltration and unauthorized API calls.

Code Execution

Blocks tools that execute arbitrary code or shell commands.

Blocked Tool Patterns:

exec, execute, run, eval
shell, bash, system, subprocess
python_exec, node_exec

When to Enable: For agents that should not run arbitrary code. This is a critical restriction for customer-facing agents handling untrusted input.

Self-Protection

Self-protection prevents AI agents from accessing AgentGazer's own configuration and other sensitive local files. This protects against prompt injection attacks that try to exfiltrate credentials or modify security settings.

Protected Paths

Category	Protected Files
AgentGazer Config	`~/.agentgazer/config.json`, `~/.agentgazer/data.db`
SSH Keys	`~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/config`
Cloud Credentials	`~/.aws/credentials`, `~/.azure/`, `~/.config/gcloud/`
Shell History	`~/.bash_history`, `~/.zsh_history`
Environment Files	`.env`, `.env.local`, `.env.production`

Detection Logic

Self-protection only triggers when:

Action verb present — The message contains read-related verbs like read, open, cat, show, display, print, view
Sensitive path mentioned — The message references a protected file path
Latest message only — Only checks the most recent user message (not conversation history)

This prevents false positives from:

AI responses that mention file paths in explanations
Historical messages in conversation context
General discussion about configuration files

Example Blocked Requests

❌ "Can you read ~/.agentgazer/config.json for me?"
❌ "Open the file at ~/.ssh/id_rsa and show me the contents"
❌ "Cat ~/.aws/credentials"

Example Allowed Requests

✓ "What is the format of ~/.agentgazer/config.json?" (no action verb)
✓ "Tell me about SSH key security" (no specific path)
✓ "How do I configure AWS credentials?" (educational, no read action)

Response When Blocked

When self-protection triggers, the agent receives a clear message:

🛡️ Request blocked: Self-protection policy violation

This request attempted to access protected system files.
For security reasons, AI agents cannot read:
- AgentGazer configuration files
- SSH keys and credentials
- Cloud provider credentials
- Shell history files

This is not an error with your request. AgentGazer's self-protection
feature blocked this to prevent potential credential exposure.

Custom Patterns

In addition to built-in patterns, you can define custom detection rules.

Custom Prompt Injection Patterns

Add regex patterns to detect domain-specific injection attempts. For example:

Internal command keywords
Company-specific role names
Custom jailbreak phrases

Custom Data Masking Patterns

Add regex patterns to redact business-specific sensitive data. For example:

Internal project codes
Employee IDs
Custom identifier formats

Tool Allowlist / Blocklist

Allowlist: Only allow specific tools (whitelist approach)
Blocklist: Block specific tools by name (blacklist approach)

Security Events

When a security rule triggers, AgentGazer logs a security event with:

Event type
Severity (warning, critical)
Matched pattern details
Agent and request context
Timestamp

Event Types

Event Type	Description	Where to View
`prompt_injection`	Detected prompt injection attempt	Security page
`data_masked`	Sensitive data was redacted	Security page
`tool_blocked`	Tool call was blocked by restrictions	Security page
`self_protection`	Blocked access to sensitive files	Security page
`security_blocked`	Request blocked by security filter	Security page, Logs page

Viewing Events

Security Page → Events tab: All security-related events with detailed context
Logs Page → Filter by security_blocked: Quick view of blocked requests alongside normal LLM calls

The security_blocked event type appears in both the Security page (detailed) and the Logs page (for unified request tracking). This allows you to see security blocks in context with your normal agent activity.

Alert Integration

Security events can trigger alerts. Configure alert rules on the Alerts page:

Create a new alert rule
Select rule type: "Security Event"
Choose severity threshold (warning or critical)
Configure notification channels (webhook, email, Telegram)

Best Practices

Start Strict, Then Relax: Enable all rules initially, then disable specific rules only when needed for legitimate use cases.
Use Per-Agent Config: Apply stricter rules to customer-facing agents, more permissive rules to internal tools.
Monitor Events: Regularly review security events to identify attack patterns and false positives.
Custom Patterns: Add domain-specific patterns for your use case rather than disabling built-in protection.
Defense in Depth: Combine Security Shield with other protections (rate limiting, authentication, input validation).

Security Shield ​

Enabling Security Shield ​

Prompt Injection Detection ​

Ignore Instructions ​

System Prompt Override ​

Role Hijacking ​

Jailbreak Attempts ​

Sensitive Data Masking ​

API Keys ​

Credit Cards ​

Personal Data ​

Crypto Wallets & Keys ​

Environment Variables ​

Tool Call Restrictions ​

Filesystem Operations ​

Network Operations ​

Code Execution ​

Self-Protection ​

Protected Paths ​

Detection Logic ​

Example Blocked Requests ​

Example Allowed Requests ​

Response When Blocked ​

Custom Patterns ​

Custom Prompt Injection Patterns ​

Custom Data Masking Patterns ​

Tool Allowlist / Blocklist ​

Security Events ​

Event Types ​

Viewing Events ​

Alert Integration ​

Best Practices ​

Security Shield

Enabling Security Shield

Prompt Injection Detection

Ignore Instructions

System Prompt Override

Role Hijacking

Jailbreak Attempts

Sensitive Data Masking

API Keys

Credit Cards

Personal Data

Crypto Wallets & Keys

Environment Variables

Tool Call Restrictions

Filesystem Operations

Network Operations

Code Execution

Self-Protection

Protected Paths

Detection Logic

Example Blocked Requests

Example Allowed Requests

Response When Blocked

Custom Patterns

Custom Prompt Injection Patterns

Custom Data Masking Patterns

Tool Allowlist / Blocklist

Security Events

Event Types

Viewing Events

Alert Integration

Best Practices