Skip to content

Security Shield

Security Shield is AgentGazer's built-in protection layer that monitors and filters AI agent communications in real-time. It provides defense-in-depth through three categories of protection:

  • Prompt Injection Detection — Identifies attempts to manipulate the AI's instructions
  • Sensitive Data Masking — Automatically redacts sensitive information from requests and responses
  • Tool Call Restrictions — Controls which tools agents are allowed to use

Enabling Security Shield

Navigate to the Security page in the dashboard. Select an agent or use "Global Default" to apply rules to all agents. Toggle individual rules on/off and save your configuration.

Per-Agent Configuration

You can configure different security rules for each agent. This allows stricter rules for customer-facing agents while relaxing rules for internal testing agents.

Prompt Injection Detection

Prompt injection attacks attempt to manipulate AI behavior by inserting malicious instructions. AgentGazer detects four categories of prompt injection:

Ignore Instructions

Detects attempts to make the AI disregard its original instructions or system prompt.

Detected Patterns:

  • "ignore all previous instructions"
  • "forget your rules"
  • "disregard prior context"
  • "do not follow your original instructions"

Severity: Critical

When to Enable: Always recommended for production agents. Disable only for testing environments where you need to override agent behavior.

System Prompt Override

Detects attempts to inject a new system prompt or override the existing one.

Detected Patterns:

  • "new system prompt:"
  • "override system message"
  • "enable developer mode"
  • "sudo mode" / "admin access"
  • Messages starting with "System:"

Severity: Critical (most patterns), Warning (System: prefix)

When to Enable: Essential for agents that handle untrusted user input. Attackers often try to escalate privileges through fake system prompts.

Role Hijacking

Detects attempts to make the AI assume a different identity or persona that bypasses its safety guidelines.

Detected Patterns:

  • "you are now a ..."
  • "pretend to be ..."
  • "act as if you are ..."
  • "roleplay as ..."
  • "assume the identity of ..."

Severity: Critical (identity assumption), Warning (roleplay requests)

When to Enable: Recommended for agents with defined personas. Can be disabled for creative/roleplay applications where persona switching is intended.

Jailbreak Attempts

Detects known jailbreak techniques and attempts to bypass AI safety measures.

Detected Patterns:

  • "DAN" (Do Anything Now) prompts
  • "jailbreak" keyword
  • "bypass safety filters"
  • "remove restrictions"
  • "without any limitations"
  • "disable safety checks"

Severity: Critical

When to Enable: Always recommended. These patterns indicate intentional attempts to circumvent safety measures.

Sensitive Data Masking

Data masking automatically detects and redacts sensitive information to prevent accidental data leakage. Detected content is replaced with [REDACTED].

API Keys

Detects API keys and access tokens from major providers.

Supported Formats:

ProviderPattern
OpenAIsk-... (32+ chars)
Anthropicsk-ant-... (32+ chars)
GoogleAIza... (39 chars)
AWSAKIA... (20 chars)
GitHubghp_, gho_, ghu_, ghs_, ghr_ tokens
Stripesk_live_, sk_test_ keys
Genericapi_key=, access_token= patterns

When to Enable: Always recommended. Prevents accidental exposure of credentials in logs, responses, or external integrations.

Credit Cards

Detects credit card numbers from major payment networks.

Supported Formats:

  • Visa (starts with 4)
  • Mastercard (starts with 51-55)
  • American Express (starts with 34 or 37)
  • CVV/CVC codes

When to Enable: Essential for agents that handle financial data or customer information.

Personal Data

Detects personally identifiable information (PII).

Supported Formats:

  • Social Security Numbers (XXX-XX-XXXX)
  • Email addresses
  • US phone numbers
  • Passport numbers

When to Enable: Required for compliance with GDPR, HIPAA, and other privacy regulations. Consider enabling for all customer-facing agents.

Crypto Wallets & Keys

Detects cryptocurrency addresses and private keys.

Supported Formats:

  • Bitcoin addresses (legacy and bech32)
  • Ethereum addresses (0x...)
  • Ethereum private keys (0x... 64 chars)
  • Solana addresses
  • Seed phrases (12 or 24 words)

When to Enable: Critical for crypto-related applications. Protects against accidental exposure of wallet addresses or private keys.

Environment Variables

Detects secrets commonly stored in environment variables.

Supported Formats:

  • password=, passwd=, pwd=
  • secret=, private_key=
  • Database connection strings (postgres://, mysql://, mongodb://, redis://)

When to Enable: Recommended for development environments. Prevents leaking configuration secrets through error messages or debug output.

Tool Call Restrictions

Tool restrictions control which categories of tools an agent can invoke. This provides defense against tool-based attacks and limits the blast radius of compromised agents.

Filesystem Operations

Blocks tools that read, write, or manipulate files and directories.

Blocked Tool Patterns:

  • read_file, write_file, delete_file
  • read_dir, write_dir, list_dir
  • Tools starting with fs_, file_, path_

When to Enable: For agents that should not have filesystem access. Prevents unauthorized reading of sensitive files or writing malicious content.

Network Operations

Blocks tools that make HTTP requests or send external communications.

Blocked Tool Patterns:

  • http_, fetch_, curl_, wget_, request_
  • get_url, post_url, api_call
  • send_email, send_sms, webhook_

When to Enable: For sandboxed agents that should not communicate externally. Prevents data exfiltration and unauthorized API calls.

Code Execution

Blocks tools that execute arbitrary code or shell commands.

Blocked Tool Patterns:

  • exec, execute, run, eval
  • shell, bash, system, subprocess
  • python_exec, node_exec

When to Enable: For agents that should not run arbitrary code. This is a critical restriction for customer-facing agents handling untrusted input.

Self-Protection

Self-protection prevents AI agents from accessing AgentGazer's own configuration and other sensitive local files. This protects against prompt injection attacks that try to exfiltrate credentials or modify security settings.

Protected Paths

CategoryProtected Files
AgentGazer Config~/.agentgazer/config.json, ~/.agentgazer/data.db
SSH Keys~/.ssh/id_rsa, ~/.ssh/id_ed25519, ~/.ssh/config
Cloud Credentials~/.aws/credentials, ~/.azure/, ~/.config/gcloud/
Shell History~/.bash_history, ~/.zsh_history
Environment Files.env, .env.local, .env.production

Detection Logic

Self-protection only triggers when:

  1. Action verb present — The message contains read-related verbs like read, open, cat, show, display, print, view
  2. Sensitive path mentioned — The message references a protected file path
  3. Latest message only — Only checks the most recent user message (not conversation history)

This prevents false positives from:

  • AI responses that mention file paths in explanations
  • Historical messages in conversation context
  • General discussion about configuration files

Example Blocked Requests

❌ "Can you read ~/.agentgazer/config.json for me?"
❌ "Open the file at ~/.ssh/id_rsa and show me the contents"
❌ "Cat ~/.aws/credentials"

Example Allowed Requests

✓ "What is the format of ~/.agentgazer/config.json?" (no action verb)
✓ "Tell me about SSH key security" (no specific path)
✓ "How do I configure AWS credentials?" (educational, no read action)

Response When Blocked

When self-protection triggers, the agent receives a clear message:

🛡️ Request blocked: Self-protection policy violation

This request attempted to access protected system files.
For security reasons, AI agents cannot read:
- AgentGazer configuration files
- SSH keys and credentials
- Cloud provider credentials
- Shell history files

This is not an error with your request. AgentGazer's self-protection
feature blocked this to prevent potential credential exposure.

Custom Patterns

In addition to built-in patterns, you can define custom detection rules.

Custom Prompt Injection Patterns

Add regex patterns to detect domain-specific injection attempts. For example:

  • Internal command keywords
  • Company-specific role names
  • Custom jailbreak phrases

Custom Data Masking Patterns

Add regex patterns to redact business-specific sensitive data. For example:

  • Internal project codes
  • Employee IDs
  • Custom identifier formats

Tool Allowlist / Blocklist

  • Allowlist: Only allow specific tools (whitelist approach)
  • Blocklist: Block specific tools by name (blacklist approach)

Security Events

When a security rule triggers, AgentGazer logs a security event with:

  • Event type
  • Severity (warning, critical)
  • Matched pattern details
  • Agent and request context
  • Timestamp

Event Types

Event TypeDescriptionWhere to View
prompt_injectionDetected prompt injection attemptSecurity page
data_maskedSensitive data was redactedSecurity page
tool_blockedTool call was blocked by restrictionsSecurity page
self_protectionBlocked access to sensitive filesSecurity page
security_blockedRequest blocked by security filterSecurity page, Logs page

Viewing Events

  • Security Page → Events tab: All security-related events with detailed context
  • Logs Page → Filter by security_blocked: Quick view of blocked requests alongside normal LLM calls

The security_blocked event type appears in both the Security page (detailed) and the Logs page (for unified request tracking). This allows you to see security blocks in context with your normal agent activity.

Alert Integration

Security events can trigger alerts. Configure alert rules on the Alerts page:

  1. Create a new alert rule
  2. Select rule type: "Security Event"
  3. Choose severity threshold (warning or critical)
  4. Configure notification channels (webhook, email, Telegram)

Best Practices

  1. Start Strict, Then Relax: Enable all rules initially, then disable specific rules only when needed for legitimate use cases.

  2. Use Per-Agent Config: Apply stricter rules to customer-facing agents, more permissive rules to internal tools.

  3. Monitor Events: Regularly review security events to identify attack patterns and false positives.

  4. Custom Patterns: Add domain-specific patterns for your use case rather than disabling built-in protection.

  5. Defense in Depth: Combine Security Shield with other protections (rate limiting, authentication, input validation).