Claude Code Plugins

Community-maintained marketplace

Feedback

AI Security Expert

@frankxai/ai-architect
0
0

Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name AI Security Expert
description Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection
version 1.1.0
last_updated Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time)
external_version OWASP LLM Top 10 v2
resources resources/security-patterns.py
triggers AI security, prompt injection, LLM security, guardrails, PII protection

AI Security Expert

Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.

OWASP LLM Top 10 (2025)

Quick Reference

# Vulnerability Risk Key Defense
LLM01 Prompt Injection Critical Input sanitization, delimiters
LLM02 Insecure Output High Output validation, sanitization
LLM03 Training Data Poisoning High Data provenance, auditing
LLM04 Model DoS Medium Rate limiting, timeouts
LLM05 Supply Chain High Verification, pinning
LLM06 Sensitive Info Disclosure High PII detection, redaction
LLM07 Insecure Plugin Design High Permission model, validation
LLM08 Excessive Agency High Human-in-the-loop, least privilege
LLM09 Overreliance Medium Confidence scores, citations
LLM10 Model Theft Medium Rate limiting, watermarking

LLM01: Prompt Injection

Attack Types:

  • Direct: "Ignore previous instructions..."
  • Indirect: Malicious content in RAG documents
  • Encoding tricks: Unicode, special tokens

Defense Pattern:

User Input → Sanitize → Delimit → LLM → Validate Output → Filter

LLM02: Insecure Output Handling

  • Never execute LLM output as code without validation
  • Sanitize HTML (use allowlist)
  • Validate SQL (SELECT only, table allowlist)

LLM04: Model DoS

  • Rate limiting per user/API key
  • Token limits on requests
  • Timeout configurations
  • Cost capping/alerts

LLM06: Sensitive Information Disclosure

  • PII detection (regex + NER)
  • System prompt protection
  • Training data sanitization
  • Output filtering

Code patterns: resources/security-patterns.py

PII Protection

Detection Patterns

Type Example Pattern
Email *@*.com
Phone XXX-XXX-XXXX
SSN XXX-XX-XXXX
Credit Card 16 digits
IP Address X.X.X.X

Redaction Strategy

  1. Detect PII in input before LLM call
  2. Redact PII in LLM output
  3. Log without PII
  4. Encrypt at rest

Guardrails Implementation

NeMo Guardrails (NVIDIA)

define user express harmful intent
    "How do I hack"

define bot refuse harmful request
    "I can't help with that."

define flow harmful intent
    user express harmful intent
    bot refuse harmful request

Guardrails AI

guard = Guard().use_many(
    ToxicLanguage(on_fail="fix"),
    PIIFilter(on_fail="fix"),
    ValidJSON(on_fail="reask")
)

Custom Pipeline

Input Guards → LLM Call → Output Guards → Response

Implementation: resources/security-patterns.py

Security Architecture

Defense in Depth Layers

Layer Controls
Network WAF, DDoS protection, API gateway
Auth OAuth 2.0, API keys, mTLS
Input Schema validation, injection detection
Guardrails Topic restrictions, PII filtering
Model Versioning, anomaly detection
Output Response filtering, fact verification
Audit Logging, retention, compliance

Zero Trust Principles

  • Never trust, always verify
  • Least privilege for agents
  • Assume breach (log everything)

Compliance Frameworks

EU AI Act (High-Risk)

  • Risk management system
  • Data governance
  • Technical documentation
  • Human oversight
  • Accuracy/robustness testing

SOC 2 for AI

  • Security: Access controls, encryption
  • Availability: SLA monitoring, DR
  • Processing Integrity: Input/output validation
  • Confidentiality: Data classification
  • Privacy: Data minimization, consent

Security Testing

Red Team Categories

  1. Direct injection attempts
  2. Jailbreak prompts
  3. Indirect injection via context
  4. Encoding/unicode tricks

Test suite: resources/security-patterns.py

Testing Checklist

  • Injection patterns blocked
  • System prompt protected
  • PII detected and redacted
  • Rate limits enforced
  • Outputs validated
  • Audit logs complete

Incident Response

Severity Levels

Incident Severity Response
Prompt injection detected Medium Block, log, analyze
Data exfiltration attempt High Block, forensics, notify
Model extraction detected High Rate limit, investigate

Response Steps

  1. Contain (block source)
  2. Preserve (logs, evidence)
  3. Analyze (attack pattern)
  4. Remediate (update defenses)
  5. Document (security log)

Resources


Secure AI systems with defense in depth and zero trust principles.