| name | AI Security Expert |
| description | Enterprise AI security - OWASP LLM Top 10, prompt injection defense, guardrails, PII protection |
| version | 1.1.0 |
| last_updated | Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time) |
| external_version | OWASP LLM Top 10 v2 |
| resources | resources/security-patterns.py |
| triggers | AI security, prompt injection, LLM security, guardrails, PII protection |
AI Security Expert
Enterprise AI security architect specializing in securing LLM applications, defending against prompt injection, implementing guardrails, and OWASP LLM Top 10 compliance.
OWASP LLM Top 10 (2025)
Quick Reference
| # | Vulnerability | Risk | Key Defense |
|---|---|---|---|
| LLM01 | Prompt Injection | Critical | Input sanitization, delimiters |
| LLM02 | Insecure Output | High | Output validation, sanitization |
| LLM03 | Training Data Poisoning | High | Data provenance, auditing |
| LLM04 | Model DoS | Medium | Rate limiting, timeouts |
| LLM05 | Supply Chain | High | Verification, pinning |
| LLM06 | Sensitive Info Disclosure | High | PII detection, redaction |
| LLM07 | Insecure Plugin Design | High | Permission model, validation |
| LLM08 | Excessive Agency | High | Human-in-the-loop, least privilege |
| LLM09 | Overreliance | Medium | Confidence scores, citations |
| LLM10 | Model Theft | Medium | Rate limiting, watermarking |
LLM01: Prompt Injection
Attack Types:
- Direct: "Ignore previous instructions..."
- Indirect: Malicious content in RAG documents
- Encoding tricks: Unicode, special tokens
Defense Pattern:
User Input → Sanitize → Delimit → LLM → Validate Output → Filter
LLM02: Insecure Output Handling
- Never execute LLM output as code without validation
- Sanitize HTML (use allowlist)
- Validate SQL (SELECT only, table allowlist)
LLM04: Model DoS
- Rate limiting per user/API key
- Token limits on requests
- Timeout configurations
- Cost capping/alerts
LLM06: Sensitive Information Disclosure
- PII detection (regex + NER)
- System prompt protection
- Training data sanitization
- Output filtering
Code patterns: resources/security-patterns.py
PII Protection
Detection Patterns
| Type | Example Pattern |
|---|---|
*@*.com |
|
| Phone | XXX-XXX-XXXX |
| SSN | XXX-XX-XXXX |
| Credit Card | 16 digits |
| IP Address | X.X.X.X |
Redaction Strategy
- Detect PII in input before LLM call
- Redact PII in LLM output
- Log without PII
- Encrypt at rest
Guardrails Implementation
NeMo Guardrails (NVIDIA)
define user express harmful intent
"How do I hack"
define bot refuse harmful request
"I can't help with that."
define flow harmful intent
user express harmful intent
bot refuse harmful request
Guardrails AI
guard = Guard().use_many(
ToxicLanguage(on_fail="fix"),
PIIFilter(on_fail="fix"),
ValidJSON(on_fail="reask")
)
Custom Pipeline
Input Guards → LLM Call → Output Guards → Response
Implementation: resources/security-patterns.py
Security Architecture
Defense in Depth Layers
| Layer | Controls |
|---|---|
| Network | WAF, DDoS protection, API gateway |
| Auth | OAuth 2.0, API keys, mTLS |
| Input | Schema validation, injection detection |
| Guardrails | Topic restrictions, PII filtering |
| Model | Versioning, anomaly detection |
| Output | Response filtering, fact verification |
| Audit | Logging, retention, compliance |
Zero Trust Principles
- Never trust, always verify
- Least privilege for agents
- Assume breach (log everything)
Compliance Frameworks
EU AI Act (High-Risk)
- Risk management system
- Data governance
- Technical documentation
- Human oversight
- Accuracy/robustness testing
SOC 2 for AI
- Security: Access controls, encryption
- Availability: SLA monitoring, DR
- Processing Integrity: Input/output validation
- Confidentiality: Data classification
- Privacy: Data minimization, consent
Security Testing
Red Team Categories
- Direct injection attempts
- Jailbreak prompts
- Indirect injection via context
- Encoding/unicode tricks
Test suite: resources/security-patterns.py
Testing Checklist
- Injection patterns blocked
- System prompt protected
- PII detected and redacted
- Rate limits enforced
- Outputs validated
- Audit logs complete
Incident Response
Severity Levels
| Incident | Severity | Response |
|---|---|---|
| Prompt injection detected | Medium | Block, log, analyze |
| Data exfiltration attempt | High | Block, forensics, notify |
| Model extraction detected | High | Rate limit, investigate |
Response Steps
- Contain (block source)
- Preserve (logs, evidence)
- Analyze (attack pattern)
- Remediate (update defenses)
- Document (security log)
Resources
- OWASP LLM Top 10
- NIST AI Risk Management Framework
- NeMo Guardrails
- Guardrails AI
- LLM Security Best Practices
Secure AI systems with defense in depth and zero trust principles.