| name | llm-safety-patterns |
| description | Use when securing LLM integrations against prompt injection, hallucination, or data leakage. Implements context separation, input validation, and output filtering patterns. |
| version | 1.0.0 |
LLM Safety Patterns
The Core Principle
Identifiers flow AROUND the LLM, not THROUGH it. The LLM sees only content. Attribution happens deterministically.
Why This Matters
When identifiers appear in prompts, bad things happen:
- Hallucination: LLM invents IDs that don't exist
- Confusion: LLM mixes up which ID belongs where
- Injection: Attacker manipulates IDs via prompt injection
- Leakage: IDs appear in logs, caches, traces
- Cross-tenant: LLM could reference other users' data
The Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ SYSTEM CONTEXT (flows around LLM) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ user_id │ tenant_id │ analysis_id │ trace_id │ permissions │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ │
│ │ PRE-LLM │ ┌─────────────────────┐ │POST-LLM │ │
│ │ FILTER │──────▶│ LLM │───────────▶│ATTRIBUTE│ │
│ │ │ │ │ │ │ │
│ │ Returns │ │ Sees ONLY: │ │ Adds: │ │
│ │ CONTENT │ │ - content text │ │ - IDs │ │
│ │ (no IDs)│ │ - context text │ │ - refs │ │
│ └─────────┘ │ (NO IDs!) │ └─────────┘ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
What NEVER Goes in Prompts
SkillForge Forbidden Parameters
| Parameter | Type | Why Forbidden |
|---|---|---|
user_id |
UUID | Can be hallucinated, enables cross-user access |
tenant_id |
UUID | Critical for multi-tenant isolation |
analysis_id |
UUID | Job tracking, not for LLM |
document_id |
UUID | Source tracking, not for LLM |
artifact_id |
UUID | Output tracking, not for LLM |
chunk_id |
UUID | RAG reference, not for LLM |
session_id |
str | Auth context, not for LLM |
trace_id |
str | Observability, not for LLM |
| Any UUID | UUID | Pattern: [0-9a-f]{8}-... |
Detection Pattern
import re
FORBIDDEN_PATTERNS = [
r'user[_-]?id',
r'tenant[_-]?id',
r'analysis[_-]?id',
r'document[_-]?id',
r'artifact[_-]?id',
r'chunk[_-]?id',
r'session[_-]?id',
r'trace[_-]?id',
r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
]
def audit_prompt(prompt: str) -> list[str]:
"""Check for forbidden patterns in prompt"""
violations = []
for pattern in FORBIDDEN_PATTERNS:
if re.search(pattern, prompt, re.IGNORECASE):
violations.append(pattern)
return violations
The Three-Phase Pattern
Phase 1: Pre-LLM (Filter & Extract)
async def prepare_for_llm(
query: str,
ctx: RequestContext,
) -> tuple[str, list[str], SourceRefs]:
"""
Filter data and extract content for LLM.
Returns: (content, context_texts, source_references)
"""
# 1. Retrieve with tenant filter
documents = await semantic_search(
query_embedding=embed(query),
ctx=ctx, # Filters by tenant_id, user_id
)
# 2. Save references for attribution
source_refs = SourceRefs(
document_ids=[d.id for d in documents],
chunk_ids=[c.id for c in chunks],
)
# 3. Extract content only (no IDs)
content_texts = [d.content for d in documents]
return query, content_texts, source_refs
Phase 2: LLM Call (Content Only)
def build_prompt(content: str, context_texts: list[str]) -> str:
"""
Build prompt with ONLY content, no identifiers.
"""
prompt = f"""
Analyze the following content and provide insights.
CONTENT:
{content}
RELEVANT CONTEXT:
{chr(10).join(f"- {text}" for text in context_texts)}
Provide analysis covering:
1. Key concepts
2. Prerequisites
3. Learning objectives
"""
# AUDIT: Verify no IDs leaked
violations = audit_prompt(prompt)
if violations:
raise SecurityError(f"IDs leaked to prompt: {violations}")
return prompt
async def call_llm(prompt: str) -> dict:
"""LLM only sees content, never IDs"""
response = await llm.generate(prompt)
return parse_response(response)
Phase 3: Post-LLM (Attribute)
async def save_with_attribution(
llm_output: dict,
ctx: RequestContext,
source_refs: SourceRefs,
) -> Analysis:
"""
Attach context and references to LLM output.
Attribution is deterministic, not LLM-generated.
"""
return await Analysis.create(
# Generated
id=uuid4(),
# From RequestContext (system-provided)
user_id=ctx.user_id,
tenant_id=ctx.tenant_id,
analysis_id=ctx.resource_id,
trace_id=ctx.trace_id,
# From Pre-LLM refs (deterministic)
source_document_ids=source_refs.document_ids,
source_chunk_ids=source_refs.chunk_ids,
# From LLM (content only)
content=llm_output["analysis"],
key_concepts=llm_output["key_concepts"],
difficulty=llm_output["difficulty"],
# Metadata
created_at=datetime.utcnow(),
model_used=MODEL_NAME,
)
Output Validation
After LLM returns, validate:
- Schema: Response matches expected structure
- Guardrails: No toxic/harmful content
- Grounding: Claims are supported by provided context
- No IDs: LLM didn't hallucinate any IDs
async def validate_output(
llm_output: dict,
context_texts: list[str],
) -> ValidationResult:
"""Validate LLM output before use"""
# 1. Schema validation
try:
parsed = AnalysisOutput.model_validate(llm_output)
except ValidationError as e:
return ValidationResult(valid=False, reason=f"Schema error: {e}")
# 2. Guardrails
if await contains_toxic_content(parsed.content):
return ValidationResult(valid=False, reason="Toxic content detected")
# 3. Grounding check
if not is_grounded(parsed.content, context_texts):
return ValidationResult(valid=False, reason="Ungrounded claims")
# 4. No hallucinated IDs
if contains_uuid_pattern(parsed.content):
return ValidationResult(valid=False, reason="Hallucinated IDs")
return ValidationResult(valid=True)
Integration Points in SkillForge
Content Analysis Workflow
backend/app/workflows/
├── agents/
│ ├── execution.py # Add context separation
│ └── prompts/ # Audit all prompts
├── tasks/
│ └── generate_artifact.py # Add attribution
Services
backend/app/services/
├── embeddings/ # Pre-LLM filtering
└── analysis/ # Post-LLM attribution
Checklist Before Any LLM Call
- RequestContext available
- Data filtered by tenant_id and user_id
- Content extracted without IDs
- Source references saved
- Prompt passes audit (no forbidden patterns)
- Output validated before use
- Attribution uses context, not LLM output
Version: 1.0.0 (December 2025)