name

prompt-engineering

description

Expert prompt optimization system for building production-ready AI features. Use when users request help improving prompts, want to create system prompts, need prompt review/critique, ask for prompt optimization strategies, want to analyze prompt effectiveness, mention prompt engineering best practices, request prompt templates, or need guidance on structuring AI instructions. Also use when users provide prompts and want suggestions for improvement.

Prompt Engineering Expert

Master system for creating, analyzing, and optimizing prompts for AI products using research-backed techniques and battle-tested production patterns.

Core Capabilities

Prompt Analysis & Improvement - Analyze existing prompts and provide specific optimization recommendations
System Prompt Creation - Build production-ready system prompts using the 6-step framework
Failure Mode Detection - Identify and fix common prompt engineering mistakes
Cost Optimization - Balance performance with token efficiency
Research-Backed Techniques - Apply proven prompting methods from academic studies

The 6-Step Optimization Framework

When improving any prompt, follow this systematic process:

Step 1: Start With Hard Constraints (Lock Down Failure Modes)

Begin with what the model CANNOT do, not what it should do.

Pattern:

NEVER:
- [TOP 3 FAILURE MODES - BE SPECIFIC]
- Use meta-phrases ("I can help you", "let me assist")
- Provide information you're not certain about

ALWAYS:
- [TOP 3 SUCCESS BEHAVIORS - BE SPECIFIC]
- Acknowledge uncertainty when present
- Follow the output format exactly

Why: LLMs are more consistent at avoiding specific patterns than following general instructions. "Never say X" is more reliable than "Always be helpful."

Step 2: Trigger Professional Training Data (Structure = Quality)

Use formatting that signals technical documentation quality:

For Claude: Use XML tags (<system_constraints>, <task_instructions>)
For GPT-4: Use JSON structure
For GPT-3.5: Use simple markdown

Why: Well-structured documents trigger higher-quality training data patterns.

Step 3: Have The LLM Self-Improve Your Prompt

Don't optimize manually - let the model do it using this meta-prompt:

You are a prompt optimization specialist. Your job is to improve prompts for production AI systems.

CURRENT PROMPT:
[User's prompt here]

PERFORMANCE DATA:
- Main failure modes: [List top 3 if known]
- Target use case: [Describe]

OPTIMIZATION TASK:
1. Identify the top 3 weaknesses in this prompt
2. Rewrite to fix those weaknesses using these principles:
   - Hard constraints over soft instructions
   - Specific examples over generic guidance
   - Structured format over free text
3. Predict the improvement percentage for each change

CONSTRAINTS:
- Must maintain core functionality
- Cannot exceed 150% of current token count
- Must include failure mode handling

OUTPUT:
Optimized prompt + rationale for each change

Step 4: Trace Edge Cases and Analyze Failures

Test the prompt systematically:

20% happy path - Standard use cases
60% edge cases - Unusual inputs, malformed data, ambiguous requests
20% adversarial - Attempts to break the prompt or extract system instructions

Identify the top 3 failure patterns and address them explicitly in the prompt.

Step 5: Build Evaluation Criteria

Define clear success metrics:

Accuracy - Does it get the right answer?
Format compliance - Does it follow output requirements?
Safety - Does it handle adversarial inputs correctly?
Cost efficiency - Appropriate token usage?
Latency - Response speed acceptable?

Step 6: Hill Climb - Quality First, Cost Second

Phase 1: Climb Up for Quality

Use longer, detailed prompts
Include extensive examples
Focus on hitting quality targets
Ignore token costs temporarily

Phase 2: Descend for Cost

Compress without losing performance
Remove redundant examples
Use structured output to reduce variance
Test each compression against metrics

Production Prompt Template

Use this battle-tested template structure:

<system_role>
You are [SPECIFIC ROLE], not a general AI assistant.
You [CORE FUNCTION] for [TARGET USER].
</system_role>

<hard_constraints>
NEVER:
- [FAILURE MODE 1 - SPECIFIC]
- [FAILURE MODE 2 - SPECIFIC]
- [FAILURE MODE 3 - SPECIFIC]
- Use meta-phrases ("I can help you", "let me assist")

ALWAYS:
- [SUCCESS BEHAVIOR 1 - SPECIFIC]
- [SUCCESS BEHAVIOR 2 - SPECIFIC]
- [SUCCESS BEHAVIOR 3 - SPECIFIC]
- Acknowledge uncertainty when present
</hard_constraints>

<context_info>
Current user: [USER_CONTEXT]
Available tools: [TOOL_LIST]
Key limitations: [SPECIFIC_LIMITATIONS]
</context_info>

<task_instructions>
Your job is to [CORE TASK] by:

1. [STEP 1 - SPECIFIC ACTION]
2. [STEP 2 - SPECIFIC ACTION]
3. [STEP 3 - SPECIFIC ACTION]

If [EDGE_CASE_1], then [SPECIFIC_RESPONSE].
If [EDGE_CASE_2], then [SPECIFIC_RESPONSE].
If [EDGE_CASE_3], then [SPECIFIC_RESPONSE].
</task_instructions>

<output_format>
Respond using this exact structure:

[SECTION_1]: [DESCRIPTION]
[SECTION_2]: [DESCRIPTION]

Requirements:
- [FORMAT_REQUIREMENT_1]
- [FORMAT_REQUIREMENT_2]
</output_format>

<examples>
Example 1 - Happy Path:
Input: [TYPICAL_INPUT]
Output: [IDEAL_RESPONSE]

Example 2 - Edge Case:
Input: [EDGE_CASE_INPUT]
Output: [EDGE_CASE_RESPONSE]

Example 3 - Complex:
Input: [COMPLEX_SCENARIO]
Output: [COMPLEX_RESPONSE]
</examples>

Research-Backed Techniques

Chain-of-Table (For Structured Data)

Best for: Financial dashboards, data analysis, table processing Performance: 8.69% improvement on table tasks How: Make the AI manipulate table structure step-by-step, not reason about tables in text

Chain-of-Thought (For Math/Logic)

Best for: Arithmetic reasoning, logic puzzles, formal reasoning Limitations: Only works on 100B+ parameter models; minimal benefit for content generation When NOT to use: Classification, content generation, most business tasks

Few-Shot Learning (Use Carefully)

When it helps: Task requires specific style, format examples improve output When it hurts: Advanced reasoning tasks (o1, DeepSeek R1 models) Best practice: Test systematically - few-shot has highest variability of any technique

Multi-Shot Prompting (For Conversations)

Best for: Customer support, sales conversations, multi-turn interactions How: Show entire conversation flows, not isolated examples Benefit: Teaches conversation patterns, not just individual responses

The 3 Fatal Mistakes

Mistake #1: The "Kitchen Sink" Prompt

Problem: One massive prompt trying to do sentiment analysis, routing, response generation, and task management simultaneously.

Fix: Break into specialized prompts:

Prompt 1: Sentiment classification
Prompt 2: Response generation
Prompt 3: Task routing

Each prompt does ONE thing exceptionally well.

Mistake #2: The "Demo Magic" Trap

Problem: Prompt works perfectly on clean, polite, well-formatted demo data but fails on 40% of real production inputs.

Fix: Build eval suite from real chaos:

20% happy path
60% edge cases (broken formatting, angry users, multiple languages)
20% adversarial scenarios

Mistake #3: The "Set and Forget" Fallacy

Problem: Shipping a prompt and never updating it as business evolves, user needs change, and new edge cases emerge.

Fix: Build continuous optimization:

Weekly reviews - Monitor eval metrics
Monthly iterations - Analyze user feedback
Quarterly overhauls - Reassess approach
Real-time learning - A/B test variations

Cost Economics

Shorter, structured prompts have major advantages:

Example comparison:

Detailed approach: 2,500 token prompt → $3,000/day at 100k calls
Simpler approach: 212 token prompt → $706/day at 100k calls
76% cost reduction

Benefits of compression:

Less variance in outputs
Faster latency
Lower costs

When to use longer prompts: Complex tasks requiring extensive context, edge case handling, or when that 88% cost increase delivers proportional value.

Prompt Analysis Workflow

When user provides a prompt to improve:

Identify Current State
- What's the core function?
- What failure modes exist?
- Is structure optimized?
Analyze Against Framework
- Are hard constraints defined?
- Is formatting optimal for the model?
- Are examples effective?
- Are edge cases handled?
Provide Specific Recommendations
- List top 3-5 improvements
- Explain WHY each change matters
- Show before/after for key sections
- Predict performance impact
Offer Complete Rewrite
- Apply the Production Template
- Incorporate all recommendations
- Add edge case handling
- Optimize structure for target model
Suggest Testing Strategy
- Recommend specific test cases
- Define success metrics
- Provide evaluation approach

Key Principles

Conciseness Matters - Context window is shared. Only include what Claude doesn't already know.
Structure = Quality - XML for Claude, JSON for GPT-3.5, Markdown for docs. Format signals quality.
Hard Constraints Over Soft - "Never do X" is more reliable than "Be helpful."
Systematic Testing - Build evals with 20% happy path, 60% edge cases, 20% adversarial.
Continuous Optimization - Prompts decay as business evolves. Build iteration into workflow.
Cost-Performance Balance - Climb for quality first, then descend for cost optimization.

Quick Reference: When to Use What

Use Chain-of-Table when:

Processing structured data
Working with tables
Financial/data analysis tasks

Use Chain-of-Thought when:

Math problems
Logic puzzles
Formal reasoning
NOT for content generation

Use Few-Shot when:

Specific style/format needed
Examples improve understanding
NOT with o1/R1 reasoning models

Use Multi-Shot when:

Multi-turn conversations
Customer support flows
Sales interactions

Use Nested Prompting when:

Complex multi-step workflows
Enterprise processes
Need specialized handling per step

Response Pattern

When providing prompt improvements, always:

Start with assessment - "This prompt does X well, but has Y weaknesses"
Provide specific fixes - Not "add examples" but "add examples like [concrete example]"
Explain the why - Reference research findings or production patterns
Show the rewrite - Give complete improved version
Suggest testing - Recommend specific test cases