name	context-engineering
description	Master context engineering for AI features - the skill that separates AI products that work from ones that hallucinate. Use when speccing new AI features, diagnosing underperforming AI features, or doing quality checks before shipping. Helps PMs define what context AI needs, where to get it, and what to do when it fails. Based on the 4D Context Canvas framework.

Context Engineering for AI Products

Core Philosophy

Context engineering is the art of giving AI exactly the right information to do its job.

Models are commodities—your context is your moat.

Most AI features fail before they reach the model. They fail because:

Nobody defined the model's actual job
Nobody mapped what context it needs
Nobody figured out how to get that context at runtime
Nobody designed what happens when it breaks

This skill prevents those failures.

The 90/10 Mismatch

Teams spend 90% of their time on model selection and prompts. But 90% of AI quality comes from context quality.

When AI fails, teams blame the model. But the real causes:

System doesn't know what file the user is working on
System doesn't see the user's preferences
System isn't aware of entities or relationships in the workspace
System cannot recognize the user's role
System retrieves irrelevant documents
System misses crucial logs or state

Fix the context, fix the AI.

PM's Role in Context Engineering

Context engineering is NOT an engineering problem. It sits at the intersection of product strategy, user understanding, and system design.

PMs own three critical layers:

Defining "intelligence" - What should the AI know? What's essential vs nice-to-have? What level of personalization without feeling creepy?
Mapping context requirements to user value - Translating "users want better suggestions" into "system needs access to past rejections, current workspace state, and team preferences"
Designing degradation strategy - When context is missing, stale, or incomplete: Block the feature? Show partial answer? Ask clarifying questions? Fall back to non-personalized?

Engineers own the implementation: Retrieval architecture, vector databases, embedding pipelines, API integrations, performance optimization.

But they need you to define the what and why before they can build the how.

Entry Point

When this skill is invoked, start with:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CONTEXT ENGINEERING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

What are you working on?

  1. Spec a new AI feature
     → Define what context it needs before engineering starts

  2. Diagnose an existing AI feature
     → Figure out why it's underperforming or hallucinating

  3. Quick quality check
     → Validate context before shipping or during review

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Route to the appropriate path based on user selection.

Path 1: Spec New Feature (4D Context Canvas)

Purpose

Walk through four dimensions that determine whether an AI feature ships successfully or dies in production. Use BEFORE engineering starts.

Starting Point

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 SPEC NEW FEATURE — 4D Context Canvas
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We'll walk through 4 dimensions. Most AI features fail before
they reach the model—this prevents that.

How do you want to start?

  1. From a Linear issue (I'll pull the details)
  2. Describe it manually

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If Linear: Use Linear MCP to pull issue details. Pre-populate what's available.

If Manual: Ask user to describe the AI feature in 1-2 sentences.

D1 — DEMAND: What's the Model's Job?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 D1: DEMAND — What's the model's job?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 If you can't articulate the job precisely, the model can't do it.
   "Make it smart" is not a spec. Neither is "personalized."

Questions to ask:

What should the model produce?
- Push for specificity: Not "recommendations" → "3 ranked options with rationale"
- Not "a summary" → "2-paragraph executive summary with key metrics"
For whom? (User segment, role, context)
Under what assumptions? (What must be true for this to work?)
What constraints apply? (Tone, format, length, boundaries, prohibited content)
What defines success? (Measurable outcome, not "users like it")

The transformation to model:

VAGUE: "Draft a status update"

PRECISE: "Summarize the key changes in project X since the last report,
structured for stakeholder Y, using the user's preferred tone,
adhering to the product's reporting format, in under 200 words."

Education moment:

💡 PM vs Engineer: You own the what and why. Engineers own the how.
   Without this spec, they build impressive systems that feel hollow.

Capture and display D1 summary before moving on.

D2 — DATA: What Context Is Required?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 D2: DATA — What context does the model need?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Every piece of context costs tokens. More tokens = higher cost +
   slower responses. Include only what's essential for the job.

Build a Context Requirements Table together:

For each piece of context needed:

Data Needed - The entity, document, metric, or signal
Source - Where it lives (DB, API, user input, cache, logs)
Availability:
- Always (can fetch 100% of the time)
- Sometimes (depends on user actions or data freshness)
- Never (must be requested explicitly or cannot be assumed)
Sensitivity - PII, internal-only, restricted, public

Example output:

| Data Needed          | Source      | Availability | Sensitivity |
|---------------------|-------------|--------------|-------------|
| User equity estimate | Internal DB | Always       | PII         |
| Browsing history    | Analytics   | Always       | Internal    |
| Stated goals        | User input  | Sometimes    | Internal    |
| Local market trends | API         | Always       | Public      |

Flag problems immediately:

"Sometimes" availability needs a decision: What happens when it's missing?
"Never" availability is a blocker: Can't build without resolving this

Education moment:

💡 Hidden dependencies live here. When you map honestly, you discover
   critical data that doesn't exist, sources that are unreliable, or
   assumptions that will break at scale.

D3 — DISCOVERY: How Will You Get Context at Runtime?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 D3: DISCOVERY — How will you get the context at runtime?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Knowing what data you need ≠ knowing how to get it at runtime.
   This is where "it worked in the demo" dies in production.

For each piece of context from D2:

How will the system fetch this?
- Real-time query
- Pre-computed/cached
- User provides it
- Inferred from behavior
What's the latency budget?
What if the source is slow or unavailable?

Discovery strategies to consider:

Search-Based: Vector search (semantic), keyword search (precision), hybrid
Graph-Based: Follow relationships through knowledge graph
Precomputed: Daily/weekly jobs, materialized views, caches

Education moment:

💡 Trade-off: Real-time = fresh but slow. Cached = fast but stale.
   Know which context needs which strategy.

D4 — DEFENSE: What Happens When It Fails?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 D4: DEFENSE — What happens when it fails?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 AI will fail. Context will be missing. Data will be stale. The
   model will hallucinate confidently. Design for failure first.

Four defense mechanisms to define:

1. Pre-Checks (before calling model):

What must be true before calling the model?
Enough context present?
Data fresh enough?
Required entities available?
If checks fail → block generation or ask clarifying questions

2. Post-Checks (after generation):

Did output follow constraints?
Does it match required schema?
Is it logically consistent?
Does it violate any rules?

3. Fallback Paths (when things break):

Partial answer with caveats?
Clarifying questions to user?
Conservative defaults?
Non-AI fallback experience?

4. Feedback Loops (how to improve):

Explicit ratings (thumbs up/down)
Implicit behavior (edits, corrections, abandonment)
Pattern detection across failures

Education moment:

💡 The best AI features degrade gracefully. Users trust systems
   that know their limits.

Path 1 Output

After completing all four dimensions, generate summary:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 4D CONTEXT CANVAS COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Feature: [Name]

  D1 Demand:      [CLEAR / NEEDS WORK / BLOCKED]
  D2 Data:        [CLEAR / NEEDS WORK / BLOCKED]
  D3 Discovery:   [CLEAR / NEEDS WORK / BLOCKED]
  D4 Defense:     [CLEAR / NEEDS WORK / BLOCKED]

Overall: [READY FOR ENGINEERING / NEEDS WORK / BLOCKED]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If blocked or needs work, list specific items to resolve.

Output options:

Add to Linear issue as comment
Create new Linear story with spec
Export as markdown
Export Context Requirements Table as spreadsheet format
Just show the summary

Path 2: Diagnose Existing Feature (Context Audit)

Purpose

Figure out why an existing AI feature is underperforming, hallucinating, or behaving inconsistently. Work backwards from symptoms to root cause.

Starting Point

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 DIAGNOSE EXISTING FEATURE — Context Audit
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 When AI features fail, teams blame the model. But 90% of failures
   are context failures—wrong data, missing data, stale data, or
   poorly structured data.

Let's find the root cause.

How do you want to start?

  1. From a Linear issue (I'll pull the details)
  2. Describe the feature and symptoms manually

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚠️ TOKEN MANAGEMENT (for Claude): When pulling from Linear, use get_issue for a single issue ID—don't search broadly. If searching, always use limit: 10 and get titles first before fetching full details.

Scope Check (for multi-issue features)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 SCOPE — What are we diagnosing?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 AI features often span multiple issues—a parent spec plus
   implementation tasks and bug reports. Diagnosing without the
   full picture leads to incomplete answers.

What's the scope?

  1. Single issue — One specific problem to diagnose
  2. Entire feature — A feature that spans multiple issues

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

If "Entire feature":

Ask for parent/overview issue ID, then use Linear MCP to find related issues.

⚠️ IMPLEMENTATION NOTE FOR CLAUDE: Linear queries can return massive amounts of data that exceed token limits. ALWAYS follow this pattern:

First query: titles only — Use list_issues with limit: 20 max
Count results — Report how many issues were found
Ask user preference — Before fetching full details
Selective fetch — Only get_issue on specifically selected issues

NEVER try to read all issue details in one query. This will fail.

Found 12 related issues:
  • 3 sub-issues
  • 2 blocked-by relations
  • 4 bugs referencing this feature
  • 3 other relations

⚠️  Loading all of them may be slow and increase cost.

How do you want to proceed?

  1. Smart summary — Pull titles + key details, summarize each
     (faster, cheaper, usually sufficient)

  2. Full context — Pull everything including comments
     (slower, more expensive, use for deep dives)

  3. Let me pick — Show me the list, I'll select what's relevant

Education moment:

💡 This is context engineering in action—we're deciding what's
   relevant vs. what's noise. Same trade-off you'll make for
   your AI features.

Symptoms Collection

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 SYMPTOMS — What's going wrong?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

What are you seeing? (Select all that apply)

  □ Hallucinations — Confidently wrong facts, made-up data
  □ Inconsistency — Different outputs for similar inputs
  □ Generic outputs — Feels like it doesn't know the user/context
  □ Wrong tone/format — Output doesn't match expectations
  □ Slow responses — Taking too long
  □ High costs — Token usage is out of control
  □ Works in demo, fails in prod — Different behavior in real conditions
  □ Other: ___

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Symptom-to-cause mapping:

Symptom	Likely Root Cause	Focus Area
Hallucinations	Missing domain context, no grounding	D2, D4
Inconsistency	Vague job definition, missing rules	D1, D4
Generic outputs	Missing user/environment context	D2
Wrong tone/format	Missing constraints, no examples	D1, D4
Slow responses	Too much context, bad discovery	D2, D3
High costs	Dumping everything in prompt	D2, D3
Demo vs prod	Discovery strategy broken	D3, D4

Audit D1 — Was the Job Defined?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 AUDIT D1: Was the model's job clearly defined?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Vague jobs cause vague outputs. "Make it personalized" is not
   a spec—it's a wish.

Diagnostic questions:

Can you articulate exactly what the model should produce?
- Hesitation = 🚨 Gap: Job never properly defined
Is there a written spec for inputs, outputs, constraints, success criteria?
- No = 🚨 Gap: No spec exists
Do engineers and PMs agree on what "good" looks like?
- No = 🚨 Gap: Misaligned expectations

D1 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]

Audit D2 — Is the Model Getting the Right Context?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 AUDIT D2: Is the model getting the right context?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Most hallucinations are context failures, not model failures.
   The model can only reason about what it sees.

Diagnostic questions:

What context is the model actually receiving today?
Walk through the 6 layers—what's present vs missing?

Layer	What It Is	Present?
Intent	What user actually wants (not just what they typed)	?
User	Preferences, patterns, history, proficiency	?
Domain	Entities, rules, relationships, definitions	?
Rules	Constraints, policies, formats, permissions	?
Environment	Current state, time, location, recent actions	?
Exposition	Structured, labeled, clean final payload	?

Is context structured or dumped as raw text?
- Dumped = 🚨 Gap: Unstructured context confuses models
Is there too much context? (Token bloat)
- Yes = 🚨 Gap: Over-stuffed prompt, model loses focus

Education moment:

💡 Common failure: Teams dump everything into the prompt hoping the
   model will "figure it out." It won't. Curate ruthlessly.

D2 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]

Audit D3 — Is Context Being Fetched Reliably?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 AUDIT D3: Is context being fetched reliably at runtime?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 This is where "it worked in the demo" dies. Demo uses hardcoded
   context. Production must fetch it live—and things break.

Diagnostic questions:

How is each piece of context being fetched?
- Hardcoded? = 🚨 Gap: Won't work at scale
- API that times out? = 🚨 Gap: Latency/reliability issue
- Cache that goes stale? = 🚨 Gap: Freshness issue
What happens when a data source is unavailable?
- Feature crashes? = 🚨 Gap: No fallback
- Silent failure? = 🚨 Gap: Model hallucinates to fill gap
Is there visibility into what context is being used per request?
- No = 🚨 Gap: Can't debug failures

D3 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]

Audit D4 — Are Failures Being Caught?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 AUDIT D4: Are failures being caught and handled?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 AI will fail. The question is whether users see raw failures or
   graceful degradation. Trust comes from knowing your limits.

Diagnostic questions:

Are there pre-checks before calling the model?
- No = 🚨 Gap: No validation before generation
Are there post-checks validating output?
- No = 🚨 Gap: No output validation
What's the fallback UX when things break?
- None designed = 🚨 Gap: Users see raw failures
Is there a feedback loop capturing failures?
- No = 🚨 Gap: Same failures repeat

D4 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]

Path 2 Output

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CONTEXT AUDIT COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Feature: [Name]
Symptoms: [What was reported]

  D1 Demand:      [CLEAR / GAP / CRITICAL]
  D2 Data:        [CLEAR / GAP / CRITICAL]
  D3 Discovery:   [CLEAR / GAP / CRITICAL]
  D4 Defense:     [CLEAR / GAP / CRITICAL]

Primary Issue: [e.g., "Missing user context (D2) + no fallback (D4)"]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

RECOMMENDED FIXES (prioritized):

1. [Highest impact fix]
2. [Second fix]
3. [Third fix]

Quick Win: [Smallest change that would improve things]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Output options:

Add to Linear issue as comment
Create fix stories in Linear
Export as markdown
Just show the summary

Path 3: Quick Quality Check

Purpose

Fast 5-check validation of context quality. Use before shipping, during code review, or when reviewing a prompt/payload.

Starting Point

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 QUICK QUALITY CHECK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 All hallucinations are context failures before they're model
   failures. This checklist catches problems before users do.

5 checks. 5 minutes. Use before shipping or during review.

What are you checking?

  1. A prompt/context payload (paste it)
  2. A feature spec (describe it)
  3. A Linear issue (I'll pull it)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Check 1: RELEVANCE

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CHECK 1: RELEVANCE — Is everything here necessary?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 More context ≠ better. Irrelevant context confuses the model,
   increases cost, and slows responses.

Does every piece of context directly contribute to the task?
Is there anything "kind of related" that could be cut?
Is there decorative metadata that doesn't help reasoning?

Relevance: [PASS / NEEDS WORK]

Check 2: FRESHNESS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CHECK 2: FRESHNESS — Is the data current enough?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Stale context = stale outputs. A model reasoning about yesterday's
   data will give yesterday's answers.

Are timestamps recent enough for this task?
Are metrics, dashboards, logs up to date?
Could cached data be invalid for this request?

Freshness: [PASS / NEEDS WORK]

Check 3: SUFFICIENCY

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CHECK 3: SUFFICIENCY — Does the model have enough to reason?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Missing context forces the model to guess. Guessing = hallucinating.
   If the model needs it to reason, it must be provided.

Are all required entities present?
Are dependencies, relationships, history included?
Could the model answer correctly without guessing?

Sufficiency: [PASS / NEEDS WORK]

Check 4: STRUCTURE

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CHECK 4: STRUCTURE — Is context organized clearly?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Dumping raw text forces the model to parse meaning. Structured,
   labeled sections reduce ambiguity and improve accuracy.

Is context broken into labeled sections?
Are relationships explicitly described (not implied)?
Is domain knowledge structured (not prose blobs)?

Structure: [PASS / NEEDS WORK]

Check 5: CONSTRAINTS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CHECK 5: CONSTRAINTS — Are the rules explicit?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Prompts are suggestions. The model will eventually ignore them.
   Hard rules must be enforced outside the prompt or stated as
   non-negotiable constraints.

Are business rules explicitly stated?
Are tone, formatting, domain rules included?
Is permission logic represented accurately?
Are prohibited actions clearly listed?

Constraints: [PASS / NEEDS WORK]

Path 3 Output

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CONTEXT QUALITY CHECK COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  ┌─────────────┬────────────┐
  │ Check       │ Result     │
  ├─────────────┼────────────┤
  │ Relevance   │ ✓ PASS     │
  │ Freshness   │ ✓ PASS     │
  │ Sufficiency │ ⚠ NEEDS WORK │
  │ Structure   │ ✓ PASS     │
  │ Constraints │ ⚠ NEEDS WORK │
  └─────────────┴────────────┘

  Overall: 3/5 PASSING — Fix issues before shipping

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

ISSUES TO FIX:

[List specific issues found with concrete recommendations]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

💡 Pro tip: Run this check again after fixes to confirm resolution.

Output options:

Add to Linear issue as comment
Export as markdown
Just show the summary

Linear Integration

When Linear MCP is available:

Pulling issues:

Use mcp__plugin_hb-tools_linear__get_issue to fetch issue details
Use mcp__plugin_hb-tools_linear__list_issues with parent filter to find related issues
Check for blocking/blocked-by relations

Creating output:

Use mcp__plugin_hb-tools_linear__create_comment to add canvas/audit as comment
Use mcp__plugin_hb-tools_linear__create_issue to create new stories from specs
Apply appropriate labels: context-engineering, ai-feature

Integration with Other Commands

Before /context-engineering:

/four-risks - Validate the feature is worth building at all

After /context-engineering:

/ai-cost-check - Model the unit economics
/ai-health-check - Pre-launch validation

The sequence:

Is this worth building? (/four-risks)
How do we spec it correctly? (/context-engineering)
Can we afford it? (/ai-cost-check)
Is it ready to ship? (/ai-health-check)

Key Concepts Reference

The 6 Layers of Context

Every AI system needs these layers (bottom to top):

Intent - What user actually means, not what they typed
User - Preferences, patterns, history, proficiency
Domain - Entities, rules, relationships, definitions
Rules - Constraints, policies, formats, permissions
Environment - Current state, time, location, recent actions
Exposition - Final structured, clean payload the model sees

The 4D Canvas Summary

Demand - What the model must do (precise job spec)
Data - What context is required (requirements table)
Discovery - How to get context at runtime (fetch strategy)
Defense - What happens when it fails (guardrails)

Context Quality Checklist

Relevance - Only what's necessary
Freshness - Current enough for the task
Sufficiency - Everything needed to reason
Structure - Organized with clear labels
Constraints - Rules explicitly stated

Attribution

Source framework: 4D Context Canvas, 6 Layers of Context, C.E.O. Framework Authors: Aakash Gupta & Miqdad Jaffer (OpenAI) Publication: "The Ultimate Guide to Context Engineering for PMs" - Product Growth Newsletter, 2025

Remember

Context engineering > prompt engineering
90% of AI quality comes from context quality
PMs own the what and why; engineers own the how
Design for failure first
Context is your only durable moat

Install Skill

SKILL.md

Context Engineering for AI Products

Core Philosophy

The 90/10 Mismatch

PM's Role in Context Engineering

Entry Point

Path 1: Spec New Feature (4D Context Canvas)

Purpose

Starting Point

D1 — DEMAND: What's the Model's Job?

D2 — DATA: What Context Is Required?

D3 — DISCOVERY: How Will You Get Context at Runtime?

D4 — DEFENSE: What Happens When It Fails?

Path 1 Output

Path 2: Diagnose Existing Feature (Context Audit)

Purpose

Starting Point

Scope Check (for multi-issue features)

Symptoms Collection

Audit D1 — Was the Job Defined?

Audit D2 — Is the Model Getting the Right Context?

Audit D3 — Is Context Being Fetched Reliably?

Audit D4 — Are Failures Being Caught?

Path 2 Output

Path 3: Quick Quality Check

Purpose

Starting Point

Check 1: RELEVANCE

Check 2: FRESHNESS

Check 3: SUFFICIENCY

Check 4: STRUCTURE

Check 5: CONSTRAINTS

Path 3 Output

Linear Integration

Integration with Other Commands

Key Concepts Reference

The 6 Layers of Context

The 4D Canvas Summary

Context Quality Checklist

Attribution

Remember