| name | add-embedding-support |
| description | Add Qdrant embedding support to v3 WordPress components for RAG chatbot. Implements component-level content chunking for searchable, structured embeddings. Use when adding embedding to new or existing v3 components. |
Add Embedding Support Skill
You are helping add Qdrant embedding support to WordPress v3 components. This enables component content to be indexed and searched via a RAG-based chatbot powered by Claude's API.
System Overview
The embedding system:
- Chunks at component level: Each component becomes one or more embedding chunks
- Avoids sub-component loading: Write extraction code directly in the component class
- Supports sections: Complex components add multiple sections (sub-chunks) per instance
- Tracks metadata: Links, dates, and custom metadata stored separately
- Respects skip markers: Components can opt-out via
ComponentEmbeddingSkipAwareInterface
How It Works
- CLI command
wp vendi embedding:generateruns - Sets global constant
VENDI_RENDER_CONTEXTtoRenderingContextEnum::EMBEDDING - Loads each component via
vendi_load_component_v3() - Template detects context and returns component instance (no HTML rendering)
- Component's
getEmbedding()method extracts structured data ComponentEmbeddingDTO formats data into JSON chunks for Qdrant
Output Format
Each component produces a JSON object like this:
{
"content": "Heading: Ask a Researcher\nBody: Are you a CRNA with research questions?\nLinks: Contact us",
"metadata": {
"type": "page",
"url": "https://example.com/page/",
"created": "2022-11-29T21:01:08+00:00",
"updated": "2024-03-07T09:07:06+00:00",
"links": [
{
"text": "Contact us",
"url": "https://example.com/contact/"
}
],
"component_type": "content_callout_full_width"
},
"id": "660-3"
}
Component Type Classification
Embeddable Components
- Implements
ComponentEmbeddingAwareInterface - Provides
getEmbedding()method - Content is indexed for chatbot
Skippable Components
- Implements
ComponentEmbeddingSkipAwareInterface(marker interface) - No
getEmbedding()method needed - Ignored during embedding generation
- Use for: ads, navigation, forms, decorative elements
Simple Components
- Single chunk with heading and/or body
- No repeater fields
- Auto-extraction via interfaces
Complex Components
- Multiple sections from repeater/flexible content
- Each item becomes a separate section (sub-chunk)
- May include links/CTAs tracked in metadata
Implementation Patterns
Pattern 1: Simple Component (Single Chunk)
When to use: Component has just heading and/or body copy, no repeater fields
Choosing the Right Interfaces
IMPORTANT: Inspect the actual template file to determine which interfaces to implement:
PrimaryHeadingInterface- Use when template displays a component-level heading (outside loops)- Example:
<h2><?php esc_html_e(get_sub_field('headline')); ?></h2>at the top level - NOT for headings inside repeater loops
- Example:
PrimaryCopyInterface- Use when template displays component-level body/intro copy (outside loops)- Example:
<?php echo wp_kses_post(get_sub_field('intro_copy')); ?>before any repeaters - NOT for copy inside repeater loops
- Example:
The interfaces should map to what actually exists in the template structure.
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; // If template has top-level heading
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; // If template has top-level copy
use Vendi\Theme\DTO\Embedding\ComponentEmbedding;
use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
class simple_component extends BaseComponent implements
ComponentEmbeddingAwareInterface,
PrimaryHeadingInterface, // Only if template has top-level heading
PrimaryCopyInterface // Only if template has top-level copy
{
public function getEmbedding(): ?ComponentEmbeddingInterface
{
return ComponentEmbedding::fromComponent($this);
}
public function getPrimaryHeadingText(): ?string
{
// Return the field that corresponds to the top-level heading in template
return get_sub_field('headline');
}
public function getPrimaryCopy(): ?string
{
// Return the field that corresponds to the top-level copy in template
return get_sub_field('copy');
}
}
Output
Heading: [from getPrimaryHeadingText() if interface implemented]
Body: [from getPrimaryCopy() if interface implemented]
Key Points:
- Inspect template first to determine which interfaces are needed
fromComponent()auto-extracts heading and body via interfaces- Single chunk per component instance
- No manual section creation needed
- Don't guess at structure - base decision on actual template code
Pattern 2: Skippable Component (No Embedding)
When to use: Ads, navigation, forms, decorative/visual-only elements
Required Interface
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface;
Class Implementation
class ad_component extends VendiComponent implements ComponentEmbeddingSkipAwareInterface
{
// No getEmbedding() method needed
// Component completely ignored during embedding generation
}
Key Points:
- Empty marker interface
- No embedding logic required
- Still add template boilerplate (see Template Requirements)
Pattern 3: Complex Component with Sections
When to use: Component has repeater or flexible content fields where each item should be a separate section
Required Interfaces
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface;
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface;
use Vendi\Theme\DTO\Embedding\ComponentEmbedding;
use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface;
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
// Start with base embedding (auto-extracts heading/body from interfaces)
$ret = ComponentEmbedding::fromComponent($this);
// Loop through repeater field
while (have_rows('items')) {
the_row();
$layout = get_row_layout();
// CRITICAL: Filter to relevant layouts only
if (!in_array($layout, ['content_item', 'text_block'], true)) {
continue;
}
$heading = get_sub_field('heading');
$copy = get_sub_field('copy');
// CRITICAL: Always clean HTML from user content
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
// Add section with optional custom label
$ret->addSection(
$heading . PHP_EOL . $cleanCopy,
'Section' // Optional: 'FAQ Item', 'Testimonial', etc.
);
}
return $ret;
}
Output
Heading: [component main heading]
Body: [component intro copy]
Section 1: Item 1 Heading
[item 1 copy]
Section 2: Item 2 Heading
[item 2 copy]
Key Points:
- Filter layouts to process only relevant types
- Use
stripAllHtmlFromText()for all HTML content - Each
addSection()creates a separate sub-chunk - Sections are auto-numbered (Section 1, Section 2, etc.)
Pattern 4: Component with Links/CTAs
When to use: Component has call-to-action buttons or links that should be tracked in metadata
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
while (have_rows('cards')) {
the_row();
$heading = get_sub_field('heading');
$copy = get_sub_field('copy');
$link = get_sub_field('cta');
// Build structured content with labels
$contentParts = [];
if ($heading) {
$contentParts[] = 'Heading: ' . $heading;
}
if ($copy) {
$contentParts[] = 'Body: ' . $copy;
}
if ($link && is_array($link)) {
$contentParts[] = 'Link: ' . $link['title'];
}
// Only add section if there's content
if ($content = implode(PHP_EOL, array_filter($contentParts))) {
$ret->addSection($content);
}
// CRITICAL: Track link separately in metadata
if ($link && is_array($link)) {
$ret->addLink(
linkText: $link['title'] ?? '',
linkUrl: $link['url'] ?? ''
);
}
}
return $ret;
}
Output
{
"content": "Heading: Component Title\nBody: Intro text\nLinks: Card 1 CTA, Card 2 CTA\nSection 1:\nHeading: Card 1\nBody: Card 1 copy\nLink: Card 1 CTA",
"metadata": {
"links": [
{
"text": "Card 1 CTA",
"url": "/page1/"
},
{
"text": "Card 2 CTA",
"url": "/page2/"
}
],
"component_type": "card_navigation"
}
}
Key Points:
- Links appear in both content text and metadata
- Metadata links enable advanced RAG features
- Use structured content with labels (Heading:, Body:, Link:)
- Filter empty content before adding sections
Pattern 5: Component with HTML Content Containing Links
When to use: Component has HTML content (bios, articles, descriptions) with embedded <a> tags that should be tracked
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
while (have_rows('items')) {
the_row();
$name = get_sub_field('name');
$bio = get_sub_field('bio'); // Contains HTML with links
// CRITICAL: Extract links BEFORE stripping HTML
// Use name as prefix for context
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name);
// Now strip HTML for text content
$cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio);
$ret->addSection(
'Name: ' . $name . PHP_EOL . 'Bio: ' . $cleanBio,
'Person'
);
}
return $ret;
}
Output
If bio contains: <p>Follow me on <a href="https://twitter.com/jdoe">Twitter</a></p>
{
"content": "Person 1: Name: John Doe\nBio: Follow me on Twitter",
"metadata": {
"links": [
{"text": "John Doe Twitter", "url": "https://twitter.com/jdoe"}
]
}
}
Key Points:
- Call
extractAndAddLinksFromHtml()BEFOREstripAllHtmlFromText() - Use contextual prefix (name, title, etc.) to avoid duplicate generic link text
- Links preserved in metadata even after HTML is stripped from content
Pattern 6: Component with Related Posts
When to use: Component displays content from related WP_Post objects (testimonials, people, etc.)
Class Implementation
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
foreach ($this->getRelatedPosts() as $post) {
// CRITICAL: Validate post object before accessing fields
if (!$post instanceof WP_Post) {
continue;
}
$name = get_field('name', $post->ID);
$bio = get_field('bio', $post->ID);
// Clean HTML and add with custom section label
$ret->addSection(
$name . PHP_EOL . ComponentEmbedding::stripAllHtmlFromText($bio),
'Person' // Custom label: 'Testimonial', 'Team Member', etc.
);
}
return $ret;
}
Key Points:
- Always check
instanceof WP_Postbefore accessing post fields - Access fields with post ID:
get_field('field_name', $post->ID) - Use descriptive section labels
Template File Requirements
CRITICAL: Every embeddable component template must include this boilerplate at the top.
Required Boilerplate
<?php
use Vendi\Theme\Component\{component_name};
use Vendi\Theme\ComponentUtility;
use Vendi\Theme\Enums\RenderingContextEnum;
/** @var {component_name} $component */
$component = ComponentUtility::get_new_component_instance({component_name}::class);
// CRITICAL: Early return for embedding context
if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) {
return $component;
}
if (!$component->renderComponentWrapperStart()) {
return;
}
?>
<!-- HTML template here -->
<?php
$component->renderComponentWrapperEnd();
Why This Matters
Without the embedding context check:
- Template will render HTML instead of returning component instance
getEmbedding()method will never be called- Component will be skipped in embedding output
This boilerplate is required even for skippable components (for consistency).
Key Methods & Utilities
ComponentEmbedding Static Factory
fromComponent($this)
Purpose: Create base embedding with auto-extraction
Auto-extracts:
- Component type (class short name)
- Post ID and URL
- Creation and modification dates
- Primary heading (if
PrimaryHeadingInterfaceimplemented - based on template inspection) - Primary body copy (if
PrimaryCopyInterfaceimplemented - based on template inspection)
Usage: Always first line of getEmbedding()
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
// ... add sections, links, etc.
return $ret;
}
Note: The heading and body auto-extraction only works if you've implemented the corresponding interfaces based on what actually exists in the template (see Pattern 1 for details).
Content Building Methods
addSection(string $text, string $sectionLabel = 'Section')
Adds a labeled section to the embedding. Sections are auto-numbered (Section 1, Section 2, etc.).
Best Practice: Use descriptive labels
// Good: Descriptive
$ret->addSection($content, 'Testimonial');
$ret->addSection($content, 'FAQ Item');
$ret->addSection($content, 'Team Member');
// Acceptable: Default auto-numbering
$ret->addSection($content); // "Section 1", "Section 2", etc.
addLink(string $linkText, string $linkUrl)
Adds a link to metadata. Links stored separately from content text for advanced RAG features.
if ($link && is_array($link)) {
$ret->addLink(
linkText: $link['title'] ?? '',
linkUrl: $link['url'] ?? ''
);
}
extractAndAddLinksFromHtml(ComponentEmbedding $embedding, ?string $html, string $linkPrefix = '')
Purpose: Extracts all <a> tags from HTML content and adds them to the embedding's link metadata.
When to use: When content contains HTML with embedded links that should be tracked separately (e.g., biographical text with social media links, articles with reference links).
Parameters:
$embedding- The ComponentEmbedding instance to add links to$html- HTML content to parse for links$linkPrefix- Optional prefix to add context to link text (e.g., person name)
Features:
- Uses DOMDocument for reliable HTML parsing
- Extracts both href and link text
- Filters out links missing href or text
- Adds contextual prefix when provided (useful for avoiding duplicate generic link text)
Usage:
// Basic usage - extract links from HTML
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $htmlContent);
// With prefix for context (recommended when looping through items)
foreach ($persons as $person) {
$name = $person->name;
$bio = $person->bio; // Contains <a href="...">Twitter</a>, <a href="...">LinkedIn</a>
// Prefix links with person name: "John Doe Twitter", "John Doe LinkedIn"
ComponentEmbedding::extractAndAddLinksFromHtml($ret, $bio, $name);
// Clean HTML after extracting links
$cleanBio = ComponentEmbedding::stripAllHtmlFromText($bio);
$ret->addSection("Name: $name\nBio: $cleanBio", 'Person');
}
Why use linkPrefix: Without prefix, 20 people with Twitter links produces 20 identical "Twitter" entries. With prefix, you get "Chris Haas Twitter", "Jane Smith Twitter", etc., providing essential context.
Important: Call extractAndAddLinksFromHtml() BEFORE stripAllHtmlFromText() to preserve the links before HTML is removed.
HTML Cleaning Utility
stripAllHtmlFromText(?string $text, bool $preserveLists = false)
CRITICAL: Always use this for user-entered HTML content
Features:
- Removes
<script>,<style>,<form>tags and HTML comments - Strips all remaining HTML tags
- Decodes HTML entities (
&→&) - Collapses whitespace
- Optional: Preserves list structure with proper formatting
Usage:
// DO THIS:
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
$ret->addSection($cleanCopy);
// NOT THIS:
$ret->addSection($copy); // May contain <div>, <p>, <br> tags
Best Practices
1. Avoid Loading Sub-Components
VERY IMPORTANT: Write extraction code directly in getEmbedding(). Do NOT load sub-components.
Strongly Preferred:
public function getEmbedding(): ?ComponentEmbeddingInterface
{
$ret = ComponentEmbedding::fromComponent($this);
// Write code directly - NO sub-component loading
while (have_rows('items')) {
the_row();
$ret->addSection(get_sub_field('copy'));
}
return $ret;
}
Avoid:
// DON'T load sub-components during embedding
vendi_load_component_v3(['parent', 'child']);
Why: The system hasn't found a good pattern for sub-component loading in embeddings yet. Keep it simple and direct.
2. Always Clean HTML from User Content
// CORRECT:
$cleanCopy = ComponentEmbedding::stripAllHtmlFromText($copy);
$ret->addSection($cleanCopy);
// WRONG:
$ret->addSection($copy); // HTML tags leak into embedding
5. Use Structured Content with Labels
Makes content more parseable by the RAG system:
$contentParts = [];
if ($heading) {
$contentParts[] = 'Heading: ' . $heading;
}
if ($subheading) {
$contentParts[] = 'Subheading: ' . $subheading;
}
if ($copy) {
$contentParts[] = 'Body: ' . ComponentEmbedding::stripAllHtmlFromText($copy);
}
if ($link) {
$contentParts[] = 'Link: ' . $link['title'];
}
$ret->addSection(implode(PHP_EOL, $contentParts));
Implementation Checklist
Step 1: Inspect Template File
- Read the component's template file (
.php) to understand its structure - Identify if there's a top-level heading (outside any loops) → Consider
PrimaryHeadingInterface - Identify if there's top-level body/intro copy (outside any loops) → Consider
PrimaryCopyInterface - Note any repeater fields that should become sections
- Note any links/CTAs that should be tracked in metadata
Step 2: Class File Changes
- Add use statements at top of file:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingAwareInterface; use Vendi\Theme\DTO\Embedding\ComponentEmbedding; use Vendi\Theme\DTO\Embedding\ComponentEmbeddingInterface; - Only if template has top-level heading: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryHeadingInterface; - Only if template has top-level copy: Add interface use statement:
use Vendi\Theme\ComponentInterfaces\PrimaryCopyInterface; - Implement
ComponentEmbeddingAwareInterfacein class declaration - Only if template has top-level heading: Implement
PrimaryHeadingInterface - Only if template has top-level copy: Implement
PrimaryCopyInterface - Add
getEmbedding(): ?ComponentEmbeddingInterfacemethod - If using
PrimaryHeadingInterface: AddgetPrimaryHeadingText(): ?stringreturning the appropriate field - If using
PrimaryCopyInterface: AddgetPrimaryCopy(): ?stringreturning the appropriate field
Step 3: Template File Changes
- Add use statement at top:
use Vendi\Theme\Enums\RenderingContextEnum; - Add embedding context check after component instantiation:
if (defined('VENDI_RENDER_CONTEXT') && VENDI_RENDER_CONTEXT === RenderingContextEnum::EMBEDDING->value) { return $component; }
Step 4: getEmbedding() Implementation
- Start with
$ret = ComponentEmbedding::fromComponent($this); - Loop through any repeater/flexible content fields
- Filter layouts to relevant types only (
in_array()check) - Use
stripAllHtmlFromText()for all HTML content - Add sections with
addSection()for each logical chunk - Add links with
addLink()if component has CTAs - Validate WP_Post objects with
instanceofbefore accessing fields - Filter empty content before adding sections
- Write code directly (do NOT load sub-components)
- Return
$ret
For Skippable Components Only
- Add use statement:
use Vendi\Theme\ComponentInterfaces\ComponentEmbeddingSkipAwareInterface; - Implement
ComponentEmbeddingSkipAwareInterfacein class declaration - Do NOT implement
ComponentEmbeddingAwareInterface - Still add template boilerplate (for consistency)
- No
getEmbedding()method needed
Testing
After implementation, test with the CLI command:
wp vendi embedding:generate
This command:
- Iterates through all published posts/pages
- Sets
VENDI_RENDER_CONTEXTtoEMBEDDING - Loads each component
- Calls
getEmbedding()on embeddable components - Outputs structured JSON for Qdrant
Verify Output
Check the JSON output for:
- ✅ Component appears in embedding data
- ✅ Heading and body extracted correctly
- ✅ Sections appear as separate chunks (Section 1, Section 2, etc.)
- ✅ Links tracked in metadata
- ✅ HTML stripped from content (no
<div>,<p>,<br>tags) - ✅ Content is readable and well-structured
Sample Output Format
{
"content": "Heading: Research Topics\nSection 1: AANA's Current Priorities\nWhat are healthcare executives' perceptions...",
"metadata": {
"type": "page",
"url": "https://example.com/page/",
"created": "2022-11-29T21:01:08+00:00",
"updated": "2024-03-07T09:07:06+00:00",
"component_type": "accordion"
},
"id": "660-2"
}
Common Pitfalls
- Forgetting to clean HTML: Always use
stripAllHtmlFromText()on user content - Loading sub-components: Write extraction code directly in
getEmbedding() - Missing template boilerplate: Component will render HTML instead of being embedded
- Not filtering layouts: Process only relevant flexible content layouts
- Not validating WP_Post: Check
instanceof WP_Postbefore accessing post fields - Adding empty sections: Filter content before calling
addSection() - Forgetting to return component: Template must
return $component;in embedding context - Extracting links after stripping HTML: Call
extractAndAddLinksFromHtml()BEFOREstripAllHtmlFromText() - Missing link context: Use linkPrefix parameter when looping through items to avoid duplicate generic link text
Reference Examples
Examine these components for real-world patterns:
- basic_copy_block - Simple: Single chunk with heading/body
- ad_row - Skippable: Marked with skip interface
- accordion - Complex: Multiple accordion items as sections
- card_navigation - Complex: Cards with CTAs tracked as links
- testimonial - Related Posts: WP_Post objects as sections with custom label
- people_image_grid - Complex: Loops through people, extracts links from bio HTML with name prefix, creates person sections
All located in: vendi-theme-parts/components/[component_name]/[component_name].class.php
Your Role
Guide the user through implementing embedding support for a v3 component:
- Read the template file: Inspect the actual
.phptemplate to understand structure - Identify top-level content: Determine if component has top-level heading and/or copy (outside loops)
- Determine pattern: Is it simple, complex, skippable? Does it have repeaters? Links?
- Choose interfaces: Based on template inspection, decide which interfaces to implement
- Present implementation plan: Describe changes needed with specific field names from template
- Implement changes: Update class and template files
- Test: Run
wp vendi embedding:generateand verify output
Remember:
- Always start by reading the template file - don't guess at structure
- Implement
PrimaryHeadingInterfaceonly if template has top-level heading (outside loops) - Implement
PrimaryCopyInterfaceonly if template has top-level copy (outside loops) - User handles
git addandgit commit- you should NOT run these - Write embedding extraction code directly (avoid sub-component loading)
- Always clean HTML from user content
- Use structured content with labels for better RAG performance