Claude Code Plugins

Community-maintained marketplace

Feedback

Master feature stores - Feast, data validation, versioning, online/offline serving

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name feature-stores
version 2.0.0
sasmp_version 1.3.0
description Master feature stores - Feast, data validation, versioning, online/offline serving
bonded_agent 03-data-pipelines
bond_type PRIMARY_BOND
category data_engineering
difficulty intermediate_to_advanced
estimated_hours 35
prerequisites mlops-basics
validation [object Object]
observability [object Object]

Feature Stores Skill

Learn: Build production feature stores for ML systems.

Skill Overview

Attribute Value
Bonded Agent 03-data-pipelines
Difficulty Intermediate to Advanced
Duration 35 hours
Prerequisites mlops-basics

Learning Objectives

  1. Understand feature store architecture
  2. Implement features with Feast
  3. Validate data quality with Great Expectations
  4. Serve features online and offline
  5. Version datasets with DVC

Topics Covered

Module 1: Feature Store Architecture (8 hours)

Components:

┌─────────────────────────────────────────────────────────────┐
│                   FEATURE STORE ARCHITECTURE                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │ Offline     │    │  Feature    │    │  Online     │     │
│  │ Store       │───▶│  Registry   │◀───│  Store      │     │
│  │ (Parquet)   │    │  (Metadata) │    │  (Redis)    │     │
│  └─────────────┘    └─────────────┘    └─────────────┘     │
│        │                   │                   │            │
│        ▼                   ▼                   ▼            │
│   [Training]         [Discovery]        [Inference]        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Exercises:

  • Design feature store for e-commerce use case
  • Compare Feast vs Tecton vs Hopsworks

Module 2: Feast Implementation (12 hours)

Feature Definition Example:

from feast import Entity, Feature, FeatureView, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

# Entity definition
customer = Entity(
    name="customer_id",
    value_type=ValueType.INT64,
    description="Customer identifier"
)

# Feature view
customer_features = FeatureView(
    name="customer_features",
    entities=["customer_id"],
    ttl=timedelta(days=7),
    schema=[
        Feature(name="total_purchases", dtype=Float32),
        Feature(name="avg_order_value", dtype=Float32),
        Feature(name="days_since_last_order", dtype=Int64),
    ],
    online=True,
    source=customer_stats_source
)

Exercises:

  • Set up Feast repository locally
  • Create entity and feature views
  • Materialize features to online store
  • Retrieve features for training and inference

Module 3: Data Validation (8 hours)

Great Expectations Setup:

import great_expectations as gx

# Create validation suite
suite = context.add_expectation_suite("ml_data_validation")

# Add expectations
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToNotBeNull(
        column="target",
        mostly=0.99
    )
)

suite.add_expectation(
    gx.expectations.ExpectColumnMeanToBeBetween(
        column="feature_a",
        min_value=0.0,
        max_value=100.0
    )
)

Module 4: Data Versioning (7 hours)

DVC Workflow:

# Initialize DVC
dvc init

# Add data to tracking
dvc add data/training_data.parquet

# Push to remote storage
dvc push

# Checkout specific version
git checkout v1.0.0
dvc checkout

Code Templates

Template: Feature Engineering Pipeline

# templates/feature_pipeline.py
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class FeaturePipeline(BaseEstimator, TransformerMixin):
    """Production feature engineering pipeline."""

    def __init__(self, config: dict):
        self.config = config
        self.feature_names = []

    def fit(self, X: pd.DataFrame, y=None):
        """Learn feature statistics."""
        self.means = X.select_dtypes(include=['number']).mean()
        self.stds = X.select_dtypes(include=['number']).std()
        return self

    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        """Apply feature transformations."""
        X = X.copy()

        # Numerical normalization
        for col in X.select_dtypes(include=['number']).columns:
            X[f"{col}_normalized"] = (X[col] - self.means[col]) / self.stds[col]

        # Temporal features
        for col in self.config.get("datetime_columns", []):
            X[f"{col}_hour"] = pd.to_datetime(X[col]).dt.hour
            X[f"{col}_dow"] = pd.to_datetime(X[col]).dt.dayofweek

        return X

Troubleshooting Guide

Issue Cause Solution
Slow feature serving Online store bottleneck Scale Redis, add caching
Training-serving skew Different transformations Use unified feature pipeline
Stale features Materialization lag Increase refresh frequency

Resources


Version History

Version Date Changes
2.0.0 2024-12 Production-grade with Feast examples
1.0.0 2024-11 Initial release