name	feature-stores
version	2.0.0
sasmp_version	1.3.0
description	Master feature stores - Feast, data validation, versioning, online/offline serving
bonded_agent	03-data-pipelines
bond_type	PRIMARY_BOND
category	data_engineering
difficulty	intermediate_to_advanced
estimated_hours	35
prerequisites	mlops-basics
validation	[object Object]
observability	[object Object]

Feature Stores Skill

Name: feature-stores
Author: pluginagentmarketplace

Learn: Build production feature stores for ML systems.

Skill Overview

Attribute	Value
Bonded Agent	03-data-pipelines
Difficulty	Intermediate to Advanced
Duration	35 hours
Prerequisites	mlops-basics

Learning Objectives

Understand feature store architecture
Implement features with Feast
Validate data quality with Great Expectations
Serve features online and offline
Version datasets with DVC

Topics Covered

Module 1: Feature Store Architecture (8 hours)

Components:

┌─────────────────────────────────────────────────────────────┐
│                   FEATURE STORE ARCHITECTURE                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
│  │ Offline     │    │  Feature    │    │  Online     │     │
│  │ Store       │───▶│  Registry   │◀───│  Store      │     │
│  │ (Parquet)   │    │  (Metadata) │    │  (Redis)    │     │
│  └─────────────┘    └─────────────┘    └─────────────┘     │
│        │                   │                   │            │
│        ▼                   ▼                   ▼            │
│   [Training]         [Discovery]        [Inference]        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Exercises:

Design feature store for e-commerce use case
Compare Feast vs Tecton vs Hopsworks

Module 2: Feast Implementation (12 hours)

Feature Definition Example:

from feast import Entity, Feature, FeatureView, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

# Entity definition
customer = Entity(
    name="customer_id",
    value_type=ValueType.INT64,
    description="Customer identifier"
)

# Feature view
customer_features = FeatureView(
    name="customer_features",
    entities=["customer_id"],
    ttl=timedelta(days=7),
    schema=[
        Feature(name="total_purchases", dtype=Float32),
        Feature(name="avg_order_value", dtype=Float32),
        Feature(name="days_since_last_order", dtype=Int64),
    ],
    online=True,
    source=customer_stats_source
)

Exercises:

Set up Feast repository locally
Create entity and feature views
Materialize features to online store
Retrieve features for training and inference

Module 3: Data Validation (8 hours)

Great Expectations Setup:

import great_expectations as gx

# Create validation suite
suite = context.add_expectation_suite("ml_data_validation")

# Add expectations
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToNotBeNull(
        column="target",
        mostly=0.99
    )
)

suite.add_expectation(
    gx.expectations.ExpectColumnMeanToBeBetween(
        column="feature_a",
        min_value=0.0,
        max_value=100.0
    )
)

Module 4: Data Versioning (7 hours)

DVC Workflow:

# Initialize DVC
dvc init

# Add data to tracking
dvc add data/training_data.parquet

# Push to remote storage
dvc push

# Checkout specific version
git checkout v1.0.0
dvc checkout

Code Templates

Template: Feature Engineering Pipeline

# templates/feature_pipeline.py
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class FeaturePipeline(BaseEstimator, TransformerMixin):
    """Production feature engineering pipeline."""

    def __init__(self, config: dict):
        self.config = config
        self.feature_names = []

    def fit(self, X: pd.DataFrame, y=None):
        """Learn feature statistics."""
        self.means = X.select_dtypes(include=['number']).mean()
        self.stds = X.select_dtypes(include=['number']).std()
        return self

    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        """Apply feature transformations."""
        X = X.copy()

        # Numerical normalization
        for col in X.select_dtypes(include=['number']).columns:
            X[f"{col}_normalized"] = (X[col] - self.means[col]) / self.stds[col]

        # Temporal features
        for col in self.config.get("datetime_columns", []):
            X[f"{col}_hour"] = pd.to_datetime(X[col]).dt.hour
            X[f"{col}_dow"] = pd.to_datetime(X[col]).dt.dayofweek

        return X

Troubleshooting Guide

Issue	Cause	Solution
Slow feature serving	Online store bottleneck	Scale Redis, add caching
Training-serving skew	Different transformations	Use unified feature pipeline
Stale features	Materialization lag	Increase refresh frequency

Resources

Feast Documentation
Great Expectations Docs
DVC Documentation
[See: training-pipelines] - Use features in training

Version History

Version	Date	Changes
2.0.0	2024-12	Production-grade with Feast examples
1.0.0	2024-11	Initial release

feature-stores

Install Skill

SKILL.md

Feature Stores Skill

Skill Overview

Learning Objectives

Topics Covered

Module 1: Feature Store Architecture (8 hours)

Module 2: Feast Implementation (12 hours)

Module 3: Data Validation (8 hours)

Module 4: Data Versioning (7 hours)

Code Templates

Template: Feature Engineering Pipeline

Troubleshooting Guide

Resources

Version History