| name | ff-statistical-methods |
| description | Expert guidance on statistical analysis methodologies and Monte Carlo simulation for fantasy football. Use this skill when selecting regression approaches, designing simulations, performing variance analysis, or conducting hypothesis tests. Covers regression types (OLS, Ridge, Lasso, GAMs), Monte Carlo frameworks, regression-to-mean analysis, and statistical best practices for player performance modeling. |
Statistical Analysis & Simulation for Fantasy Football
Overview
Provide expert guidance on statistical methodologies and simulation techniques for fantasy football analytics. Apply appropriate regression methods, design Monte Carlo simulations, perform variance analysis, and conduct hypothesis tests using research-backed approaches.
When to Use This Skill
Trigger this skill for queries involving:
- Regression selection: "Should I use OLS or Lasso?" "When to use GAMs?" "Ridge vs Elastic Net?"
- Monte Carlo simulation: "How do I simulate rest-of-season?" "Calculate championship probability?" "Quantify trade impact?"
- Variance analysis: "Identify regression-to-mean candidates?" "Calculate confidence intervals?" "Prediction intervals?"
- Statistical testing: "Hypothesis test for performance trends?" "Is this improvement significant?"
- Aging curves: "Model non-linear age effects?" "GAMs for position-specific curves?"
- Uncertainty quantification: "Error bars on projections?" "Probability of outcomes?"
Note: For ML model selection and feature engineering, use ff-ml-modeling. For dynasty strategy domain knowledge, use ff-dynasty-strategy.
Core Capabilities
1. Regression Methods
Decision Framework:
Linear Regression (OLS): Baseline, interpretability, small samples
Ridge (L2): Multicollinearity, keep all features, shrink coefficients
Lasso (L1): High-dimensional data, automatic feature selection, sparse models
Elastic Net: Best default for fantasy (combines Ridge + Lasso)
GAMs: Non-linear relationships (aging curves), interpretable smooth functions
Reference: references/regression_methods.md for detailed comparisons and Python code.
2. Monte Carlo Simulation
Applications:
- Rest-of-season projections with uncertainty
- Championship probability estimation
- Trade scenario impact analysis
- Lineup optimization under uncertainty
Core Approach:
# Simulate player week: Normal(projection, std_dev)
simulated_points = np.random.normal(projection, std_dev, n_sims=10000)
simulated_points = np.maximum(simulated_points, 0) # Floor at zero
Key Considerations:
- Iterations: 10,000 default (SE ≈ 0.5%), 100,000 for critical decisions
- Error correlation: QB and WRs are correlated, model synergies
- Path dependence: Update team ratings within simulations (FiveThirtyEight approach)
- Flaw of averages: Analyze full distribution, not just mean
Reference: references/simulation_design.md for frameworks, templates, and best practices.
Asset: assets/monte_carlo_template.py - Python templates for common simulations.
3. Regression to the Mean
Concept: Extreme values tend toward average in subsequent measurements
Fantasy Application:
- +TDOE (Touchdowns Over Expected): Declines 86% next year
- -TDOE: Improves 93% next year
- High TD rates regress downward, low TD rates improve
Position-Specific Sample Sizes (50% regression):
- QB: 21 games
- RB: 29-30 games
- WR: 13-14 games
- TE: ~20 games
Implementation:
regression_factor = sample_size / (sample_size + n_50[position])
regressed_estimate = (regression_factor * current_stat) + ((1 - regression_factor) * position_mean)
Reference: references/regression_methods.md section on regression to the mean.
4. Confidence vs Prediction Intervals
Confidence Interval: Uncertainty in estimated mean (narrow)
Prediction Interval: Uncertainty for new observation (wider - use this for player projections!)
Why it matters: Individual player performance has more variability than average performance
# Prediction interval accounts for both parameter uncertainty AND individual variance
margin = t_score * residual_standard_error
lower, upper = prediction - margin, prediction + margin
5. Generalized Additive Models (GAMs)
When to use: Non-linear relationships like aging curves
How it works: Fit smooth spline for each feature: y = β₀ + f₁(age) + f₂(experience) + ...
Fantasy use cases:
- Aging curves (inverted-U shapes for position-specific performance)
- Experience effects on production
- Visualize smooth trends
Research finding: GAMs reveal QB peaks at 28-33, RB declines post-27
Python:
from pygam import LinearGAM, s, f
# s() = smooth (non-linear), f() = factor (categorical)
gam = LinearGAM(s(0) + s(1) + f(2)) # age, experience, position
gam.fit(X_train, y_train)
# Visualize smooth curves
gam.partial_dependence(term=0) # Age curve
Reference: references/regression_methods.md section on GAMs with Python and R code.
Workflows
Choosing a Regression Method
Step 1: Define Goal
- Interpretation needed? → OLS or GAMs
- Prediction focus? → Consider regularization
Step 2: Check Data Characteristics
- Small sample (<100)? → OLS or Ridge
- High-dimensional (many features)? → Lasso or Elastic Net
- Multicollinearity (VIF > 5)? → Ridge or Elastic Net
- Non-linear patterns? → GAMs
Step 3: Baseline
- Always start with OLS to establish floor
Step 4: Regularization
- If overfitting, try Ridge/Lasso/Elastic Net
- Use cross-validation to select regularization strength
Step 5: Non-linearity
- If residuals show patterns, consider GAMs
- Particularly for aging curves
Designing a Monte Carlo Simulation
Step 1: Define Scenario
- What are you simulating? (Rest-of-season, trade impact, championship probability)
- What's the time horizon? (Weeks remaining)
Step 2: Gather Inputs
- Player projections (expected values)
- Standard deviations (from historical performance or model residuals)
- Correlations (QB-WR pairs, teammates)
Step 3: Build Simulation
- Use
assets/monte_carlo_template.pyas starting point - Implement correlated errors for teammates
- Consider path dependence if multi-week
Step 4: Run Simulations
- 10,000 iterations default
- 100,000 for final decisions
Step 5: Analyze Distribution
- Don't just report mean!
- Show percentiles (10th, 50th, 90th)
- Probability of exceeding thresholds
- Visualize histograms and CDFs
Analyzing Regression to the Mean
Step 1: Identify Extreme Performers
- Find players with unusually high/low TD rates
- Calculate TDOE (Touchdowns Over Expected)
Step 2: Check Sample Size
- How many games/opportunities?
- Compare to position-specific threshold (QB: 21, RB: 30, WR: 14)
Step 3: Apply Regression Formula
regression_factor = n / (n + n_50)regressed = (factor * current) + ((1 - factor) * mean)
Step 4: Identify Buy-Low / Sell-High
- Positive TDOE → Likely to regress down (sell high)
- Negative TDOE → Likely to improve (buy low)
- Volume matters more than TDs!
Identifying Data Requirements
For Regression Analysis:
- Dependent variable (fantasy points, production metrics)
- Independent variables (age, usage, efficiency stats)
- Historical data (3+ years for robust estimates)
- Position labels (for position-specific models)
For Monte Carlo Simulation:
- Player projections (weekly expected values)
- Historical variance (to estimate std dev)
- Roster compositions (for team simulations)
- Correlation structures (teammate relationships)
For Variance Analysis:
- Historical performance distributions
- Sample sizes (games played, opportunities)
- Position-specific baselines (for regression to mean)
Integrating with Other Skills
Complement with ff-ml-modeling when:
- Choosing between regression types and tree-based models
- Feature engineering informs what to include in regression
- Validation strategies apply to statistical models too
Complement with ff-dynasty-strategy when:
- Identifying TD regression candidates (TDOE analysis)
- Applying aging curves to trade decisions
- Understanding domain context for statistical findings
Best Practices
Start Simple - OLS baseline before complex methods
Regularize for High Dimensions - Use Lasso/Elastic Net when features > samples
Use GAMs for Clear Non-Linearity - Aging curves, experience effects
Model Correlations in Simulations - QB and WRs are correlated (ρ ≈ 0.6)
Sufficient Iterations - 10,000 minimum for stable estimates
Analyze Full Distribution - Percentiles and probabilities, not just means
Validate Assumptions - Plot residuals, check for patterns
Regression to Mean is Powerful - TDs regress, volume is king
Common Pitfalls
Ignoring non-linearity - Age curves aren't linear, use GAMs
Too few simulations - <1,000 gives unstable estimates
Independence assumptions - Teammates are correlated
Flaw of averages - Non-linear outcomes make "average" misleading
Over-interpreting small samples - NFL has only 17 games/season
Forgetting regression to mean - Extreme TDs will regress
Python Libraries
# Regression
from sklearn.linear_model import Ridge, Lasso, ElasticNet, LinearRegression
import statsmodels.api as sm # For statistical inference
from pygam import LinearGAM, s, f # GAMs
# Simulation
import numpy as np
import scipy.stats
# Analysis
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
References
references/regression_methods.md- OLS, Ridge, Lasso, Elastic Net, GAMs, regression to mean, confidence/prediction intervalsreferences/simulation_design.md- Monte Carlo frameworks, championship probability, trade impact, path dependence, error correlation
Assets
assets/monte_carlo_template.py- Python templates for rest-of-season simulation, championship probability, and trade impact analysis