| name | QuantConnect Validation |
| description | QuantConnect walk-forward validation and Phase 5 robustness testing (project) |
QuantConnect Validation Skill (Phase 5)
Purpose: Walk-forward validation for Phase 5 robustness testing before deployment.
Progressive Disclosure: This primer contains essentials only. Full details available via qc_validate.py docs command.
When to Use This Skill
Load when:
- Running
/qc-validatecommand - Testing out-of-sample performance
- Evaluating strategy robustness
- Making deployment decisions
Tool: Use python SCRIPTS/qc_validate.py for walk-forward validation
Walk-Forward Validation Overview
Purpose: Detect overfitting and ensure strategy generalizes to new data.
Approach:
- Training (in-sample): Develop/optimize on 80% of data
- Testing (out-of-sample): Validate on remaining 20%
- Compare: Measure performance degradation
Example (5-year backtest 2019-2023):
- In-sample: 2019-2022 (4 years) - Training period
- Out-of-sample: 2023 (1 year) - Testing period
Key Metrics
1. Performance Degradation
Formula: (IS Sharpe - OOS Sharpe) / IS Sharpe
| Degradation | Quality | Decision |
|---|---|---|
| < 15% | Excellent | Deploy with confidence |
| 15-30% | Acceptable | Deploy but monitor |
| 30-40% | Concerning | Escalate to human |
| > 40% | Severe | Abandon (overfit) |
Key Insight: < 15% degradation indicates robust strategy that generalizes well.
2. Robustness Score
Formula: OOS Sharpe / IS Sharpe
| Score | Quality | Interpretation |
|---|---|---|
| > 0.75 | High | Strategy robust across periods |
| 0.60-0.75 | Moderate | Acceptable but monitor |
| < 0.60 | Low | Strategy unstable |
Key Insight: > 0.75 indicates strategy maintains performance out-of-sample.
Quick Usage
Run Walk-Forward Validation
# From hypothesis directory with iteration_state.json
python SCRIPTS/qc_validate.py run --strategy strategy.py
# Custom split ratio (default 80/20)
python SCRIPTS/qc_validate.py run --strategy strategy.py --split 0.70
What it does:
- Reads project_id from
iteration_state.json - Splits date range (80/20 default)
- Runs in-sample backtest
- Runs out-of-sample backtest
- Calculates degradation and robustness
- Saves results to
PROJECT_LOGS/validation_result.json
Analyze Results
python SCRIPTS/qc_validate.py analyze --results PROJECT_LOGS/validation_result.json
Output:
- Performance comparison table
- Degradation percentage
- Robustness assessment
- Deployment recommendation
Decision Integration
After validation, the decision framework evaluates:
DEPLOY_STRATEGY (Deploy with confidence):
- Degradation < 15% AND
- Robustness > 0.75 AND
- OOS Sharpe > 0.7
PROCEED_WITH_CAUTION (Deploy but monitor):
- Degradation < 30% AND
- Robustness > 0.60 AND
- OOS Sharpe > 0.5
ABANDON_HYPOTHESIS (Too unstable):
- Degradation > 40% OR
- Robustness < 0.5 OR
- OOS Sharpe < 0
ESCALATE_TO_HUMAN (Borderline):
- Results don't clearly fit above criteria
Best Practices
1. Time Splits
- Standard: 80/20 (4 years training, 1 year testing)
- Conservative: 70/30 (more OOS testing)
- Very Conservative: 60/40 (extensive testing)
Minimum OOS period: 6 months (1 year preferred)
2. Never Peek at Out-of-Sample
CRITICAL RULE: Never adjust strategy based on OOS results.
- OOS is for testing only
- Adjusting based on OOS defeats validation purpose
- If you adjust, OOS becomes in-sample
3. Check Trade Count
Both periods need sufficient trades:
- In-sample: Minimum 30 trades (50+ preferred)
- Out-of-sample: Minimum 10 trades (20+ preferred)
Too few trades = unreliable validation.
4. Compare Multiple Metrics
Don't just look at Sharpe:
- Sharpe Ratio degradation
- Max Drawdown increase
- Win Rate change
- Profit Factor degradation
- Trade Count consistency
All metrics should degrade similarly for robust strategy.
Common Issues
Severe Degradation (> 40%)
Cause: Strategy overfit to in-sample period
Example:
- IS Sharpe: 1.5 → OOS Sharpe: 0.6
- Degradation: 60%
Decision: ABANDON_HYPOTHESIS
Fix for next hypothesis: Simplify (fewer parameters), longer training period
Different Market Regimes
Cause: IS was bull market, OOS was bear market
Example:
- 2019-2022 (bull): Sharpe 1.2
- 2023 (bear): Sharpe -0.3
Decision: Not necessarily overfit, but not robust across regimes
Fix: Test across multiple regimes, add regime detection
Low Trade Count in OOS
Cause: Strategy stops trading in OOS period
Example:
- IS: 120 trades → OOS: 3 trades
Decision: ESCALATE_TO_HUMAN (insufficient OOS data)
Integration with /qc-validate
The /qc-validate command workflow:
- Read
iteration_state.jsonfor project_id and parameters - Load this skill for validation approach
- Modify strategy for time splits (80/20)
- Run in-sample and OOS backtests
- Calculate degradation and robustness
- Evaluate using decision framework
- Update
iteration_state.jsonwith results - Git commit with validation summary
Reference Documentation
Need implementation details? All reference documentation accessible via --help:
python SCRIPTS/qc_validate.py --help
That's the only way to access complete reference documentation.
Topics covered in --help:
- Walk-forward validation methodology
- Performance degradation thresholds
- Monte Carlo validation techniques
- PSR/DSR statistical metrics
- Common errors and fixes
- Phase 5 decision criteria
The primer above covers 90% of use cases. Use --help for edge cases and detailed analysis.
Related Skills
- quantconnect - Core strategy development
- quantconnect-backtest - Phase 3 backtesting (qc_backtest.py:**)
- quantconnect-optimization - Phase 4 optimization (qc_optimize.py:**)
- decision-framework - Decision thresholds
- backtesting-analysis - Metric interpretation
Key Principles
- OOS is sacred - Never adjust strategy based on OOS results
- Degradation < 15% is excellent - Strategy generalizes well
- Robustness > 0.75 is target - Maintains performance OOS
- Trade count matters - Need sufficient trades in both periods
- Multiple metrics - All should degrade similarly for robustness
Example Decision
In-Sample (2019-2022):
Sharpe: 0.97, Drawdown: 18%, Trades: 142
Out-of-Sample (2023):
Sharpe: 0.89, Drawdown: 22%, Trades: 38
Degradation: 8.2% (< 15%)
Robustness: 0.92 (> 0.75)
→ DEPLOY_STRATEGY (minimal degradation, high robustness)
Version: 2.0.0 (Progressive Disclosure) Last Updated: November 13, 2025 Lines: ~190 (was 463) Context Reduction: 59%