| name | crypto-hard-filter-simplification |
| description | Simplify crypto hard filters to essential checks only. Trigger when: (1) crypto symbols fail multiple filters, (2) data_quality/spread/trading_status fail for crypto, (3) yfinance data gaps causing false failures. |
| author | Claude Code |
| date | Sat Dec 28 2024 00:00:00 GMT+0000 (Coordinated Universal Time) |
Crypto Hard Filter Simplification (v2.5.0)
Experiment Overview
| Item | Details |
|---|---|
| Date | 2024-12-28 |
| Goal | Stop crypto symbols from failing irrelevant filters |
| Environment | alpaca_trading/selection/filters/hard_filters.py |
| Status | Success |
Context
User reported ALL 18 crypto symbols failing selection with different filter failures:
BTCUSD: failed price (max_price $10k but BTC is $87k)
ETHUSD: failed data_quality (51% zero-volume but 5% max allowed)
SOLUSD: failed trading_status (50% activity but 80% required)
DOGEUSD: failed volume (consistency 28% but 70% required)
Each filter had different asset-type assumptions that didn't apply to crypto.
Root Cause Analysis
| Filter | Problem for Crypto | Reality |
|---|---|---|
price |
max_price=$10,000 | BTC is $87k+, no upper limit |
data_quality |
max 5% zero-volume | yfinance has 50% gaps |
trading_status |
80% activity required | yfinance gaps cause 50% |
spread |
0.5% max spread | Crypto volatility is higher |
volume |
70% consistency | yfinance has ~30% consistency |
Key Insight: These filters were designed to catch BAD ASSETS, but they were flagging BAD DATA from yfinance. When using Alpaca API with quality data, these filters would pass.
v2.5.0 Solution: Separate Crypto Path
Instead of patching each filter, we created a separate code path for crypto that only checks essentials:
def apply_hard_filters(symbol, df, ..., is_crypto=False):
result = HardFilterResult(symbol=symbol, passed=True)
# v2.5.0: For crypto, only check essential filters
# Skip spread/data_quality/trading_status - they just flag yfinance gaps
if is_crypto:
# Essential: Has enough data?
if df is None or len(df) < min_bars:
result.add_result("min_bars", False, {...})
return result
result.add_result("min_bars", True, {"n_bars": len(df)})
# Essential: Price above minimum? (filter dead coins)
current_price = df['close'].iloc[-1] if 'close' in df.columns else 0
passed = current_price >= min_price
result.add_result("price", passed, {...})
# Essential: Has reasonable volume? (relaxed for yfinance gaps)
passed, details = check_volume_filter(df, min_daily_volume_usd, is_crypto=True)
result.add_result("volume", passed, details)
return result # Skip spread, data_quality, trading_status
# Equities: Apply full filter chain
# ... existing code ...
Filter Changes for Crypto
| Filter | Equities | Crypto | Reason |
|---|---|---|---|
min_bars |
Check | Check | Essential for training |
price |
min/max | min only | No upper limit (BTC $87k+) |
volume |
70% consistency | 30% consistency | yfinance gaps |
spread |
Check | SKIP | Flags yfinance volatility |
data_quality |
Check | SKIP | Flags yfinance gaps |
trading_status |
Check | SKIP | Flags yfinance gaps |
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Patch max_price check for crypto | Still fails data_quality | Whack-a-mole approach |
| Lower data_quality thresholds | Still fails trading_status | Same problem |
| Add is_crypto to each filter | Complex, hard to maintain | Separate code path is cleaner |
| Remove all filters for crypto | No quality control | Keep essential checks |
volume_filter Adjustments for Crypto
def check_volume_filter(df, min_daily_volume_usd, min_volume_consistency=0.7, is_crypto=False):
# v2.5.0: Cap consistency requirement for crypto (yfinance has many zero-volume bars)
if is_crypto:
min_volume_consistency = min(min_volume_consistency, 0.30) # Cap at 30%
# ... rest of calculation ...
Key Insights
- Separate code paths are cleaner - Don't patch each filter individually
- Filters should catch bad assets, not bad data - Use quality data source instead
- Keep essential checks - min_bars, min_price, volume still matter
- Skip irrelevant checks - spread/data_quality/trading_status flag data issues, not asset issues
- With Alpaca API, these filters would pass - The real fix is quality data
Files Modified
alpaca_trading/selection/filters/hard_filters.py:
- Line 362-378: New crypto-specific path in apply_hard_filters()
- Line 66-68: Volume consistency cap for crypto
Best Practice
Prefer Alpaca API over filter simplification.
The simplified filters are a fallback for when yfinance must be used. With Alpaca API:
- Volume data is complete
- All filters pass naturally
- No special crypto handling needed
See skill data-source-priority for ensuring Alpaca API is used.
References
alpaca_trading/selection/filters/hard_filters.py: Filter implementations- Skill:
data-source-priority- Ensure quality data source - Skill:
symbol-selection-asset-filters- Asset-type filter patterns