| name | data-source-priority |
| description | Ensure Alpaca API is used for quality data, not yfinance fallback. Trigger when: (1) crypto volume filter fails unexpectedly, (2) zero-volume bars in data, (3) API key configuration issues. |
| author | Claude Code |
| date | Sat Dec 28 2024 00:00:00 GMT+0000 (Coordinated Universal Time) |
Data Source Priority - Alpaca vs yfinance (v2.5.0)
Experiment Overview
| Item | Details |
|---|---|
| Date | 2024-12-28 |
| Goal | Fix crypto selection failing due to poor data quality |
| Environment | alpaca_trading/data/fetcher.py, training notebook |
| Status | Success - v2.5.0 now fails early |
Context
User reported ALL 18 crypto symbols failing selection filters:
BTCUSD: failed price (max_price $10k but BTC is $87k)
ETHUSD: failed data_quality (51% zero-volume bars)
Others: failed volume, trading_status
Root Cause: Alpaca API keys weren't configured, so DataFetcher fell back to yfinance. yfinance crypto data has:
- ~50% zero-volume bars (data gaps)
- Missing volume data on many bars
- Causes ALL filters to fail
v2.5.0 Solution: Fail Fast
OLD BEHAVIOR (v2.4.x): Warned about yfinance but continued anyway, wasting time debugging filter failures.
NEW BEHAVIOR (v2.5.0): Training notebook FAILS IMMEDIATELY if crypto is enabled without Alpaca API:
# In training notebook cell-16
alpaca_enabled = data_fetcher._use_alpaca_data
if not alpaca_enabled and selection_config.crypto.enabled:
raise ValueError(
'CRYPTO TRAINING REQUIRES ALPACA API. '
'yfinance crypto data has ~50% missing volume. '
'Set API keys or disable crypto training.'
)
Clear Error Message
When crypto is enabled without Alpaca API:
======================================================================
ERROR: CRYPTO TRAINING REQUIRES ALPACA API
======================================================================
yfinance crypto data is unusable for training:
- ~50% zero-volume bars
- Causes all symbols to fail volume/data_quality filters
- Training on bad data produces bad models
FIX OPTIONS:
1. Set Alpaca API keys in Colab Secrets:
- APCA_API_KEY_ID = your_key
- APCA_API_SECRET_KEY = your_secret
2. Set API_KEYS_FILE in previous cell:
API_KEYS_FILE = "/content/Alpaca_trading/API_key_500Paper.txt"
3. Disable crypto and train only equities:
selection_config.crypto.enabled = False
selection_config.equities.enabled = True
Failed Attempts (Critical)
| Attempt | Why it Failed | Lesson Learned |
|---|---|---|
| Lower volume consistency to 30% | Masks real problem, still fails | Fix data source, not thresholds |
| Skip data quality filter for crypto | Still fails price/trading_status | Garbage in = garbage out |
| Simplify all crypto filters | Works but produces bad models | Use quality data, not workarounds |
| Just warn about yfinance | User ignores warning, wastes time | FAIL FAST is better |
Data Quality Comparison
| Data Source | Volume Data | Zero-Volume Bars | Crypto Support |
|---|---|---|---|
| Alpaca API | Complete | <1% | Excellent |
| yfinance | 50% missing | ~50% | Unusable |
API Key Configuration
For Google Colab (RECOMMENDED)
- Go to Colab Secrets (key icon in left sidebar)
- Add
APCA_API_KEY_ID= your API key - Add
APCA_API_SECRET_KEY= your secret key - Enable access to notebook
For Training Notebook
# Cell 14: Option 1 - Environment variables (recommended)
# Keys are read from Colab Secrets automatically
# Cell 15: Option 2 - Keys file (after unzipping repo)
API_KEYS_FILE = '/content/Alpaca_trading/API_key_500Paper.txt'
For Local Development
# Add to ~/.bashrc or ~/.zshrc
export APCA_API_KEY_ID="your_key"
export APCA_API_SECRET_KEY="your_secret"
Diagnostic Check
from alpaca_trading.data.fetcher import DataFetcher
fetcher = DataFetcher(keys_file=API_KEYS_FILE)
# This is checked BEFORE selection in v2.5.0
if not fetcher._use_alpaca_data:
print('Alpaca API: NOT AVAILABLE')
# Notebook will fail here if crypto enabled
else:
print('Alpaca API: ENABLED')
Key Insights
- Don't work around bad data - Fix the data source
- Fail fast, fail loud - Silent fallbacks waste debugging time
- yfinance is equities-only - Acceptable for stocks, unusable for crypto
- Environment variables are best - Work everywhere, no path issues
- Check logs for "yfinance fetched" - This means you're using bad data
Files Modified (v2.5.0)
notebooks/training.ipynb:
- Cell 16: Added fail-fast check before symbol selection
- Cell 0: Updated header to v2.5.0 with changelog
alpaca_trading/selection/filters/hard_filters.py:
- apply_hard_filters(): Simplified crypto path (skip yfinance-specific checks)
References
alpaca_trading/data/fetcher.py: DataFetcher implementationnotebooks/training.ipynb: Training notebook with fail-fast check- Alpaca API docs: https://docs.alpaca.markets/docs/
- Skill:
reward-function-hold-bias- Related v2.5.0 fix