Claude Code Plugins

Community-maintained marketplace

Feedback

account-aware-training

@majiayu000/claude-skill-registry
5
0

Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name account-aware-training
description Add account state (P&L, win rate, drawdown) to RL observations + drawdown penalty in rewards. Trigger when: (1) model needs account awareness, (2) training should penalize drawdowns, (3) upgrading obs_dim 5300→5600.
author Claude Code
date Thu Dec 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

Account-Aware RL Training (v2.4)

Experiment Overview

Item Details
Date 2024-12-26
Goal Make RL model learn from account state (P&L, win rate, drawdown)
Environment vectorized_env.py, inference_obs_builder.py, training notebook
Status Success

Context

Prior to v2.4, the RL model was "blind" to account performance. It received:

  • 53 features: price action, technicals, regime probabilities, calendar effects
  • No information about cumulative P&L, win rate, or drawdown

Problem: The model could generate signals that were individually good but led to excessive drawdowns at the account level. It had no incentive to trade conservatively after losses.

Solution: Add 3 account-level features + drawdown penalty in rewards.

Verified Workflow

1. Config Parameters (GPUEnvConfig)

# In vectorized_env.py GPUEnvConfig dataclass (~line 405)
# Account-aware training (v2.4)
drawdown_penalty_threshold: float = 0.15  # Penalize when drawdown > 15%
drawdown_penalty_weight: float = 0.10     # Weight in reward function

2. Equity Tracking Tensors

# In _init_state_tensors() after line 712
# Account-level equity tracking (v2.4)
self.initial_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)
self.peak_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)
self.current_equity = torch.ones(self.n_envs, dtype=self.dtype, device=self.device)

3. Reset Equity Tensors

# In reset() after line 850
# Reset account-level equity tracking
self.initial_equity[env_ids] = 1.0
self.peak_equity[env_ids] = 1.0
self.current_equity[env_ids] = 1.0

4. Update Equity in step()

# In step() after line 926
# Update account-level equity tracking (v2.4)
self.current_equity = self.initial_equity + self.total_pnl / (current_prices + 1e-8)
self.peak_equity = torch.maximum(self.peak_equity, self.current_equity)

5. Feature Count Update

# In _calculate_obs_features() line 682
# Add account features
account = 3  # total_pnl_pct, rolling_win_rate, current_drawdown_pct
return base + technical + intraday + temporal + markov + extended + multi_window + account
# Result: 53 + 3 = 56 features

6. Account Features in Observations

# In _get_observations() after line 1258, before sanitization

# === ACCOUNT-LEVEL FEATURES (3) - v2.4 ===

# Feature 1: Total P&L % (normalized to [-1, 1])
total_pnl_pct = self.total_pnl / (self.initial_equity + 1e-8)
total_pnl_pct_norm = torch.tanh(total_pnl_pct * 10)
obs[:, :, feat_idx] = total_pnl_pct_norm[env_ids].unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1

# Feature 2: Rolling win rate (0.5 if no trades)
win_rate = torch.where(
    self.n_trades[env_ids] > 0,
    self.n_wins[env_ids].float() / self.n_trades[env_ids].float(),
    torch.full((n_envs,), 0.5, dtype=self.dtype, device=self.device)
)
obs[:, :, feat_idx] = win_rate.unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1

# Feature 3: Current drawdown % [0, 1]
drawdown = (self.peak_equity[env_ids] - self.current_equity[env_ids]) / (self.peak_equity[env_ids] + 1e-8)
drawdown = torch.clamp(drawdown, 0.0, 1.0)
obs[:, :, feat_idx] = drawdown.unsqueeze(1).expand(-1, self.config.window)
feat_idx += 1

7. Drawdown Penalty in Rewards

# In _calculate_rewards() after line 1618

# COMPONENT 7: Drawdown penalty (v2.4)
current_drawdown = (self.peak_equity - self.current_equity) / (self.peak_equity + 1e-8)
current_drawdown = torch.clamp(current_drawdown, 0.0, 1.0)

# Quadratic penalty when over threshold
drawdown_over_threshold = torch.clamp(current_drawdown - self.config.drawdown_penalty_threshold, min=0.0)
drawdown_penalty = -drawdown_over_threshold ** 2 * 10

# Add to reward combination:
reward = (
    self.config.direction_weight * direction_reward +
    self.config.magnitude_weight * magnitude_reward +
    self.config.pnl_weight * pnl_reward +
    self.config.stop_tp_weight * stop_tp_reward +
    self.config.exploration_weight * exploration_bonus +
    self.config.slippage_weight * slippage_penalty +
    self.config.drawdown_penalty_weight * drawdown_penalty  # NEW
) * risk_adjustment

8. Inference Observation Builder

# In inference_obs_builder.py get_target_features_from_obs_dim()
if features == 56:
    return 56  # v2.4 with account awareness
elif features == 53:
    return 53  # v2.3
# ... legacy support

# In build_inference_observation() after line 624
# === ACCOUNT-LEVEL FEATURES (3) - v2.4 ===
# Use neutral defaults during inference
if target_features >= 56:
    obs[:, feat_idx] = 0.0   # total_pnl_pct (no prior trades)
    feat_idx += 1
    obs[:, feat_idx] = 0.5   # win_rate (neutral prior)
    feat_idx += 1
    obs[:, feat_idx] = 0.0   # drawdown (no drawdown)
    feat_idx += 1

Failed Attempts (Critical)

Attempt Why it Failed Lesson Learned
Account features with raw P&L values P&L scale varies by price level Use P&L percentage normalized with tanh
Win rate = 0 when no trades Invalid input during initial episodes Default to 0.5 (neutral prior)
Peak equity never decreasing Logical error in update Use torch.maximum() to track high-water mark
Drawdown penalty linear Too harsh at moderate levels Quadratic scaling is gentler below threshold
Live inference with account state Would need real account connection Use neutral defaults (0, 0.5, 0) for inference

Final Parameters

# GPUEnvConfig (v2.4)
n_features: 56  # Was 53 in v2.3
drawdown_penalty_threshold: 0.15  # 15% drawdown starts penalty
drawdown_penalty_weight: 0.10     # Moderate weight in reward

# Feature breakdown (56 total)
base_features: 7              # price action basics
technical_features: 4         # intraday technicals
temporal_features: 7          # calendar features
markov_features: 12           # 4-chain regime probabilities
extended_features: 14         # extended technicals
multi_window_features: 9      # 20/50/100 bar windows
account_features: 3           # P&L %, win rate, drawdown %

# obs_dim = n_features * window = 56 * 100 = 5600

Key Insights

  • Breaking Change: obs_dim 5300 → 5600 means v2.3 models CANNOT be used with v2.4 environments
  • Neutral Inference: Live trading uses neutral defaults (0, 0.5, 0) since account state isn't tracked per-prediction
  • Quadratic Penalty: The ** 2 makes penalty gentle at 16% drawdown but harsh at 25%+
  • Normalized P&L: tanh(pnl * 10) keeps values in [-1, 1] even for large P&L swings
  • 0.5 Win Rate Prior: Prevents model confusion during initial trades with no history

Model Behavior Expected

With account awareness, the model should learn:

  1. Reduce position sizing after losses (sees drawdown feature)
  2. Be more selective after poor win rate (sees win rate feature)
  3. Avoid compounding losses (drawdown penalty kicks in at 15%)
  4. Trade more aggressively when profitable (sees positive P&L)

References

  • alpaca_trading/gpu/vectorized_env.py: Lines 405 (config), 712 (tensors), 850 (reset), 926 (step), 1258 (obs)
  • alpaca_trading/gpu/inference_obs_builder.py: Lines 61-108 (feature detection), 624+ (account features)
  • notebooks/VSCode_Colab_Training_NATIVE.ipynb: Training notebook with v2.4 settings