Definition of Features¶
I asked ChatGPT "provide full name and definitions and relevance of each of these features ret, vol20, vol_z, pos_range20, ret_lag1"
- Daily Logarithmic Return = measures the percentage price change from one day to the next, expressed in log form
- captures day-to-day price movement
- good for detecting heavy-tailed behavior (fat tailed distributions in high vol regimes)
- Useful input to clustering because return distribution shifts during shocks, events, and turbulence.
- Low-vol regime → return values cluster narrowly near zero
- High-vol regime → return distribution spreads out (large +/- swings)
- 20-Day Rolling Annualized Volatility = This is the classic measure of short-term volatility used in quant finance
- Volatility is the single most important driver of regime segmentation in financial markets
- High-vol and low-vol regimes are structurally different states
- spreads widen
- volume increases
- returns behave differently
- market microstructure shifts
- This feature typically provides the clearest separation between regimes
- Volume Z-Score (20-Day) = Measures how unusual today’s volume is compared to the last 20 trading days
- Trading volume spikes during:
- breakouts
- earnings
- news shocks
- panic selling
- Volume is strongly correlated with volatility, but carries independent information about market participation and order flow imbalance
- High-vol regime consistently shows elevated vol_z
- Low-vol regime → volume stays near its mean (centered around 0)
- Helps the model distinguish “quiet low-volatility days” vs. “active trending or panic days.”
- Trading volume spikes during:
- Price Position in 20-Day Range = This places the price on a 0 to 1 scale within its recent high–low range
- 0 → price at the 20-day low
- 1 → price at the 20-day high
- 0.5 → middle of the recent range
- Detects breakout vs. mean-reverting structure:
- trending regime → prices often near upper range
- fear/panic regime → price near lower range
- sideways regime → price oscillates in the middle
- Not as strong a separator as volatility or volume, but adds valuable directional context
- High-vol spikes usually pull the price out of the range → shifts model likelihood
- Low-vol periods often drift within a narrower band
- Lagged Daily Log Return (1 Day) = Just the previous day’s return
- This adds time structure into the feature space:
- Captures short-term momentum
- Captures short-term mean reversion
- Helps model transitions:
- Large negative return → more likely entering high-vol regime
- Consecutive small returns → more likely in calm regime
- While returns alone are not great for clustering, lagged information improves regime persistence (like a pseudo-HMM effect)
- If yesterday was shocking (big negative return), today’s probability of the high-vol regime typically increases
- If yesterday was calm, model likelihood shifts toward low-vol
- This adds time structure into the feature space:
- 14-day Relative Strength Index = Wilder's RSI
- RSI near 70 > strong upward momentum
- RSI near 30 > strong downward momentum
- Measures speed and magnitude of recent price changes
- Signals Overbought/Oversold conditions
- Separate Bullish/Bearish phases
- RSI correlates with Mean-Reversion dynamics
- MACD Histogram = measures momentum acceleration
- Positive histogram > upward momentum strengthening
- Negative histogram > downward momentum strengthening
- Zero crossing > momentum inflection point
- a smoothed trend signal (technical analysis)
- define momentum-dominated vs correction phases
- informative for regime bourndaries around trend changes
- Market Beta = sensitivity to the broader market (S&P 500)
- Beta > 1 NVDA amplifies market moves
- Beta < 1 NVDA moves less than the market
- Systematic risk indicator
- High-beta periods often align with volatile or speculative market phases