This Is Not a Tutorial. This Is a Map.
NOVOSKY already runs. It trades. It makes money. The ensemble of Random Forest, XGBoost, and LightGBM is live on BTC/USD M15 with a 71.7% OOS win rate and a 5.46 profit factor.
But that's a starting point, not a ceiling.
The quantitative finance research space is vast, and most of the tools in it haven't been applied to NOVOSKY yet. This document is a systematic survey of 17 methods — each one with the math, a working Python sketch, and a precise answer to the question: what does this actually buy NOVOSKY?
We're not collecting buzzwords. We're building a backlog.
Part I — Statistical Foundations
1. Pearson Correlation — Cleaning the Feature Space
Before anything else, you need to know which of your features are actually telling you different things.
Pearson's measures the linear dependence between two zero-mean variables:
. At , two features are essentially measuring the same thing — one of them is noise from the model's perspective. For tree-based ensembles, multicollinear features don't cause instability the way they do in regression, but they do dilute SHAP importance scores and inflate feature count with no signal gain.
import pandas as pd
import numpy as np
def prune_correlated_features(df: pd.DataFrame, threshold: float = 0.90) -> list[str]:
corr = df.corr(method="pearson").abs()
upper = corr.where(np.triu(np.ones(corr.shape), k=1).astype(bool))
to_drop = [col for col in upper.columns if any(upper[col] > threshold)]
return to_drop
# Usage against NOVOSKY's feature matrix
features = pd.read_parquet("ml/feature_cache/latest.parquet")
redundant = prune_correlated_features(features, threshold=0.90)
print(f"Candidates for removal: {redundant}")
# e.g. ['rsi_7', 'atr_7', 'ema_slope_fast'] — shorter lookback duplicates longerNOVOSKY benefit: NOVOSKY currently has 62 features. A correlation audit will identify duplicates (short-window vs long-window RSI, multiple ATR periods) and let the model spend its capacity on genuinely orthogonal signals instead of re-learning the same pattern under multiple names.
2. ARIMA / SARIMA — The Statistically Honest Baseline
Before training any ML model on a financial time series, you must first defeat ARIMA. If your gradient boosted tree can't beat an ARIMA forecast, you don't have a signal — you have a curve fit.
ARIMA$(p, d, q)$ models a stationary time series as:
Where is the backshift operator, is the autoregressive polynomial, and is the moving-average polynomial. The d-th differencing term achieves stationarity.
SARIMA extends this to capture periodic seasonality of period :
For BTC/USD M15, (one trading day = 96 fifteen-minute bars) is the natural seasonal period.
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
close = df["close"].values
# 1. Test stationarity
adf_stat, p_val, *_ = adfuller(np.diff(np.log(close)))
print(f"ADF stat={adf_stat:.3f} p={p_val:.4f}") # want p < 0.05
# 2. Fit SARIMA on log-returns
log_ret = np.diff(np.log(close))
model = SARIMAX(log_ret, order=(2, 0, 2), seasonal_order=(1, 0, 1, 96))
res = model.fit(disp=False)
print(res.summary())
# 3. Forecast 1 step ahead (next M15 candle direction)
fc = res.get_forecast(steps=1)
direction = "LONG" if fc.predicted_mean.iloc[0] > 0 else "SHORT"
conf_int = fc.conf_int(alpha=0.10) # 90% CINOVOSKY benefit: Use ARIMA as a meta-feature — feed arima_forecast_1step and arima_residual into the ensemble. The residual is the part of price movement ARIMA couldn't explain, which is exactly what ML is good at capturing.
Part II — Volatility and Risk Architecture
3. GARCH — Modeling Conditional Volatility
Financial returns exhibit volatility clustering: large moves tend to cluster together, and calm periods cluster together. This heteroskedasticity is not noise — it's structure. GARCH models it explicitly.
The GARCH(1,1) model:
Where , , , and (stationarity). is the conditional variance — the market's expected volatility given everything up to .
from arch import arch_model
log_ret = np.diff(np.log(df["close"].values)) * 100 # scale for numerical stability
garch = arch_model(log_ret, vol="Garch", p=1, q=1, dist="t") # Student-t for fat tails
res = garch.fit(disp="off")
# Forecast 1-step conditional volatility
fc = res.forecast(horizon=1)
sigma_next = np.sqrt(fc.variance.iloc[-1, 0])
# Use as a feature: high sigma → widen ATR multiplier, reduce lot size
print(f"σ_t+1 = {sigma_next:.4f}% → regime: {'HIGH VOL' if sigma_next > 1.5 else 'LOW VOL'}")NOVOSKY benefit: Add garch_sigma as a feature and use it to dynamically scale the ATR multiplier for stop-loss placement. When GARCH forecasts high conditional variance, widen the SL to avoid getting stopped out by noise. This is free edge — NOVOSKY already computes ATR; GARCH is a better volatility forecast.
4. Monte Carlo Simulation — Forward-Looking Risk
Monte Carlo projects a distribution of possible future equity curves by drawing thousands of random return paths. Each path samples from the empirical return distribution of your strategy:
Where is the empirical CDF of your strategy's per-trade returns.
import numpy as np
def monte_carlo_equity(trade_returns: np.ndarray, n_sims=10_000, n_trades=200,
starting_balance=500.0) -> np.ndarray:
"""Returns shape (n_sims, n_trades+1) — equity paths."""
idx = np.random.randint(0, len(trade_returns), size=(n_sims, n_trades))
draws = trade_returns[idx] # bootstrap resample
paths = np.cumprod(1 + draws, axis=1)
return starting_balance * np.hstack([np.ones((n_sims, 1)), paths])
trade_rets = np.array(backtest_results["trade_returns"])
paths = monte_carlo_equity(trade_rets, n_sims=50_000)
final_eq = paths[:, -1]
max_dd_sim = np.array([
(np.maximum.accumulate(p) - p).max() / np.maximum.accumulate(p).max()
for p in paths[:1000] # sample for speed
])
print(f"Median final equity: ${np.median(final_eq):.0f}")
print(f"P5 final equity: ${np.percentile(final_eq, 5):.0f}")
print(f"P95 Max Drawdown: {np.percentile(max_dd_sim, 95)*100:.1f}%")NOVOSKY benefit: Run Monte Carlo on live trade returns monthly to get a forward-looking drawdown distribution. If P95 max drawdown breaches 15%, tighten the circuit breaker before a real drawdown hits. This is pre-emptive risk management.
5. Value at Risk (VaR) and Expected Shortfall
VaR answers: what is the maximum loss I should expect at confidence level ?
Parametric (Gaussian) VaR:
Historical VaR — no distributional assumption; uses the empirical quantile:
Expected Shortfall (CVaR) — the expected loss given that VaR is breached. It is more informative than VaR for fat-tailed assets:
def compute_var_cvar(returns: np.ndarray, alpha: float = 0.95) -> dict:
sorted_r = np.sort(returns)
var_idx = int(np.floor((1 - alpha) * len(sorted_r)))
var = -sorted_r[var_idx]
cvar = -sorted_r[:var_idx].mean()
return {"VaR": var, "CVaR": cvar, "alpha": alpha}
daily_returns = compute_daily_strategy_returns(backtest_results)
risk = compute_var_cvar(daily_returns, alpha=0.95)
print(f"95% Daily VaR: {risk['VaR']*100:.2f}%")
print(f"95% Daily CVaR: {risk['CVaR']*100:.2f}%")NOVOSKY benefit: Use CVaR — not just drawdown — as the optimization objective in Optuna hyperparameter tuning. Minimizing CVaR directly pushes the model toward strategies that perform well in the worst 5% of scenarios, not just on average.
6. Copula Dependency Model — When Correlation Isn't Enough
Pearson correlation measures linear dependency. Copulas model tail dependency — the probability that two variables both take extreme values simultaneously. This matters enormously for Bitcoin, where assets can crash together in ways their normal correlation doesn't predict.
Sklar's theorem: any joint distribution can be decomposed into marginals and a copula :
Where captures the dependence structure separately from the marginal distributions. The Clayton copula has strong lower-tail dependency (both variables crash together):
from scipy.stats import kendalltau, norm
import numpy as np
# Model BTC/ETH joint tail dependency for correlation-based features
btc_ret = np.diff(np.log(df_btc["close"].values))
eth_ret = np.diff(np.log(df_eth["close"].values))
# Convert to uniform marginals via empirical CDF
u = (np.argsort(np.argsort(btc_ret)) + 1) / (len(btc_ret) + 1)
v = (np.argsort(np.argsort(eth_ret)) + 1) / (len(eth_ret) + 1)
# Lower-tail dependence: P(U < 0.05 | V < 0.05) — joint crash probability
joint_crash = np.mean((u < 0.05) & (v < 0.05))
marginal_crash = 0.05 * 0.05 # if independent
print(f"Joint crash prob: {joint_crash:.4f} vs independence: {marginal_crash:.4f}")
print(f"Tail dependency ratio: {joint_crash/marginal_crash:.1f}x")NOVOSKY benefit: If NOVOSKY is ever extended to multi-instrument trading (BTC + ETH, or BTC + XAUUSD), copula models ensure position sizing accounts for tail co-movement — the risk that both positions lose simultaneously is higher than naive correlation predicts.
Part III — State and Structural Models
7. Hidden Markov Model — Regime Detection
Financial markets don't have one static return distribution. They switch between regimes — trending up, trending down, mean-reverting sideways. A Hidden Markov Model (HMM) models this as:
- A latent state (e.g. Bull, Bear, Sideways) that follows a Markov chain with transition matrix
- Observations (e.g. daily returns) emitted from state-conditional distributions
The Viterbi algorithm recovers the most probable state sequence given observations:
from hmmlearn.hmm import GaussianHMM
log_ret = np.diff(np.log(df["close"].values)).reshape(-1, 1)
hmm = GaussianHMM(n_components=3, covariance_type="full", n_iter=200, random_state=42)
hmm.fit(log_ret)
# Predict current regime
states = hmm.predict(log_ret)
proba = hmm.predict_proba(log_ret) # soft assignment
curr_state = states[-1]
# Map states to labels by sorting on mean return
means = hmm.means_.flatten()
label_map = {np.argsort(means)[0]: "BEAR",
np.argsort(means)[1]: "SIDEWAYS",
np.argsort(means)[2]: "BULL"}
print(f"Current regime: {label_map[curr_state]}")
print(f"P(Bear)={proba[-1,np.argsort(means)[0]]:.2f} "
f"P(Sideways)={proba[-1,np.argsort(means)[1]]:.2f} "
f"P(Bull)={proba[-1,np.argsort(means)[2]]:.2f}")NOVOSKY benefit: The single biggest weakness NOVOSKY has is taking trades during sideways/choppy regimes where no directional edge exists. An HMM regime filter — hmm_regime_bull_prob, hmm_regime_bear_prob added as features — lets the model learn to go quiet during structureless markets. This is probably the single highest-leverage improvement available.
8. Kalman Filter — Noise-Reduced Price Tracking
The Kalman filter is the optimal linear estimator for a noisy linear dynamical system. Applied to price, it separates the true underlying trend from the observation noise:
State transition (predict):
Measurement update:
The state vector gives a noise-filtered price and its current rate-of-change simultaneously.
def kalman_filter_price(prices: np.ndarray,
Q: float = 1e-4, # process noise
R: float = 1e-2 # measurement noise
) -> tuple[np.ndarray, np.ndarray]:
"""Returns (filtered_price, filtered_velocity)."""
n = len(prices)
F = np.array([[1, 1], [0, 1]]) # state transition
H = np.array([[1, 0]]) # observation matrix
Q_m = np.eye(2) * Q
R_m = np.array([[R]])
x = np.array([[prices[0]], [0.0]]) # initial state
P = np.eye(2)
filtered_px = np.zeros(n)
filtered_vel = np.zeros(n)
for t in range(n):
x = F @ x
P = F @ P @ F.T + Q_m
K = P @ H.T @ np.linalg.inv(H @ P @ H.T + R_m)
x = x + K * (prices[t] - (H @ x)[0, 0])
P = (np.eye(2) - K @ H) @ P
filtered_px[t] = x[0, 0]
filtered_vel[t] = x[1, 0]
return filtered_px, filtered_vel
kpx, kvel = kalman_filter_price(df["close"].values)
df["kalman_price"] = kpx
df["kalman_velocity"] = kvel # sign = direction, magnitude = trend strengthNOVOSKY benefit: Replace or supplement EMA trend slope with kalman_velocity. Kalman adapts its smoothing dynamically (via Kalman gain $K_t$) — it's faster to respond when markets change and smoother during noise, unlike EMA with its fixed lag. It also gives velocity and uncertainty estimates simultaneously.
Part IV — Deep Sequence Learning
9. LSTM Price Forecast
Long Short-Term Memory networks solve the vanishing gradient problem that breaks vanilla RNNs on long sequences. The core is a cell state gated by three learned gates:
The forget gate decides what to erase from memory. When , the network has learned that the past is irrelevant — critical for regime transitions.
import torch
import torch.nn as nn
class TradingLSTM(nn.Module):
def __init__(self, input_size=62, hidden=128, layers=2, dropout=0.3):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden, layers,
batch_first=True, dropout=dropout)
self.head = nn.Sequential(
nn.Linear(hidden, 64), nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(64, 3) # LONG / SHORT / FLAT
)
def forward(self, x): # x: (B, T, F)
out, _ = self.lstm(x)
return self.head(out[:, -1, :]) # last timestep only
# Training sketch
model = TradingLSTM()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 1.0, 0.5])) # down-weight FLAT
# Walk-forward training on NOVOSKY feature windows
for epoch in range(50):
for X_batch, y_batch in train_loader: # X: (B, 20, 62) — 20-bar window
logits = model(X_batch)
loss = criterion(logits, y_batch)
optimizer.zero_grad(); loss.backward(); optimizer.step()NOVOSKY benefit: The ensemble currently uses static feature windows with no memory across bars. An LSTM layer added as a meta-learner on top of the RF/XGB/LGB probability outputs could capture sequential signal patterns — "three bars of increasing confidence followed by a regime shift" — that the tree models cannot represent.
10. Transformer Price Forecast
The Transformer replaces recurrence entirely with scaled dot-product attention, which allows every position in a sequence to attend to every other position in parallel:
Multi-head attention runs parallel attention heads and concatenates:
Where .
The scaling prevents softmax saturation in high-dimensional key spaces.
class TemporalTransformer(nn.Module):
def __init__(self, d_model=64, nhead=4, n_layers=3, seq_len=48, n_features=62):
super().__init__()
self.embed = nn.Linear(n_features, d_model)
self.pos_enc = nn.Parameter(torch.randn(seq_len, d_model))
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, nhead=nhead, dim_feedforward=256,
dropout=0.1, batch_first=True
)
self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
self.head = nn.Linear(d_model, 3)
def forward(self, x): # x: (B, T, F)
x = self.embed(x) + self.pos_enc[:x.size(1)]
x = self.encoder(x) # attend across all T bars
return self.head(x[:, -1, :]) # classify from last tokenNOVOSKY benefit: Attention heads can learn to focus on candles from hours ago that structurally resemble current conditions — something neither LSTM nor trees can do cleanly. A Transformer trained on M15 bars with a 48-bar (12-hour) window could learn intra-day opening patterns that the current feature set can only approximate.
Part V — Pattern Detection and Signal Decomposition
11. CNN Pattern Recognition
Convolutional filters are pattern matchers. A 1D convolution of kernel over signal :
Applied to OHLCV windows, a CNN learns to detect candlestick patterns — engulfing bars, doji sequences, breakouts — as filters that activate on the right shape.
class CandleCNN(nn.Module):
def __init__(self, n_features=5, seq_len=30): # raw OHLCV
super().__init__()
self.conv1 = nn.Conv1d(n_features, 32, kernel_size=5, padding=2)
self.conv2 = nn.Conv1d(32, 64, kernel_size=3, padding=1)
self.pool = nn.AdaptiveAvgPool1d(8)
self.head = nn.Sequential(
nn.Flatten(),
nn.Linear(64*8, 128), nn.ReLU(),
nn.Linear(128, 3)
)
def forward(self, x): # x: (B, C, T) — channels-first for Conv1d
x = torch.relu(self.conv1(x))
x = torch.relu(self.conv2(x))
x = self.pool(x)
return self.head(x)
# Visualize what filter 0 is detecting after training:
# filter_activations = model.conv1.weight[0].detach().numpy()NOVOSKY benefit: CNN outputs (cnn_long_prob, cnn_short_prob) added as features to the existing ensemble give it access to shape-based pattern information that the current engineered features (ratios, indicators) can't capture. Think of it as a learned candlestick recognition module bolted onto the existing system.
12. Wavelet Decomposition — Multi-Scale Signal Separation
A Discrete Wavelet Transform (DWT) decomposes a time series into approximation (low-frequency trend) and detail (high-frequency noise) components at multiple scales simultaneously:
Where is a low-pass filter and is its quadrature mirror (high-pass). At level , each bar represents 2 bars of the original; at , each bar is 8 M15 bars = 2 hours.
import pywt
def wavelet_features(close: np.ndarray, wavelet="db4", level=3) -> dict:
cA, *cDs = pywt.wavedec(np.log(close), wavelet, level=level)
# Reconstruct trend from approximation only
rec_components = [cA] + [np.zeros_like(d) for d in cDs]
trend = pywt.waverec(rec_components, wavelet)[:len(close)]
# High-frequency noise energy at level 1
noise_energy = np.std(cDs[0][-20:]) # recent 20-bar HF energy
return {
"wavelet_trend_slope": np.polyfit(range(10), trend[-10:], 1)[0],
"wavelet_noise_energy": noise_energy,
"wavelet_detail_l2_std": np.std(cDs[1][-10:]),
}NOVOSKY benefit: wavelet_noise_energy is a direct signal quality metric. When high-frequency detail coefficients are large, the market is noisy and trades should require higher confidence. When the L3 approximation has a clear slope, NOVOSKY can lean into the trend. This is multi-timeframe analysis without requiring a second data feed.
Part VI — Generative and Probabilistic Models
13. GAN Price Simulation — Stress Testing
A Generative Adversarial Network pits two networks against each other: a Generator that creates synthetic data, and a Discriminator that tries to tell real from fake. The minimax objective:
At Nash equilibrium, produces samples indistinguishable from real data. For trading, this means generating synthetic BTC return sequences with the same statistical properties as the real market — fat tails, volatility clustering, autocorrelation.
class PriceGenerator(nn.Module):
def __init__(self, latent=32, seq_len=96, n_features=5):
super().__init__()
self.net = nn.Sequential(
nn.Linear(latent, 128), nn.LeakyReLU(0.2),
nn.Linear(128, 256), nn.LeakyReLU(0.2),
nn.Linear(256, seq_len * n_features),
nn.Tanh()
)
self.seq_len = seq_len
self.n_features = n_features
def forward(self, z):
return self.net(z).view(-1, self.seq_len, self.n_features)
# Use trained GAN to augment training data with rare-event scenarios
gen = PriceGenerator().eval()
noise = torch.randn(1000, 32)
with torch.no_grad():
synthetic_sequences = gen(noise).numpy() # 1000 synthetic market scenarios
# Run NOVOSKY backtest on synthetic sequences to measure tail performanceNOVOSKY benefit: NOVOSKY's training data covers ~1 year of BTC/USD. GANs can generate thousands of realistic synthetic scenarios including market conditions that never occurred in the training window — flash crashes, sustained trend days, low-liquidity weekends. Training on augmented data makes the model more robust to unseen regimes.
14. Autoencoder Anomaly Detection
An autoencoder learns to compress data into a latent vector and reconstruct it: . It's trained to minimize reconstruction error:
Because the autoencoder is trained on normal market conditions, it reconstructs them well. When an anomaly occurs (flash crash, extreme volume spike, data feed error), the reconstruction error spikes:
class MarketAutoencoder(nn.Module):
def __init__(self, n_features=62, latent=16):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(n_features, 48), nn.ReLU(),
nn.Linear(48, latent)
)
self.decoder = nn.Sequential(
nn.Linear(latent, 48), nn.ReLU(),
nn.Linear(48, n_features)
)
def forward(self, x):
return self.decoder(self.encoder(x))
def anomaly_score(self, x: torch.Tensor) -> torch.Tensor:
return ((x - self.forward(x)) ** 2).mean(dim=-1)
# Threshold: 99th percentile of training reconstruction error
ae = MarketAutoencoder().eval()
scores_train = ae.anomaly_score(X_train_tensor).detach().numpy()
tau = np.percentile(scores_train, 99)
# At inference: block trade if anomaly detected
score = ae.anomaly_score(current_features_tensor).item()
if score > tau:
signal = "BLOCKED — anomalous market conditions"NOVOSKY benefit: NOVOSKY currently relies on ATR and circuit breakers for risk management. An autoencoder guard — trained on the feature distribution of profitable trade setups — provides a learned abnormality filter that blocks trades when the market looks nothing like historical conditions the model was trained on.
Part VII — Forecasting and External Data
15. Prophet Forecast — Decomposable Time Series
Prophet (Facebook/Meta) decomposes time series into trend, seasonality, and holiday effects:
Where is a piecewise linear or logistic trend with automatic changepoint detection, is Fourier-series seasonality at multiple frequencies, and models scheduled events (news releases, options expiry).
from prophet import Prophet
import pandas as pd
# Aggregate M15 data to hourly for Prophet
hourly = df.resample("1h", on="datetime").agg({"close": "last"}).dropna()
ph_df = hourly.reset_index().rename(columns={"datetime": "ds", "close": "y"})
m = Prophet(
changepoint_prior_scale=0.05, # flexibility of trend
seasonality_mode="multiplicative", # BTC vol scales with price
weekly_seasonality=True,
daily_seasonality=True,
)
m.add_seasonality(name="session", period=1/4, fourier_order=8) # 6h session cycle
m.fit(ph_df)
future = m.make_future_dataframe(periods=4, freq="h") # 4h ahead
fc = m.predict(future)
trend_slope = fc["trend"].diff().iloc[-1] # +ve = uptrending, -ve = downtrendingNOVOSKY benefit: Prophet's trend, weekly, and daily decomposition components can be injected as features — especially trend_slope (macro direction) and weekly_seasonality (Monday vs Friday behavior). These are longer-horizon signals that complement the short M15-based indicators already in the model.
16. Sentiment Analysis (NLP) — External Signal Intelligence
Price doesn't move in a vacuum. Macro events, Elon tweets, ETF news, exchange hacks — all move BTC. Sentiment models convert unstructured text into numerical signals.
VADER (rule-based, fast, no GPU needed):
FinBERT (BERT fine-tuned on financial text — more accurate):
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline
vader = SentimentIntensityAnalyzer()
finbert = pipeline("sentiment-analysis",
model="ProsusAI/finbert",
truncation=True, max_length=512)
def fetch_btc_sentiment(lookback_minutes=30) -> dict:
# Pull from CryptoPanic API or Reddit /r/bitcoin RSS
headlines = fetch_recent_news(lookback_minutes)
if not headlines:
return {"vader_compound": 0.0, "finbert_positive": 0.5, "news_count": 0}
vader_scores = [vader.analyze(h)["compound"] for h in headlines]
finbert_out = finbert(headlines[:10]) # top 10, GPU budget
fb_positive = np.mean([1 if r["label"] == "positive" else
(0.5 if r["label"] == "neutral" else 0)
for r in finbert_out])
return {
"vader_compound": float(np.mean(vader_scores)),
"finbert_positive": fb_positive,
"news_count": len(headlines),
}NOVOSKY benefit: Currently NOVOSKY uses a ForexFactory news blocker (rule-based: block X minutes before/after high-impact events). Replacing it with a live sentiment score as an actual feature lets the model learn nuance: a negative headline during a bullish regime behaves differently from the same headline during a ranging market. News count alone (news_count feature) is a strong volatility predictor.
17. Neural Network Classification Zoo
Beyond LSTM and Transformers, several architectures are worth benchmarking as drop-in classifiers:
# NGBoost: probabilistic gradient boosting — gives full return distribution
from ngboost import NGBClassifier
from ngboost.distns import k_categorical
ngb = NGBClassifier(Dist=k_categorical(3), n_estimators=500,
verbose_eval=50, random_state=42)
ngb.fit(X_train, y_train)
# Returns probability distribution over [LONG, FLAT, SHORT]
proba = ngb.predict_proba(X_test) # shape (N, 3)
print(f"P(LONG)={proba[0,0]:.3f} P(FLAT)={proba[0,1]:.3f} P(SHORT)={proba[0,2]:.3f}")Part VIII — The NOVOSKY Integration Map
Every method above maps to one of three improvement areas. Here is the priority ranking:
| Method | Use In NOVOSKY | Priority |
|---|---|---|
| HMM Regime | Regime filter feature — block trades in Sideways state | P0 — do now |
| Kalman Filter | Replace EMA slope with Kalman velocity | P0 — do now |
| GARCH | Volatility regime feature + dynamic ATR multiplier | P0 — do now |
| Pearson Pruning | Remove pairs with from the 62-feature set | P1 — easy win |
| Wavelet | Multi-scale noise energy + trend slope features | P1 — easy win |
| Autoencoder | Anomaly guard: block trades on out-of-distribution features | P1 — easy win |
| Monte Carlo | Monthly drawdown distribution audit | P2 — tooling |
| VaR / CVaR | Optimization objective for Optuna runs | P2 — tooling |
| ARIMA residual | Meta-feature: what ARIMA couldn't predict | P2 — tooling |
| NLP Sentiment | Replace hard news blocker with soft sentiment feature | P3 — research |
| CNN Patterns | Candlestick shape feature extractor | P3 — research |
| LSTM meta | Sequence meta-learner over ensemble probabilities | P4 — future |
| Transformer | Full sequence model on 48-bar M15 windows | P4 — future |
| Prophet | Long-horizon trend decomposition feature | P4 — future |
| GAN augmentation | Synthetic training data for rare regimes | P5 — research |
| Copula | Multi-instrument tail-risk sizing | P5 — multi-instrument |
| BNN / NGBoost | Uncertainty-aware classification | P3 — drop-in test |
The Research Principle
Every method here shares one constraint: it must beat the current ensemble on walk-forward out-of-sample data, not just in-sample cross-validation. A GARCH feature that adds 0.3% to OOS win rate after 3 months of live data is worth more than a Transformer that looks spectacular in backtest and falls apart in production.
The stack isn't a to-do list. It's a hypothesis bank.
Each row in the table above is a falsifiable claim: "adding this feature / model / filter will improve the strategy's edge on unseen BTC/USD M15 data." The job of the research loop is to test each claim as cheaply as possible — and throw away the ones that don't survive contact with reality.
That's the only kind of research that matters.