The Quantitative Research Arsenal: 17 Advanced Methods NOVOSKY Should Explore ⋇ NOVOSKY

article.md

This Is Not a Tutorial. This Is a Map.

NOVOSKY already runs. It trades. It makes money. The ensemble of Random Forest, XGBoost, and LightGBM is live on BTC/USD M15 with a 71.7% OOS win rate and a 5.46 profit factor.

But that's a starting point, not a ceiling.

The quantitative finance research space is vast, and most of the tools in it haven't been applied to NOVOSKY yet. This document is a systematic survey of 17 methods — each one with the math, a working Python sketch, and a precise answer to the question: what does this actually buy NOVOSKY?

We're not collecting buzzwords. We're building a backlog.

Part I — Statistical Foundations

1. Pearson Correlation — Cleaning the Feature Space

Before anything else, you need to know which of your features are actually telling you different things.

Pearson's $r$ measures the linear dependence between two zero-mean variables:

$r_{x y} = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2} \cdot \sum _{i = 1}^{n} ( y _{i} - y ˉ ) ^{2}}$

$r \in [- 1, 1]$ . At $∣ r ∣ > 0.85$ , two features are essentially measuring the same thing — one of them is noise from the model's perspective. For tree-based ensembles, multicollinear features don't cause instability the way they do in regression, but they do dilute SHAP importance scores and inflate feature count with no signal gain.

Illustrative Feature Correlation Heatmap — High correlation clusters

python

import pandas as pd
import numpy as np

def prune_correlated_features(df: pd.DataFrame, threshold: float = 0.90) -> list[str]:
    corr = df.corr(method="pearson").abs()
    upper = corr.where(np.triu(np.ones(corr.shape), k=1).astype(bool))
    to_drop = [col for col in upper.columns if any(upper[col] > threshold)]
    return to_drop

# Usage against NOVOSKY's feature matrix
features = pd.read_parquet("ml/feature_cache/latest.parquet")
redundant = prune_correlated_features(features, threshold=0.90)
print(f"Candidates for removal: {redundant}")
# e.g. ['rsi_7', 'atr_7', 'ema_slope_fast'] — shorter lookback duplicates longer

NOVOSKY benefit: NOVOSKY currently has 62 features. A correlation audit will identify duplicates (short-window vs long-window RSI, multiple ATR periods) and let the model spend its capacity on genuinely orthogonal signals instead of re-learning the same pattern under multiple names.

2. ARIMA / SARIMA — The Statistically Honest Baseline

Before training any ML model on a financial time series, you must first defeat ARIMA. If your gradient boosted tree can't beat an ARIMA forecast, you don't have a signal — you have a curve fit.

ARIMA$(p, d, q)$ models a stationary time series as:

$ϕ (B) (1 - B)^{d} y_{t} = θ (B) ϵ_{t}$

Where $B$ is the backshift operator, $ϕ (B) = 1 - ϕ_{1} B - \dots - ϕ_{p} B^{p}$ is the autoregressive polynomial, and $θ (B) = 1 + θ_{1} B + \dots + θ_{q} B^{q}$ is the moving-average polynomial. The d-th differencing term $(1 - B)^{d}$ achieves stationarity.

SARIMA extends this to capture periodic seasonality of period $s$ :

$SARIMA (p, d, q) (P, D, Q)_{s} : Φ_{P} (B^{s}) ϕ_{p} (B) (1 - B^{s})^{D} (1 - B)^{d} y_{t} = Θ_{Q} (B^{s}) θ_{q} (B) ϵ_{t}$

For BTC/USD M15, $s = 96$ (one trading day = 96 fifteen-minute bars) is the natural seasonal period.

python

from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller

close = df["close"].values

# 1. Test stationarity
adf_stat, p_val, *_ = adfuller(np.diff(np.log(close)))
print(f"ADF stat={adf_stat:.3f}  p={p_val:.4f}")  # want p < 0.05

# 2. Fit SARIMA on log-returns
log_ret = np.diff(np.log(close))
model = SARIMAX(log_ret, order=(2, 0, 2), seasonal_order=(1, 0, 1, 96))
res = model.fit(disp=False)
print(res.summary())

# 3. Forecast 1 step ahead (next M15 candle direction)
fc = res.get_forecast(steps=1)
direction = "LONG" if fc.predicted_mean.iloc[0] > 0 else "SHORT"
conf_int = fc.conf_int(alpha=0.10)  # 90% CI

NOVOSKY benefit: Use ARIMA as a meta-feature — feed arima_forecast_1step and arima_residual into the ensemble. The residual is the part of price movement ARIMA couldn't explain, which is exactly what ML is good at capturing.

Part II — Volatility and Risk Architecture

3. GARCH — Modeling Conditional Volatility

Financial returns exhibit volatility clustering: large moves tend to cluster together, and calm periods cluster together. This heteroskedasticity is not noise — it's structure. GARCH models it explicitly.

The GARCH(1,1) model:

$r_{t} = μ + ϵ_{t}, ϵ_{t} = σ_{t} z_{t}, z_{t} \sim ii d N (0, 1)$

$σ_{t}^{2} = ω + α_{1} ϵ_{t - 1}^{2} + β_{1} σ_{t}^{2}$

Where $ω > 0$ , $α_{1} \geq 0$ , $β_{1} \geq 0$ , and $α_{1} + β_{1} < 1$ (stationarity). $σ_{t}^{2}$ is the conditional variance — the market's expected volatility given everything up to $t - 1$ .

GARCH(1,1) — Volatility Clustering on BTC/USD M15 Returns

python

from arch import arch_model

log_ret = np.diff(np.log(df["close"].values)) * 100  # scale for numerical stability

garch = arch_model(log_ret, vol="Garch", p=1, q=1, dist="t")  # Student-t for fat tails
res   = garch.fit(disp="off")

# Forecast 1-step conditional volatility
fc    = res.forecast(horizon=1)
sigma_next = np.sqrt(fc.variance.iloc[-1, 0])

# Use as a feature: high sigma → widen ATR multiplier, reduce lot size
print(f"σ_t+1 = {sigma_next:.4f}%  →  regime: {'HIGH VOL' if sigma_next > 1.5 else 'LOW VOL'}")

NOVOSKY benefit: Add garch_sigma as a feature and use it to dynamically scale the ATR multiplier for stop-loss placement. When GARCH forecasts high conditional variance, widen the SL to avoid getting stopped out by noise. This is free edge — NOVOSKY already computes ATR; GARCH is a better volatility forecast.

4. Monte Carlo Simulation — Forward-Looking Risk

Monte Carlo projects a distribution of possible future equity curves by drawing thousands of random return paths. Each path samples from the empirical return distribution of your strategy:

$E_{t + k}^{(i)} = E_{t} \cdot j = 1 \prod k (1 + r_{j}^{(i)}), r_{j}^{(i)} \sim ii d \hat{F}_{r}$

Where $\hat{F}_{r}$ is the empirical CDF of your strategy's per-trade returns.

python

import numpy as np

def monte_carlo_equity(trade_returns: np.ndarray, n_sims=10_000, n_trades=200,
                       starting_balance=500.0) -> np.ndarray:
    """Returns shape (n_sims, n_trades+1) — equity paths."""
    idx   = np.random.randint(0, len(trade_returns), size=(n_sims, n_trades))
    draws = trade_returns[idx]                       # bootstrap resample
    paths = np.cumprod(1 + draws, axis=1)
    return starting_balance * np.hstack([np.ones((n_sims, 1)), paths])

trade_rets = np.array(backtest_results["trade_returns"])
paths = monte_carlo_equity(trade_rets, n_sims=50_000)

final_eq   = paths[:, -1]
max_dd_sim = np.array([
    (np.maximum.accumulate(p) - p).max() / np.maximum.accumulate(p).max()
    for p in paths[:1000]  # sample for speed
])

print(f"Median final equity: ${np.median(final_eq):.0f}")
print(f"P5  final equity:    ${np.percentile(final_eq, 5):.0f}")
print(f"P95 Max Drawdown:    {np.percentile(max_dd_sim, 95)*100:.1f}%")

NOVOSKY benefit: Run Monte Carlo on live trade returns monthly to get a forward-looking drawdown distribution. If P95 max drawdown breaches 15%, tighten the circuit breaker before a real drawdown hits. This is pre-emptive risk management.

5. Value at Risk (VaR) and Expected Shortfall

VaR answers: what is the maximum loss I should expect at confidence level $α$ ?

Parametric (Gaussian) VaR:

$VaR_{α} = μ + σ \cdot z_{α}$

Historical VaR — no distributional assumption; uses the empirical quantile:

$VaR_{α} = - Q_{α} ({r_{1}, r_{2}, \dots, r_{n}})$

Expected Shortfall (CVaR) — the expected loss given that VaR is breached. It is more informative than VaR for fat-tailed assets:

$CVaR_{α} = E [L ∣ L > VaR_{α}] = \frac{1}{1 - α} \int_{α}^{1} VaR_{u} d u$

VaR vs CVaR — Return distribution with 95% confidence level

python

def compute_var_cvar(returns: np.ndarray, alpha: float = 0.95) -> dict:
    sorted_r = np.sort(returns)
    var_idx   = int(np.floor((1 - alpha) * len(sorted_r)))
    var       = -sorted_r[var_idx]
    cvar      = -sorted_r[:var_idx].mean()
    return {"VaR": var, "CVaR": cvar, "alpha": alpha}

daily_returns = compute_daily_strategy_returns(backtest_results)
risk = compute_var_cvar(daily_returns, alpha=0.95)
print(f"95% Daily VaR:  {risk['VaR']*100:.2f}%")
print(f"95% Daily CVaR: {risk['CVaR']*100:.2f}%")

NOVOSKY benefit: Use CVaR — not just drawdown — as the optimization objective in Optuna hyperparameter tuning. Minimizing CVaR directly pushes the model toward strategies that perform well in the worst 5% of scenarios, not just on average.

6. Copula Dependency Model — When Correlation Isn't Enough

Pearson correlation measures linear dependency. Copulas model tail dependency — the probability that two variables both take extreme values simultaneously. This matters enormously for Bitcoin, where assets can crash together in ways their normal correlation doesn't predict.

Sklar's theorem: any joint distribution $H (x_{1}, x_{2})$ can be decomposed into marginals and a copula $C$ :

$H (x_{1}, x_{2}) = C (F_{1} (x_{1}), F_{2} (x_{2}))$

Where $C : [0, 1]^{2} \to [0, 1]$ captures the dependence structure separately from the marginal distributions. The Clayton copula has strong lower-tail dependency (both variables crash together):

$C_{θ} (u, v) = (u^{- θ} + v^{- θ} - 1)^{- 1/ θ}, θ > 0$

python

from scipy.stats import kendalltau, norm
import numpy as np

# Model BTC/ETH joint tail dependency for correlation-based features
btc_ret = np.diff(np.log(df_btc["close"].values))
eth_ret = np.diff(np.log(df_eth["close"].values))

# Convert to uniform marginals via empirical CDF
u = (np.argsort(np.argsort(btc_ret)) + 1) / (len(btc_ret) + 1)
v = (np.argsort(np.argsort(eth_ret)) + 1) / (len(eth_ret) + 1)

# Lower-tail dependence: P(U < 0.05 | V < 0.05) — joint crash probability
joint_crash = np.mean((u < 0.05) & (v < 0.05))
marginal_crash = 0.05 * 0.05  # if independent
print(f"Joint crash prob: {joint_crash:.4f} vs independence: {marginal_crash:.4f}")
print(f"Tail dependency ratio: {joint_crash/marginal_crash:.1f}x")

NOVOSKY benefit: If NOVOSKY is ever extended to multi-instrument trading (BTC + ETH, or BTC + XAUUSD), copula models ensure position sizing accounts for tail co-movement — the risk that both positions lose simultaneously is higher than naive correlation predicts.

Part III — State and Structural Models

7. Hidden Markov Model — Regime Detection

Financial markets don't have one static return distribution. They switch between regimes — trending up, trending down, mean-reverting sideways. A Hidden Markov Model (HMM) models this as:

A latent state $S_{t} \in {1, \dots, K}$ (e.g. Bull, Bear, Sideways) that follows a Markov chain with transition matrix $A$
Observations $O_{t}$ (e.g. daily returns) emitted from state-conditional distributions

$P (S_{t} = j ∣ S_{t - 1} = i) = A_{ij}$

$P (O_{t} ∣ S_{t} = k) = N (μ_{k}, σ_{k}^{2})$

The Viterbi algorithm recovers the most probable state sequence given observations:

$δ_{t} (k) = s_{1 : t - 1} max P (s_{1}, \dots, s_{t - 1}, S_{t} = k, O_{1}, \dots, O_{t})$

HMM — 3-State Regime Transition Graph

python

from hmmlearn.hmm import GaussianHMM

log_ret = np.diff(np.log(df["close"].values)).reshape(-1, 1)

hmm = GaussianHMM(n_components=3, covariance_type="full", n_iter=200, random_state=42)
hmm.fit(log_ret)

# Predict current regime
states     = hmm.predict(log_ret)
proba      = hmm.predict_proba(log_ret)   # soft assignment
curr_state = states[-1]

# Map states to labels by sorting on mean return
means = hmm.means_.flatten()
label_map = {np.argsort(means)[0]: "BEAR",
             np.argsort(means)[1]: "SIDEWAYS",
             np.argsort(means)[2]: "BULL"}
print(f"Current regime: {label_map[curr_state]}")
print(f"P(Bear)={proba[-1,np.argsort(means)[0]]:.2f}  "
      f"P(Sideways)={proba[-1,np.argsort(means)[1]]:.2f}  "
      f"P(Bull)={proba[-1,np.argsort(means)[2]]:.2f}")

NOVOSKY benefit: The single biggest weakness NOVOSKY has is taking trades during sideways/choppy regimes where no directional edge exists. An HMM regime filter — hmm_regime_bull_prob, hmm_regime_bear_prob added as features — lets the model learn to go quiet during structureless markets. This is probably the single highest-leverage improvement available.

8. Kalman Filter — Noise-Reduced Price Tracking

The Kalman filter is the optimal linear estimator for a noisy linear dynamical system. Applied to price, it separates the true underlying trend from the observation noise:

State transition (predict):

$\hat{x}_{t ∣ t - 1} = F \hat{x}_{t - 1∣ t - 1}$ $P_{t ∣ t - 1} = F P_{t - 1∣ t - 1} F^{⊤} + Q$

Measurement update:

$K_{t} = P_{t ∣ t - 1} H^{⊤} (H P_{t ∣ t - 1} H^{⊤} + R)^{- 1}$ $\hat{x}_{t ∣ t} = \hat{x}_{t ∣ t - 1} + K_{t} (z_{t} - H \hat{x}_{t ∣ t - 1})$

The state vector $x = [price, velocity]^{⊤}$ gives a noise-filtered price and its current rate-of-change simultaneously.

python

def kalman_filter_price(prices: np.ndarray,
                        Q: float = 1e-4,   # process noise
                        R: float = 1e-2    # measurement noise
                        ) -> tuple[np.ndarray, np.ndarray]:
    """Returns (filtered_price, filtered_velocity)."""
    n   = len(prices)
    F   = np.array([[1, 1], [0, 1]])        # state transition
    H   = np.array([[1, 0]])                # observation matrix
    Q_m = np.eye(2) * Q
    R_m = np.array([[R]])

    x   = np.array([[prices[0]], [0.0]])    # initial state
    P   = np.eye(2)

    filtered_px  = np.zeros(n)
    filtered_vel = np.zeros(n)

    for t in range(n):
        x   = F @ x
        P   = F @ P @ F.T + Q_m
        K   = P @ H.T @ np.linalg.inv(H @ P @ H.T + R_m)
        x   = x + K * (prices[t] - (H @ x)[0, 0])
        P   = (np.eye(2) - K @ H) @ P
        filtered_px[t]  = x[0, 0]
        filtered_vel[t] = x[1, 0]

    return filtered_px, filtered_vel

kpx, kvel = kalman_filter_price(df["close"].values)
df["kalman_price"]    = kpx
df["kalman_velocity"] = kvel   # sign = direction, magnitude = trend strength

NOVOSKY benefit: Replace or supplement EMA trend slope with kalman_velocity. Kalman adapts its smoothing dynamically (via Kalman gain $K_t$) — it's faster to respond when markets change and smoother during noise, unlike EMA with its fixed lag. It also gives velocity and uncertainty estimates simultaneously.

Part IV — Deep Sequence Learning

9. LSTM Price Forecast

Long Short-Term Memory networks solve the vanishing gradient problem that breaks vanilla RNNs on long sequences. The core is a cell state $c_{t}$ gated by three learned gates:

$i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) (input gate)$ $f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) (forget gate)$ $\tilde{c}_{t} = tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c}) (candidate)$ $c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tilde{c}_{t}$ $o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}), h_{t} = o_{t} ⊙ tanh (c_{t})$

The forget gate $f_{t}$ decides what to erase from memory. When $f_{t} \approx 0$ , the network has learned that the past is irrelevant — critical for regime transitions.

LSTM Cell — Information Flow

python

import torch
import torch.nn as nn

class TradingLSTM(nn.Module):
    def __init__(self, input_size=62, hidden=128, layers=2, dropout=0.3):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden, layers,
                            batch_first=True, dropout=dropout)
        self.head = nn.Sequential(
            nn.Linear(hidden, 64), nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(64, 3)       # LONG / SHORT / FLAT
        )

    def forward(self, x):                        # x: (B, T, F)
        out, _ = self.lstm(x)
        return self.head(out[:, -1, :])          # last timestep only

# Training sketch
model     = TradingLSTM()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 1.0, 0.5]))  # down-weight FLAT

# Walk-forward training on NOVOSKY feature windows
for epoch in range(50):
    for X_batch, y_batch in train_loader:       # X: (B, 20, 62) — 20-bar window
        logits = model(X_batch)
        loss   = criterion(logits, y_batch)
        optimizer.zero_grad(); loss.backward(); optimizer.step()

NOVOSKY benefit: The ensemble currently uses static feature windows with no memory across bars. An LSTM layer added as a meta-learner on top of the RF/XGB/LGB probability outputs could capture sequential signal patterns — "three bars of increasing confidence followed by a regime shift" — that the tree models cannot represent.

10. Transformer Price Forecast

The Transformer replaces recurrence entirely with scaled dot-product attention, which allows every position in a sequence to attend to every other position in parallel:

$Attention (Q, K, V) = softmax (\frac{Q K ^{⊤}}{d _{k}}) V$

Multi-head attention runs $h$ parallel attention heads and concatenates:

$MultiHead (Q, K, V) = Concat (head_{1}, \dots, head_{h}) W^{O}$

Where $head_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})$ .

The $d_{k}$ scaling prevents softmax saturation in high-dimensional key spaces.

python

class TemporalTransformer(nn.Module):
    def __init__(self, d_model=64, nhead=4, n_layers=3, seq_len=48, n_features=62):
        super().__init__()
        self.embed   = nn.Linear(n_features, d_model)
        self.pos_enc = nn.Parameter(torch.randn(seq_len, d_model))
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model, nhead=nhead, dim_feedforward=256,
            dropout=0.1, batch_first=True
        )
        self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
        self.head    = nn.Linear(d_model, 3)

    def forward(self, x):                        # x: (B, T, F)
        x = self.embed(x) + self.pos_enc[:x.size(1)]
        x = self.encoder(x)                      # attend across all T bars
        return self.head(x[:, -1, :])            # classify from last token

NOVOSKY benefit: Attention heads can learn to focus on candles from hours ago that structurally resemble current conditions — something neither LSTM nor trees can do cleanly. A Transformer trained on M15 bars with a 48-bar (12-hour) window could learn intra-day opening patterns that the current feature set can only approximate.

Part V — Pattern Detection and Signal Decomposition

11. CNN Pattern Recognition

Convolutional filters are pattern matchers. A 1D convolution of kernel $w$ over signal $x$ :

$(x * w) [n] = k = 0 \sum K - 1 w [k] x [n - k]$

Applied to OHLCV windows, a CNN learns to detect candlestick patterns — engulfing bars, doji sequences, breakouts — as filters that activate on the right shape.

python

class CandleCNN(nn.Module):
    def __init__(self, n_features=5, seq_len=30):    # raw OHLCV
        super().__init__()
        self.conv1 = nn.Conv1d(n_features, 32, kernel_size=5, padding=2)
        self.conv2 = nn.Conv1d(32, 64, kernel_size=3, padding=1)
        self.pool  = nn.AdaptiveAvgPool1d(8)
        self.head  = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64*8, 128), nn.ReLU(),
            nn.Linear(128, 3)
        )

    def forward(self, x):            # x: (B, C, T) — channels-first for Conv1d
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = self.pool(x)
        return self.head(x)

# Visualize what filter 0 is detecting after training:
# filter_activations = model.conv1.weight[0].detach().numpy()

NOVOSKY benefit: CNN outputs (cnn_long_prob, cnn_short_prob) added as features to the existing ensemble give it access to shape-based pattern information that the current engineered features (ratios, indicators) can't capture. Think of it as a learned candlestick recognition module bolted onto the existing system.

12. Wavelet Decomposition — Multi-Scale Signal Separation

A Discrete Wavelet Transform (DWT) decomposes a time series into approximation (low-frequency trend) and detail (high-frequency noise) components at multiple scales simultaneously:

$cA_{j} [n] = k \sum h [k] cA_{j - 1} [2 n - k] (approximation at scale j)$ $cD_{j} [n] = k \sum g [k] cA_{j - 1} [2 n - k] (detail/noise at scale j)$

Where $h [k]$ is a low-pass filter and $g [k]$ is its quadrature mirror (high-pass). At level $j = 1$ , each bar represents 2 bars of the original; at $j = 3$ , each bar is 8 M15 bars = 2 hours.

Wavelet Multi-Scale Decomposition — M15 BTC Price

python

import pywt

def wavelet_features(close: np.ndarray, wavelet="db4", level=3) -> dict:
    cA, *cDs = pywt.wavedec(np.log(close), wavelet, level=level)

    # Reconstruct trend from approximation only
    rec_components = [cA] + [np.zeros_like(d) for d in cDs]
    trend = pywt.waverec(rec_components, wavelet)[:len(close)]

    # High-frequency noise energy at level 1
    noise_energy = np.std(cDs[0][-20:])          # recent 20-bar HF energy

    return {
        "wavelet_trend_slope": np.polyfit(range(10), trend[-10:], 1)[0],
        "wavelet_noise_energy": noise_energy,
        "wavelet_detail_l2_std": np.std(cDs[1][-10:]),
    }

NOVOSKY benefit: wavelet_noise_energy is a direct signal quality metric. When high-frequency detail coefficients are large, the market is noisy and trades should require higher confidence. When the L3 approximation has a clear slope, NOVOSKY can lean into the trend. This is multi-timeframe analysis without requiring a second data feed.

Part VI — Generative and Probabilistic Models

13. GAN Price Simulation — Stress Testing

A Generative Adversarial Network pits two networks against each other: a Generator $G$ that creates synthetic data, and a Discriminator $D$ that tries to tell real from fake. The minimax objective:

$G min D max E_{x \sim p_{data}} [lo g D (x)] + E_{z \sim p_{z}} [lo g (1 - D (G (z)))]$

At Nash equilibrium, $G$ produces samples indistinguishable from real data. For trading, this means generating synthetic BTC return sequences with the same statistical properties as the real market — fat tails, volatility clustering, autocorrelation.

python

class PriceGenerator(nn.Module):
    def __init__(self, latent=32, seq_len=96, n_features=5):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(latent, 128), nn.LeakyReLU(0.2),
            nn.Linear(128, 256),   nn.LeakyReLU(0.2),
            nn.Linear(256, seq_len * n_features),
            nn.Tanh()
        )
        self.seq_len   = seq_len
        self.n_features = n_features

    def forward(self, z):
        return self.net(z).view(-1, self.seq_len, self.n_features)

# Use trained GAN to augment training data with rare-event scenarios
gen   = PriceGenerator().eval()
noise = torch.randn(1000, 32)
with torch.no_grad():
    synthetic_sequences = gen(noise).numpy()   # 1000 synthetic market scenarios

# Run NOVOSKY backtest on synthetic sequences to measure tail performance

NOVOSKY benefit: NOVOSKY's training data covers ~1 year of BTC/USD. GANs can generate thousands of realistic synthetic scenarios including market conditions that never occurred in the training window — flash crashes, sustained trend days, low-liquidity weekends. Training on augmented data makes the model more robust to unseen regimes.

14. Autoencoder Anomaly Detection

An autoencoder learns to compress data into a latent vector $z = f_{θ} (x)$ and reconstruct it: $\overset{x}{^} = g_{ϕ} (z)$ . It's trained to minimize reconstruction error:

$L_{recon} = ∥ x - g_{ϕ} (f_{θ} (x)) ∥^{2}$

Because the autoencoder is trained on normal market conditions, it reconstructs them well. When an anomaly occurs (flash crash, extreme volume spike, data feed error), the reconstruction error spikes:

$s (x) = ∥ x - \overset{x}{^} ∥^{2} \Rightarrow anomaly if s (x) > τ$

python

class MarketAutoencoder(nn.Module):
    def __init__(self, n_features=62, latent=16):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(n_features, 48), nn.ReLU(),
            nn.Linear(48, latent)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent, 48), nn.ReLU(),
            nn.Linear(48, n_features)
        )

    def forward(self, x):
        return self.decoder(self.encoder(x))

    def anomaly_score(self, x: torch.Tensor) -> torch.Tensor:
        return ((x - self.forward(x)) ** 2).mean(dim=-1)

# Threshold: 99th percentile of training reconstruction error
ae   = MarketAutoencoder().eval()
scores_train = ae.anomaly_score(X_train_tensor).detach().numpy()
tau  = np.percentile(scores_train, 99)

# At inference: block trade if anomaly detected
score = ae.anomaly_score(current_features_tensor).item()
if score > tau:
    signal = "BLOCKED — anomalous market conditions"

NOVOSKY benefit: NOVOSKY currently relies on ATR and circuit breakers for risk management. An autoencoder guard — trained on the feature distribution of profitable trade setups — provides a learned abnormality filter that blocks trades when the market looks nothing like historical conditions the model was trained on.

Part VII — Forecasting and External Data

15. Prophet Forecast — Decomposable Time Series

Prophet (Facebook/Meta) decomposes time series into trend, seasonality, and holiday effects:

$y (t) = g (t) + s (t) + h (t) + ϵ_{t}$

Where $g (t)$ is a piecewise linear or logistic trend with automatic changepoint detection, $s (t)$ is Fourier-series seasonality at multiple frequencies, and $h (t)$ models scheduled events (news releases, options expiry).

python

from prophet import Prophet
import pandas as pd

# Aggregate M15 data to hourly for Prophet
hourly = df.resample("1h", on="datetime").agg({"close": "last"}).dropna()
ph_df  = hourly.reset_index().rename(columns={"datetime": "ds", "close": "y"})

m = Prophet(
    changepoint_prior_scale=0.05,       # flexibility of trend
    seasonality_mode="multiplicative",  # BTC vol scales with price
    weekly_seasonality=True,
    daily_seasonality=True,
)
m.add_seasonality(name="session", period=1/4, fourier_order=8)   # 6h session cycle
m.fit(ph_df)

future  = m.make_future_dataframe(periods=4, freq="h")           # 4h ahead
fc      = m.predict(future)
trend_slope = fc["trend"].diff().iloc[-1]   # +ve = uptrending, -ve = downtrending

NOVOSKY benefit: Prophet's trend, weekly, and daily decomposition components can be injected as features — especially trend_slope (macro direction) and weekly_seasonality (Monday vs Friday behavior). These are longer-horizon signals that complement the short M15-based indicators already in the model.

16. Sentiment Analysis (NLP) — External Signal Intelligence

Price doesn't move in a vacuum. Macro events, Elon tweets, ETF news, exchange hacks — all move BTC. Sentiment models convert unstructured text into numerical signals.

VADER (rule-based, fast, no GPU needed):

$compound (t) = \frac{\sum _{i} s _{i}}{\sum _{i} s _{i}^{2} + C}$

FinBERT (BERT fine-tuned on financial text — more accurate):

$P (positive ∣ t) = softmax (W BERT (t))_{pos}$

python

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline

vader = SentimentIntensityAnalyzer()
finbert = pipeline("sentiment-analysis",
                   model="ProsusAI/finbert",
                   truncation=True, max_length=512)

def fetch_btc_sentiment(lookback_minutes=30) -> dict:
    # Pull from CryptoPanic API or Reddit /r/bitcoin RSS
    headlines = fetch_recent_news(lookback_minutes)
    if not headlines:
        return {"vader_compound": 0.0, "finbert_positive": 0.5, "news_count": 0}

    vader_scores = [vader.analyze(h)["compound"] for h in headlines]
    finbert_out  = finbert(headlines[:10])    # top 10, GPU budget
    fb_positive  = np.mean([1 if r["label"] == "positive" else
                            (0.5 if r["label"] == "neutral" else 0)
                            for r in finbert_out])
    return {
        "vader_compound":   float(np.mean(vader_scores)),
        "finbert_positive": fb_positive,
        "news_count":       len(headlines),
    }

NOVOSKY benefit: Currently NOVOSKY uses a ForexFactory news blocker (rule-based: block X minutes before/after high-impact events). Replacing it with a live sentiment score as an actual feature lets the model learn nuance: a negative headline during a bullish regime behaves differently from the same headline during a ranging market. News count alone (news_count feature) is a strong volatility predictor.

17. Neural Network Classification Zoo

Beyond LSTM and Transformers, several architectures are worth benchmarking as drop-in classifiers:

TABULAR

TabNet

Attentive feature selection per step. Interpretable. Outperforms XGB on sparse tabular data.

SEQUENTIAL

TCN

Temporal Convolutional Network. Dilated causal convolutions. Faster than LSTM, comparable accuracy.

ENSEMBLE

NGBoost

Gradient boosting with probabilistic outputs (full distribution, not just class). Calibration built in.

ROBUST

CatBoost

Handles categorical features natively. Symmetric trees reduce overfitting on small datasets.

BAYESIAN

BNN

Bayesian Neural Net (MC Dropout). Produces uncertainty estimates — "I don't know" is a valid output.

ATTENTION

FT-Transformer

Transformer on feature tokens (not time tokens). Each feature = one attention token. Strong on tabular.

python

# NGBoost: probabilistic gradient boosting — gives full return distribution
from ngboost import NGBClassifier
from ngboost.distns import k_categorical

ngb = NGBClassifier(Dist=k_categorical(3), n_estimators=500,
                    verbose_eval=50, random_state=42)
ngb.fit(X_train, y_train)

# Returns probability distribution over [LONG, FLAT, SHORT]
proba = ngb.predict_proba(X_test)        # shape (N, 3)
print(f"P(LONG)={proba[0,0]:.3f}  P(FLAT)={proba[0,1]:.3f}  P(SHORT)={proba[0,2]:.3f}")

Part VIII — The NOVOSKY Integration Map

Every method above maps to one of three improvement areas. Here is the priority ranking:

NOVOSKY Research Backlog — Impact vs Implementation Cost

Method	Use In NOVOSKY	Priority
HMM Regime	Regime filter feature — block trades in Sideways state	P0 — do now
Kalman Filter	Replace EMA slope with Kalman velocity	P0 — do now
GARCH	Volatility regime feature + dynamic ATR multiplier	P0 — do now
Pearson Pruning	Remove pairs with $∣ r ∣ > 0.90$ from the 62-feature set	P1 — easy win
Wavelet	Multi-scale noise energy + trend slope features	P1 — easy win
Autoencoder	Anomaly guard: block trades on out-of-distribution features	P1 — easy win
Monte Carlo	Monthly drawdown distribution audit	P2 — tooling
VaR / CVaR	Optimization objective for Optuna runs	P2 — tooling
ARIMA residual	Meta-feature: what ARIMA couldn't predict	P2 — tooling
NLP Sentiment	Replace hard news blocker with soft sentiment feature	P3 — research
CNN Patterns	Candlestick shape feature extractor	P3 — research
LSTM meta	Sequence meta-learner over ensemble probabilities	P4 — future
Transformer	Full sequence model on 48-bar M15 windows	P4 — future
Prophet	Long-horizon trend decomposition feature	P4 — future
GAN augmentation	Synthetic training data for rare regimes	P5 — research
Copula	Multi-instrument tail-risk sizing	P5 — multi-instrument
BNN / NGBoost	Uncertainty-aware classification	P3 — drop-in test

The Research Principle

Every method here shares one constraint: it must beat the current ensemble on walk-forward out-of-sample data, not just in-sample cross-validation. A GARCH feature that adds 0.3% to OOS win rate after 3 months of live data is worth more than a Transformer that looks spectacular in backtest and falls apart in production.

The stack isn't a to-do list. It's a hypothesis bank.

Each row in the table above is a falsifiable claim: "adding this feature / model / filter will improve the strategy's edge on unseen BTC/USD M15 data." The job of the research loop is to test each claim as cheaply as possible — and throw away the ones that don't survive contact with reality.

That's the only kind of research that matters.