Market-regime filter null result: SPX percentile gate on quant backtests

TL;DR — KreamEdge built a market-regime gate following the standard percent-rank approach (SPX 252-day rank, three terciles T / R / B with hysteresis). On 14 backtested strategies — 6 from our reference bench, 8 from the JSON corpus — the gate did not improve a single strategy’s risk-adjusted score above a realistic noise floor. Re-optimizing weights for the non-bull tape (R+B) did not change that either — gating and specialist re-optimization both fail to beat the all-regime baseline on our universe. The most likely explanation is twofold: (a) over the 2016–2026 window, the SPX percent-rank label tags ~78 % of bars as T, so the all-regime weights are already T-specialist weights de facto, and the remaining ~22 % of bars carry too little signal to support a useful specialist; (b) strategies that already rotate across a globally diversified universe absorb regional and trend regimes implicitly, so an SPX-only label adds little. The strategies do not condition on VIX either, which is consistent with the same observation. The natural next experiment is to enrich the label with rate-curve context; a fresh slope-based or rate-based label that splits the dominant T regime into sub-buckets is more likely to unlock heterogeneity than a finer-grained gate on the same SPX percentile.

This article reports backtest research. It is informational and educational. Past performance is not indicative of future results. Nothing here is financial advice. See our community page for the full disclaimer.

1. What we built

A “market regime” in the practitioner sense is a coarse label attached to each bar that says, roughly, “what kind of tape is this?” The two textbook references for retail-scale systematic work are the Better System Trader Live episode with Cesar Alvarez and Alan Clement’s earlier BST 063 episode. Both converge on the same recipe: a slow, smoothed measure of the broad market, bucketed into a small number of states, not optimized against your backtest results.

We implemented the percent-rank variant Alvarez describes on the show.

Component	Implementation
Underlying	Raw `SPX:USD` daily close (not FX-normalized — we explicitly forbid FX-normalizing the regime context, to avoid contaminating the trend signal with EUR/USD or CHF/USD moves).
Indicator	252-day percent rank of close.
Bucketing	Top tercile = `T` (bull), bottom tercile = `B` (bear), middle band = `R` (range / transitional).
Hysteresis	Two-tier: enter `T` at rank ≥ 0.67, exit to `R` at ≤ 0.55. Enter `B` at ≤ 0.33, exit to `R` at ≥ 0.45. Stickiness in the dead-band.
Diagnostic axis	`vix_ratio = VIX / VIX3M`, 252-day percent-rank, bucketed `Q / N / V` (quiet / normal / volatile). Tracked but not used as a production gate.
Application	Entry-only mask. A position acquired in `T` is not force-closed when the regime flips to `R`; trailing stops and exit rules continue to govern.
Lag	One bar. All regime computations are delayed by one bar before being consumed by signal generation.

The four percentile cutoffs (0.67 / 0.55 / 0.33 / 0.45) are intentionally fixed, not gridded against backtest results. This is a deliberate methodological choice — Alvarez is explicit on the show that optimizing the regime filter against backtest outcomes is one of the easiest ways to overfit a strategy. The filter is a coarse contextual switch, not a parameter to tune.

The per-strategy regime_label field is hashed into the strategy UID so two otherwise-identical strategies with different gates are tracked as separate runs.

2. How we tested whether the gate helps

For each of 14 candidate strategies, we ran four variants:

baseline — no regime gate (legacy behavior).
T-mode — accept entries only when label = T.
R-mode — accept entries only when label = R.
B-mode — accept entries only when label = B.

The weights inside each strategy are the production weights — the ones tuned on full-bar averages, without any regime conditioning. The question is exclusively: does masking entries by regime improve a pre-optimized strategy? It is a deliberately weaker question than “could we re-optimize the strategy inside each regime and beat the baseline” — that one is addressed separately in section 4.

The capital-efficiency metric we score on is composite / exposure. The composite already bakes Sharpe and a drawdown penalty; dividing by exposure rewards strategies that deliver risk-adjusted return without sitting in the market the whole time. A regime gate that simply turns the strategy off most of the time would still need to deliver higher per-bar quality to beat the baseline.

A strategy is flagged as a regime specialist candidate if some gated mode’s composite / exposure exceeds the baseline by ≥ 0.10, while also clearing two noise floors: exposure ≥ 20 % of the backtest window and Sharpe ≥ 0.5 in the gated mode. The thresholds are not the point — what is important is that they are set before looking at the results.

3. The result

Across the strategy corpus we tested (reference-bench rows from our quality-filtered set plus production JSONs covering crypto, ETF/PEA, euro, European-ex-EUR, US-USD, and global-major-indices):

No strategy crossed the +0.10 fit-score threshold at the realistic floor.
No strategy showed a gated Sharpe meaningfully above its baseline.
Most strategies bucketed as all-regime (baseline already dominates).
A minority bucketed as gated-invalid (gated modes collapse below the exposure / trade-count noise floor) or baseline-invalid (baseline itself in noise territory before regime testing).

Loosening the floors to (exposure ≥ 5 %, Sharpe ≥ 0) promotes 2 of 14 strategies to “B-friendly”, but the underlying B-mode Sharpe is borderline meaningful and we would not act on it.

Representative numbers from the global-indices reference bench:

Strategy	Mode	CAGR %	Sharpe	Exposure %	Composite / Exposure
`rbench:265`	baseline	32.2	1.73	52.9	0.058
`rbench:265`	T-only	20.8	1.17	39.0	0.064
`rbench:265`	B-only	7.9	0.38	13.2	0.163
`rbench:288`	baseline	36.1	1.19	81.5	0.018
`rbench:288`	T-only	21.3	0.69	66.1	-0.001
`rbench:288`	B-only	7.5	0.24	13.8	0.142

Composite already incorporates a drawdown penalty. CAGR figures above are bench-window in-sample values; do not transpose them to forward expectations.

The numbers tell a consistent story. Gating to T shrinks both return and Sharpe — the strategy was already doing fine in trending tape, and the regime label removes some of the entries that contributed. Gating to B looks attractive on capital-efficiency (composite/exposure jumps from 0.018 to 0.142 on rbench:288), but the residual trade count and exposure are so small that the metric is being lifted by a handful of trades, not by a robust regime-specific edge. None of the gated modes produced a Sharpe high enough above its baseline to justify a regime specialist; the small uplifts on the composite/exposure ratio come from drastically reduced exposure, not from sharper per-bar quality.

4. Why nothing improved — the structural hypothesis

The null result is not a bug. It is a constraint of the system we tested. Before reaching the structural argument, there is a base-rate observation that does most of the work.

The base rates are extremely lopsided

The flagship production strategy was backtested on the period 2016-01-01 → 2026-04-30 — roughly ten years, 2 592 trading bars. Across that window the SPX percent-rank label assigned bars to the three regimes as follows:

Regime	Bars	Share of window
`T` (bull, top tercile)	2 013	77.7 %
`R` (range, middle band)	196	7.6 %
`B` (bear, bottom tercile)	383	14.8 %

These are not “tercile” frequencies in the textbook sense. By construction each bucket should contain one-third of bars over a stationary window; in a stretch of market history dominated by the post-2020 recovery and the 2023–2025 rally, SPX spent most of its days inside its trailing-year top. The label is doing exactly what it should — the distribution is the message.

The implication for any gating experiment is severe. A strategy whose weights were optimized on the full window is, mechanically, already a T-specialist — 78 % of the data the optimizer saw was T tape, so the weight vector minimizes objective penalties under T conditions and treats R + B bars as a small perturbation. Switching the gate to T only does not unlock anything new; switching it to R + B reduces both training data and trade count by ~4× and exposes the strategy to bars that were always a minority of the optimizer’s signal.

We also tested the harder version of the question: take the strategy and re-optimize the weights with the gate restricted to R + B, i.e. fit specialist weights for the non-bull tape from the ground up. That run did not improve the all-regime baseline either. The two plausible reasons are not mutually exclusive:

Statistical room. A specialist trained on ~580 bars and the trades that survive a 22 % exposure window has very few degrees of freedom against an 11-weight model. The risk of fitting noise dominates whatever genuine R + B edge might exist.
Universe rotation already exits the bad tape. Even without a regime gate, the strategy’s ranking step de-allocates from names whose individual trends have rolled over. By the time the broad SPX label flips to B, the strategy is already mostly out of US large-cap exposure and tilted toward whatever is still ranking well — global ETFs, defensives, commodity-linked names. The cross-sectional rotation does the regime work that an SPX-percentile gate is trying to add.

The structural argument

KreamEdge production strategies share two structural properties that make an SPX-based regime gate redundant:

a) The universe already encodes regional regimes. The flagship production strategy operates on zone=major_indices, a basket of large-cap names across US, Europe, UK, Switzerland, Japan, and ETF wrappers covering APAC. The ranking step inside the strategy (“which 5–10 names look strongest right now?”) is already a regime filter — it just expresses itself through the cross-section of which tickers rank, not through a global on/off switch. When US large-caps are in a drawdown and emerging-market or commodity-linked names are leading, the strategy already rotates toward the latter. Adding an “SPX is in its bottom tercile, stop trading” overlay on top of that throws away the rotation it is being paid to do.

b) The signal logic does not condition on VIX. None of the production strategy_buy or strategy_sell expressions contain the vix0a–vix1b atoms that the parser supports. The strategies do not use volatility state as a signal-level input either. That is a deliberate property — they were tuned on entry/exit rules that are pure price + trend + momentum — but it is informative. A strategy that doesn’t lean on VIX at the entry layer is unlikely to learn anything from a coarser SPX-percentile state at the gating layer; the percentile is monotonic in trailing SPX return, which the entry rules already see directly.

In other words, the regime gate is trying to solve a problem the strategy already solves implicitly. That is not the same as saying market regime is unimportant — it is saying the specific gate variant we tested does not add information that the strategy was not already extracting.

5. What this does not prove

A few important caveats, in the spirit of being precise about what the data actually says:

Gating and a first pass of specialist re-optimization both failed. The headline experiment was gate-only. We did also run the harder version — re-optimizing weights inside the R + B mask, from scratch — on the flagship global-indices strategy. That run did not beat the all-regime baseline either. That is one data point on one strategy on one universe; it is not a proof that no per-regime specialist exists, but it removes the obvious “we just didn’t fit the right weights” rebuttal. A more aggressive specialist study — B alone with a much larger weight space, or a different universe with more B trades — could still produce a positive result.
One regime taxonomy, one window. The percent-rank cutoffs, the 252-day window, the tercile bucketing — all are practitioner defaults, not optimized. A volatility-crossed grid (T×V, B×Q, etc.) is the obvious next axis. The diagnostic Q/N/V vol state is already computed; it is just not promoted to a gate.
An SPX-anchored gate is one of several possible regime definitions. The current label is blind to interest-rate state, credit spreads, term-structure, breadth, and inter-asset correlations. The codebase now ships per-currency benchmark interest-rate curves. Folding rate state — for instance “global rates rising vs. falling”, or “real rates positive vs. negative” — into the regime label is the most obvious enrichment to try next.
No strategy was hurt by being gated either. With one exception (rbench:288 T-only flipping composite/exposure negative), the gate is a neutral overlay. Strategies that already do not want a gate quietly ignore it; strategies that might want one in principle do not yet have the weights to express that preference.

6. What we will try next

In rough order of expected information yield per unit of compute:

Rate-aware regime label. Promote the per-currency benchmark interest-rate infrastructure already in the codebase to a regime input. A label that splits the dominant T bucket into sub-states — e.g. “bull tape with rising real rates” vs. “bull tape with cutting cycle” — directly attacks the base-rate problem identified above. The SPX-only label cannot split T; a rate-conditioned label can. This is now ranked above specialist re-optimization on the existing label, because the first round of specialist re-optimization on R + B did not deliver.
Volatility-crossed grid. The diagnostic VIX/VIX3M percentile (Q / N / V) already exists; promoting it to a gate axis is a one-config change. Useful diagnostic and a likely complement to the rate-aware label.
Specialist re-optimization on B alone, with a different universe. The R + B re-optimization on the global-indices strategy did not improve things, but the trade count was tight. Re-running on a strategy with a much larger baseline trade count — or on a universe where B bars are more frequent (the bench window we used is anomalously bull-heavy) — is a sharper version of the same test.
Cesar Alvarez’s diversification trick. Run two copies of the same strategy with different regime filters and ensemble the equity curves. Even if no single filter is a clear win, ensembling reduces regime-classification risk because no two filters time the same transition the same way.

What we will deliberately not do: optimize the regime cutoffs (0.67 / 0.55 / 0.33 / 0.45) against backtest results. The point of fixing them up front is to keep the filter generalizable. If a fixed-threshold filter does not help, the answer is a different kind of filter, not a re-tuned version of the same one.

7. Takeaways for builders

If you are building your own systematic strategy and considering adding a regime filter, the practitioner-level conclusions from this experiment generalize beyond our specific stack:

Test the filter as a gate before reaching for a full re-optimization. It is cheap, and if gating does not help, that already tells you something about the strategy: its entry rules are probably absorbing the same information the gate would express.
Diversified universes need stronger regime evidence to justify a gate. Single-asset and single-region strategies have the most to gain from a regime overlay; global-rotation strategies have the least, because rotation is a regime response.
A regime gate is a hypothesis about which conditions you want to be active in, not a way to boost in-sample numbers. If you find yourself tuning the percentile cutoffs to find a configuration that lifts your CAGR, you are no longer doing regime analysis — you are doing in-sample fitting through a different parameter.
Negative results are part of the work. A regime gate that does not improve a strategy is not wasted research. It tells you that the strategy is already extracting the relevant context, and it focuses the next iteration on where information is actually missing — for us, that points toward interest-rate state, not equity-percentile state.

The KreamEdge codebase will keep the regime gate available — it costs nothing to leave in place, it gives us the empirical apparatus to test the next label definition, and it satisfies an important property in the meantime: the regime_label is now part of the strategy UID, so any future regime-aware strategy is unambiguously distinguished from its all-regime ancestor in the run history. Architecture in place; the answer to which regime label matters is still open.

This post describes backtest research and methodology. All figures are taken from out-of-sample-aware backtests with transaction costs and slippage modeled, but they remain historical simulations. Backtest results do not represent live trading, do not guarantee future results, and should not be interpreted as recommendations to buy, sell, or hold any instrument. KreamEdge is informational and educational only — not financial advice, not a regulated advisory service, and not a signal-distribution product. Discussion of methodology continues on our community channels.

Market-regime filter null result – why an SPX percent-rank gate did not improve our quant backtests

Published by KreamEdge on May 18, 2026May 18, 2026

1. What we built

2. How we tested whether the gate helps

3. The result

4. Why nothing improved — the structural hypothesis

The base rates are extremely lopsided

The structural argument

5. What this does not prove

6. What we will try next

7. Takeaways for builders

Related KreamEdge research

0 Comments

Leave a Reply Cancel reply

Why Screener and Strategy Belong in One Optimization Loop

SPX Liquidity Deflator: Correcting a Silent Bias in Long Backtests

Backtest 332 – Combined systematic 1D portfolio: US equities + world indices (CAGR 35%, MaxDD 27%)

Market-regime filter null result – why an SPX percent-rank gate did not improve our quant backtests

Published by KreamEdge on May 18, 2026May 18, 2026

1. What we built

2. How we tested whether the gate helps

3. The result

4. Why nothing improved — the structural hypothesis

The base rates are extremely lopsided

The structural argument

5. What this does not prove

6. What we will try next

7. Takeaways for builders

Related KreamEdge research

0 Comments

Leave a Reply Cancel reply

Related Posts

Why Screener and Strategy Belong in One Optimization Loop

SPX Liquidity Deflator: Correcting a Silent Bias in Long Backtests

Backtest 332 – Combined systematic 1D portfolio: US equities + world indices (CAGR 35%, MaxDD 27%)