Read your verdict | liftstack

After your campaign sends and engagement data flows back, liftstack analyses the results and produces a verdict for each slot.

Verdict types

There are four possible verdicts:

Winner Found

One variant outperformed the others with high confidence. The verdict card shows:

Which variant won and how many others it beat
The conversion rates compared (winner vs control)
The uplift: additional conversions and estimated additional revenue generated by the winning variant
Confidence level and probability of being best
Revenue range (best case to worst case)

Equivalent

All variants performed within a negligible range of each other (within the ROPE width). There is no performance-based reason to choose one over another. Pick whichever fits your brand preference.

Insufficient Data

No conclusion can be reached yet. One variant is leading but not decisively. The verdict card shows:

Which variant is currently leading
The current probability of being best (below the decision threshold)
An estimate of how many more exposures are needed before a conclusion can be reached

Common reasons for insufficient data:

The audience is small
The variants perform very similarly (requiring more data to distinguish them)
The campaign is still early in its tracking period
Not enough conversions have been recorded yet (each variant needs at least 3 conversions before a verdict can be computed)

A variant triggered a safety guardrail. This happens when a variant causes a meaningful increase in unsubscribe rates, spam complaint rates, or bounce rates compared to the control. Even if the variant has a high probability of being best on the primary metric, it will not be declared a winner because it is damaging your audience or sender reputation.

Confidence levels

Confidence levels translate the raw probability into a human-readable label:

Probability of Being Best	Confidence Level	What it means
95% or higher	Very High	Extremely likely this is the true best variant. Declare a winner.
85% to 95%	High	Very probably the best, but a small chance you are wrong. Consider collecting more data if the stakes are high.
70% to 85%	Moderate	Leading, but there is meaningful uncertainty. Likely needs more data.
Below 70%	Low	Too early to tell. Keep testing.

Reading the numbers

The verdict card shows the probability each variant is best (for example, “93% probability Variant B is the winner”), the estimated uplift range, and the revenue impact.

Results update in real time as more data arrives, so you can watch confidence build over hours. There is no statistical penalty for checking results early. Unlike frequentist methods, the Bayesian approach used by liftstack handles continuous monitoring naturally without inflating error rates.

Understanding credible intervals

Throughout the report you will see ranges labelled as credible intervals. These are not the same as traditional confidence intervals. A 95% credible interval means: “there is a 95% probability the true value falls within this range.” This is a direct probability statement about where the true value lies, which makes it more intuitive to interpret. A narrow range means liftstack is quite certain; a wide range means there is still meaningful uncertainty.