Read your verdict
After your campaign sends and engagement data flows back, liftstack analyses the results and produces a verdict for each slot.
Verdict types
There are four possible verdicts:
Winner Found
One variant outperformed the others with high confidence. The verdict card shows:
- Which variant won and how many others it beat
- The conversion rates compared (winner vs control)
- The uplift: additional conversions and estimated additional revenue generated by the winning variant
- Confidence level and probability of being best
- Revenue range (best case to worst case)
Equivalent
All variants performed within a negligible range of each other (within the ROPE width). There is no performance-based reason to choose one over another. Pick whichever fits your brand preference.
Insufficient Data
No conclusion can be reached yet. One variant is leading but not decisively. The verdict card shows:
- Which variant is currently leading
- The current probability of being best (below the decision threshold)
- An estimate of how many more exposures are needed before a conclusion can be reached
Common reasons for insufficient data:
- The audience is small
- The variants perform very similarly (requiring more data to distinguish them)
- The campaign is still early in its tracking period
- Not enough conversions have been recorded yet (each variant needs at least 3 conversions before a verdict can be computed)
Guardrail Violation
A variant triggered a safety guardrail. This happens when a variant causes a meaningful increase in unsubscribe rates, spam complaint rates, or bounce rates compared to the control. Even if the variant has a high probability of being best on the primary metric, it will not be declared a winner because it is damaging your audience or sender reputation.
Confidence levels
Confidence levels translate the raw probability into a human-readable label:
| Probability of Being Best | Confidence Level | What it means |
|---|---|---|
| 95% or higher | Very High | Extremely likely this is the true best variant. Declare a winner. |
| 85% to 95% | High | Very probably the best, but a small chance you are wrong. Consider collecting more data if the stakes are high. |
| 70% to 85% | Moderate | Leading, but there is meaningful uncertainty. Likely needs more data. |
| Below 70% | Low | Too early to tell. Keep testing. |
Reading the numbers
The verdict card shows the probability each variant is best (for example, “93% probability Variant B is the winner”), the estimated uplift range, and the revenue impact.
Results update in real time as more data arrives, so you can watch confidence build over hours. There is no statistical penalty for checking results early. Unlike frequentist methods, the Bayesian approach used by liftstack handles continuous monitoring naturally without inflating error rates.
Understanding credible intervals
Throughout the report you will see ranges labelled as credible intervals. These are not the same as traditional confidence intervals. A 95% credible interval means: “there is a 95% probability the true value falls within this range.” This is a direct probability statement about where the true value lies, which makes it more intuitive to interpret. A narrow range means liftstack is quite certain; a wide range means there is still meaningful uncertainty.