Uplift measurement
liftstack measures uplift as the difference in performance between the winning variant and the control, translated into revenue terms wherever possible. The system includes several safeguards against overestimation.
How uplift is calculated
liftstack uses the posterior mean (not the raw observed difference) to estimate uplift. The posterior mean naturally shrinks extreme estimates toward realistic values, providing a more reliable number than a simple observed difference.
Every uplift estimate includes:
- A credible interval (range) so you can see the best and worst case. For example: “Estimated additional revenue: £14,200 (range: £8,400 to £21,100).”
- The probability this is a real improvement (e.g., “94% chance of real improvement”), so you know how confident you should be
The Expected Improvement chart
When a winner has been declared, liftstack displays an Expected Improvement chart (a density plot) that visualises the difference between the winning variant and the control.
- The horizontal axis shows the improvement in percentage points (e.g., +0.5 means the winner’s conversion rate is 0.5 percentage points higher than the control)
- A vertical dashed line marks zero (“no difference”)
- The area to the right of zero (shaded green) represents scenarios where the winner truly is better
- The area to the left of zero (shaded amber) represents scenarios where the winner is actually worse (unlikely, but possible)
- Below the curve, a dot and line show the estimated improvement and its 95% credible interval
If almost all of the curve is to the right of zero, you can be very confident the winner is genuinely better. The annotation below the chart (e.g., “92.4% chance of real improvement”) tells you exactly how much of the curve is on the positive side.
The winner’s curse
When you test many variants and declare the best-performing one the “winner”, its observed performance tends to be slightly inflated by luck. The variant that happened to get favourable randomness in this particular test looks better than it truly is.
liftstack mitigates this automatically through the Bayesian model (which shrinks extreme estimates toward realistic values) and by always reporting credible intervals alongside point estimates. You should interpret the range, not just the headline number.
Cross-slot uplift
When summing uplift across multiple slots in the same campaign, liftstack widens the confidence intervals to maintain accuracy. This accounts for the additional uncertainty introduced by combining independent estimates.