The comparison chart

The comparison chart (also called a raincloud plot) visualises the posterior distributions for each variant. It provides a rich visual comparison of all variants’ estimated true conversion rates.

Three visual layers

Each variant gets a horizontal row with three visual layers:

The cloud (top half). A smooth density curve showing the range of likely conversion rates for this variant. Where the curve is tall, that conversion rate is more likely; where it is low, less likely. A tight, narrow cloud means you can be fairly sure of the rate. A wide, spread-out cloud means there is more uncertainty.
The line and dot (middle). A horizontal line showing the 95% credible interval (the range where the true rate almost certainly falls), with a dot at the estimated conversion rate. The dot is your best estimate; the line is the uncertainty.
The rain (bottom half). A scatter of small dots below the line, each representing one possible conversion rate drawn from the statistical model. Where dots cluster densely, that is where the rate is most likely to be.

Colour coding

The leading variant is coloured based on confidence: green (very high), emerald (high), amber (moderate), or grey (low)
All other variants are grey
If the verdict is EQUIVALENT, all variants are grey (no leader)
If a guardrail was violated, all variants are red

What to look for

If the leading variant’s cloud is clearly separated from the others (no overlap), it is a strong winner
If the clouds overlap substantially, it is hard to tell which is best and you may need more data
A narrow cloud means more certainty; a wide cloud means the sample size is still small

The P(best) bar chart

Alongside the comparison chart, liftstack displays a horizontal bar chart showing each variant’s probability of being the best performer.

Each bar extends from 0% to the variant’s probability of being best
A vertical dashed line marks the decision threshold (default: 90%). A variant needs to cross this line to be declared a winner.
The percentages always add up to 100% across all variants, since exactly one variant is the best; liftstack just does not know which one for certain

If one bar dominates (e.g., 93%) and crosses the threshold line, you have a clear winner. If bars are close (e.g., 45% vs 35% vs 20%), no variant has yet proven itself and more data is needed.