Skip to main content
liftstack
Sign In Start Free Trial

Frequently Asked Questions

Everything you need to know about testing your content with Liftstack, explained in plain language.

Getting Started

What is Liftstack?
Liftstack is a multi-channel A/B testing platform for CRM marketers. It lets you test different versions of content against each other across email, push notifications, SMS, in-app messages, and in-app surfaces (content cards), then uses statistical analysis to tell you which version actually performs best and by how much. It works with Klaviyo, Customer.io, Iterable, and Braze.
What can I test?

You can test any piece of content that you'd swap between recipients across any supported channel. In Liftstack, these are called "snippets." Each snippet belongs to a specific channel (email, push, SMS, in-app message, or in-app surface). The most powerful feature is the ability to test custom HTML code blocks in email and in-app channels, which lets you experiment with virtually any element:

  • Custom HTML blocks. This is where Liftstack really shines. You can test entire sections of email markup: different layouts, content structures, visual treatments, or any HTML that your ESP supports. Examples include:
    • Product recommendation grids (2-column vs 3-column, image sizes, product ordering)
    • Social proof sections (star ratings vs review quotes vs "X people bought this")
    • Header/navigation bar layouts (minimal vs full category links)
    • Footer designs (stacked vs inline, with or without social icons)
    • Countdown timer blocks vs static urgency text
    • Loyalty points callouts and reward tier displays
    • "Why buy from us" trust badge sections
    • Dynamic content cards (editorial style vs product-focused vs testimonial)
    • Shipping and returns policy callout blocks
    • Cross-sell and upsell module formats
  • Subject lines. "Don't miss out!" vs "Your exclusive offer inside"
  • Hero blocks. Different images or headline/subheadline combinations
  • CTAs. "Shop Now" vs "Browse the Collection" vs "Claim Your Discount"
  • Copy blocks. Different tone, length, or messaging strategy
  • Discount framing. "20% off" vs "Save £10" vs no discount

Because Liftstack works at the HTML snippet level, you are not limited to testing simple text swaps. Any section of your email that you can express as an HTML block can become a testable snippet with multiple variants.

What is a "variant"?
A variant is one version of a snippet. If you're testing three different subject lines, each subject line is a variant. You need at least two variants to run a test.
Can I create a variant with blank content?

It depends on the snippet type:

  • Subject lines: content is always required. ESPs reject blank subject lines, and a blank subject line would corrupt your test results.
  • Copy and HTML blocks: content is always required. A blank variant would produce inflated uplift numbers for competing variants (since no one can click or convert on empty content) and poison Thompson Sampling posteriors for future campaigns.
  • Image snippets: text content (alt text) is optional, but you must provide either an uploaded image or an image URL.

These rules are enforced when you create or edit a variant. If you want to test "no content" vs "some content" for a slot, use a minimal placeholder (e.g., a single space or a neutral message) as your control variant instead.

What is a "control" variant?

The control is the version you'd send if you weren't testing. It represents your current standard or "safe" option. Marking a variant as the control lets Liftstack measure uplift: how much better the winning variant performed compared to what you would have done anyway.

You don't have to designate a control, but it's highly recommended. Without one, Liftstack can still find a winner, but the uplift numbers will be less precise.

What is a "slot"?
A slot is a position in your campaign where a snippet is being tested. If you're testing both a subject line and a hero image in the same campaign, that's two slots. Each slot is analysed independently, so you'll get separate results for each.
How does Liftstack assign variants to recipients?

Before your campaign sends, Liftstack randomly assigns each recipient a variant for each slot. These assignments are written to your CRM profiles as a property called lf_assignments. Your email template then uses conditional logic to show each person the content they were assigned.

This is important: the assignment happens before anyone sees anything. This is what makes it a proper experiment, because we know who was shown what before we see the results.

Why Liftstack?

My ESP already has A/B testing built in. Why would I pay for this?

Native ESP testing and Liftstack solve different problems. Here's what each does:

What native ESP A/B testing does:

  • Splits your audience into two groups and sends each group a completely different email (or subject line)
  • Picks a winner based on opens or clicks over a short window (typically 1 to 4 hours)
  • Sends the winning version to the remaining audience

What Liftstack does differently:

  • Tests individual content blocks inside a single email, not whole emails against each other. You can test just the hero image, just the CTA, or just the product grid layout while keeping everything else identical. This isolates what actually drives the difference.
  • Runs multiple tests simultaneously in the same campaign. Test a subject line AND a hero block AND a CTA in one send, with independent results for each slot.
  • Uses Bayesian statistics that let you check results at any time without inflating error rates. No more guessing whether 4 hours was long enough.
  • Carries learning across campaigns. Smart Allocation uses historical performance to send more traffic to better-performing variants automatically.
  • Provides revenue attribution, not just click counting. Know which variant actually drives purchases, not just engagement.
  • Detects guardrail violations like unsubscribe spikes, bounce rate increases, and spam complaints that your ESP's A/B test won't flag.
  • Works across ESPs. If you use Klaviyo for lifecycle and Customer.io for transactional, your testing insights live in one place.

In short, native ESP testing tells you which of two emails got more opens in the first few hours. Liftstack tells you which specific content elements drive conversions and revenue over the full attribution window, protects your list health, and accumulates learning over time.

Can Liftstack do things my ESP cannot?

Yes. The core capability gap is in-template content testing. Native ESP tools treat the email as a single unit: you either send Email A or Email B. Liftstack injects conditional logic into your template so that different recipients see different content blocks within the same email. This is how you test a CTA without also changing the subject line, the layout, and the imagery at the same time.

Other things Liftstack does that native tools typically don't:

  • Multi-slot testing in a single send (subject line + hero + CTA, analysed independently)
  • Bayesian analysis with continuous monitoring (no fixed test duration needed)
  • Automatic bot filtering so inflated opens and security-scanner clicks don't corrupt your results
  • Revenue-per-exposure modelling that captures both conversion probability and order value
  • Cross-campaign learning via Thompson Sampling and content insights
  • Safety guardrails (unsubscribe, bounce, complaint) that block winners which damage list health or sender reputation

How Testing Works

How long does a test take?

It depends on your audience size and how different the variants are. As a rough guide:

  • Large audiences (50,000+) with meaningful content differences: often conclusive within a few days
  • Medium audiences (5,000 to 50,000): typically 3 to 7 days
  • Small audiences (under 5,000): may take multiple campaign sends

Liftstack will show you a progress estimate when your test is still collecting data.

Can I check results while the test is running?

Yes. The campaign report updates in real-time while your campaign is in tracking mode. You'll see live charts, preliminary numbers, and a confidence progression chart showing how close the test is to reaching a conclusion.

However, during the early data collection period, results will be labelled as preliminary. Liftstack enforces a minimum data threshold before declaring any verdict, which prevents premature conclusions from small, noisy samples.

What's the minimum audience size?

There's no hard minimum, but smaller audiences need larger differences between variants to reach a conclusion. As a planning guide:

Flow/automation campaigns (baseline 1-5%):

Baseline conversion rateMin. difference to detectAudience per variant
1%0.5 percentage points~6,300
2%1.0 percentage point~3,100
3%1.0 percentage point~4,700
5%2.0 percentage points~1,900

Broadcast/campaign sends (baseline 0.05-0.2%): conversion rates for one-off campaign sends are typically much lower than flows. At these rates, Liftstack automatically switches to relative effect sizes rather than fixed percentage point targets:

Baseline conversion rateRelative lift to detectAudience per variant
0.05%100% (doubling)~32,000
0.10%100% (doubling)~16,000
0.20%50%~12,800
0.50%50%~5,100

If your audience is too small to detect realistic differences, Liftstack will tell you the test needs more data rather than making a premature call.

When you set up a campaign, Liftstack automatically shows a sample size guidance card after your audience is synced. This tells you whether your audience is large enough for the number of variants you're testing, using your workspace's historical conversion rate (or a 3% default if you have no history). For low conversion rate campaigns (below 0.5%), the guidance automatically uses relative effect sizes and shows an additional warning with advice on alternative metrics or campaign types.

What is a "primary metric"?

The primary metric is the single measure you're optimising for. You choose it when setting up your campaign, and it cannot be changed once the campaign starts sending. This is deliberate: it prevents cherry-picking whichever metric happens to look best after the fact.

Your options are:

  • Conversion rate (default): what percentage of recipients took the desired action (purchase, sign-up, etc.)
  • Click rate: what percentage of recipients clicked a link in the email
  • Open rate: what percentage of recipients opened the email
  • Revenue per exposure: average revenue generated per recipient

All other metrics are still tracked and shown in your report as secondary/diagnostic metrics, but only the primary metric determines the winner.

Why can't I change the primary metric after sending?
This is a critical safeguard called pre-registration. If you could change the metric after seeing results, you might (even unconsciously) switch to whichever metric makes a particular variant look best. This would inflate your false positive rate, causing you to "find" winners that aren't real winners. Pre-registering the metric keeps the test honest.
What is the attribution window?

The attribution window is the time period after each recipient is assigned during which their engagement events (clicks, conversions, purchases) are credited to the test. The default is 7 days (168 hours).

Critically, the window is per-recipient, not per-campaign. Each person's 7-day clock starts from the moment they were assigned a variant. For a broadcast campaign where everyone is assigned at once, this is effectively the same as "7 days after send." For an automation or flow where new recipients enter over time, each person gets their own independent 7-day window starting from their entry date.

A click that happens 3 days after assignment counts. A purchase 10 days after assignment does not (by default). This prevents distant events, which are influenced by many other factors, from muddying your test results.

If a significant number of conversions are arriving after individual attribution windows close, Liftstack will flag this and suggest extending the window for future campaigns.

How does attribution work for automations and flows?

Automations and flows work the same way as broadcast campaigns, with one important difference: recipients enter the test over time rather than all at once.

Liftstack automatically detects whether a campaign is a broadcast (all messages sent within 48 hours) or an automation (messages sent over days or weeks). The per-recipient attribution window means you do not need to worry about when individual messages are sent. Each person's conversions are measured from their own assignment time, ensuring fair comparison regardless of entry date.

One practical consequence: automations take longer to produce results, because recipients are entering gradually rather than all at once. The campaign report updates as new data arrives, and verdicts may shift as the sample grows.

What is the Test Calculator?

The Test Calculator is a pre-test planning tool available under Analytics > Test Calculator. Enter your channel, the number of variants, and your audience size. Liftstack computes the minimum detectable effect, the required sample size per variant, and the estimated number of days to reach a conclusion.

Use it before launching a campaign to avoid underpowered tests that end up stuck at "Insufficient Data." The calculator uses the same Bayesian framework as the analysis engine, so its estimates are consistent with the verdicts you'll see in your campaign reports.

What happens if a recipient is in multiple active campaigns?

When a recipient is assigned to more than one active Liftstack campaign at the same time, conversion events are attributed to all active campaigns the recipient is assigned to. Each campaign gets full credit for the event.

This is the industry standard approach. It works because each campaign uses independent randomization: the treatment effect estimate within each campaign remains statistically valid regardless of what other campaigns are running concurrently.

  • Per-campaign reports are accurate. Each campaign's conversion rate, winner, and uplift reflect the true treatment effect.
  • Revenue may appear in multiple campaign reports. This is correct for per-campaign analysis.
  • Dashboard totals are deduplicated. The cumulative uplift figure on your home dashboard applies a deduplication factor to prevent inflation from shared recipients.

Integration & Setup

How does Liftstack connect to my ESP?

Liftstack connects via your ESP's API using credentials that you provide. The setup process is:

  1. Go to Integrations in Liftstack and select your platform (Klaviyo, Customer.io, Iterable, or Braze)
  2. Enter your credentials. What's required depends on the platform:
    • Klaviyo: a private API key
    • Customer.io: a Site ID, a Tracking API key, and an App API key
    • Iterable: a standard API key
    • Braze: a REST API key and your Braze instance (e.g. US-01, EU-01)
  3. Liftstack validates the connection and confirms access

Your credentials are encrypted at rest using Fernet symmetric encryption. Liftstack never stores them in plain text, and they are only decrypted when making API calls on your behalf.

No developer is required. If you can find your API credentials in your ESP's settings, you can complete setup in under five minutes.

What API permissions does Liftstack need?

Liftstack needs permission to:

  • Read segments/lists (to sync your audience)
  • Read and write profile properties (to write lf_assignments for variant targeting)
  • Create and update templates (to push the conditional template logic)
  • Read engagement events (clicks, opens, conversions) for attribution

For Klaviyo, this means a private API key with full read/write scope. For Customer.io, an App API key with tracking and API access. For Iterable, a standard API key. For Braze, a REST API key with segment, user export, user track, and template permissions. The exact permissions are documented in the integration setup flow.

Does writing assignments burn through my ESP's API limits?

Liftstack uses batch endpoints wherever available and includes built-in rate limiting that respects each platform's published limits. For a 500,000-person audience:

  • Klaviyo: uses bulk profile import endpoints; typically completes in 10 to 20 minutes
  • Customer.io: uses individual profile identify calls (Customer.io does not offer a bulk endpoint); typically completes in 15 to 30 minutes for large audiences
  • Iterable: uses bulk user update endpoints; typically completes in 10 to 20 minutes
  • Braze: uses /users/track with batches of 75 profiles per request; typically completes in 10 to 20 minutes

These API calls count toward your ESP's rate limits, but the built-in throttling means Liftstack won't spike your usage or trigger overage charges. If your ESP plan has very tight API limits, the writeback will simply take longer (it backs off automatically on 429 responses).

How long do I need to wait between assigning and sending?

The campaign wizard handles this in sequence: it syncs the audience, runs assignment, writes properties to profiles, and pushes the template. You'll see a progress indicator for each step. Once all steps show complete, you can send immediately. There is no additional waiting period.

For large audiences (100,000+), the profile writeback step is the longest part and can take 15 to 30 minutes. Plan accordingly, but you don't need to wait overnight or anything like that.

What happens if the API fails halfway through assigning?

Liftstack writes profile properties in batches with automatic retry. If a batch fails (network timeout, API error), the system retries with exponential backoff. If it hits a 429 (rate limit) response, it reads the Retry-After header and waits before continuing.

If some batches fail despite retries, the campaign still advances (the writeback is best-effort per batch). The progress indicator will report how many profiles succeeded and how many failed. You can re-trigger the writeback step from the campaign wizard, and since the operation is idempotent (writing the same property value twice is harmless), it will safely re-process all profiles from the beginning. It does not resume from a checkpoint.

Does Liftstack slow down my campaign sending?
No. Liftstack's work happens before you send. The variant assignments are written to CRM profiles as a property, and the conditional template is pushed to your ESP. When you actually hit send in your ESP, the email renders using the pre-written profile property. There is zero additional latency at send time.
Can I connect multiple ESPs to the same workspace?
Yes. Each plan tier allows a set number of platform connections (Starter: 1, Growth: 2, Scale: 3). You might connect Klaviyo for your lifecycle campaigns and Customer.io for transactional, and run tests on both from the same workspace with shared snippet libraries.

Understanding Your Results

What does "X% probability of being best" mean?

This is the single most important number in your report. It answers: "What is the probability that this variant truly has the highest conversion rate?"

For example, "93% probability of being best" means: given all the data we've collected, there's a 93% chance this variant genuinely outperforms all the others. There's a 7% chance one of the other variants is actually better and this one just got lucky in this particular test.

Where is the p-value?

Liftstack uses Bayesian statistics instead of the traditional frequentist approach you might be familiar with from other tools. This means you won't see p-values, and that's a good thing. Here's why:

P-values answer a confusing question: "If there were NO real difference between variants, what's the probability of seeing data this extreme?" That's hard to interpret and easy to misuse.

Probability of being best answers a direct question: "Given the data I have, what's the probability this variant is actually the best?" That's what you really want to know.

Think of it this way:

  • A p-value of 0.03 does NOT mean "there's a 97% chance variant A is better." (This is the most common misinterpretation of p-values.)
  • A "probability of being best" of 97% DOES mean "there's a 97% chance variant A is better." It's exactly what it says.
What about confidence intervals? I'm used to seeing those.

Liftstack shows credible intervals (displayed as "range" in the report), which look similar to confidence intervals but are easier to interpret:

  • A traditional 95% confidence interval means: "If we repeated this experiment many times, 95% of the resulting intervals would contain the true value." (Confusing, right?)
  • A 95% credible interval means: "There's a 95% probability the true value falls within this range." (Much more intuitive.)

You'll see these ranges throughout the report: for conversion rates, uplift estimates, and revenue figures. A narrow range means we're quite certain; a wide range means there's still meaningful uncertainty.

What does "expected loss" mean?

Expected loss answers: "If I pick this variant and it turns out not to be the best, how much conversion rate am I leaving on the table?"

For example, an expected loss of 0.05% means: if you go with this variant and it's not actually the winner, you'd lose about 0.05 percentage points of conversion rate on average. That's tiny, well within the "not worth worrying about" range.

Liftstack uses expected loss as part of its decision criteria. A variant isn't declared a winner just because it's probably best. It also needs to have a very low expected loss, ensuring that even in the unlikely scenario it's wrong, the cost is negligible.

What does "practical equivalence" mean?

Sometimes variants are so close in performance that the difference doesn't matter in practice. If variant A converts at 3.02% and variant B converts at 3.05%, that 0.03 percentage point difference is real but meaningless for your business.

Liftstack checks whether variants fall within a Region of Practical Equivalence (ROPE): a range around zero (default: 0.5 percentage points) where differences are too small to care about. For campaigns with low conversion rates, the ROPE width is automatically narrowed so it remains meaningful relative to the baseline. If all variants fall within this range with high probability, the verdict is EQUIVALENT, and you're told to pick whichever version you prefer. There's no statistical reason to favour one over another.

Reading the Campaign Report

What is the verdict card?

The verdict card is the hero element at the top of each slot's results. It gives you the bottom line in plain language. There are four possible verdicts:

  • Winner (green, trophy icon). A clear winner has been identified. The card shows which variant won, the conversion rates compared, the uplift (additional conversions and revenue), confidence level and probability of being best, and revenue range (best case to worst case).
  • Equivalent (grey, equals icon). All variants performed within a negligible range of each other. Pick whichever fits your brand best. There's no performance-based reason to choose one over another.
  • Insufficient Data (amber, hourglass icon). No conclusion yet. One variant is leading but not decisively. Shows which variant is currently leading, how likely it is that the leader is actually the best, and how many more exposures are estimated before a conclusion can be reached.
  • Guardrail Violation (red, warning icon). A variant triggered a safety guardrail, typically because it caused a meaningful increase in unsubscribe rates compared to the control. Even if it has a high probability of being best on the primary metric, it won't be declared a winner because it's damaging your audience.
What are the confidence levels?
Probability of Being BestConfidence LevelWhat It Means
95% or higherVery HighExtremely likely this is the true best variant. Declare a winner.
85% to 95%HighVery probably the best, but a small chance you're wrong. Consider collecting more data if the stakes are high.
70% to 85%ModerateLeading, but there's meaningful uncertainty. Likely needs more data.
Below 70%LowToo early to tell. Keep testing.
What is the uplift callout?

The uplift callout is the key value statement of your test. It answers: "How much more did I get by using the winning variant instead of the control?"

It shows two numbers:

  • Additional conversions: how many extra people converted because of the winning content
  • Additional revenue: the estimated revenue those extra conversions generated

These numbers come with a range (e.g., "+£8,200 to +£16,800") so you know the realistic best and worst case. The number also includes the probability that this is a real improvement (not just noise).

What is the metrics table?

Below each slot's charts, there's an expandable metrics table showing the raw numbers for every variant. This includes exposures, opens, open rate, clicks, CTR, conversions, conversion rate, unsubscribes, bounces, complaints, revenue, and revenue per exposure.

This table is collapsed by default because the verdict card, charts, and uplift callout already tell you everything you need to make a decision.

Understanding the Charts

What is the Variant Comparison Chart (Raincloud Plot)?

A visual comparison of all variants' estimated true conversion rates, shown in the campaign report below the verdict card. Each variant gets a horizontal row with three visual layers:

  • The cloud (top half). A smooth density curve showing the range of likely conversion rates. Where the curve is tall, that rate is more likely. A tight, narrow cloud means more certainty.
  • The line and dot (middle). A horizontal line showing the 95% credible interval, with a dot at the estimated conversion rate.
  • The rain (bottom half). A scatter of small dots representing possible conversion rates drawn from the statistical model.

If the leading variant's cloud is clearly separated from the others (no overlap), it's a strong winner. If clouds overlap substantially, you may need more data.

What is the Chance of Winning chart?

A horizontal bar chart showing each variant's probability of being the best performer. A vertical dashed line marks the decision threshold (default: 90%). A variant needs to cross this line to be declared a winner.

The percentages always add up to 100% across all variants. If one bar dominates and crosses the threshold, you have a clear winner. If bars are close, more data is needed.

What is the Expected Improvement chart?

A density plot of the difference between the winning variant and the control, shown only when a winner has been declared. The area to the right of zero (shaded green) represents scenarios where the winner truly is better. The area to the left (shaded amber) represents scenarios where it's actually worse (unlikely, but possible).

The annotation below the chart (e.g., "92.4% chance of real improvement") tells you exactly how much of the curve is on the positive side.

What is the Confidence Progression chart?

A line chart tracking how the leading variant's probability of being best has evolved over time since the campaign was sent. A horizontal dashed line marks the decision threshold (default: 90%).

Watch for the leading variant's line climbing toward the threshold. A line that's climbing steadily suggests the test is heading toward a conclusion. A line that's flat or bouncing suggests the variants are very close. During live tracking, this chart auto-refreshes every 60 seconds.

What is the Cumulative Revenue Uplift chart?

Shown on the analytics dashboard, this is a running total of the additional revenue generated by all your winning variants across all campaigns over time. A shaded band around the line shows the confidence range.

This line should only go up (each new winner adds to the total). This is the single best chart for demonstrating ROI from your testing programme.

What is the Conversion Rate Sparkline?
Found on the snippet performance page, this small line chart shows how a specific variant's conversion rate has changed across every campaign it's appeared in. A flat line means consistent performance. An upward trend might indicate a primacy effect. A downward trend might indicate a novelty effect.

Verdicts & Decisions

How does Liftstack decide on a winner?

A variant is declared the winner when both of these conditions are met:

  1. Probability of being best is at least 90% (configurable). We're highly confident this variant truly has the highest conversion rate.
  2. Expected loss is at most 0.1% (configurable, automatically scaled down for low conversion rate campaigns). Even if we're wrong, the cost of choosing this variant over the true best is negligible.

Both conditions must hold simultaneously. A variant with 92% probability of being best but an expected loss of 0.3% won't be declared a winner yet because the potential downside is still too large.

How does Liftstack decide variants are equivalent?

Variants are declared equivalent when Liftstack is highly confident (90%+ probability) that the difference between all variants falls within the ROPE width (default: 0.5 percentage points, configurable, automatically narrowed for low conversion rate campaigns). At that point, the differences are real but too small to matter for your business.

When there are many variants (4+), Liftstack can detect partial equivalence. For example: "Variant A is the clear winner. Among the remaining variants, B, C, and D are practically equivalent to each other." This helps you understand the full picture, showing not just who won but which of the remaining variants are interchangeable.

What is a guardrail violation?

Guardrail metrics are safety checks that protect your audience. Even if a variant has a great conversion rate, it won't be declared a winner if it's damaging other important metrics. The specific guardrails depend on the channel:

Email guardrails:

  • Unsubscribe rate. If the variant causes unsubscribes to increase by more than 0.1 percentage points vs the control.
  • Spam complaint rate. If complaints increase by more than 0.05 percentage points vs the control.
  • Bounce rate. If bounces increase by more than 0.5 percentage points vs the control.

Push and SMS guardrails:

  • Opt-out rate. If the variant causes opt-outs to increase beyond the threshold vs the control.

In-app guardrails:

  • Dismiss rate. If the variant causes dismissals to increase beyond the threshold vs the control.

When multiple guardrails are checked simultaneously (e.g., all three email guardrails), Liftstack applies a Bonferroni correction to control the overall false alarm rate. A variant that drives clicks but damages your audience is destroying long-term value. The guardrail catches this and warns you.

What does "insufficient data" mean?

This means no conclusion can be reached yet. One variant is probably leading, but there isn't enough data to be confident. Common reasons:

  • The audience is small
  • The variants perform very similarly (requiring more data to distinguish them)
  • The campaign is still early in its tracking period
  • Not enough conversions have been recorded yet (each variant needs at least 3 conversions before a verdict can be computed)

The report will show an estimate of how many more recipients need to be exposed before a conclusion can be reached.

Can I override the verdict?

The verdict is the system's statistical recommendation. You're free to take a different action, such as continuing to test a variant even after it's been declared equivalent, or choosing a variant other than the winner based on brand considerations.

What you can't do is change the primary metric after seeing results, or retroactively adjust the analysis to favour a particular outcome. These safeguards keep the testing process honest.

Metrics & What They Mean

What are the primary metrics?
MetricWhat It MeasuresBest For
Conversion ratePercentage of recipients who completed the desired actionMost campaigns (the default)
Click ratePercentage of recipients who clicked any linkQuick-signal tests, smaller audiences
Open ratePercentage of recipients who opened the emailSubject line and preview text testing
Revenue per exposureAverage revenue generated per recipientWhen variants might influence order size
What are secondary/diagnostic metrics?
All metrics not selected as primary become diagnostics. They're shown in the metrics table for context. For example, you might optimise for conversion rate but still want to see the click rate and revenue per variant. Diagnostic metrics are never used to determine the winner.
Why is open rate marked with a warning?

Open tracking is unreliable because of Apple Mail Privacy Protection (MPP) and email client pre-fetching. These technologies automatically trigger "opens" for every email, whether or not the recipient actually looked at it.

The good news: this noise affects all variants equally (since recipients are randomly assigned), so relative comparisons remain valid. If Variant A has a higher open rate than Variant B, that ranking is trustworthy. The bad news: absolute open rates are inflated, and the true difference between variants appears smaller than it really is. This means tests using open rate as the primary metric need more data to reach a conclusion.

What is "revenue per exposure"?

Revenue per exposure (RPE) measures the average revenue each recipient generates. It captures two effects:

  1. Conversion probability. Does this variant make people more likely to buy?
  2. Order value. When people do buy, do they spend more?

A variant could win on RPE even if it doesn't have the highest conversion rate, because it might encourage larger orders. Liftstack uses a specialised compound model for RPE that analyses these two components separately and then combines them.

What are the safety guardrails?

Even if a variant drives conversions, it might be doing so in a way that damages your audience health. Liftstack monitors guardrail metrics automatically, with channel-specific checks:

Email: unsubscribe rate (threshold: 0.1pp), spam complaint rate (0.05pp), bounce rate (0.5pp)

Push/SMS: opt-out rate

In-app: dismiss rate

Each guardrail checks whether the winning variant's rate exceeds the control's rate by more than the threshold. When multiple guardrails are checked for the same channel (e.g., all three email guardrails), a Bonferroni correction raises the per-test probability threshold so that the combined false alarm rate stays at 10%.

When any guardrail fires, Liftstack shows a red warning and prevents the variant from being declared a winner. This protects you from inadvertently adopting content that's eroding your subscriber base, sender reputation, or app engagement.

Smart Allocation (Thompson Sampling)

What is "Smart Allocation"?
When you've tested the same snippet variants across multiple campaigns, Liftstack can use historical performance data to send more traffic to the variants that have been performing well, while still sending some traffic to underperforming variants to make sure we aren't missing something. This is called Thompson Sampling.
How is it different from an equal split?

With a standard A/B test (equal split), each variant gets the same number of recipients, say 33% each for three variants. This is fair but wasteful: you're sending just as much traffic to a clearly underperforming variant as to the front-runner.

With Smart Allocation, Liftstack might split traffic 60/25/15 based on past performance. The likely winner gets more traffic (fewer wasted exposures), while alternatives still get enough to confirm whether they've improved or the leader has slipped.

Does this bias the test?
No. The system still tracks performance for every variant and runs the full statistical analysis. The unequal allocation actually makes the test more efficient. You reach conclusions faster because more recipients are exposed to the likely best variant, so uplift is captured sooner.
Can I override the smart allocation?
Yes. When Liftstack recommends an allocation, you'll see a transparency panel showing the recommended traffic split and why. You have three options: Accept, Adjust Manually (drag sliders), or Use Equal Split.
What is the "Smart Allocation Uplift"?
When a campaign uses Thompson Sampling, the report shows the additional conversions captured by the smart allocation compared to what an equal split would have produced. This isolates the value of the allocation strategy from the value of testing itself.
How does the system handle a brand-new variant with no history?
New variants (those that have never appeared in a completed campaign) receive a guaranteed minimum of 20% of traffic on their first campaign, regardless of what Thompson Sampling would recommend. This prevents established variants from starving newcomers of exposure.
Does historical data expire?
Yes. Liftstack applies a recency decay to historical data: performance from campaigns 60 days ago counts half as much as recent campaigns, and very old data fades away almost entirely. This ensures the allocation reflects current audience preferences, not stale data.

Operational Workflow

Can I fix a typo in a variant after the test starts?

It depends on how far the campaign has progressed:

  • Before sending (DRAFT through TEMPLATE_PUSHED): Yes. You can edit variant content in the snippet editor at any time before you confirm the send. If the template has already been pushed, Liftstack will re-push it with the updated content.
  • After sending (SENT, TRACKING, COMPLETED): No. Once the campaign is sent, the content that recipients saw is fixed. Editing the variant in Liftstack would update it for future campaigns, but it won't change what was already delivered.

If you spot a serious error after sending (like a broken link), the right approach is to fix it in your ESP's template directly. The Liftstack test results for that variant will be affected, and the report will reflect that.

Can I add a variant to a test that is already running?

No. Adding a variant mid-test would mean that variant has a different exposure period and audience size, which makes statistical comparison invalid. If you want to test an additional variant, create a new campaign with all the variants you want to compare (including the new one).

This is a deliberate constraint. Mixed-exposure tests produce unreliable results, and Liftstack prioritises correct conclusions over flexibility.

Can I stop or pause a single variant without killing the whole campaign?
Not currently. The campaign operates as a single unit: it's either tracking or completed. If a variant has a serious problem (offensive content, broken rendering), your best option is to fix the issue in the ESP template directly so recipients no longer see the problematic content. The statistical results for that variant will be affected, but the test continues for the remaining variants.
Can I duplicate a campaign setup?
Not yet, but this is a planned feature. For now, when setting up a new campaign you can select the same snippets and variants from your library, which preserves most of the configuration. If you're using Smart Allocation, historical performance from previous campaigns carries over automatically, so the system remembers what worked.
What happens if I delete a snippet that's active in a campaign?
You can't. Snippets that are referenced by campaign slots are protected at the database level. If you attempt to delete one, the operation will fail. You would need to remove the snippet from all campaign slots first. This prevents accidentally orphaning a running test.
Can I re-run the same test on a different audience?
Yes. Create a new campaign, select the same snippets and variants, and point it at a different segment. Liftstack treats each campaign as an independent experiment with fresh assignments. If Smart Allocation is enabled, the new campaign will benefit from the performance data gathered in the original test.
What is cohort tracking?

After a campaign completes, you can activate Monitoring mode (30, 60, or 90 days). While monitoring is active, Liftstack continues tracking conversions and revenue for the campaign's assigned recipients to measure long-term variant impact. Results are reported at 7, 14, 30, 60, and 90 day intervals post-assignment, giving you a clear picture of whether the winning variant's advantage holds, grows, or fades over time.

This is most valuable for high-AOV brands where repeat purchase behaviour matters, subscription businesses, and loyalty campaigns where long-term engagement is the real goal.

How do I enable monitoring?
From the campaign report of a completed campaign, click "Enable Monitoring" and select a duration (30, 60, or 90 days). The campaign status changes to Monitoring. Liftstack will continue collecting conversion and revenue data for the duration you selected. When the monitoring window expires, the campaign automatically returns to Completed status with the extended results available on the report.

Segmentation & Audience

Does Liftstack work with my existing ESP segments?
Yes. When you set up a campaign, you select a segment (or list) from your ESP. Liftstack syncs the audience from that segment via the API. Whatever targeting, filtering, or segmentation logic you've built in your ESP applies as normal. Liftstack doesn't bypass or override your segmentation; it tests content within the audience you've already defined.
Can I see results broken down by segment?

The standard campaign report shows results for the full audience. For deeper breakdowns, see Segment Analysis below.

You can also achieve segment-level insights in two additional ways:

  1. Run separate campaigns per segment. Send the same snippets to your VIP segment and your non-VIP segment as separate campaigns. Each gets its own independent analysis, and you can compare winners across the two.
  2. Stratified Thompson Sampling (Scale plan). When using the stratified assignment strategy, Liftstack maintains separate performance estimates per segment. While the report still shows aggregate results, the allocation engine uses per-segment data, which means variants that work better for specific segments get more traffic within those segments.
What is Segment Analysis?

Available on campaign reports for Growth plan and above, Segment Analysis breaks down variant performance by audience profile properties such as city, country, or region. Liftstack automatically detects properties that have between 2 and 10 distinct values and at least 50 profiles per group, then shows per-segment conversion rates for each variant.

Segment analysis is observational, not causal. Differences between segments may reflect audience composition rather than variant effectiveness. Use it to generate hypotheses for future targeted tests, not to draw firm conclusions.

Is there a way to have a global holdout group?

Yes. When creating a campaign, you can set a holdout percentage (up to 20% of the audience). Holdout recipients are randomly selected and do not receive any HTML snippet content for that campaign. The template conditional for those slots falls through and renders nothing, so the email arrives without the tested HTML blocks.

This is an advanced feature for HTML content snippets only. Subject lines and copy slots are unaffected by holdout (you cannot send a blank subject line or empty button text). The holdout group answers: "Does having this HTML content in the email at all improve outcomes vs not having it?"

Key details:

  • The holdout percentage is set during campaign creation and cannot be changed after assignments are made.
  • Your campaign must have at least one HTML content type slot for holdout to take effect. If all slots are subject lines or copy, the holdout setting is silently ignored.
  • No control variant is required to use holdout.

How it appears in the report: After the campaign completes, the report includes a holdout comparison card for each HTML slot. This shows the holdout group's conversion rate (no snippet content) alongside the optimised group's rate, with the percentage improvement. This tells you the total value of having that HTML content in the email.

Can I run a test targeting only mobile users or only desktop users?
Not directly within Liftstack. However, you can achieve this by creating a segment in your ESP that filters by device type (most ESPs support this), and then running your Liftstack campaign against that segment. The test results will then reflect only that device audience.
Can I see if Variant A won for one demographic but Variant B won for another?

Not as a built-in report split. Liftstack analyses each campaign as a single audience. If you suspect a variant performs differently across demographics, the recommended approach is to run separate campaigns against demographic-specific segments. This gives you statistically rigorous per-segment results, rather than post-hoc slicing which is prone to false positives.

The Content Insights feature (Growth and Scale plans) does detect patterns across campaigns, which can surface observations like "urgency messaging tends to outperform for your promotional segments." These are observational hints, not segment-level A/B test results, but they can guide your testing strategy.

Dashboard & Insights

What do the dashboard stat cards show?

The four cards at the top of the dashboard give you a monthly snapshot:

CardWhat It Shows
Campaigns This MonthHow many campaigns you've sent with Liftstack
Snippets TestedHow many unique content variants were tested
Clear WinnersPercentage of tested slots where a clear winner was found
Est. Revenue UpliftTotal estimated additional revenue from choosing winning variants
What are Content Insights?

Content Insights are patterns the system detects across your historical campaigns. For example: "Urgency tone tends to outperform your average by approximately 1.2%." These are surfaced with confidence levels:

  • High confidence. Pattern supported by substantial data (10,000+ exposures across many campaigns).
  • Moderate confidence. Suggestive pattern worth investigating, but based on less data.

Important: Insights are observational, not causal. A pattern like "urgency outperforms" is a correlation. It could be influenced by the specific copy, audience, timing, or other factors that happened to accompany that tone. The insight is a hypothesis to test deliberately, not a guaranteed rule.

Every insight includes hedging language to remind you of this, and a disclaimer at the bottom reads: "These insights are based on historical patterns and may be influenced by factors beyond the content attribute itself. Use them as hypotheses to test, not as rules to follow."

Why don't I see any insights?

Insights require a meaningful history to detect patterns. They won't appear until:

  • You've completed at least 5 campaigns with the same snippet attributes
  • At least 3 variants share the attribute being analysed
  • The pattern passes a statistical threshold (adjusted for the number of attributes being tested simultaneously)

Snippet Performance

What is the Snippet Performance page?
This page aggregates how each variant has performed across all the campaigns it's appeared in. Instead of looking at one campaign at a time, you can see the big picture: which variants consistently win, which are reliable, and which are inconsistent.
What do the performance verdicts mean?
VerdictCriteriaWhat It Means
Strong performerWon 60%+ of campaigns, across 4+Reliably outperforms. Consider making it your default.
ConsistentWon 40%+ with low variabilityReliable middle-of-the-road performer
VariableHigh variability across campaignsSensitive to audience or timing. Unpredictable.
Needs more dataFewer than 3 campaignsToo early to judge. Keep testing.
What does the sparkline show?
The sparkline chart on each variant's detail page shows its conversion rate across every campaign. A flat line is good (consistent performer). A downward trend suggests novelty effects wore off. An upward trend suggests the audience is warming to it.
What is a temporal trend warning?

If a variant's performance is clearly trending up or down across campaigns (tested in 3 or more campaigns), Liftstack surfaces a warning. This helps you catch two temporal biases:

  • Novelty effects: a new content style gets a temporary engagement boost simply because it is different from what recipients are used to. The boost decays as the novelty wears off, meaning the current estimate may overstate long-term performance.
  • Primacy effects: recipients are habituated to the existing style and initially resist the change. The new variant underperforms at first but improves over time, meaning the current estimate may understate long-term performance.

Liftstack detects these by computing the Spearman rank correlation between campaign send order and conversion rate. When a strong monotonic trend is found, the warning tells you the direction and includes both the most recent conversion rate and the average rate across all campaigns so you can judge the likely long-term performance yourself.

Data Quality & Warnings

What is a Sample Ratio Mismatch (SRM)?

An SRM means the actual traffic split between variants doesn't match what was intended. For example, you set up a 50/50 split but actually got 53/47. This is a serious issue because it suggests something went wrong in the delivery pipeline. If the problem correlates with the variants, all the statistical results become untrustworthy.

Common causes: partial failures when writing assignments to your CRM, recipients unsubscribing between assignment and send, template rendering errors for one variant, or platform-side content filtering.

When SRM is detected, Liftstack blocks the verdict and shows a red warning explaining the mismatch. You should investigate the root cause before trusting any results.

What are data quality checks?

Before running any analysis, Liftstack automatically checks:

  • Assignment completeness. Were all audience members actually assigned a variant?
  • Sample ratio mismatch. Does the actual split match the intended split?
  • Zero-event variants. Does any variant have zero engagement events despite having recipients? (May indicate a tracking issue.)
  • Minimum data threshold. Has each variant accumulated enough data for meaningful analysis?

Issues are flagged directly on the campaign report with severity levels (critical warnings block analysis; minor warnings are informational).

What about bot traffic?

Email engagement metrics are polluted by bots. Liftstack automatically filters these out during event ingestion by detecting:

  • Known bot user agents (Googlebot, link scanners, headless browsers, etc.)
  • Known email security scanners (Barracuda, Proofpoint, Mimecast, etc.)
  • Impossibly fast clicks (within 1 second of delivery)

The campaign report shows what percentage of traffic was classified as bot activity and excluded. Typical campaigns see 5 to 15% bot traffic.

What does "interaction detected" mean?

When your campaign tests multiple slots (e.g., subject line AND hero image), Liftstack checks whether the combination matters. An interaction means: Variant A in the subject line slot performs differently when paired with Variant X vs Variant Y in the hero slot.

Interactions are flagged with cautious language: "We detected a possible interaction... This may warrant investigation but could also be coincidental." The per-slot results remain valid. The interaction is additional context, not a change to the verdict.

Multi-Channel Testing

What channels does Liftstack support?
Liftstack supports five messaging channels: email, push notifications, SMS, in-app messages, and in-app surfaces (content cards). Each snippet and campaign is scoped to a single channel.
How do I create a push notification or SMS test?
The workflow is the same as email. When creating a snippet, select the channel (e.g., "Push Notification"). The available placements and content types adjust automatically. For push, you'll see placements like "title", "body", "image", and "deep_link". For SMS, you'll see "body". Then create a campaign, select the same channel, and the slot form will only show snippets matching that channel.
Are there character limits for push and SMS?
Liftstack shows soft character warnings: push titles at 65 characters, push body at 240 characters, and SMS body at 160 characters. These are advisory. Liftstack does not truncate your content, but the receiving platform or device may.
How does attribution work for push and in-app channels?

Attribution varies by channel:

  • Email and SMS use URL-based attribution for clicks: Liftstack embeds a tracking parameter (lf_cid) in links and matches click events back to the specific variant and slot. Other metrics (opens, conversions, revenue) use profile-based matching.
  • Push and in-app channels use profile-based attribution entirely: since Liftstack writes variant assignments to user profiles, any engagement event from that user is matched to their assigned variant via their profile ID.

In all cases, the attribution window is per-recipient (measured from each person's assignment time, not from a single campaign-wide timestamp). The default window is 7 days.

What are campaign groups?
Campaign groups let you organise related campaigns across channels for cross-channel reporting. For example, you might group an email campaign and a push campaign that both test the same product launch messaging. The group detail page shows a per-channel comparison table.
What is shared assignment?
When "shared assignment" is enabled on a campaign group, all campaigns in the group assign the same variant label to each user. If a user is assigned "Variant B" in the email campaign, they also see "Variant B" content in the push campaign. This enables true cross-channel experimentation where you can measure the combined effect of consistent messaging.
Can I mix channels within a single campaign?
No. Each campaign tests one channel. Mixing channels within a campaign would create incomparable metrics (email open rates and push tap rates have different base rates). Use campaign groups for cross-channel coordination instead.

Common Questions About the Statistics

Is Bayesian analysis as rigorous as traditional statistics?

Yes, and arguably more so for this use case. The Bayesian approach used in Liftstack:

  • Produces the same quality of conclusions as frequentist methods (p-values, confidence intervals)
  • Provides answers that are easier to interpret correctly ("93% probability this is the best" vs "p < 0.05")
  • Handles continuous monitoring naturally, so you can check results at any time without inflating error rates
  • Does not require pre-determined sample sizes. It reports the current state of evidence regardless of how much data has arrived.
  • Includes built-in protection against the winner's curse (extreme results are naturally pulled toward realistic values)
Why 50,000 Monte Carlo samples?
Behind the scenes, Liftstack uses a simulation technique called Monte Carlo sampling: it draws 50,000 random scenarios from the statistical model to estimate probabilities. This is more than sufficient for stable, reproducible results. Increasing beyond 50,000 wouldn't meaningfully change any number you see in the report.
What is the "prior" and does it affect my results?

In Bayesian statistics, the prior represents your starting assumption before seeing any data. Liftstack defaults to an uninformative prior, meaning it starts with no assumptions about what the conversion rate should be. This is conservative and lets the data speak for itself.

After you've completed 5+ campaigns, Liftstack can automatically switch to an adaptive prior that encodes your workspace's typical conversion rate range (e.g., "our campaigns usually convert between 1% and 4%"). This helps small tests converge faster without biasing toward any particular variant, because it applies the same prior to all variants equally.

You can also manually set the prior if you have specific domain knowledge, but most users never need to touch this.

Won't the prior bias my results?

No, for two important reasons:

  1. The same prior is applied to every variant in the test. It shifts all estimates equally and doesn't favour one variant over another.
  2. The prior's influence shrinks rapidly as data arrives. After a few hundred exposures per variant, the data overwhelms the prior entirely.

The prior mainly matters in the early stages of a test (under 300 exposures per variant), where it prevents extreme estimates from tiny samples.

What is ROPE and why does it matter?

ROPE (Region of Practical Equivalence) is how Liftstack determines whether a difference is too small to care about. The default ROPE width is 0.5 percentage points, meaning if two variants are within half a percentage point of each other, they're treated as functionally equivalent. For low conversion rate campaigns (below ~2%), the ROPE width is automatically scaled down relative to the observed rate so that it remains a meaningful comparison threshold.

This prevents the system from declaring a "winner" that only beats the control by 0.02 percentage points. Technically better, but practically meaningless.

How does Liftstack handle multiple comparisons?

When you test many variants across many slots, the chance of finding a false positive increases. Liftstack handles this differently for each metric tier:

  • Primary metric. The Bayesian framework already accounts for all variants simultaneously. Probability of being best is computed jointly, so no additional correction is needed within a slot.
  • Guardrail metrics. Bonferroni correction is applied across guardrails within each channel. For email (3 guardrails: unsubscribe, complaint, bounce), the per-test probability threshold is raised from 90% to ~96.7% so that the combined false alarm rate stays at 10%. For channels with a single guardrail (push/SMS: opt-out, in-app: dismiss), no correction is needed.
  • Diagnostic metrics. No correction. They're explicitly labelled as exploratory context, not decision-drivers.
  • Cross-slot uplift. When summing uplift across multiple slots, Bonferroni-adjusted confidence intervals are computed. Each slot's CI uses a per-slot confidence level so the combined interval maintains 95% coverage.
Is the uplift number real? Can I trust it?

The uplift estimate ("+X additional conversions, +£Y additional revenue") is the system's best estimate based on the data, with several safeguards against overestimation:

  1. It uses the posterior mean (not the raw observed difference), which naturally shrinks extreme estimates toward realistic values
  2. It always includes a credible interval (range) so you can see the best and worst case
  3. It reports the probability this is a real improvement (e.g., "94% chance of real improvement")

That said, all estimates have uncertainty. The true uplift could be at the high end of the range, the low end, or anywhere in between. The headline number is the most likely value, and the range gives you the realistic spread.

What is the winner's curse?

When you test many variants and declare the best-performing one the "winner", its observed performance tends to be slightly inflated by luck. The variant that happened to get favourable randomness in this particular test looks better than it truly is.

Liftstack mitigates this automatically through the Bayesian model (which shrinks extreme estimates) and by always reporting credible intervals alongside point estimates. You should interpret the range, not just the headline number.

Commercial, Privacy & Administration

How is Liftstack priced?

Liftstack offers three paid tiers, billed monthly or annually (with ~17% discount for annual billing). Billing is per-organization, with pooled limits across all workspaces:

Starter (£249/mo)Growth (£549/mo)Scale (£999/mo)
Workspaces1520
Audience profiles (pooled)250,0002,000,00010,000,000
Campaigns per month (pooled)1560Unlimited
Slots per campaign24Unlimited
Variants per slot355
Connections per workspace113
Team members (org-level)315Unlimited
SSO/SAMLYesYesYes
Smart AllocationNoYesYes
Revenue modellingNoYesYes
Content InsightsNoYesYes
Stratified TSNoNoYes
Interaction detectionNoNoYes
Adaptive priorsNoNoYes

Add-on profile packs are available: +250K (£79/mo), +500K (£149/mo), +1M (£249/mo). Extra workspaces can be purchased on Growth (£79/mo each) and Scale (£59/mo each).

There is also a 14-day free trial with Growth-tier features, 1 workspace, and 2 campaigns, so you can run a real test before committing.

Can I invite my agency or team members to my organization?

Yes. Every plan includes team member seats at the organization level. You invite team members by email, and they get their own login with access to all workspaces in the organization. Liftstack supports three roles:

  • Owner: full access, including billing and organization settings
  • Admin: full access to campaigns, snippets, integrations, and workspace settings
  • Member: can create and manage campaigns and snippets; cannot modify integrations or organization settings

If you need limited access for stakeholders, the Member role is the closest fit. Members can view all reports and dashboards and manage campaigns and snippets, but cannot modify integration credentials or organization settings.

Is Liftstack GDPR compliant?

Liftstack is designed with data minimisation in mind:

  • What Liftstack stores: Platform profile IDs (the identifier your ESP uses), email addresses (for audience sync), and engagement events (clicks, opens, conversions) with their metadata. These are necessary to run the test and attribute results.
  • What Liftstack does NOT store: Payment information (handled entirely by Stripe), email content rendered to recipients (that stays in your ESP), or any personal data beyond what's listed above.
  • Encryption at rest: API credentials, email addresses, audience profile properties, and event payloads are all encrypted at rest using Fernet symmetric encryption with per-workspace derived keys. All data in transit uses TLS.
  • Data location: Liftstack runs on infrastructure hosted in the EU/US (depending on your account region). Contact support for specifics about data residency.
  • Data processing: Liftstack acts as a data processor on your behalf. You remain the data controller for your subscriber data.

If your organisation requires a Data Processing Agreement (DPA), contact support and we will provide one.

Does Liftstack store Personally Identifiable Information (PII)?

Liftstack stores the minimum PII necessary to run tests: platform profile IDs or email addresses from your audience sync. These are used to match assignments to engagement events for attribution. No additional personal data (names, addresses, payment details) is collected or stored.

Email addresses and audience profile properties are encrypted at rest using per-workspace Fernet keys. Engagement event payloads are also encrypted. Platform profile IDs (the opaque identifiers your ESP assigns to each contact) are stored unencrypted because they are required for database lookups and attribution joins.

Engagement events are stored with their metadata (timestamps, UTM parameters, event type) but do not include the content of the email itself or any personal data beyond the profile identifier.

What happens to my data if I cancel?

When you cancel your subscription:

  • Your workspace and all its data (campaigns, snippets, results, audience snapshots) remain accessible in read-only mode through the end of your current billing period.
  • After the billing period ends, your workspace enters a grace period. You can reactivate your subscription during this time to restore full access.
  • If you want your data deleted, contact support and we will permanently remove your workspace and all associated data.

Historical campaign results (verdicts, uplift numbers, variant performance) are yours. You can export CSV reports from any campaign before your access expires.

Who can see my test results?
Only members of your workspace. Liftstack is multi-tenant with strict workspace isolation. Users in one workspace cannot see campaigns, snippets, integrations, or results belonging to another workspace, even if they're on the same Liftstack account.
Does Liftstack have access to my ESP account?

Liftstack uses the API key you provide to make specific API calls: syncing audiences, writing profile properties, pushing templates, and fetching engagement events. It does not have access to your ESP dashboard, billing, or any data outside the scope of those API calls. The API key permissions determine exactly what Liftstack can and cannot do.

You can revoke access at any time by deleting the API key in your ESP's settings. Liftstack will immediately lose the ability to make any calls.

Troubleshooting

My test has been running for days but still says "Insufficient Data"

This usually means one of:

  • The variants perform very similarly. If the true difference is tiny, you need a very large audience to detect it. Consider whether the content differences are meaningful enough.
  • Small audience. Check whether your audience meets the minimum size guidance for the effect size you're trying to detect.
  • Low conversion rate. Broadcast campaign conversion rates are often 0.05-0.2%, which requires much larger audiences than flow campaigns. Liftstack automatically adjusts its decision thresholds for low rates and will show a data quality warning when this applies. Consider testing a higher-funnel metric like click rate or open rate for faster signal.
  • Not enough conversions yet. Each variant needs at least 3 conversions (configurable) before verdict computation begins. At low conversion rates, this takes more exposures.

The report will show an estimate of how many more exposures are needed. If that number is impractically large, the variants may simply be too similar to distinguish. That is a valid result; consider declaring them equivalent and moving on.

Why does one variant show zero events?

A variant with recipients but zero engagement events may indicate a tracking issue:

  • Check that the template conditional logic is rendering correctly for that variant
  • Verify that the tracking links contain the correct lf_cid parameter
  • Confirm that your webhook or event polling is functioning

Liftstack flags this as a data quality warning on the campaign report.

Why was my winner blocked by a guardrail?

The variant with the best primary metric performance also triggered a safety threshold. The guardrails that can block a winner depend on the channel:

  • Email: unsubscribe rate, spam complaint rate, or bounce rate exceeded the threshold vs the control
  • Push/SMS: opt-out rate exceeded the threshold vs the control
  • In-app: dismiss rate exceeded the threshold vs the control

The violation message in the report includes the specific metric, the observed rates, and the probability threshold used (which may be Bonferroni-adjusted when multiple guardrails are checked).

Consider:

  • Reviewing the variant's content for overly aggressive messaging
  • Looking at which audience segments are driving the negative metric
  • Whether the increase is acceptable given the conversion gains (you can acknowledge the guardrail and proceed if you've investigated)
  • For bounce rate violations, check whether the variant contains content that might trigger spam filters or whether there are deliverability issues with the variant's formatting
The report shows an SRM warning. What do I do?

An SRM (Sample Ratio Mismatch) means the traffic split doesn't match what was configured. Steps to investigate:

  1. Check for partial failures in the CRM profile write step (look for error logs during the writeback)
  2. Check whether audience members were suppressed or unsubscribed between assignment and send
  3. Verify that the template renders correctly for all variants (a broken conditional could funnel everyone to a default)
  4. Check for platform-side filtering (spam filters catching one variant's content)

Until the root cause is identified, the statistical results for this slot should not be trusted.

Can I re-run a test?
Yes. Create a new campaign with the same snippet and variants. Liftstack will use the historical data from previous campaigns to inform the new test (especially with Smart Allocation enabled). Each campaign is a fresh experiment with fresh assignments.
How do I export my data?
Click the "Export CSV" button on any campaign report to download the full metrics table. This includes all variants, all metrics, and the verdict information.

Start compounding revenue from the emails you already send

14-day free trial on the Growth tier. No credit card required.