Skip to content

Interpreting the Results

Understanding how to interpret your experiment results correctly ensures you make informed decisions. This guide provides clarity on what your data is telling you.

Reading Confidence Intervals

Confidence intervals show the reliability of the uplift measurement, typically at a 90% confidence level. A narrow range indicates more reliable results, whereas a wide range suggests uncertainty.

For instance:

“3.19% ± 18.83pp” means the real uplift could be anywhere between -15.64% and +22.02%.

Cumulative Uplift Chart

When to Trust the Results

Use the following checklist to determine if your results are reliable:

  • ✅ Narrow confidence intervals (indicating precise measurements)
  • ✅ Experiment is completed (no longer running)
  • 🔁 If any of the above are missing, exercise caution and potentially wait for more data.

Digging into Variant Performance

Consider these points when evaluating how your variants are performing:

  • Assess uplift significance carefully: a small but statistically significant uplift in a critical metric (e.g., Conversion Rate, RPV) might outweigh large fluctuations in secondary metrics.
  • Consistency and trend over time are more important than daily fluctuations.

Variant Comparison Table

Caveats and Gotchas

Be mindful of these common misinterpretations:

  • Low Data Volume: Small datasets can lead to volatile, unreliable results.
  • External Factors: Promotions, technical issues, or external events might temporarily skew data.
  • Niche Segments: Analysing results from very specific segments (e.g., a single device or region) may not generalise well.

Frequently Asked Questions (FAQs)

What metrics are available?

The metrics you will need will depend on your goals. The UI can support a wide range of metrics, but below are some noteworthy ones:

  • Conversion Rate: Completed sessions ÷ started sessions (e.g. for a purchase funnel)
  • Revenue Per Visitor (RPV): Total revenue ÷ total sessions
  • Average Order Value (AOV): Revenue ÷ sessions with orders
  • Click Through Rate (CTR): Sessions with at least one click ÷ sessions eligible to click
  • Time on Page: Average time between viewing a page and leaving it
  • Add to Basket Rate: Sessions with at least one item added to the basket ÷ total sessions

Is seasonality a factor?

Yes. We try to avoid running experiments during busy retail periods like Black Friday or Christmas unless the experiment is time-sensitive or seasonal by nature.

  • Why is my uplift changing dramatically each day?

    • Small data volumes or fluctuations in user behaviour can cause volatility. Wait for data stabilisation over time.
  • What if the confidence intervals overlap zero?

    • Overlapping zero means uncertainty in whether the variant truly has an effect. Consider waiting for additional data before acting on these results.

Troubleshooting Results

  • If you’re seeing unexpected results, double-check:
    • Experiment setup (variants correctly implemented)
    • Metrics selected (are they relevant?)
    • Any recent changes or external events impacting traffic or conversions.