October 27, 2021

Ishan & Shubhankar: A Quest for Truth with A/B Testing

Ishan Goel, Shubhankar Gupta
Data Scientist

An insightful conversation with A/B testing champions Ishan Goel and Shubhankar Gupta. We discuss all things A/B testing — from various techniques to best practices and strategies. A valuable discussion for anyone looking to optimize website performance!

Ishan Goel is the lead data scientist at Wingify. He handles all processes related to their statistical engine and is involved in the development of new AB testing products. Shubhankar Gupta is a product management and growth specialist at VWO. He also has significant experience with the product marketing side of things. The following post consists of excerpts from a fascinating conversation with the pair — both of whom are deeply involved with AB testing and experimentation.

Moderator: Okay before diving in, let’s get the basics out of the way — What is A/B testing? 

 

Shubankar: Simply put, when you create variations of your website, in terms of the elements of the webpages, and you want to check their effects on key metrics that you care about — you do an A/B test. It’s a useful tactic to employ when you make new changes to your existing environment and want to see if they lead to an increase or a decrease in performance. It’s a similar case when you have a few different versions of a website, product, UI, etc and want to test which one performs best. 


Moderator: Why should businesses A/B test?


Shubhankar: All businesses need to innovate in order to cater best to their customers and remain relevant. Every one of them is trying to figure out how to grow their business, how to get more customers, how to increase revenue, etc — if they are unable to do so, it becomes hard to survive. In this age of information, decision-making needs to be data-driven. You need to test out all your hypotheses to ensure that the execution of your ideas will lead to the results you seek.

"A/B testing allows you to experiment, collect information, and make decisions that are data-backed."


Ishan: A/B testing, as it is prominently practiced, is deep-rooted in statistics. The reason for this is that data inherently has randomness. If customer behavior did not have randomness, we would not have required statistics or A/B testing.


An example that we like to give is this: say there are two points A and B; and there are two routes to get from point A to point B. If someone asks you which route is longer, you can input the relevant details into Google Maps and find out. Distances are not random. However, if someone asks you which route is faster, then there is an element of randomness that comes from, say, varying traffic on the two routes. So — answering which route is faster has many variables that are not as easy to figure out as the distance between A and B.

That is the core intuition on why A/B testing requires statistics: you need to gather consumer data, you’ll need several observations, and you’ll need a way to summarize those observations to determine if there is any meaningful difference or insight.

There are two kinds of A/B tests: Frequentist and Bayesian.

A Frequentist engine gives you p-value results. During a comparison, this engine gives you the result in terms of a p-value which will tell you which option performs better and if the difference is significant. In other words, you’ll get a quantitative winner. But what about the qualitative significance of such a result? What does this p-value signify? These questions are left unanswered.

A Bayesian engine helps solve for this. It actually gives you the probability that, say, page B is better than page A and vice-versa when required. 

Another major difference is that in a frequentist engine, results are valid only at a certain fixed number of visitors, while for Bayesian, the results are valid for any sample size.

Here are a few more tips on AB testing that growth marketers should know: 

  1. Always A/A test your statistical engine: A good engine would show no difference between both variations since they are exactly the same. This is a great test to see the efficacy of results.
  2. Small changes take more visitors to detect. For smaller changes, the impact will be picked up only when the sample size is larger. The accuracy levels are low at small samples. At 3000 visitors, a small change (like redesigning a button) is harder to pick up — it is very easy for the engine to give inaccurate results as no significant pattern might be revealed. However, at 30,000 visitors, you might start to see patterns start to form.
  1. Test bigger changes when you have lesser traffic. Bigger changes can be picked even at low levels of traffic as the impact will be easier to detect. If you’re making a big change on the website — like adding a new page or changing up the theme of the existing UI, a pattern can be seen even at a lower number of visitors.
  2. Test smaller changes when you have a higher traffic. Alternatively, when testing for the impact of smaller changes, it is better to do it at higher levels of traffic for greater accuracy of results. 

Traditionally, there are four kinds of A/B tests :

  1. MVTs (multivariate tests): These test multiple changes at once. They will gauge all possible combinations of the elements being tested and tell you which combination works best.
  2. Personalisation tests: These help in finding the best variation across different customer segments. If, for example, your website has an English version and a Spanish version, these tests can identify the users as per preferred languages and give you conversion results accordingly.
  3. Server-side tests: Changes in the algorithm and backend changes can be tested using server-side tests. For example, if change in backend code improves page load time, server side tests can be used to gauge impact.
  4. MABs (Multi-armed Bandits): These tests are used when you’re trying to optimise conversions during one-time flash sales. This is a variation of A/B tests except that while AB tests divert equal traffic to all variations during the duration of the test, MABs will divert more traffic to variations that are performing better. These are mostly performed during campaigns with short durations as the changes are not long term or permanent and the goal is to get maximum conversions out of that campaign.


Moderator:  What is the minimum amount of traffic that one needs for their landing page or website in order to be able to start a CRO or A/B testing?


Shubhankar: From an ROI perspective, we’ve seen that for very small traffic numbers, it might not be worth devoting a lot of your time to experimentation. It’s probably better to invest that time and energy into activities that help drive more traffic to your website. That will give you a better jump on your ROI.

But if you have bigger traffic numbers, say you’ve reached 10-20k users at a minimum every month, then it becomes easier for you to run more than one A/B test per week. That is when you can look for significant improvements from the testing efforts that you put in. 


Ishan: From a data perspective, you can theoretically run an A/B test even with a small sample, that is, anything above 30-40 visitors. But the restriction that also applies is that your changes need to be really big — like an entire upheaval of your website UI. Minor changes will simply not be picked up by these A/B tests if you have just a small number of visitors, as the statistical variation will be very low. So, if the changes that you’re making are incremental, then don’t even think about testing before at least 10k visitors. 


Shubankar: That being said, you can also use A/B testing as a growth hack. We know that it is great for optimization problems but you also use it to solve prioritisation problems if you’re in the product team. To illustrate, say you have multiple features that you’re building in the pipeline and you want to know which ones your team should really focus on. In these cases, small A/B tests can help you gauge user interest for the features you’re building and that can help you prioritise your feature set. You don’t need a lot of numbers as you can understand what kind of features your visitors want or like the most even with a thousand visitors. 


Moderator: What are some of the pointers that you can give a beginner to A/B testing?


Ishan: First, it is important that you don’t lose sight of the end goal. This depends on the context of your business and the problems that you’re trying to solve. Say, you want to increase your revenue. Your tests should then be aimed at looking for areas where you lose most of your customers. If it is on the pricing page, then you can come up with solutions like nudging your customers with discounts. 

Second, you do not need to set up any data engineering pipeline to be able to make data-driven decisions — which is the main purpose here. All you need is enough traffic. Once that’s established, you can start tracking your data and sign up for any A/B testing product.


Moderator: How can you AB test for a pop-up? 


Shubankar: A/B testing for a pop-up can be done in many, many ways. 

It could be something as simple as, when do you initiate a pop-up?: Do you initiate on exit intent? Do you initiate it after the user takes some action? Do you initiate when the user does not take some action or maybe do you want to initiate this when the user is sort of exhibiting some behaviour, which would suggest that they are struggling to navigate through your application?

There can be various triggers which can be pre-programmed, or which can be performative. They can even be behavioural. Other things would be the size of the popup, the images inside the popup, whether there is a direct CTA with the pop-up or whether you can input two CTAs, etc


Moderator: Suppose someone has large traffic, say more than a million per month, but due to a limited budget they cannot test all visitors. How do you suggest that they go about prioritising their A/B tests and how many visitors they test?


Shubankar: First, if you’re limited in terms of resources and the number of visitors that you can A/B test on your website, then look to make bigger changes — as mentioned earlier, you can use a smaller sample size for those and do not need to test all million visitors to see if the change is working out. 

Second, you don’t need to test all million visitors. Even for small changes, you just need to test the number of visitors that you would need in order to reach statistical significance. 

However, we do suggest that you run the experiment for a minimum of one week to account for day-to-day fluctuations. For example, most B2B websites get a lot of traffic during the week but during the weekends the traffic dies down entirely. So understanding what kind of numbers you would want to look out for during the week and setting your sampling ratio to that account should solve for this.


Ishan: Adding to that, all A/B test engines generally give you a sample size calculator. VWO has one which you can check out on our website. So you can use these to find the approximate number of visitors that you need.

Another solution to prioritise your A/B tests with a high number of visitors is to test for further down in the funnel. For example, for an e-commerce website, there might be a lot of visitors on the homepage or category pages, but there will be only a certain percentage that will be adding products to their carts and moving ahead with the transaction. So you don’t test the top of your funnel where you get the most traffic but you test for further down the funnel where the traffic may be lesser but you can still see the impact, given the sample size of the particular set of visitors that you want to test.


Closing remarks

A/B tests were confined to the sciences and were mostly referred to as what we know as hypothesis tests. People in the business community weren’t consciously A/B testing even if they were trying out different techniques to improve their sales. However, as the internet age came about, businesses realised that collecting data from their customers in scale is far more straightforward. Now, you can sign up for a testing engine, install a tiny piece of smart code onto your website, and start tracking your data immediately. Essentially, since collecting data has gotten so much easier — people and organisations are encouraged to grow while experimenting. A/B testing also became very important for people whose roles required them to focus on innovation and improving existing processes. Being able to data-back their ideas made those ideas more credible. 


Moderator: Thank you so much for joining us today! It’s safe to say that all of us found this very informative.


Shubhankar and Ishan: It is always a pleasure! It was lovely engaging with the audience.


Newsletter

Get the latest best practices in Marketing Analytics
delivered to your inbox. You don't want to miss this!!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.