If your mobile game features multiple ads and you want to boost engagement, click-through rates, and retention, this is where the power of A/B testing comes into play. It’s a methodology that provides answers to specific questions, shedding light on how specific ad formats, placements, pricing for in-app purchases, and content varieties perform.
For example, you can test your theory that uninstall rates arise from an overload of ads or dissatisfaction with a particular ad type. Alternatively, you may want to explore how serving a rewarded video at level completion impacts engagement, comparing an extra life versus coins.
It can also be used to determine prices for in-app purchases. Instead of introducing a price point to everyone at once, conducting A/B tests within a select group first mitigates risk and provides you with data to gauge the impact of price adjustments.
Let’s dive into how A/B tests can help enhance your ad monetization strategy, increasing revenue and improving user engagement.
How to A/B test in-app ads like a pro
To begin, set a goal for what question you’d like to answer and come up with a hypothesis. Let’s say you’re testing out a new feature and want to find out the level of player engagement. You’d like to know what the player motivations are that underlie your hypothesis.
Create a three-step testing plan:
- Determine your audience: Who will see your new feature? (New players only versus seasoned players)
- Calculations: How big are your test groups? How many test groups do you have? How long will you run it? What impact is your test having on key metrics?
- Manage expectations: Are all stakeholders on board? Does everyone know the next steps after the test results are in? What changes will you make if your theory bares out?
A/B testing divides a user base into groups. One group receives version A and the other, version B. A timeframe is then set for the testing.
Discover the best practices to keep in mind when you’re A/B testing.
Define key metrics
Before you start testing, pinpoint key performance indicators (KPIs) you want to measure.
Kyle Waring, Lead Product Manager at Ad Platform for Zynga, suggests looking out for these three metrics: impressions, yield or cost per mille (CPM), and average revenue per daily active user (ARPDAU).
Here are brief explanations to help you better understand why each matters:
- eCPM: Effective cost per mille, or thousand impressions, is used to measure the revenue generated per thousand ad impressions served. You could test ad revenue across ad networks, region, location, or operating system.
- ARPDAU: Average revenue per daily active user takes the pulse of revenue and growth while helping you understand why sudden spikes or drops in revenue occur. A best practice is to look for a minimum of a 3% change in ARPDAU before implementing any changes based on your A/B test results.
- Engagement rate: Track user behavior and interaction by measuring how much time a player spends with your game.
- Retention rate: Similar to engagement, this is when players return to your game after a specific timeframe. You can test this by days, including day one, day seven, and day 30.
- ROAS: Return on ad spend is an important performance indicator because it tells you how much profit you’re making for every dollar you spend.
Decide on the variants
These are the number of users who you will expose the test to.
It’s common practice to A/B test around ad creatives to find out which ones users engage with most. You can test graphics and copy, for example. Experiments might consist of a simple copy change, such as changing a question to a statement or tweaking the CTA (call to action) button from “Download Now” to “Yes, I Want It!”
You may also consider switching the placement of the CTA on the screen. Think about serving one group the CTA at the top and the other group the CTA at the bottom.
It’s worth avoiding the testing of nuanced, subtle changes, such as changing the background color on imagery from green to blue, which may not lead to valuable insights. Changing a color isn’t a strong hypothesis. Let’s say your users like a blue background better, now what? Test a yellow one? How about a teal one — who doesn’t like teal? What about the hundreds of other colors? This kind of testing doesn’t help you to better understand your users, so keep it practical.
Control vs. variation
Test only one variable at a time, whether it’s the CTA copy or a graphical element.
One group is the control – these are the players who won’t see the new change. The other group is the variant group, which you serve the different CTA, design, or copy to.
Use the control group as a benchmark against which to measure the increase or decrease in performance.
When testing different ad formats, only change the format and keep all other variables the same. Remember to use a large enough sample size to ensure the results have a meaningful enough difference.
Example of an A/B test
Let’s say you want to test the frequency of how many ads are shown within a given period of time. Maybe you test showing an ad every 10 minutes of gameplay (low frequency) versus showing an ad every two minutes of gameplay (high frequency) to see how engagement is impacted.
First, decipher between Android and iOS users and put in your testing parameters, such as how many people you choose and where they reside.
Add additional goals you want to see, such as:
- Estimated ad revenue
- Estimated total revenue
- Retention for one day to see its effects in the immediate term
- In-app purchases
Fill in the variant parameters — show ads three times for group A and two times for group B.
Then determine the percentage for the winning variant, which is usually at least 5% of the conversion goal. If the difference isn’t high enough, it’s inconclusive because it’s not clear whether it was just a coincidence. Test it again, but until your results are more definitive, don’t take any action.
Kyle says, “Continue to iterate on your experiments. Learning through experimentation is key. Don’t be afraid of trying new things and finding the breaking point of the game and economy. Focus on building experiments that test for multiple things and don’t be afraid to push the envelope on treatments.”
Analyze your results
This exciting stage is when you see how your hypothesis compares to the actual results. Check your performance metrics to see whether they increased, decreased, or stayed relatively the same. These metrics will identify your best-performing version and provide evidence for or against your hypothesis going forward.
Kyle explains that A/B testing really boils down to “designing high-quality experiments to test a variety of things. For example, instead of testing two amounts for your next rewarded video, test four or five variants. This gives you a deeper understanding of what rewards are enticing for users or not worth giving away.”
He adds, “Knowing that an experiment has ‘failed’ is also helpful — you now know what doesn’t work, which allows you to move forward in your product and feature developments.”
For example, you may look at initial conversion rates versus trial conversion rates for in-app purchases to understand how players’ likelihood to pay has changed between groups A and B.
Run the test for a duration of time that allows you to have enough data to identify trends and patterns. The length of time to test is up to you, but it should be long enough to gain a better understanding of the change driven by the tests you’re conducting. For casual games, a one or two week testing period usually uncovers relevant results.
A/B testing challenges
A/B tests don’t always yield obvious results. There will be moments when your findings are split.
If this is the case, pick apart the factors that could have impacted the results. For instance, was the content of your video ad causing people to bounce, or was it too long? Consider and check every possible variable if your test results keep coming back inconclusive. Also, don’t be afraid to retest or use a different demographic.
Avoid the “peek” trap
Avoid “peeking” at the results before the test duration is complete, as noticing significant results might lead you to prematurely conclude that you’ve collected adequate information for moving to the next stages.
The issue with this is that you’re inflating your false positive rate. This is the probability of detecting a significant difference in the data when there is none, putting you at risk of making decisions based on incorrect or false information. The more you peek, the greater the potential for skewed data.
To help avoid peeking, make sure to carefully evaluate the sample size with a reasonable probability that you’ll see a change. Set this up before you begin the test. Avoid stopping the test or jumping to conclusions when you first start to notice statistical changes.
Final thoughts
A/B testing is the tried-and-true method of continuously and incrementally tweaking your game to provide the best user experience for your players. It’s an ongoing process of refinement in which you can more deeply understand how your specific users engage with certain types of ad creatives, in-app promotions, and gameplay.
A/B experimentation helps you identify patterns and trends and then use the data to manage budgets and allocate spend more efficiently based on reasonable predictions for future campaigns.
Compiling results from A/B testing can help your team make smarter decisions to create a data-driven strategy that leads to less churn, higher ARPDAU, and users who stay engaged.
Interested in A/B testing ad performance in your games? Chartboost has the tools you need to perform your next test.