3 Major Problems in Google Play Experiment A/B Testing Solved with StoreMaven
This is a guest post from Incipia’s CEO & Co-Founder, Gabe Kwakyi. Incipia is a growth consultancy that leverages in-house expertise across marketing, development, and data analysis in order to build and scale marketing campaigns.
Google Play’s Store Experiments are easy to use, free, and capable of calculating statistically significant results, making this method the most popular for A/B testing store assets. For most apps, especially those without the resources to explore other methods, Google Play Experiments is also the best option. However, for app marketers with the resources to demand a serious A/B testing methodology, there are 3 major issues to using Google Play Experiments.
Issue #1 – Inability to report on/control user segments
The first issue with Google Play Experiments is that there is no visibility or control for where the testing traffic comes from.
While you cannot observe true organic store traffic with a 3rd party test, the same is also true of Google Play Experiments, which makes use of all traffic arriving at an app’s store listing page, of which organic is one subgroup.
One real-world example of this issue is the natural difference in conversion intent between these major groups of traffic for most apps in the Google Play Store:
- Branded searches
- Top chart visitors
- Related apps
In a Google Play Experiment, the inherently different conversion rates and intents are necessarily masked by the strongest contributing segment when testing across all of these segments (or others, like ads or features).
Consider the following explanation as to why there is a difference in conversion intent of each segment:
- Branded searchers have the most specific intent to download your particular app given that there is a 1:1 match for what they are looking for and what your app offers, and so are highly unlikely to be influenced by (most) variations in-store assets.
- Top chart visitors may be most swayed by superlatives or other broadly-appealing messaging/design, such as messaging that references “the most popular” or “the best” or “over 100 million users.”
- Visitors from related apps (competitors), by contrast, are likely to be the most discerning and require a key differentiator to download your app as opposed to the app these users originated from, such as social proof or a unique feature not found in other apps.
Given that you cannot report on the conversion rate by source, it is impossible to know what the relative conversion rates are. By only reporting on the entire blended conversion rate, it’s entirely possible that the largest segment (in this case let’s say that this is branded searches) may have neutral inertia so strong that a positive or negative result from another segment is drowned out.
- Branded searches segments show +/- 1% change in conversion between the variants
- The conversion rate of top chart varies by +/- 3%
- That of related apps varies by +/- 10%.
Depending on the weighting of each segment, this could produce an insignificant, blended conversion rate of, say +/- 2.5%, which may not be enough to make a decision, despite the presence of a real difference in performance within the variants as it relates to related app traffic.
Using a 3rd party A/B testing method like Store Maven enables you to control where traffic is sourced from, allowing you to assess biases or other influences with more certainty.
Issue #2 – Inability to run comparative tests
The second issue with Google Play Experiments is that there is no way to run a comparative test to see what your app’s conversion rate was compared to peer apps in the store. Using Google Play Store Experiments alone, it is easy to identify what happens to conversion rates between variants in the experiment parameters, but it is difficult to determine the more valuable why insights; why one variant or another performed better or worse.
There is also no guarantee of control in the live environment of the Play Store, meaning that test results could be contaminated by any one of many other reasons, such as:
- Other apps running A/B tests and affecting user conversion rates independently of your own test.
- Sources being mixed in ways not fully random, such as higher converting traffic generally being sent to one or another variation, which can be especially frustrating if this happens at the start of a test.
- Related to the above point, there can even be odd variations within a categorical source, such as search traffic where Google’s algorithmic “keyword dance” regularly produces chaotic daily rank changes for keywords, which can pass chaotic outcomes on to variant performance.
Using a 3rd party search page test allows you to assess the click-through-rates, conversion rates and other information of not only your app but also the set of peer apps that vie for user attention in the real Play Store environment. This provides an additional wealth of information to draw your why conclusions from. A 3rd party test controls the testing environment (of which a big part is incoming traffic), ensuring that such influences of live environment noise are mitigated.
Issue #3 – Lack of innovative testing support
The third issue relates partially to the rigidity of Google Play’s testing statistical methodology. StoreMaven, by contrast, uses a more contemporary, predictive A/B testing method, which generally allows tests to arrive at results with less total traffic required.
Also, the insights of the team at Store Maven contribute years of experience running A/B tests of all types and time spent divining trends and variations that are actually worth testing from this experience. Whereas Google Play Experiments are fully self-serve, working with a 3rd party tool like Store Maven can make your tests more focused rather than forcing you to rely on the scattershot method of lobbing spaghetti at the wall to see what sticks.