What happens when an experiment fails? How do you pivot?
Ideally, you design experiments in a way that seek to answer a question regardless of the result. For example, "do users not adopt this feature because of discoverability or usability?" is a great question to be answered with an experiment. If the exeriment execution puts that feature front and center of the main page when a user logs in, and the experiment is flat, then you could be pretty sure it's a usability issue. If it's stat sig positive, then you can be sure it's a discoverability issue.
What's important is what you do after you've identified this finding. This new information can inform your entire strategy of your team or you can pass it on to another team for them to work on.
Experiments are made to fail. It is know in the tech industry that around 85% or experiments fail.
If we'd know in advance what would work, we would just go straight for it and not make any experiments.
That's why it's critical to really treat this as an experiment - and not a full solution - and be able to move fast to the next one.
You will usually get success after experimenting and failing tons of time.
If the experiment results were flat or if the new variation performed worse, I would suggest three steps in deciding how to pivot
- Confirm it failed.
- Dig into the data
- Get user / qualitiative feedback
Confirm it failed.
- Did you run the test for long enough to get statistical significance? (How long does an AB test calculator say you should run the test?) The most common issue with AB testing is people calling the test with not enough data.
- Does the experience or part of the site you were testing does it get enough traffic to be a good fit for AB testing?
- Review the user experience you just tested yourself. Were there any issues with it you can see if you look at it with fresh eyes?
Dig into the data
- If the primary KPI you were testing is unclear/ low significance, is there a clue elsewhere in the data which might give you an idea why the new variation did poorly (worse results on mobile, better for returning users, or there were more form errors - are all examples of issues which might give you an idea on why the results were disappointing)
- What additional data sources can you look at to understand how users were responding. For example, Amplitude, Hotjar, FullStory, Qualtrics, etc. Watching some user sessions of people interacting with the variation can be eye opening.
Get user / qualitiative feedback
- Ask users. You can survey test participants and see how the responses different between variants, or if you have 'always-on' surveys on the site, you can divide responses based on which variation they saw.
- If the experiment is critical to decide an important investment, you can also do UserTesting.com style testing to understand how users were responding better.