Be a Bayesian

Most ideas aren’t tested at all. When a company starts being run by analytical people that changes. Now they want to test everything. When this happens most analytical folk go back to the Statistics 101 taught to them in academia. They create a null hypothesis and figure out how much data they need to disprove the null hypothesis. They run the tests and present the results to upper management. Managers (those that have bought into this analytical world at all) want to know if the results are significant. Some may even ask about the t-test or confidence intervals.

But while these ‘analytical’ companies pretend to run businesses that way, deep down, that’s not how they are actually doing it. The analytics is just window dressing for what’s really going on. And what’s worse, when it actually IS run that way, it leads to bad decisions.

Let me give you an example:

Management has an idea for a new sales process. The process will cost more, but they are pretty sure it will improve conversion. They want to test it. They run an A/B test. They need a +5% C/R improvement for it to break even with the increased costs. What decision gets made based on these results:

  1. New process is 10% better with 95% confidence interval of 8pp
  2. New process is 5% better with 95% confidence interval of 8pp
  3. New process is no improvement with 95% confidence interval of 8pp
  4. New process is 5% WORSE with 95% confidence interval of 8pp

In theory here is what should happen based on a null hypothesis of ‘it doesn’t do anything’:

  1. Your test was too small
  2. Your test was too small
  3. Your test was too small
  4. Cancel the project

But we all know that is not what happens. It’s something closer to this:

  1. Awesome! This is great. Expand the program, but keep A/B testing live to make sure we are right
  2. Awesome! This is great. Expand the program, but keep A/B testing live to make sure we are right
  3. Oh. That’s not great. Keep testing until we are sure
  4. Wow. That’s weird. That doesn’t make sense. This new program cannot possibly be worse. Something must have gone wrong with the test. Can we test is again?

If you are a website that has ridiculous amounts of traffic with high conversion rates, then you can just power through and run A/B tests on everything until you get to significance (which is a different problem I will talk about later), but for everyone else it doesn’t work. You don’t have enough conversions to effectively A/B test everything. So instead you make decisions based on hypothesis of what you think might work, and strategically test the things you are a little less sure about.

No matter what the hyper-testers tell you, this is not a bad plan.

Even those with practically unlimited traffic still need to make decisions on what to test. It’s just possible that they have more testing time than ideas so they start testing random stuff. Most of us do not have that luxury. (And that might be a good thing)

 

Another Way

One way to do it is to just deal in confidence intervals. In the case above, #1 has a >50% chance of being more than break even, so let’s keep doing it. #3 and #4 have a less than 50% chance, so let’s spend our effort somewhere else. #2 is a toss up.

But the issue with that while you used some numbers (the percentages) you ignored the magnitude. As an extreme imagine if you put one ‘lead’ through each of the funnels. If one converted and one didn’t, you now know with >50% certainty that one is ‘better’ than the other. But you still really don’t know anything at all that you should be making decisions on. Who cares if you have a number behind it. It’s a meaningless number.

So let’s put away the number for a second and think about what we are really doing.

We have an idea.

We think that idea will work. In fact, in the old days before we knew how to test these things, we would have just decided to roll out the new idea. Now we know the prudent thing to do is to run a test: In case our management judgment is wrong.

But our current hypothesis isn’t that the new sales process is exactly the same as the old one. If we really believed that we would never waste our time running the test. The giant company that tests random stuff may believe that, but we aren’t testing random. We are testing something we are pretty confident will work far better than our break even.

The null hypothesis is meaningless for us.

The real question is: “How confident are you?”

It’s hard to put a number on that, right? It starts to feel ‘unscientific’. (This is why many academics hate this method by the way.)

But think about it this way:

Say you are very sure that the new process will double your sales. Then you run a small test and it says it halved your sales. Do you believe it? Nope. You think: There must be something wrong with the test. Let’s look into how we ran the test and see if we did something wrong.

(This happens all the time by the way. We had a process that we used in four African countries to reduce mobile telecom churn. It worked in three. In one it looked like it was destroying value. We didn’t believe it, so we dug it. It turned out their messaging machine wasn’t working and the offers were never going out, so all we were seeing there was random noise.)

So you were very sure that it would double sales, but the small test said it halved sales. You don’t think the test was right, but how do you feel about your confidence now? If you are at all rational your confidence has been reduced. Even if the test was very small, you are somewhat less sure that it will really double sales. You still think it’s a good program, but maybe not as good as you thought.

So you run a much bigger test. A huge test.

And this test says that the results are still halving sales.

OK. Now maybe you say that your original confidence was wrong. That this thing doesn’t work at all. It may not be as bad as halving sales, but it sure isn’t doubling them any time soon. Back to the drawing board.

My example didn’t really use any math. Just works like ‘very confident’ and ‘big’ and ‘small’. But replace those words with numbers and you have Bayesian Statistics.

And the thing about Bayesian statistics: It’s how human brains actually work and make decisions.

In academia you need to do things their way: Null Hypothesis. T-Tests.

But in business you will end up doing it the Bayesian way whether you know what Bayesian is or not. So why not structure your tests around it to start with?

Please note: I reserve the right to delete comments that are offensive or off-topic.