Personalization, Automation and Authenticity (Part II: Everything but the Tweet)

This is Part II of my look at automating authenticity (and why it’s a bad idea). Part I talked about Twitter specifically (I recommend clicking the link and then coming back to this piece). In Part II I am going to apply the principles to customer communication in channels like email, direct mail and websites. To start with we will look at Target.

Why the Target story is BS

No talk of personalization is complete without talking about the Target story. For readers who have not heard the story, it goes something like this:

A woman receives a flier from Target with stuff for pregnant women
Her dad sees the ad and is furious as his daughter is still a teenager
He complains to Target and the manager apologizes
Later the man goes back to Target and apologizes himself. It turns out his daughter WAS pregnant and he didn’t know(in some versions of the story the daughter didn’t know either)
The moral of the story is that Target’s personalization was so good they knew who was pregnant before the woman did(or at least the father)

I don’t know anyone who works for Target, but I know the story is complete BS. Here’s why:

The Story: A man looks at a flier from a store and then complains to the store about what they are advertising? Does that make any sense? Wouldn’t it be more likely he would see a flier for pregnant stuff for his daughter and just ignore it? Who complains to a store about ads for stuff you don’t want? Don’t we all get ads for stuff we don’t want all the time
The Targeting (sic): Assuming Target even had a program trying to target pregnant women, and even assuming it was superhumanly accurate, it would be wrong more than it would be right.

How could something be extraordinarily accurate and still wrong more than right? To understand why we can dive into a different type of pregnancy prediction: bad chromosomes.

There is a test that will tell you with 99% accuracy whether the embryo in your belly will have chromosome problems. 99% sounds pretty definitive. If the test tells you your kid is going to have issues, you better be expecting issues, right? Wrong.

What is more important than a test’s accuracy are the underlying probabilities in the population being tested. In this example, let’s say you are a 25 year old woman without risk factors. Before you take the test they know that the likelihood of someone like you having a baby with specific chromosome problems is about 1 in 5000. So let’s look at what happens after the test (Each cell is a person out of 50,000 people being tested)

	Test Result
Reality	Bad Chromosome	Good Chromosome	TOTAL
Bad Chromosome	9.9	0.1	10
Good Chromosome	499	49,491	49,990
TOTAL	508.9	49,491.1	50,000

(Obviously the decimals are just estimates. In reality people either fall into a box or they don’t. Why we can simplify and eliminate the decimals for this example)

	Test Result
Reality	Bad Chromosome	Good Chromosome	TOTAL
Bad Chromosome	10	0	10
Good Chromosome	499	49,491	49,990
TOTAL	509	49,491	50,000

Better?

What does this chart tell us?

Look at the column on the far right first. This is just the sum of the real population. As we said earlier 1 in 5000 25-year old women will have chromosome problems. So out of 50,000 women, we can expect 10 to have problems. Easy so far.

Now we run the test. Each test result gets its own column. We don’t know which group an individual woman will fall into (that’s why we are running the test!), but in our omnipotent world we can look at each group separately. This means looking at each row one at a time.

In the top row, for the 10 women who do have the problem: With 99% accuracy the test should basically get it right every time (once we rounded up in our second example). 99% accuracy seems to do a really good job. No one who has a bad chromosome is getting a test result saying everything is okay.

Now let’s look at the second row. Here there are 49,990 women (i.e., almost all the women). The test is 99% accurate here too, but this time the 1% failure rate leads to a lot of misdiagnosis. 1% of 49,990 is 499 women. 499 women is a pretty small number compared to 49,990 women (1% in fact), but a small percent of a really big number can still be significant. Still, if you have a healthy fetus the test will tell you that 99% of the time. Sounds pretty good.

But now let’s flip it. Instead of looking at rows from our omnipotent vantage point, let’s look at columns. Columns are what we can actually see with results. We don’t know if a woman has a bad chromosome or not, we just know what the test says. That’s what the columns tell us.

In the right column the test says ‘all clear’, ‘you are completely healthy’. And it turns out the test is right. For the 49,491 women it gives an all-clear to, they get it right basically 100% of the time (after our rounding. If you go back to the earlier table you can see they actually get it wrong 0.1/49,490.9 – or less than 0.0005% of the time). If the test says you are fine, you can pretty much sleep easy. You are fine.

In the left column we get a different story. 509 women will be told that they have a positive test result and that their baby has a bad chromosome. Those women will, at the very least, go through a lot of stress. Some may even choose to abort their baby. But almost all of those women will have perfectly healthy babies. In fact only 10/509 of these women will have babies with chromosome problems. That’s not the 0.0005% odds of the right column, but it’s still only a 0.2% chance of an issue.

So our test that was 99% accurate actually only got it right 0.2% of the time. Those stressed out women who are told that they are 99% likely to have a baby with a chromosome problem (because many doctors explain it that way and many people interpret 99% accurate that way) have a 99.8% chance of being completely fine.

It’s perplexing math.

Hopefully I’ve covered in in enough detail that you have internalized it. If not, please do ask questions in the comments below. I would be happy to expand further.

(For those who are wondering at the example: My wife and I went through this exact situation while she was pregnant. It was indeed a false positive. And I had to do the math myself to understand the ingoing odds.)

What does this have to do with Target and automated personalization?

Good question.

The first problem with automated personalization is being accurate. Developing a model that can predict what you are trying to predict using whatever data you have. If you ask people, “Are you pregnant”, the answer to that question is likely pretty good at predicting if someone is pregnant (or at least believes they are pregnant). Maybe it gets it right 95% of the time (5% of the time people lie on the question). I would go so far as to say it would be the single best predictor of whether someone is pregnant. Much better than their buying behavior (one of the conclusions often made from the Target story is that companies will be able to predict your needs better than you can yourself. They knew she was pregnant before she did! I don’t believe that and have certainly never seen it in reality – and I have helped dozens of companies develop models like Target claimed to have).

Now let’s say that Target managed to build an amazing model that predicted pregnancy. Let’s say they were 90% accurate – almost as good as just asking people. The issue is the same as what we saw with the chromosome tests: Only a small number of women are pregnant at any given time. Let’s try to estimate that number roughly. Let’s only look at women ages 14-64 (50 years). Let’s assume that the population is evenly spread out along that time. Let’s assume that some other Target model can predict the customer’s gender and age extremely accurately so they can assume that men and women under 14 or over 64 are never pregnant (so their accuracy is actually much better than 90%. It’s 90% accurate for 14-64 year old women and 100% accurate for everyone else). Let’s also say that women have an average of 2 children each during that time period and the model can pick up on things after 3 months of observing their buying – so about 6 months per pregnancy times two pregnancies equals 1 full year during those 50 years – or 2% of the women.

With those facts let’s build the same table we did for the chromosomes:

	Model result
Reality	Pregnant	Not Pregnant	TOTAL
Pregnant	100	900	1000
Not Pregnant	490	48,510	49,000
TOTAL	509	49,410	50,000

These odds are not as wild as the chromosome table because a 2% ingoing likelihood is much larger than a 1/5000 ingoing likelihood, but hopefully at this point you can see the same problem.

If the awesome 90% model says someone is not pregnant it is almost always right – 48510/49410 of the time (about 98.2% of the time). But if the model just assumed that women were NEVER pregnant it would still be right 98% of the time (since they are only pregnant one year in fifty).

If the model says the targeted woman is likely to be pregnant then it does much better. It’s right 100/509 of the time – about 25% of the time. That’s much better than the 2% a monkey would get by just guessing randomly. But see the problem? This 90% accurate model is still wrong 75% of the time when it says a woman is pregnant.

If you use information like that to show ads about pregnant stuff to women you will be showing those ads to non-pregnant woman 75% of the time.

But, and here’s the kicker: There is nothing wrong with that.

When a company puts an ad on TV for diapers there are far less than 25% of people watching that might need to buy those diapers. Ads touch the wrong people all the time. The goal of modeling is to try and get the ads to touch wrong people a little bit less – and that’s a great thing when it costs you money to touch someone who is not going to be interested.

THAT is why you do modeling and try and improve targeting.

Unfortunately marketers and data analysts have started drinking their own Koolaid and they seem to believe they can be 100% accurate (they can’t). And when you believe you are 100% accurate you start using the models to do things the models have no business doing.

Like automated personalization.

There is nothing wrong with sending an ad for pregnancy stuff to all of your mailing list. No matter what the Target story will infer no one is going to storm your building demanding an apology for getting a bad ad mailed to them.

If that mass mailing is ROI positive, then by all means use some modeling to limit who you send it to (here’s some free modeling for you: Don’t send it to single men. Don’t send it customers below the age of 20 or above the age of 40. Maybe only send it to married women. Yes younger and older women and single women get pregnant, but you aren’t trying for 100% you are just trying to improve your odds).

But here is a terrible idea:

Send a mailing to the ‘modeled women’ with a personalized message that says (in so many words), “We know you are pregnant.”

“Experts” will tell you not to do this because it’s creepy. I tell you not to do this because it’s wrong (and not morally wrong – just wrong wrong).

You don’t know she is pregnant. You only have a 25% shot of being right.

What is true of pregnancy is true of any low probability event. Not most of them. All of them.

And almost all ‘personalization’ relies of using automation to send unique messages based on models of low probability events.

Your personalized automated authenticity ends up falling into one of two buckets:

It’s not personalized. It’s generic. This is the bucket for 90% of the ‘personal tweets’ I received on a daily basis
It’s personalized wrong. This is the bucket that most companies fall into when they try and automate the personalization of their websites or communications

Hopefully I’ve driven home the point hard enough. All this said, there ARE ways you can “personalize” effectively. I will cover that in another post. Stay tuned.

Marketing is Easy

The Anti-Expert Expert

Personalization, Automation and Authenticity (Part II: Everything but the Tweet)

by Edward Nevraumont