In Part I, I urged readers to get excited about a simple statistical concept: expected value. To take it further, we need to be able to compute something called conditional expectation.This article gets a bit wonky, but if you can work through it, you will gain a truly powerful tool for making decisions in the face of uncertainty.
Like expected value, conditional expectation is a reasonably simple concept. To illustrate: the average income in Canada is $38,600. But, the average income in Canada, conditional on holding a university degree, is $44,100. Remember, expected value is just a probability-weighted average. So $38,600 is the expected value of a person’s salary in Canada, and $44,100 is the conditional expectation of a Canadian’s salary - on the condition that the individual holds a university diploma. Not rocket science.
The above example is a particularly easy application of conditional expectation, because we only need to crunch the simple average of people holding a bachelor’s degree in the Canadian population, i.e. we add up everyone’s salary and divide by the number of people. But we can compute conditional expectation in far murkier situations, in which our beliefs change as more information is uncovered. This leads us to Bayes’ Theorem.
While we won’t dive too deeply into the history of Bayes’ theorem, it’s worth knowing a little bit of background. It's named for Reverend Thomas Bayes, an English Presbyterian Minister who died in 1761 before the philosopher Richard Price published his theorem posthumously.
Bayes’ theorem is less a simple math equation than a whole new way of looking at the world. It allows us to rationally update our beliefs in hopes of getting ever closer to the truth - even if it concedes we may never know the exact truth. The impact of Bayesian inference has extended beyond mathematics into fields as diverse as science, philosophy, and law. A little later, we’ll take a look at Bayes’ theorem in the context of online marketing.
So just what is this powerful theorem? Practically, Bayes’ Theorem gives us the means to calculate the probability of an event given the occurrence of some other event, also known as conditional probability. The equation, which can be solved with simple arithmetic, is as follows:
- P(A|B) is also known as the conditional probability; this refers to the probability of event A occurring given B is true.
- P(B|A) is known as the likelihood; it refers to the probability of event B occurring, given A is true.
- P(A) is known as the prior probability. The prior is of deep importance to Bayesian inference, as it assigns our initial degree of belief in A, prior to accounting for new information. The prior often reflects a subjective degree of belief in Bayesian analysis, allowing us to directly and transparently state our preconceived hypothesis.
- P(B) is the probability of event B occurring, also known as the evidence. We assume the probability of B is fixed, and instead focus on updating our belief in A.
To better understand, let’s use Bayes’ Theorem to try and guess someone’s college major by their personality type. We go to a college campus and randomly select a student. The student happens to be geeky and introverted. What is the probability the student is studying computer science? Let’s say we know only 3% of the students on the campus are computer science majors, but that 60% of computer science majors are geeky and introverted. The common mistake here is to confuse the conditional probability with the likelihood. Many people will mistakenly assume there’s a 60% chance that a geeky, introverted student is a computer science major. But this ignores the fact that the prior probability is only 3%. Additionally, let’s assume that 20% of the student population is geeky, regardless of major. Taking the prior and the evidence into account, Bayes’ let’s us properly calculate the conditional probability:
= (6 * .03)/.2 = .09
So in the end, despite most computer science majors being geeky, the probability that a geeky person is a computer science major is only 9%.
How can Bayes' Theorem help with marketing? Let's devise a common situation where we need a handle on conditional expectation. Imagine we’re running a lead-generating campaign in hopes of getting paying customers where the value of a paying customer is $100. We want to know how much a freshly-acquired lead is worth, and our databases have not given us any CRM data on this particular lead. We do know that 5% of all leads convert to a sale. How much is the lead worth? Well, we can assign a probability of this lead converting, and multiply by the value of a customer. Since we have no information to differentiate this lead from all the others, we start by assigning a prior probability of converting, P(conversion), based on the overall conversion rate of 5%:
With no other information, we can calculate the expected value as follows:
.05*$100 = $5
Now let’s say we can see the lead checking out a product page for 5 whole minutes. How do we update our prior belief about the value of this lead? We look in our analytics software and see that of all users that convert, a whopping 95% check out the product page for more than five minutes. Before getting too excited, we need to put this 95% figure in proper context. So now we know:
For one, the prior probability of conversion is only 5%. Second, 30% of all leads will check out the product page for more than five minutes, giving us:
So we can calculate the conditional probability of this lead converting based on what we know:
= (.95*.05)/.3 = 0.158
Our conditional expectation should now be that the lead is worth:
.158*$100 = $15.8
Let's day that our CRM database finally wakes up and we get a rich customer profile: the user is a thirty-year-old French male. Using the same process above, we can now update our beliefs about this customer’s value. The posterior probability of our last equation (the "answer" of .158), becomes the prior probability in our next round of bayesian updating. For the sake of brevity, we’ll spare the reader from more math. But the core concept of the process is critical: as more information comes to light, we can keep updating our beliefs, getting closer and closer to predicting the true value of the lead.
Performing all of these calculations by hand for each lead in an online campaign would be a very laborious endeavor. A lot of recent work in machine learning focuses on creating algorithms that perform Bayesian updating in a highly scalable, automated fashion. Intervaliq relies heavily on Bayesian methods to continuously update our predictions, meaning that as more customer data flows in, our predictions get ever more precise. And of course, the client is spared from having to do the math.