Great Expectations II

In Part I, I urged readers to get excited about a simple statistical concept: expected value. To take it further, we need to be able to compute something called conditional expectation.This article gets a bit wonky, but if you can work through it, you will gain a truly powerful tool for making decisions in the face of uncertainty.

Like expected value, conditional expectation is a reasonably simple concept. To illustrate: the average income in Canada is $38,600. But, the average income in Canada, conditional on  holding a university degree, is $44,100. Remember, expected value is just a probability-weighted average. So $38,600 is the expected value of a person’s salary in Canada, and $44,100 is the conditional expectation of a Canadian’s salary - on the condition that the individual holds a university diploma. Not rocket science.

The above example is a particularly easy application of conditional expectation, because we only need to crunch the simple average of people holding a bachelor’s degree in the Canadian population, i.e. we add up everyone’s salary and divide by the number of people. But we can compute conditional expectation in far murkier situations, in which our beliefs change as more information is uncovered. This leads us to Bayes’ Theorem.

While we won’t dive too deeply into the history of Bayes’ theorem, it’s worth knowing a little bit of background. It's named for Reverend Thomas Bayes, an English Presbyterian Minister who died in 1761 before the philosopher Richard Price published his theorem posthumously.

Bayes’ theorem is less a simple math equation than a whole new way of looking at the world. It allows us to rationally update our beliefs in hopes of getting ever closer to the truth - even if it concedes we may never know the exact truth. The impact of Bayesian inference has extended beyond mathematics into fields as diverse as science, philosophy, and law. A little later, we’ll take a look at Bayes’ theorem in the context of online marketing.

So just what is this powerful theorem? Practically, Bayes’ Theorem gives us the means to calculate the probability of an event given the occurrence of some other event, also known as conditional probability. The equation, which can be solved with simple arithmetic, is as follows:

  • P(A|B) is also known as the conditional probability; this refers to the probability of event A occurring given B is true.
  • P(B|A) is known as the likelihood; it refers to the probability of event B occurring, given A is true.
  • P(A) is known as the prior probability. The prior is of deep importance to Bayesian inference, as it assigns our initial degree of belief in A, prior to accounting for new information. The prior often reflects a subjective degree of belief in Bayesian analysis, allowing us to directly and transparently state our preconceived hypothesis.
  • P(B) is the probability of event B occurring, also known as the evidence. We assume the probability of B is fixed, and instead focus on updating our belief in A.

To better understand, let’s use Bayes’ Theorem to try and guess someone’s college major by their personality type.  We go to a college campus and randomly select a student. The student happens to be geeky and introverted. What is the probability the student is studying computer science? Let’s say we know only 3% of the students on the campus are computer science majors, but that 60% of computer science majors are geeky and introverted. The common mistake here is to confuse the conditional probability with the likelihood. Many people will mistakenly assume there’s a 60% chance that a geeky, introverted student is a computer science major. But this ignores the fact that the prior probability is only 3%. Additionally, let’s assume that 20% of the student population is geeky, regardless of major. Taking the prior and the evidence into account, Bayes’ let’s us properly calculate the conditional probability:

= (6 * .03)/.2 = .09

So in the end, despite most computer science majors being geeky, the probability that a geeky person is a computer science major is only 9%.

How can Bayes' Theorem help with marketing? Let's devise a common situation where we need a handle on conditional expectation. Imagine we’re running a lead-generating campaign in hopes of getting paying customers where the value of a paying customer is $100. We want to know how much a freshly-acquired lead is worth, and our databases have not given us any CRM data on this particular lead. We do know that 5% of all leads convert to a sale. How much is the lead worth? Well, we can assign a probability of this lead converting, and multiply by the value of a customer. Since we have no information to differentiate this lead from all the others, we start by assigning a prior probability of converting, P(conversion), based on the overall conversion rate of 5%:

With no other information, we can calculate the expected value as follows:

.05*$100 = $5

Now let’s say we can see the lead checking out a product page for 5 whole minutes. How do we update our prior belief about the value of this lead? We look in our analytics software and see that of all users that convert, a whopping 95% check out the product page for more than five minutes. Before getting too excited, we need to put this 95% figure in proper context. So now we know:

For one, the prior probability of conversion is only 5%. Second, 30% of all leads will check out the product page for more than five minutes, giving us:

So we can calculate the conditional probability of this lead converting based on what we know:

 =  (.95*.05)/.3 = 0.158

Our conditional expectation should now be that the lead is worth:

.158*$100  = $15.8

Let's day that our CRM database finally wakes up and we get a rich customer profile: the user is a thirty-year-old French male. Using the same process above, we can now update our beliefs about this customer’s value. The posterior probability of our last equation (the "answer" of  .158), becomes the prior probability in our next round of bayesian updating. For the sake of brevity, we’ll spare the reader from more math. But the core concept of the process is critical: as more information comes to light, we can keep updating our beliefs, getting closer and closer to predicting the true value of the lead.

Performing all of these calculations by hand for each lead in an online campaign would be a very laborious endeavor. A lot of recent work in machine learning focuses on creating algorithms that perform Bayesian updating in a highly scalable, automated fashion. Intervaliq relies heavily on Bayesian methods to continuously update our predictions, meaning that as more customer data flows in, our predictions get ever more precise.  And of course, the client is spared from having to do the math.





Great Expectations

Statistics isn’t that sexy. Sure, we’ve heard about data science as the sexiest job of the 21st century. But when it comes to learning and applying specific concepts in statistics, it’s just not that exciting for most people.

There is one concept in statistics that I urge you to get excited about. Because you will make better decisions, and because you will make more money. That concept is expected value.

The practical definition of expected value is simple: take all possible outcomes for a random variable, weight each outcome by the probability of occurring, then sum the result. Voila: expected value is just a probability-weighted average.

Let’s take a classic coin-toss example. If you flip heads, you get $150. If you flip tails, you lose $100. So what’s the Expected Value (EV) of this gamble? Well, there are two possible outcomes: gaining $150 or losing $100. If we assume it’s a fair coin, the probability is 50% for both outcomes. So, EV = .5*$150 + .5*-$100 = $25. So, should you take this gamble? As long as losing $100 isn’t catastrophic, then hell yes! Generally speaking, you should take any gamble with positive expectation (a positive EV), as long as the loss would not seriously affect you.

This is all pretty easy. So where do people start to get lost? According to Nobel Prize-winning economist Daniel Kahneman, most people will not take the above gamble due to loss aversion. In fact, the possible gain would need to be at least twice the possible loss for most people to be willing to take the gamble. This is, of course, deeply irrational; a whole field of behavioral economics is dedicated to understanding why we veer from making rational decisions in the face of these kinds of gambles.

In the field of performance marketing, I’ve seen some otherwise very smart people struggle to understand the expected value of a marketing campaign. A marketing manager will notice that their conversion rate on leads is low, and therefore conclude that they should cut the campaign. “Our conversion rate sucks, lets cut the campaign” is a classic reaction. But what if those leads that do convert are worth a lot of money? For example, lets say you pay $3 a lead. Only 2% of leads convert to sale. But those that do convert are worth $500. Let’s look at the EV of a lead: .02*$500 - $3 = $7! You have a classic case of an asymmetric payoff: you lose a little money on most leads, and a small number of leads trigger a much bigger gain, offsetting those smaller losses. A lot of marketing managers, however, will want to ditch a campaign after a string of small losses.

Nassim Taleb has written extensively about asymmetric payoffs like the above example. In Fooled by Randomness, Taleb observes a similar misunderstanding of expected value amongst Wall Street traders:


“Jim Rogers, a "legendary" investor, made the following statement:

I don't buy options. Buying options is another way to go to the poorhouse. Someone did a study for the SEC and discovered that 90 percent of all options expire as losses. Well, I figured out that if 90 percent of all long option positions lost money, that meant that 90 percent of all short option positions make money. If I want to use options to be bearish, I sell calls.

Visibly, the statistic that 90% of all option positions lost money is meaningless, (i.e., the frequency) if we do not take into account how much money is made on average during the remaining 10%.”

So, how do we internalize the concept of expected value, so that we can make better decisions and become wealthier? Kahneman advocates repeating simple mantras, such as “you win some, you lose some” to counter our loss aversion when we face a gamble with a positive expected value. The most successful poker players, for example, have trained themselves to compute the expected value in order to correctly play the “pot odds.”

My take is that algorithms can help us overcome our cognitive biases and properly consider the expected value of an uncertain proposition. Now, algorithms aren’t appropriate in all scenarios; using an algorithm at a poker table in Vegas could get you in some nasty trouble.  But in trying to determine the value of your marketing efforts, I see algorithms as the best way to apply the concept at scale.

Stay tuned for part two, where we take a look at the power of conditional expectation.



Mixpanel open sources the deck that got them an $865M valuation

As an industry, analytics is brutally competitive, particularly when it comes to highly scalable analytics products. Mixpanel has managed to carve out a (sizeable) niche for themselves, as this $65 million round clearly demonstrates. 

Interestingly, Mixpanel credits their competitive advantage to the database engine they built in 2010. The other thing that pops out is how sales-centric the deck is; it shows how much manpower is required for scaling a relatively plug-and-play solution like Mixpanel.

In any case, you can check out the deck here.

Mainstream press sounds off on the relationship between humans and AI

Both David Brooks of the New York Times and Kevin Kelly of Wired Magazine opined on the state of artificial intelligence in the past week. While the two authors express different concerns about the future ramifications of machine intelligence,  both reach a consensus: artificial intelligence should primarily serve to augment human intelligence.

David Brooks hopes for a “humanistic” outcome, where AI is liberating for humans: “...machines liberate us from mental drudgery so we can focus on higher and happier things. In this future, differences in innate I.Q. are less important. Everybody has Google on their phones so having a great memory or the ability to calculate with big numbers doesn’t help as much.”

Similarly, Kevin Kelly predicts that AI will serve as specialist intelligence to make us better at what we already do as humans: “Most of the commercial work completed by AI will be done by special-purpose, narrowly focused software brains...In the next 10 years, 99 percent of the artificial intelligence that you will interact with, directly or indirectly, will be nerdily autistic, supersmart specialists.”

By focusing AI on what machines do best, humans can shift their energy on abilities that are uniquely human. The combination of these two types of intelligence results in something greater than either intelligence alone. A clear example of this phenomenon is in competitive chess: the best performing chess player is currently a “centaur,” a hybrid of human and artificial intelligence. So while AI may beat a top player like Kasprov, let Kasprov collaborate with the AI and and you've got one hell of a tag team.

Chess is a game of dizzying complexity. Claude Shannon estimated the number of possible positions in chess to be of the general order of 10 to the power of 43. While this creates a considerable challenge for AI,  chess provides for a very stable learning environment. That is, chess has the same rules over time, and the theoretically dominant strategy for a given configuration of pieces does not change. As computers get more powerful, chess AI will come ever closer to approaching a deterministically perfect strategy by simply considering all possible outcomes.  Poker, by comparison, involves imperfect information, so that even a computer with limitless computational power cannot deduce the outcome of a hand with certainty. Add more than two players to a game of Texas Hold 'em and computing an equilibrium becomes daunting.  As a result, poker is a much greater challenge for AI; pure number crunching does not suffice. 

If we turn our attention to a system as complex, uncertain, and dynamic as a business, it becomes clear that human domain experts are a long way from extinction. If AI struggles with poker, what hope does it have for totally automating strategic decisions in a business? That said, there are still many applications where AI can offer its highly specialized abilities to help humans along. In a highly competitive marketplace, it makes strategic sense to use the most powerful combination of intelligences at your disposal.  Just as software already automates much of the drudgery involved in number crunching,  AI can enable predictions at a scale and granularity that would be impossible for even a large team of humans. But at the end, it is the humans who must place this information in context, and make decisions in an ever changing, highly uncertain world.