Archive FM

Data Skeptic

[MINI] Bayesian Updating

Duration:
11m
Broadcast on:
27 Jun 2014
Audio Format:
other

In this minisode, we discuss Bayesian Updating - the process by which one can calculate the most likely hypothesis might be true given one's older / prior belief and all new evidence.

(upbeat music) - Thanks for joining me again, Linda. - Hello. - Our topic today is Bayesian updating, and Bayesian statistics, Bayesian beliefs, those are all kind of the same thing. Do you know much about what those are? - I've only heard you say the word Bayesian 100 times, but I still don't know. - So I'm very proud of the fact that Bayesian statistics were mentioned in our wedding vows. - I just wanna say he threw them in there, and no one knew what he was talking about. - Oh, you were in yours. - I've been, oh, I said he says Bayesian, but I didn't say I do what it was. - So it's a, he said she said things. - Yes. - I didn't know you were doing that, you put it in. I was glad you did though. The key to understanding Bayesian belief systems are to know that any time we have a belief in statistics, we wanna describe that as a probability. So I could ask you, where are your keys? - You're asking me now, where? Okay, I thought you said you could ask me. That's different from, I am going to ask you a question now answered for me. - Okay. - I think you should rephrase it. - All right, I could ask you where your keys are, and now I will in fact ask you, where are your keys? - My keys are next to the door. - Are you certain? - I mean 90%, yeah? - 90%, cool. What's the other 10%? In case I forgot, or like move them, or who knows they're hanging in the door. - So 90%, they're hanging by the door, but let's explore that 10%. It's not just not hanging by the door. There's some percentages you said they're in the door. - Is there a percent that I put it in a backpack or purse? I could've walked around with it and put it down somewhere. - Is there a probability I hid them? - I mean, there is, but that's very small and weird. - This is what we would call your prior belief. So your prior belief says that my belief right now is that 90% they're by the door, and then 10% add all these little figures of other options add up to 100. If we go and we look and your keys are not in fact by the door, how have your beliefs changed? - Then I would just be like, oh, okay, I forgot where I put them. - Each of these ideas about where your keys could be is a hypothesis. Hypothesis number one, that they're hanging by the door, you now have evidence that that is false. So 90% of your belief is now zero. And how does it redistribute in your mind? - I just wanna go back and clarify. When I said hanging in the door, I mean when you unlock the door and you leave the key in the door and then you shut it. - So that's part of the 10%, that's not part of the 90%. 90% is that it's on the bench. - Got it. So we walk over, we observe it's not on the bench. And how do your beliefs change now? - My beliefs didn't change. I just said it was 90% that it was on the bench. That means 10% my beliefs may not be there. My beliefs haven't changed. - Well, in fact, they have though because your prior belief is like I'm 90% sure it's on the bench, right? - Well, and I'm wrong, right? - Not wrong, not exactly. It means your most likely hypothesis is shown to be false. Yet you had other hypotheses and now your belief in those should go up. So it isn't just that you erase the hypothesis you showed to be false. You need to re-evaluate the others given that new information in some way. Generally this is done with base theorem, which we're an audio podcast and it's kind of hard to describe a formula. So I'll give it to you in just sort of a loose way. You have that first belief, which a statistician calls a prior belief. And then you get some new evidence and the new evidence you use to change that to get what's called a posterior belief, which is your new belief or your belief given that extra information. So I have a fun example, maybe, that would illustrate this pretty well. Let's say we went to the farmer's market. And there's a guy who has a farm and he, on his farm, he makes both pomegranates and lemons. Now, which of those two do you prefer? - Pomegranates. - How much more? - I don't know, I never thought about it. I guess 60% more at least. I mean, you mean just to eat them or cook with them? I mean, what's the question? - To eat them, you want to have a snack. - I just want to eat them straight out. Nothing. - Yep, right. - And pomegranates, 100%. - Yeah, you don't want those lemons at all, right? - No, I don't like eating lemons. - Yeah, okay, cool. So, now let's say this farmer, you want to do business with him. He has the best prices or you have a history with him or something, but he says, hey, there was a screw up when we packed these and shipped them over here. So we have these mixed bags of fruit that you buy by weight. There are three different types of bags. One of the bags has 100% pomegranates in it. Some of the, that's a third of them. A third of the bags have 50/50 mix and the final third of the bags have all lemons. - Okay. - All lemons, all pomegranates, 50/50. So there's three hypotheses and the bags all look the same and they're all, you know, they're not transparent. You can't really see what's inside. The shape of the bag isn't telling you anything. In fact, they're in a box. So you don't know what's inside each. It's a big mystery. And because he told you they're all equally likely, your prior belief of if you just picked up a random box would be 33% for any one of those theories. Now let's say the man says, tell you what, I'll reach and pick the box you want. I'll reach and I'll pull out one fruit and I'll show it to you. And he pulls out a pomegranate. Now what does that new information tell you? - Well, it's not the bag with just lemons. - Yeah, that's right. Good observation. And what does it tell you about the other two options? Either all pomegranates or 50/50. - The other two? Well, the one that you just reached in is either 50/50 or all pomegranates. - That's right. So you eliminated, you entirely eliminated the all lemon possibility. And you're left with two possibilities. So what would you say is the likelihood that it's all pomegranates? - Oh, well, it's just 50/50. - It's 50/50. That's one way to look at it, but there's also a more Bayesian way to look at it. If you had a box of all pomegranates and you pulled out a fruit, what's the chance that it's a pomegranate? - If you had a box full of pomegranates. - All pomegranates. - And you reached in. - And pulled something out. - Well, it should be a pomegranate, otherwise that box wasn't all pomegranates. - So 100% likelihood that a pomegranate box will produce a pomegranate. What if you know you have a 50/50 box? What's the chances you reach in and pull out a pomegranate? - Oh, the one that's 50/50. - So 50% chance of getting a pomegranate, or a 30% chance of lemon. So let's go back to that example with the farmer and let's rethink how we want to update our beliefs. So he pulled out one fruit and we were able to eliminate it with a pomegranate, so we eliminated the lemon possibility. But of the two possibilities left, all pomegranates and 50/50, if it's in fact an all pomegranate box, then certainly, like you said, 100% chance you'll pull out a pomegranate. If it's the 50/50 box, there's only a 50% chance you'll pull out a pomegranate. So because of probability and chance, you actually would want to say it's more likely that the box that he's holding is the all pomegranate box. Do you see why? - No, I don't understand. - Because when you reach into a box, if it's the split box, there's a chance you'll pull a pomegranate and a chance you'll pull a lemon. For him to have the split box and still pull out a pomegranate, is only 50% likely. - Okay. - Whereas if he's holding the all pomegranate box and he pulls out a pomegranate, that's 100% likely. - So you're saying out of the possibilities because he pulled out a pomegranate, it is more likely that that bag is full of pomegranates. - It is. - Because it's more likely to happen in that situation. - That's exactly right. - When you compare the two. - That's exactly right. - Okay. - It seems a little counterintuitive at first, doesn't it? - So then this is not probability. What are we talking about? - It is probability. It's what they call posterior probability. - Posterior. - Which is the likelihood you assess after the evidence. After you got this new evidence that he pulled out a pomegranate, you eliminated one entire possibility. But yet you don't say, well, it's now 50/50 over the other two. You actually would say it's more likely that I'm holding an all pomegranate box because the observation of a pomegranate is more consistent with an all pomegranate box than it is with the 50/50 box. - So then what is the probability? 'Cause if we're talking about probability, what is the probability? - Right. So it happens to be, in this case, 66% probability that it's all pomegranates and 33% that it's 50/50. And you get that by following Bayes' theorem. Now what if he says, you say, "Oh, I don't know, I'm still a little unsure, "I'm only 66% convinced that it's all pomegranates in there." And he says, "Well, let me pull out one more." And he pulls out a second fruit and it happens to also be a pomegranate. What do you think now? - From the same box. - Same box. - Oh, it's even more likely. Let's just take that bag number one. If you like pomegranates like me. So it's too convincing. - Take it in a round. - But is there not a chance that in a 50/50 box, you could pull out two pomegranates in a row? - Yes, but that odd is lower on the 100% one. - That's right. So you now have mounting evidence that your all pomegranate box theory is correct. Yet you have not eliminated the possibility that it's 50/50. It still could be, but it's unlikely, or it's less likely. So the general idea is that to kind of summarize, you have a set of theories and a current belief over them. You get some new evidence and you ask yourself, what is the likelihood that this theory produced this evidence? And then you multiply these together and then there's some normalization process that goes on. And essentially you come up with a revised belief that is based on your former belief and the likelihood of the evidence given the theory. And that, in a nutshell, is Bayesian statistics. - Well, interesting. Now I can use it if a farmer decides to give me a deal. I'm pomegranate and he wants me to gamble on it. - I'm very glad to hear it. (laughs) (upbeat music) (upbeat music) (upbeat music) (gentle music) [ Silence ]