Archive FM

Data Skeptic

[MINI] type i / type ii errors

Duration:
11m
Broadcast on:
30 May 2014
Audio Format:
other

In this first mini-episode of the Data Skeptic Podcast, we define and discuss type i and type ii errors (a.k.a. false positives and false negatives).

(upbeat music) - So welcome to the first data skeptic mini episode podcast. I'm here with my co-host Linda. - Hi. - And this one will be a little longer than usual. The typical format for these mini episodes will be like five, 10, maybe 15 minutes to talk about a little topic. But since this is our first one, I thought maybe we'd do some quick intros. So perhaps Linda, do you wanna give your background a little bit professionally and academically? - Well, I don't have any background on data or statistics. I think the most similar class I took in college was econ economics and I did not like it at all. So my background is I was an art major. - You started as the valedictorian. - Well, I was a valedictorian in high school and then I went on to college in North Carolina. And then I was an art major and political science major. I did take calculus all the way up to calculus too, which I almost failed. So that is not a good sign. But I did like math up until then. And now my latest job was that I was at an ad agency. So that's just to tell you how far I am away from-- - And your specialty there though, was? - Well, I worked in the mobile app department as a mobile project manager. So we built mobile applications. - And you've also worked for some .com. So you have some technology and startup econ background. - Yeah, so my background's more technology, customer facing technology and how people use technology and/or a little bit of retail, like online retail, e-commerce. - So some listeners might be wondering, why is it that you're my co-host if you're not really a data science background? And for me, I think it's kind of a nice way for a person who similarly might want to listen, but doesn't have those backgrounds to enjoy these topics through your experience because you'll probably ask good questions as might our Amazon bird there, that other people might be afraid to ask and that these conversations might be of interest. So why are you my co-host? Well, we live in the same building. That's not all, we are married. - Right, and now we have a bird that you might hear in the background. So I thought we could jump into our topic for today, which is type one and type two errors. Are you familiar with this at all? - No. - Okay, well, let me first ask you about something called ground truth. Are you familiar with that? - Not at all. - Okay, so ground truth is the idea that there's some information that you might be interested in and there is an actual value of that information, but you might not know it. So an example might be how much money and cash is in the house right now. We don't down to the penny. We don't really know 'cause I don't know how much is in your wallet. You don't know how much is in my wallet and neither of us know how much change is stuck in the couch cushions. But there is some actual value. If we really got the FBI investigators out here to go through every little nook and cranny of the house, we could get a number down to the penny of how much cash is within the house. But we don't know the answer. Although we can probably safely say it's less than a million dollars, right? Unless you have some secret fund I don't know about. - Actually, I hope it's more, right? (both laughing) - But the actual number, the perfect number, is what we call ground truth. And sometimes we can know ground truth. We can get really, really good measurements, but most of the time you can only get close. And sometimes you can't even get very close. But it always is about taking an observation or several observations and averaging them, or in other words, doing a measurement. A good example might be the smoke detectors we have in the house. In your mind, what is the purpose of the smoke detector? - To wake us up when there's a fire. - Right, so it's a way of measuring fire or not fire. Two states, kind of. And then the states of the alarm are ringing and not ringing. So in a perfect world, you have when your house is on fire, your alarm is ringing, when your house is not on fire, your alarm is not ringing. But there are two other cases. The case where your house is not on fire, but your alarm is ringing. And the case where your house is on fire, but is not ringing. And we call those type one and type two errors, which sometimes people say are false positive and false negative. So a false positive, or type one, would be when the alarm is ringing, but the house is not in fact on fire, much like a couple days ago, right? - Yes. - And the second case is the type two error, or false negative, which thankfully we haven't experienced, in which the house is on fire, but the alarm has not notified us of that. So if you kind of think of this as a quadrant of four boxes, that sometimes helps where in the rows you have ground truth, and in the columns you have your observation. And it's important to know the rates of false positives and false negatives. So for example, in the fire alarm, what do you think the odds are of a false positive? - You mean where it's going off, and it's not on fire. - Correct. - I think it's likely. - How likely? - I mean probably like 80% chance, 'cause I mean the only time I ever hear it going off is if I burn something. - Yeah, so this is a good example where it's gone off because it detected something, and it's actually doing its job, but just not quite in the way we wanted it to do. And how about the other case? What do you think where the alarm has failed to go off, but the house is in fact on fire? - I mean, I hope it's pretty rare. It really depends on where the alarms are placed. - So it sounds like you might even say, you'd put up with some type one errors in order to reduce your type two errors. You'd rather have a few false alarms, and no missing alarms. - Yeah, I mean that's a general idea. - So let me give you another example, or let's give you a quiz maybe, and you tell me what you think of it in the context of false positives and false negatives. Let's say you have something in the fridge, some milk, and the label has rubbed off so you don't know the expiration date, and you want to know if the milk is still good or not. What do you do? - Well, I sniff it. - And what are you sniffing for? - To see if it smells bad. - And if it smells bad, what do you conclude? - I generally just throw it away. - And what if it does not smell bad? - If it doesn't smell bad, then I move on to the second phase, which is, I try it. (laughing) Try out, try, milk does not look bad. - Well, that's good. Well, it's bad, but it's good for the podcast. But I was like, ew. (laughing) So that smell can be deceiving. - So in this case, (laughing) you have a measurement tool called your nose, and when you smelled something bad, you got a signal of spoiled. You concluded that the ground truth was that it was spoiled and you threw it away. But the case, and so we kind of don't know the false negatives where it smells bad, but is in fact good. Without knowing more about how accurate your nose is, we can't tell that. But we do have an interesting study we can do for your false negatives, where you smelled, you did not smell anything bad. So you assumed that ground truth was that the milk had not in fact spoiled, and then you tasted it and you found you were wrong, or you found that your smelling test was unreliable. Is that fair statement? - Yes, but I mean, sometimes when I try it, I'm still unsure. Like it has a funny taste, but I'm not sure. I mean, they're just not that many times, or you're just trying milk. - Which do you think is more reliable, your sniffer or your taster? - Definitely the taste. I mean, and just visually looking at it, if it's getting thick, and that's bad milk. - But you're gonna start with the smeller because it's of less consequence to you than your taster sensor, even though the taster sensor's better. - Well, I don't want to risk anything, right? - Yeah, that's a great point. Sometimes you have to concern yourself with you have a weaker form of measurement, like you're smelling, but even though you have a more accurate measurement tool, your taste buds, the penalty for tasting something bad versus smelling something bad is much worse. So you start with the less reliable sensor as your first wave of defense. So which, in the case of smelling then, which do you think you're more likely to see a false positive or a false negative? - False negative, I guess. - I would say so. Let's do one more. How about the check engine light on your car? - Oh, false positive. (laughs) - Yeah, I would have to do good. - I can say, 'cause I drive with my car and check engine light on for years. - Well, don't tell our insurance company. - I don't worry. (laughs) They're not listening. But maybe you could talk to us more about why this is important. - Ah, this is a great question. So it's important because you might have thrown out milk just because the smell indicated it was bad. But, so what, milk is not that expensive and you don't want to risk it. You know, putting that into, let's say, a cake that you spent an hour on and then you ruin the cake. But, so it's worth it to you to get new milk. But for the false negative, you want to move it forward in testing because you want to assess whether or not the milk is good by a second source of reliability because you know your taste sensor is of higher degree than your sniffer sensor. - Okay, I can see that. I mean, definitely in the case of baking. If I'm baking bread and I'm not sure if the yeast is working, I just throw the yeast to it. (laughs) - And the cost of new yeast is what? - It's cheap compared to the cost I've spent doing everything else. - Aha! So there, and I think lies the critical use in everyday life for false positives and false negatives. If you know the reliability of how you're measuring something and you can assess the consequence of a bad decision made on a bad observation, you can better evaluate the degree to which you want to trust the measurement and if you want to seek other lines of similar evidence. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (gentle music)