Linhda and Kyle talk about Decision Tree Learning in this miniepisode. Decision Tree Learning is the algorithmic process of trying to generate an optimal decision tree to properly classify or forecast some future unlabeled element based by following each step in the tree.
Data Skeptic
[MINI] Decision Tree Learning
(upbeat music) - Welcome back to another mini episode of the Data Skeptic Podcast. I'm here as always with my co-host Linda. - Howdy. - How are you today, Linda? - I'm well. - Our topic today is called decision tree learning. But before we get straight away into it, I thought it might be fun if we opened up with a game. Are you familiar with the game 20 questions? - Yes, so what that means is that someone thinks of a person or an object or a thing, and then the other person gets to ask 20 yes or no questions to figure out who or what it is. - Right, so go ahead. - Okay, is it a person? - Yes. - Are they famous? - Yes. - Is it a guy? - Yes. - Are they alive? - Yes. - Do we know them from TV? - Not really. - Do we know them from the radio? - I don't think so. - Are they internet famous? - That's not like the main reason they're famous, so no, even though you could find them on the internet. - Are they from magazines? - Sort of, but no. - Is it from books? - Yes, for sure. - Are they a writer? - No. - Are they an illustrator? - Not, I have to say no. - Is there a book about this person? - About this person, I doubt it. - Is the book fact? - Yes, absolutely. - Does this person live in the US? - Pretty sure, yes. - Are they American? - Yeah. - Are they an editor? - I doubt it. That's at least not what they're mainly famous for. - Are they political? - I'm gonna go ahead and say yeah, but not like, that's not the main thing. - Are they creative? - For sure. - Are they a comic book writer? - No. - Comic book illustrator. - No. - Do we know this person? - We don't necessarily know them personally, but we have met them. - Are they a photographer? - Yes. - Is it the photographer we saw in New Orleans? - Yes. - I forgot his name. - Stephen Wilkes. - Oh, that's right, Stephen Wilkes. - Yeah, so that was fun, right? - I guess. - So I have to confess that this little game, while it was fun, it's not exactly an example of our topic for today, because the goal of the game is to get to the precise person. Whereas decision trees and decision tree learning is more often about getting to a categorical label. So for example, maybe a biologist might look at features of a species to try and classify it. Or I don't know, a company could look at different things they know about one of their customers to try and predict if they'll be laid on their payment. Or you could look at the ingredients of a dish to try and say what cuisine type it is. When you think about decision trees, what does that mean to you? - To me, I think of a chart that is in the shape of a tree in which there are choices. And when you pick a path, there are more choices. It leads you down a very specific path. And what, go ahead. - That typically don't intersect, although it's possible. - And what happens at the end of that? - And you have an answer of a result. - You wind up in a leaf of the tree that has your classification answer, exactly. That's a pretty good description of a decision tree. What do you think decision tree learning might mean? - Is it learning how to make these trees? - Yeah, very much so. Decision tree learning is the process of coming up with the right tree that given some new set of input can bring you to the classification or the answer you're looking for based on the training you've done with all the historical data you might have or other examples to provide it. So when we thought about the game we were playing, you were trying to get to some final answer and you started with questions that would sort of subdivide the possibilities that you could arrive at. Could you tell me a little bit about the strategy and how you tried to prune down to the leaf node you were seeking? - Well, I wouldn't encourage people to follow my strategy. - That's fine. - But, starting broad and then going more narrow. - So why would starting broad be a good thing? - Well, because you don't know the right questions to ask later on, so there's no point being like does this person have red hair and then they would say no and it may just be a plant. - That's a good point. Similarly, you could have started with is the person Michael Jackson and then followed up with is the person Barack Obama and you would have not had much success there because those questions don't have that much information in them to give you. It will come as no surprise given that we're talking about this on the Data Skeptic podcast, but there are algorithms that do this process. The goal of any algorithm in decision tree learning is to come up with the structure, the right set of questions or comparisons to do that can get you to some either classification or prediction answer. The algorithms, there's actually two different fields of thought. There are the cart algorithms and the other class, I'll just call this C4.5 algorithms because that's a very famous algorithm and it had a couple of predecessors and also I think there's a C5 even though I've never used that. And they're very similar but also different. So in C4.5, what you're trying to do with each question is minimize the entropy or in another way of saying that is to maximize the information gain so that with every question you ask, you have the highest amount of exclusions done during the next step. So like if you say, you know, is it male or female, you've just pretty much cut half of the possibilities assuming it's a person. And the other sort of approach is deceptively similar but yet very different. It's an approach of trying to minimize the probability that you'll classify something in error with the new node you're introducing. So in other words, at the time you're saying, should I ask, is it male or female? Should I ask what color the hair is? The way you would divide your data set up then based on if they say like red, hair, blonde hair, brunette or whatever, there's some chance of error downstream from that. So you want to construct a tree that minimizes the possibility of error. And in the other school of thought, you want to ask the questions that most subdivide the remaining options in a way that's going to get you to the answer in the quickest fashion. So let's say some very clever data scientists were working on a business problem you were somehow related to. And they talked about the genie coefficients they were using and all these fancy approaches. Do you think you would necessarily follow their algorithmic design? - Do I understand what they're doing? - Yeah. - No. - Okay, but the output when they come to you and they say, hey, we have this process. Or first you say, is this Q number created before October 2013? And then if yes, do this, if no, do something else. Do you think you could interpret that decision tree that they would calculate for you? - Maybe, that sounds very binary and simple though. - In general, so decision trees don't have to be binary but they can be kind of simple. Sometimes they're binary, like a yes, no. Other times, like your hair color example, it could be red, blonde, brunette, black, whatever. And sometimes they work with continuous values which are always tricky compared to categorical values where you'd say like, what's someone's income? And you have to say we have divide and add up like below 100,000, go to this node, above 100,000, go to some other node, above 200,000, go to some third node. But I guess the point I was driving at is that if someone calculated one of these and they did a good job, it's sort of easy to follow. You just kind of walk through it with your finger and ask the right questions. And that's one of the really cool features about decision tree learning is that it produces these structures that generally have a certain intuitive appeal. So if, for example, I gave you a decision tree that was going to try and determine the cuisine that a particular recipe comes from. And maybe the first question is, does the dish contain rice? Because while rice doesn't perfectly say, oh, it must be an Asian dish, it's probably not a Russian dish, for example. Although, it could be like a Latin dish. - I think Russians use rice. - These rice? - Well, those leaf wrap things. - Then it's probably not a Scottish dish, how 'bout that? - I don't know, what about shepherd's pie? What do they have in there? - I think there's rice in there. - I don't know. - In any event, there's sort of an intuitive appeal to decision trees, which is one of the nice features that people often will choose to use those for. Whereas some other machine learning techniques can produce models that are hard for people to kind of grasp why is it doing what it's doing. And when you work with these, there are certain challenges about like, how to deal with missing variables and continuous variables. But overall, they're kind of a nice data structure that can easily describe a process of classification. - Yeah, no, I've seen it in, I believe college and maybe high school. - And I had some diagrams, but what you're saying here from this podcast is that it's really important. So I see that. - I don't know about important, but it's a nice technique that can be applied if you have a good data set that does some sort of classification or regression process. And you wanna come up with a nice way of labeling new elements that you haven't previously been able to identify. - So if I gave you a big Excel file with a bunch of different features, and the last column labeled each row, maybe labeled the person as good dancer or not good dancer, what are some of the columns you would look at and how much you wanna try and explore creating your own decision tree? - Based on whether or not a person's a good dancer? - Yeah, there could be columns that said like what their major was in college, how expensive their shoes are, or who knows what. Well, it could just be, does this person like dancing? Yes or no? - That's a good one. - Does this person use their spare time to go dancing? How frequently? Does this person enjoy watching TV shows with dancing? That's a huge industry. It could be, has this person ever taken a dance class? If so, how many? And if they say yes, they can go, how many hours do you think you would have? Practice dance. I mean, at the, it's really how many hours you spend on something. - That's a good point. And maybe someone who just dances in front of a mirror for 10,000 hours might not have progressed as well. Someone who, let's say took lessons for 5,000 hours. But what's neat about decision trees is, if you can gather all those possible features, and what you just did was a great process of hypothesizing what features might be useful. If you can collect all those and then go through and label some data set that says, this person is a good dancer, and here's all their data points, and this person is not, and here's their data points, then a decision tree learning algorithm can look at that data set and try and predict which features you wanna look at, in what order, and how to interpret them, in order to predict whether or not a new person will happen to be a good dancer or not as well. - So this algorithm puts it in order for us? - Yeah, a decision tree learning algorithm, I would call it a class, because there are many different versions, like ID3, CART, C4.5, or some of the ones I mentioned. Basically, it takes the training data, and it creates a tree where each node is a particular question. It says, look at this feature, this data point. If the value is A, B, or C, then go this way. To the left, let's say, if the value is D, E, or F, go to the right. And to the left and to the right are two other nodes that ask a different question. Maybe on the left it asks what type of shoes you own, on the right it asks, how old you are. And then from there on down the line, you keep following the correct branch that matches the value of the new element, the new row you're interested in, until you arrive at the leaf, which will say a prediction for which category this new row falls into. So having heard this, do you think you would ever try and apply any decision tree learning in your career? - Learning? - Yeah. - Like, am I gonna be learning how to apply these algorithms? - No, no, maybe decide that you have some particular problem, some classification problem, let's say. And a decision tree might be a good approach to it. - Well, I think we established earlier on that that's what we're using when we play games, like 20 questions. So it's already part of people's natural logic. - I would say so, yeah. That's why that intuitive nature decision trees make them a really nice classification algorithm. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (gentle music) (gentle music)