The Bold Blueprint Podcast

The Bold Blueprint Avideh Zakhor Overcome the Fear of Failure

Failure is an inevitable part of the journey to success, but it’s also one of our greatest teachers.

Broadcast on:: 09 Oct 2024
Audio Format:: other

Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work day by day, stitch by stitch. At Dickies, we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true work wear to your collection, remember that Dickies has been standing the test of time for a reason. Their work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code workwear20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. Okay, let me get started. What happened to the population of our class? I get them. They died. Okay. That's right. That's right. So just a couple of announcements before I get started. Homework number 3 on topography is due on Friday and I'll be assigning a new homework on Friday, which is the 17th, which will be due on the 24th. Just just to get a quick show of how many people have finished the lab assignment or you have. Very good. Han Han, I need to take you roughly. Oh, a lot of, I probably spent like five hours doing the wrong thing and then I went to office hours and figured out I was just, you got to just, it's a lot of tricks with the FMT and like how am I allowed to still make stuff. It's really good to use a lot of different. Okay, let's see. Okay, so what I'm going to talk about today is overall how many hours did it take you? We got the five hours that you were in the wrong panel. Okay, the whole thing, including that. Oh, that's nothing. That's excellent. Okay. Okay, go see Cindy. You have office. All right, right. She has office hours, Tuesdays, and tomorrow and you can also catch her after class today if you're available. Okay, so what I'm going to talk about today is a few more words about FFT and how we compute it. Then I'll talk a little bit about a recursive method of transposing a matrix because it relates to the roll column technique and then I'll move on to two-dimensional filter design and say few words about zero phase filters, linear phase filters, FIR filters. Talk about a couple of well-known FIR filter design techniques and their extensions from 1D to 2D and then possibly talk about the transformation technique by McClellan on Friday's lecture. And then that kind of wraps off our discussion off of J. Lin's book. What I plan to do hopefully by next Wednesday because we were approaching the end of February is come up with a sheet, it's not one sheet, but with a handout of possible projects or literature search or topics that you could spend time to do reading in the field, et cetera. And I'll try to get that out as I had said before end of February. So if this Friday is the 17th, next Friday is 24th, let's say it's by next Friday, I'll try to do that. And then once you receive that sheet, I want to give you two to three weeks to kind of look at the topics I've proposed and then come up with your own proposal. Again, this project/turnpaper you do should not be all encompassing. It's not something that's going to, supposed to chew up hundreds of hours of your time or anything like that. I will put some guidelines of roughly how many hours on average I expect you guys to do. I'm going to look at the rest of the semester, map out the remaining labs and homework that's to be done and it's meant to be, for the weeks that you don't have a homework and on average you're spending eight or nine hours to do the homework, it's meant to be that same number of hours a week, no more, no less, okay? So don't think of it as a gigantic project. And of course, I encourage you very much to team up after I propose the topics. If you want to do a project and you think that by teaming up you can accomplish more and show something more real, that's perfectly fine. But meanwhile, in the back of your own mind you should also be thinking about what project you might want to be doing. If you're doing something for your thesis that you think relates to image processing, of course I'm happy to receive that proposal as well. Okay, any questions or comments if I get started? All right, so what we talked about last time was two methods of computing DFT. One of them was direct method that as we talked about last time, you just sum up the values in the double summation and it ends up requiring approximately n to the fourth multiplications and to the fourth additions for an n by n sequence. Then we talked about row column decomposition and what we realized was that it requires roughly of the order of n squared log 2n. So let me write these numbers down. So computing DFT for an n by n image, if you use the direct method, you get roughly n squared operations. If you use the row column, then you get roughly n squared log 2n operations. Yeah. Yeah, when we did the direct method, it was very obvious to me how we got that number n of the fourth, but I don't see very obviously how you get n squared log 2n for this other one. So if you go back, I didn't bring the lecture notes, I could fire up the computer to look at the lecture notes from last time, but basically what we discovered was you had to do n one-dimensional DFTs across the columns, right? And each DFT is n log n, right? And then once you did the columns, then you have to do n and more one-dimensional DFTs, each one being n point long. So that's another n log n. So it's the result of those. Okay. Do you follow that? So to do the columns, here's, let me just flash this quickly. Can you bring this thing down? So the first step, you have n columns, each column is n points long. So n log n, n log n, n log n, and there's n of them. So it's n squared log n. And then once you do that, then, sorry, initially you do rows, or you can do first columns, then rows, or vice versa. And then you have to do, to begin with, you do n rows, each one is n points long. So n times n log n. And then once you do the rows, you do n columns, and that's the other one. So you add them up, you get n square log n operations. Is that more clear? Yeah. Okay. So, just so that you have a feeling of what these numbers are all about, for example, suppose that you have n approximately 1,000. So you're dealing with 1,000 by 1,000 image. The direct method then would require approximately 10 to the 8 operations. And if you're doing row column decomposition, you would require 10 to the 4, sorry. This is 10 to the 12. All right. And here, at 10 to the 3 times 10 to the 6 times, log base 2 of 1,000 is 10. So approximately 10 to the 7. So comparing these, you get 10 to the 12 over 10 to the 7, which is approximately 10 to the 5. So for 1000 by 1000 point image, the saving factor is 100,000 times. So if one of them could take one second, the other one would take, let's see, 100,000 seconds is roughly divided by 60. So 100,000 seconds divided by 60 to get minutes. Let's just say that's 120. So you get 2,000 minutes divided another by 60 to get hours. So this divided by this is 30. So you get 30 hours. So what that means is that if the row column decomposition takes one seconds, the other one takes one and a quarter of days. Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands, like our new brand Amazon Saver, 365 by Whole Foods Market, a plenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work day by day, stitch by stitch. At Dickies we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true work wear to your collection, remember that Dickies has been standing the test of time for a reason. Their work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code WorkWear20 at Checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. Which is a big saving. We don't want to, I mean if you let's say it's a one-shot thing, you can wait a day and a quarter to compute the answer. But if it's an iterative algorithm that you keep refining something, is part of your tomography who works at the iterative algorithm? Are they doing the iterative stuff? Oh okay. Anyway so if you're doing iterative techniques where you have to recompute DFT, if each iteration takes a day and a quarter, you're in deep trouble. So furthermore, if you're doing video and you're supposed to do 30 frames per second, if each frame of yours takes a day and a quarter, you'll never get done. There's never be packed. Anyway so there's a big difference between direct and roll column decomposition. One thing that I didn't really mention when I was doing roll column decomposition, and I will mention it now, is this, in that if you're doing a roll column decomposition, right, the best is to show this picture, really. There's a matrix transposition step in the middle that we never talked about, and so what do I mean by that? Well let's look at the pictures. You start with your image like this, remember that when you're dealing with storage on a computer, the computer doesn't store things in a two-dimensional array. I mean you specify a two-dimensional array, but on the disk it stores things let's say row by row, okay. So the way physically how the numbers are stored on a disk is this row followed by this row followed by this row followed by this row. So when you want to do the operation, so you take, you read a row, you take one-dimensional DFT of it, and then you write it into a different matrix. You read the second row, you take one-dimensional DFT of it, and you concatenate that conceptually, that's the second row of this F of k1 and n2, but you concatenate that with this one. But after you've stored these things, so now you've stored F of k1, n2 in a row by row fashion. This row followed by this row followed by this row followed by this row followed by that row, right. And the next step is to do now column by column DFTs. You want to take this column, this column, this column, this column. Now these points, okay, are not contiguous anymore. They're in fact n points, this point and this point, on a linear disk, they're n points apart, this point and this point are n points apart, okay. So in some sense, to be able to do the, there's some IO overhead involved in terms of collecting these data points together, making it one-dimensional array out of it and taking one-dimensional DFT. And then after you take the one-dimensional DFT, now I store my data in this manner. It's column followed by this column, followed by this column. So in my disk, it's this, followed by this, followed by this, followed by this, followed by this. And this is the transpose of what I started off with. In other words, I assume my data was row by row to begin with, and I ended up my final two-dimensional DFT being in a column by column fashion. It's all an artifact of the fact that the disk in a computer is a one-dimensional entity, and it's not a two-dimensional matrix. So essentially, the data has been transported from being in a row by row matter to a column by column matter. Does everybody clear on that? And at the end of the day, you need to do, you need to do a matrix transposition. And one way of doing the matrix transposition is Oakland's algorithm, which I'll talk about, is a recursive method of doing a matrix transpose. So let me just briefly talk about that. Where's my main sheet? Okay, here we go. So one concept, or can you rotate, please? Thank you. So the basic thing that I just talked about is that row column decomposition requires a matrix transpose. Okay, so in other words, and the question is, if you do this matrix transport the usual way, you might end up having too many IOs. So the question is, how do we do the matrix transpose in an I/O efficient way? That's kind of the key question you want to ask yourself. And the answer is, there's a technique called Oakland's algorithm for transposition. And this is a recursive algorithm. And if you start, I'll show you how it works in just one second, but basically, if you start with an n by n matrix, then you need log to n stages, as you would imagine, for the thing to get completed. And the basic idea behind -- so basically, Euclon's algorithm is a fast, non-IO-intensive way of doing matrix transpose. Okay, and the reason it's a log to n -- it has log to n stages, and it's a recursive algorithm, as the basic idea behind it is divide and conquer, similar to FFT. And it is basically based on the following observation, if you can roll up just a little bit. So the key observation you need to make is the following. If I have a matrix A, which I can divide it into four submatrices, A1, 2, A1, 1, A1, 2, A2, 1, A2, 2. And it doesn't matter how big A is. A could be 1,000 by 1,000, in which case each of these are 500 by 500. So let's say A is n by n. Then A transpose is transpose of each of these pieces. A1, 1 transpose, A2, 2 transpose. But not A1, 2 transpose here. You first switch the place of these two guys before you transpose them. So it would be A2, 1 transpose, A1, 2 transpose. And the easiest way of seeing that in a simple example kind of way is -- suppose A is a 2 by 2 matrix, X1, 1, X1, 2, X2, 1, X2, 2. Then the transpose of it, we all know, is X1, 1, X2, 2, X2, 1, X1, 2. So that's a 2 by 2 example. But you can also convince yourself with more like a 4 by 4 or 8 by 8 or larger size picture. So I'm going to now resort to the nice pictures in Jaylin's books. So I don't have to do the -- to draw these pictures myself because it takes a long time and it will never be as nice as this. So if you can zoom in, great. So suppose that I have a sequence like this or a matrix like this and I want to -- can you focus it please so people can read the letters a little bit better because -- no, okay, great. Thank you. So these are the elements of my matrix or whatever the things I want to transpose. And it's X1, 1, X1, 2, X1, 3, all the way to X1, 8 here. And X2, 1, 2, 2, 3, 3, 1, 4, 1, etc. Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands, like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work, day by day, stitch by stitch. At Dickies, we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true work wear to your collection, remember that Dickies has been standing the test of time for a reason. The work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code WorkWear20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. And so given that matrix identity that we just wrote down, that transpose of A is transpose of each piece except that the diagonals are swapped. What does this tell you what our approach is? Is that we've started with a problem that was N by N. Then we've decomposed it into four problems. Each one is N over two by N over two. Closs a little overhead. And what's that little overhead? Well in this picture, the little overhead is the fact that this is A21 and this is A12. I had to swap the place of these two things. So if this was A12 transpose of A21 transpose, then I've just perfectly divided an N by N problem into four N over two by N over two problems and I'm done. I just keep doing it and finished. But there's a little bit of overhead and that is the overhead comes from the fact that these two guys have to get swapped. And it's the swapping of these two guys that causes the whole algorithm to have non-zero amount of work involved in it. So in each step, you can imagine, in each step, what I'm going to do is I'll start with this and then I swap these two guys with each other and now I've decomposed my problem into four N over two by N over two. And then I go to each one of these little guys and I repeat the same thing. I swap the diagonal of them. I swap the diagonal of this guy, swap the diagonal of this guy, swap the diagonal of this guy, and I keep doing that until I'm done. Is everybody clear what the basic idea is? So the question is that swapping that we do, how much does that cost us? Kind of like FFT. FFT you decompose an N-point problem into two N over two-point problems plus a little bit of overhead. It's that overhead that causes FFT to have non-zero amount of operations. It's the multiplication by the total factors, et cetera. And here it's the swapping of A12 with A21 that's the overhead that we have to keep track of that causes the whole matrix transpose to have non-zero number of operations. Is that clear? Okay. So coming back here, now if you can zoom in one more time, great. So how much? So staring at this picture here, so this is the first stage. All we have to do is move this blah into this blah and this blah into this blah in order to get something that looks more like this. Can you zoom out just a tiny bit? Thank you. So in essence, this square of numbers, this array of numbers here, as a whole, imagine being transpose moved from upper left to the lower right. X81, X82, X83, X84 have been written here. X81, X82, X83, X84. This row is here, this row is here, and this row is here. And furthermore, this square has moved up here. So now you've got X45464748, X45464748. So the first step involves that. And then the next step, Now we have to swap these 2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2x2 This thing here in this picture is primarily for illustration purposes. This relationship is still very much true, which is FA is in this case. These are the two things that get swapped. He's just chosen his matrix elements to be diagonal the other way. It's just a notation he screwed up in the book. Right, it goes, yeah, he shouldn't have done that. I knew that question would come up, but the way he's indexed things is slightly off. But the basic idea is the same. You swap these two things, then you have smaller matrices. You swap these things, and then you come into two by two matrices, and then you're done. So you can easily see that if you have an N by N image or N by N matrix, and you want to transpose it, do we all believe how many stages? Can we all intuitively see how many stages there is in this transpose? If this is N, then this is where N, you know, it's restarted by eight by eight, and we have stage number one, stage number two, and stage number three. So you can easily see that it's log base two N stages that's needed until we get to matrices that are small enough, they're just two by two. And furthermore, at each stage, the number of IOs is fixed. Let's just, for the sake of convenience, compute the number of IOs that we need to transpose to move this square here and this square here real quickly. How many IOs do I need? Well, I need one I/O to read this entire row. I realize I only need to read half the elements, but whenever you do a disk I/O, as I'm sure you know, you always end up reading more numbers. Even if I want to read like four numbers, I end up having, there's a minimum number of elements in the array that read returns to me. I don't care about the remaining numbers because I'm not going to touch them, but nevertheless, I get those anyway, that's for free, essentially. But let's not get sidetracked with such details. Anyway, I read this, and I read this row, and I swap them. Let's say I put this row in memory, and I put this row in memory, then I write this row, I write the first four elements of this row here, and I write the first four elements of this row that I stored here. Did everybody follow me? So there's a total of two reads and two writes. That's all I'm trying to get at. Read this row, read this entire row, store them somewhere, let's say in cache or something, and then write back the four elements of this row up here, and write back the four elements of this row back here. So there's four IOs that's needed per row. And how many rows do I have to do this over? One, two, three, four. In other words, n over two. So four IO per row and n over two rows, four times n over two, it's two n IOs. And you can convince yourself by looking at this diagram that the number of IOs in every stage is also still continues to be two n. So having seen that, let me come back here and write some of these things down. So let me come back here. So it's an iterative kind of algorithm. And as I said, the number of stages of decomposition, I call this decomposition, is a log base two n stages, and in each stage, we end up doing two n IO operations. As an example, let's just look at the first stage. If I look at the first stage, I have to do four IOs per row, and I have n over two rows. So multiply these two things together, I get four times n over two, which is about two n IOs. And I realize I'm writing too small now, you might not be able to read, so let me make this bigger. So it's two n IOs per stage. So if I have two n IO per stage and I have log two n stages, I can put these things together and write down that the total IO count is then two n log two n stages. And so, so this is how much Oakland's algorithm requires for matrix transposition. Okay. So how is this used in the row column decomposition? Well, again, you start with your data, you do our row DFTs, and okay. And then you do a transpose, and then. Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands, like our new brand Amazon Saver. 365 by Whole Foods Market, a plenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work, day by day, stitch by stitch. At Dickies, we believe work is what we're made of. So, whether you're gearing up for a new project or looking to add some tried and true workware to your collection, remember that Dickies has been standing the test of time for a reason. Their workware isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code Workware20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. Do column DFTs and then followed by another transpose to get the data back into the rent. So, you do the transposition essentially twice. Yeah. Yeah, so you were talking about two NIOs per stage, but it seems like as you go to the second stage, you're only going to have -- you're going to have to do the two reads and the two writes, but there's now going to be N over four rows instead of N over two rows. Let's go back to -- let me try to bring the book. Actually, you know what? No, because there's going to be more squares though, so it's -- Right, right. So, is everybody clear about this whole argument? Okay, so essentially -- and you might say -- okay, so this one is going to take two N log two NIOs. This one is going to get two N log two NIOs. And you might say, well, how about the IOs needed for row DFT and column DFT? Well, those are minor compared to two N log N. Each time you read -- let's just go over that. So, each time I want to do a one-dimensional DFT of a row, I do a read, one read, and then I do one write, right? And I have N of these guys, so I have to do N reads and N writes, right? So, this stage really requires two NIOs, and the same thing here. After I did the transposition, I did two NIOs. So, the dominant thing here, so the total IO, if I were to count it, would be four N plus four N log two N. And many times this is going to be much bigger than that, so it can be approximated as just four N log two N. This is the IO need for the row column D composition. So, I'm going to just write down C figure 3.20 and 3.21 of J-lim for the details of this Oakland's algorithm, because I didn't want to write down all the matrices and the numbers and the rest of it. By the way, Oakland is a Scandinavian name. Do we have a Scandinavian person in our class? We did last time, by the way. But he corrected my pronunciation. That was three years ago, I already forgot what they said. Oakland, I can't remember now. We have a Swiss guy, actually Sam is not here, but he could have corrected all of our German and French mistakes, but he's not here today, so. Okay, any questions or comments on this? So, in general, in signal processing, there's a lot of this notion of divide and conquer and doing recursive algorithms to start with something big and then to divide a problem of size N into a problem of two N over twos and then keep recursing, it happens all the time. And I think in our undergraduate curriculum, Alan Goop probably can attest to this better than I can. We have CS170, an algorithm course, that, do they talk about recursive algorithms, divide and conquer? And they talk about F50 as well, right? They ran out of time. Oh boy, okay. Huge mistake. Graph algorithms, how about sorting and stuff like that? They don't cover sorting either. Some other class, okay. All right. But we have a great deal of respecting the signal processing for FFT because it's really what revived the field in 1963. Before that, people couldn't compute discrete Fourier transform fast enough. Again, because of this factor that I just talked about, the 10 to the 5 year, if you do the rack method, you end up chewing too much time, especially what those computers that existed then. And again, of course, you could argue, computers are so fast today, why do you care about speed reduction? Well, the answer is, yeah, the computers are fast, but how about your cell phone? You want your hands to melt as your phone is receiving, is computing FFT of images that it received or whatever it's trying to do. So low power, low complexity is always of importance. Next, I'm going to quickly move on to the third method of doing FFTs. So we've so far talked about direct and raw column. And now I want to talk about the vector radix, FFT. And as I mentioned last time, this vector radix, FFT, is kind of the true translation of divide and conquer in 2D. And it does result in lower operation count than raw column decomposition, but at the end, it also results in a computer program that's a lot harder to write, a lot more complicated, a lot more intricacies than using raw column decomposition. So in practice, I bet you, for example, the 2D FFT, the algorithm you're using in your MATLAB, I'm almost sure does raw column decomposition and not vector radix. But let me just explain the basic idea behind it anyway. So before doing that, I want to just recall the 1D case just for completeness. So you start with the sequence x of n, now you divide it into even points and odd points. And just, just as a reminder for you, when you do one-dimensional DFT, one-dimensional FFT, there's decimation in time and there's decimation in frequency. But for now, I'm just talking about decimation in time, right? So divided into even points and odd points, I'll call this sequence x sub e of n, I call this sequence x o of n. And then I plug in, so this is an n over 2 point sequence, and this is an n over 2 point sequence, and x of n is an n point sequence. So I compute these two things, and then I get the output of these guys, and I'll just show the top input, the first output of this box and the first output of the other box, and you do a butterfly type operation. First of all, you multiply this bottom one by a total factor, w n k, where w n k is e to the minus j 2 pi k over n. That's the definition of it. So this is called the Twitter factor. You multiply this number by a total factor, and then you shove it into a butterfly. That means this number gets added with this number here to result in this output. And then this number gets subtracted from this number to result in this other output. And this could be the kth output, and k plus n over 2nd output. So I get the kth output, and this one is the kth output itself. So this guy has n over 2 outputs. I get the kth one, multiply that by w n k. I have to get the kth answer of the final answer, and then subtract in the butterfly to get the kth plus n over 2 answer. And I keep doing that for k from 0 all the way to n over 2, and the last one that I get is nth answer. So is this picture familiar to most of you? Not to you. Watch the 123 one hour, there's only one lecture in EE123 that talks about all the details of FFT. And I think it's pretty entertaining, pretty entertaining lecture. If nothing else, it allows you to get a good night's sleep. You fall asleep in front of it, and it allows you to not just joking. So is everybody, I guess, other than Chris, is everybody familiar with this picture? Are there more people who necessarily don't know this? Just to give me an idea of how much detail I should provide right now. I need honest feedback. Okay. So let me do the 2D version. Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands, like our new brand Amazon Saver. 365 by Whole Foods Market, Aplenty, and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work, day by day, stitch by stitch. At Dickies, we believe work is what we're made of. So, whether you're gearing up for a new project, or looking to add some tried and true work wear to your collection, remember that Dickies has been standing the test of time for a reason. Their work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code Workwear20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. That would become a little bit more clear because that way with one bird would kill two stones. For people who haven't seen the 2D, we'll understand it. I think the 2D itself can explain how the divide and conquer idea works. Here, the vector, so in two dimension, you have a sequence G of N1 and N2. I just show it for a very small case, N1 and N2. Suppose this is a 4x4 sequence. I'm going to divide this into four sequences. Divide G into four sub-sequences. Let's call it sub-sequences. G itself is 4x4. The sub-sequences are GEE of N1, N2, GEO of N1, N2, GOE of N1, N2, and GOO of N1, N2. E stands for even and O stands for odd. I'm going to show each of these sequences with a different symbol here. E in the first index means that it's even for N1, even for N2. This one, even for N1, odd for N2. GEE sequence is going to have even indices for N1 and even indices for N2. So if this is 0, 1, 2, 3, 0, 1, 2, 3, so this is a GEE. So GEE is this, even index NN1 and NN2. And then GEO is even in N2, even in N1, odd in N2. So I'm going to use this upside down triangle to show that. So that will be this one. So it's even in N1, so 0, this one, this one, and this one, and it's odd in N2. So GEO is this guy. So even in N1, odd in N2. And then GOE, I'm going to put the right side up triangle, which is odd in N1, even in N2. So it'll be this guy, this guy, this guy, and this guy. And finally, GOO, which is, I'm going to use this one, which is odd in N1 and odd in N2. So it'll be this guy, this guy, this guy, and this guy. So similar idea. I started off with an N by N problem, and I've now decomposed it into 4 N/2 by N/2 problems. So the question is, how is the DFT of G related to the DFT of these other guys? Okay, so if you can roll up just a little bit. So the question is, how is the DFT? Remember, at the end of the day, FFT is just a method to compute DFT, right? So how is the DFT of G related to DFT, G-E-O, G-E-E, G-O-E, and G-O-O? These are pretty, become pretty tedious to talk. So again, let me resort to the book because it's got plenty of good pictures. If I were to attempt to draw it on here, it would take me much longer. So I'm going to show, let me even write down what I'm going to show. So I'll show figures 323, 324, 325 of J-Lamps book. Fantastic, okay. Okay, so this is 323. So if you can zoom in, okay, great. And are we in focus? Yeah, mostly. So this is, oh, he uses an X instead of G. So let's not worry, the name of the sequence doesn't really matter. So it's again a 4 by 4 sequence. And then he subdivides it, so it's x of 0, 0, x of 1, 0, 2, 0, 3, 0, et cetera. And then he subdivides it into these four sub sequences. So this is GE, E, G-O-E, G-O-O, and GEO. Each one of these is now a 2 by 2 sequence, right? So we started with a 4 by 4 sequence, and we divided it into 4 2 by 2 sequence. In general, you start with an n by n sequence, and you divide it into 4 n over 2 by n over 2. When you were in one dimension, you start with an n point sequence, and you divide it into 2 n over 2 point sequences. But here, 4 n over 2 by n over 2 point sequences. And the idea now is that I take the DFT of this guy, 2 by 2 point DFT of this, 2 point DFT of this, of this, of this. And I now get this picture. This is a 2 by 2 DFT of GEO, GEO-O, GEE, and GEO-E. And I combine these four numbers in order to form the output here, here, here, here, and here. And I combine these four numbers that have a square around them to form this number, this number, this number, this number. And I combine these four numbers here to get this point, this point, this point, this point. And I combine these diamond-shaped four numbers here to get diamond-shaped points. And the reason I can do that is because, coming back here, because the answer to this question that I posed is as follows, is that if I start-- let me just have a consistent notation with the notes I have here. I call this sequence x. So let me just call it x. But x gets decomposed into GOs and et cetera. Because the DFT of x is related to the DFT of these four subsequences in the following way. x of k1 and k2 is the 4 by 4 point DFT of x is GE of k1 comma k2, which is a 2 by 2 DFT of GEE, plus GEO of k1 comma k2 times e to the minus J 2 pi k2 over n, plus GE of k1 and k2 times e to the minus J 2 pi k1 over n. Because this is odd here, you get k1. Because this is odd here, you get k2. Plus GOO of k1 and k2. Now I have two to it of factors, e to the minus J 2 pi k1 over n, e to the minus J 2 pi k2 over n, OK? Where GE is the n over 2 by n over 2 point DFT of little GEE, GE is this of GE. And then GEO is this of GEO. And capital GOO is n over 2 by n over 2 point DFT of GOO. And so pictorially, what this looks like is as follows. So let's stare at this and draw a conclusion from it. I wanted to do an n by n DFT of this guy. And I decomposed it into 4 n over 2 by n over 2 DFTs plus a little extra factor. Where's the little extra factor? It's this multiplication. This times that. This times that. This exponential is times that. It's that extra little plus thing that adds up and results in having non-zero total number of computations. If I could just do it without these total factors, boy. Pay Amazon Prime members. Why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands, like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty, and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work day by day, stitch by stitch. At DICKIES, we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true work wear to your collection, remember that DICKIES has been standing the test of time for a reason. The work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to DICKIES.com and use the promo code WorkWear20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made DICKIES a trusted name for over a century. - Done. I just keep decomposing until I get one point to get one by one or two by two and boom, finished. But in each stage, I have to do a little extra work, which is these two little factors, which results in non-zero computation. So let me pictorially show what this summation looks like. This is figure 325 of limbs. Can you zoom in as much as you can? Just, I want to see the whole picture though. Oh, yeah, in a little bit more in. Okay, I guess there's finite granularity of how much we can control the camera. So here we go, fine. That's right, it's quantized. So actually, it's not quantized to predefined levels. It's the delta that you can move it, that's course, okay. Now we challenged our operator, okay, that's excellent. Can you guys read these letters? - Yeah. - Okay, so this is GEE, this is the four point sequence. This is GE, this is GEO, and this is GOO. Each one are four point sequences. I shove them into a two by two point DFT box, and now this is capital GEE, capital GEE, capital GEE, or in capital GEE, or these are the DFTs of these input sequences. And this box here shows that summation that I wrote down with the exponentials in a nice way. So the output of GEE, you don't multiply by anything. The output of GEE, you multiply by a twiddle factor, which is e to the minus j, 2 pi k1 over n1. This is e to the minus j, 2 pi k2 over n2. And again, J-lim is assuming the sequence is n1 by n2 point, we're assuming it's n by n, that doesn't matter. And the output of GEO has two twiddle factors, Wn1, k1, Wn2, k2. And then you go into what's called the butterfly, okay, to compute this output, this point, this point, this point, this one, this is a butterfly with four inputs and four outputs, okay. So in general, every butterfly has four inputs and four outputs. So if I was doing a, I don't know, 32 by 32 sequence, then each one of GEOs, these G guys would have been 16 by 16. So I'd have 16 butterflies, right, each with four input and, oh, that didn't make any sense. If I had 32 by 32 sequence x, each one of these guys would be 16 by 16. Okay, so it'll have 256 outputs, okay, that makes sense now. So then I'll have 256 butterflies, each with four input and four output and that would comprise my entire, that would span all my output points, is that clear? What I'm trying to say is that there's a little bit of a problem with giving a four by four example, because you don't really know whether the size of the butterfly is four by four, is it because the sequence was 16 by 16 or not? The answer is no, the butterflies always have four inputs and four outputs and they're always of this kind of a structure. Is this point clear? So this is exact translation of this summation that we wrote here, except that they're using the notation Wn1k1 versus exponential. So this butterfly gives us four output points and then another butterfly would combine this point, this point, this point, and this point. It gives you x00, x20, x02, and x22. Another butterfly would combine this point, this point, the third point, third one, third one, third one, and third one and it gives us the third point, this one, this one, this one, and this one. Finally, the fourth butterfly would combine this point, this point, this point, this point, and that one to get this, this, this, and that. So that's kind of the story behind it. So staring at this thing, once again, how many stages are they and how much work do we do in each stage? So how many stages is it? Well, if you have an n by n image, first you go into four n over two by n over two, and then each one of the n over two by n over two becomes four n over four by n over four. And each one of n over fours becomes four n over eight by n over eights, and on and on and on. So how many stages does that result in? That's it, exactly log to n. It doesn't matter that it's n by n, it's the same thing. n becomes n over two, n over two, n over four, and over four, and over eight, et cetera. So coming back here, if you can bring the camera, thank you. OK, so great. So to start with an n by n, then it becomes four n over two by n over two, and then each n over two by n over two becomes four n over four by n over four. And then each of the n over four becomes four n over eight by n over eight, so it keeps on going. Therefore, you can easily see that there's number of stages for an n by n image is just log to n. OK? And in each stage, how many butterflies do I have? Well, in general, there's n squared input, n squared output. In each stage, there's n squared input and n squared output. And therefore, and each butterfly has four inputs and four outputs, and no two butterflies share the inputs and outputs. Therefore, there's n over two over four butterflies per stage. OK? So if I look here, let me write down some of the things I said. And also, each butterfly has four inputs and four outputs. And furthermore, they don't share inputs and outputs. That's why they get n over four butterflies per stage. And you can easily convince yourself with that by staring at this one more time, is that there's 16 inputs. And I'm showing one butterfly. You could have two, three, four. So 16 over four, so four butterflies. And once you've done that, then you can decompose each of these things further until you get done. So combining these two ideas together, the fact that we have log to n stage, can you roll up just a little bit? So we have log to n stage. And we have so many butterflies per stage. So the next question becomes is, how much computation per butterfly do we need? OK. Well, that's a good question. If I look at the way, again, if I look at pictorially what this summation, this decomposition here, this shows me the compositions of the n over 2 by n over 2 DFTs into making the n by n DFT. I translate that pictorially into this. This is a structure of a butterfly. It has four inputs. In general, the k1 and the k2 point of GEE, GOE, GEO, and GOO get combined to result in x of k1, k2, x of k1 plus n1, n over 2, k2, x of k1, k2 plus n over 2, and x of k1 plus n over 2, and k2 plus n over 2, right? So this is the structure of my butterfly. It's got four inputs and four outputs. And what's the number of operations that's needed? How many multiplies do I need in this butterfly? [INAUDIBLE] Anybody? Right. Just three. This one, this one, and that one. I don't count multiplication by minus one, because you don't have to multiply by minus one. This just means that there's a subtraction going on. Hey, Amazon Prime members. Why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus, save 10% on Amazon brands, like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty, and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work day by day, stitch by stitch. At Dickies, we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true workware to your collection, remember that Dickies has been standing the test of time for a reason. Their workware isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code Workware20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. So we need three moths and how many ads means ads or subtracts do we need to compute this butterfly? For each output point, there's four arrows coming in. This, it's an addition of four numbers. So how many ads do you need to add up four numbers? Three, right? So three for this guy, three for this guy, three for this guy, and three for this guy. Three times four, 12 ads. So looking at this butterfly picture, we need 12 ads and three moths. Okay, can you roll up? Can you zoom back please? Thank you. Okay, so now I'm gonna put all of these things together. I have log two n stages. I have n over two butterflies per stage. And I have 12 ads and six multiplies per butterfly. So multiplying all of these things together, I get the number of ads then is n over two over four times log two n times 12. And the number of moths would be n over two over four times log two n times three. Okay? And in a second, I'm gonna show you another, so approximately speaking, this becomes three n squared log two n, and this becomes three fourth n squared log two n, okay? And there's one more trick one can play in 2D that you can't play in 1D. And that is, we can simplify the structure of the butterfly. So the number of ads and the butterfly, the simple way of doing it is 12 ads, but it can be reduced from 12 to eight. Can go from 12 to eight. And let me explain to you how that works. It's a pretty simple thing, and it takes very little time to explain. So here's my, G-O-E's and G-E-O's, my n over two by n over two DFTs, okay? And it's coming into this box to result in X of K one and K two, and then X of K one plus n over two comma K two, and then X of K one comma K two plus n over two, and then X of K one plus n over two comma K two plus n over two, okay? So, and then the total factor is here, W n one K one, and I'm basically repeating J-lim's picture, W n K two, W n K one plus K two. So now it's, I have to do this, this, this, this, this, this, this, this. And then this, this, this, this, this. I think we did all of them, right? Oh, there's one missing. This one is missing too. Okay, so that's our four by four butterfly. I finally had to draw what was in Lym's book. But here's the way you reduce 12 to eight. Let me re-label some of these things. I'm gonna call this point, the value at this point, A, B, C, and D, okay? And if I do that, then I'm gonna define new intermediate variables in particular, capital A is A plus B, capital B is A minus B, capital C is C plus D and capital D is C minus D. If I define these, then I can say that, look, X of K one and K two is nothing but A plus C. X of K one plus N over two comma K two is nothing but B plus D. These are all capitals. X of K one comma and N over two plus K two is nothing but A minus C and X of K one plus N over two comma K two plus N over two is nothing but B minus D, okay? So if I define these variables, then I can write my outputs that I need in this way. So I don't need to do, previously the naive way of counting the number of ads was three, three, three, three, that's 12 ads. How many ads do I have to do under this new approach, the technique? Eight, so I can get away with eight ads. So coming back up here, I can roll up please. Great, so coming back up here, I can further reduce the number of ads from 12 to eight, that means I can do N over two over four times log two N times eight, which is two N squared log two N. And so that's the number of ads and the number of molds remains the same, which is three over four N over two log two N, okay? Okay, now not at all is said and done. What's the option of all these things? So we have the vector radix FFT, which is a true translation of what FFT was about. Then we had the row column decomposition. So what was the operation column for row column decomposition? If you all put a delay in some sort of a table, so here's number of molds, number of ads. And here's row column, and here is vector radix. So we just did vector radix, so let's just fill this up. It's two N squared log N ads. And it's three fourths N squared log N molds. And the row column decomposition requires N squared log two N molds and two N squared log two N ads. So there's no, so the number of ads is a toss up and there's no that big of a difference between the number of multiplications. There is the factor of 25% difference, I agree with that. But that difference doesn't grow as N squared or the size of the image or even the log of the size of the image, there's no factor like that. So if one method, for example, takes one second, the other method takes three fourths of a second. If one method takes a day, the other method takes three fourths of a day. One is just 75% of the other one. And you could theoretically argue that implement, I mean, actually we have, how many of you are experiencing programmers? Yeah, would you think this is, which one of the two do you think is easier to program in C? I totally agree. I think the implement in the first one is much simpler. And I bet you that if you, actually this bet can never be verified in one way or the other. But I bet you if you dug into the MATLAB implementation, it'll be just be using the ROCOM decomposition. But what you do wanna avoid is the direct method, because the direct method is just a suicide, it's end to the fourth. So the thing that you wanna really remember is, avoid these guys, because if you do, then it becomes a matter of one second versus one and a half days or one and a quarter of days. And you don't wanna get into those kind of trade-offs of those kind of situations. Okay. Any questions or comments? Oh yeah. So I'm gonna talk a little bit about filter design, just preliminary stuff. You know, what the different kinds of filters are in two dimensions, so that the stage is set for when we start talking about transformation technique on Friday. So to the FIR filter design, FIR stands for Finite Impulse Response. Again, we're dealing with linear time invariant or linear shift invariant systems. They all have impulse responses and finite impulse response means there's finitely many non-zero values in there. Okay. And we've already talked a lot about the fact that IR filters are hard to derive stability for. We've talked about ways of realizing IR filters using difference equations and how to pick boundary conditions and so that they're recursively computable and they correspond to a linear time invariant system, all of those nice things. But for the most part, when you're dealing with image processing problems, you'll be dealing with FIR filters. Simply because they're stable and they also result in zero phase or linear phase, if you will. And linear phase is always important in image processing applications to preserve edges. So there's basically, generically speaking, there's three steps for designing FIR filters. Come up with filter specs and that's very much application dependent. Somebody comes to you and says, look, I want to design a low pass filter or a high pass filter and these are the specs for it. Then there's the design of the filter. You have to decide what filter you want to use. That means determining the coefficients of each of them. The impulse response. And then the last step is implementation. And the implementation could be lots of different things. You could be implemented in an ASIC. That could be two years to design, tape out, debug, fabricate, all those things. You could be programming a DSP chip. For example, a TR or an analog or whatever. And there's the programming DSP chips is kind of in between programming and assembly and programming in C. It's not as high level as C, it's not as easy as programming in C, but it's not as tedious as assembly. But there's simple instructions like FFD, convolve, you know, simple things like that. How many of you have actually programmed DSP chips? Okay, actually, what application was it and what chip did you program? Oh, okay. And what application did you... I guess I'm curious, last time I looked at them, they had instructions like convolve, FFD stuff like that. Do they have any other high level instructions? Well, what was the... Yeah, actually, you know, I would like to do programming. All right, all right. Do they have their own programming environment? Right. But what were famous signal processing operations that they gave you in that programming environment for free? So you didn't have to write it yourself. Like inner product. Yeah, so we used to work in general as a C, but... That kind of stuff. So they had the own version of simple filter? Simple. Oh, I see. Okay. And you programmed TI chip as well? Yeah, it's the C to D 5 to 10. Uh-huh. So I got more RAM after C. Which application? Like the more RAM you see it, the more RAM you see it, the more RAM you see it, the more RAM. Lower RAM. And the slide kept bigger, so it said CTF. Oh, okay. I know what you feel, so... Okay, okay. And did you find it to be harder than C2 program? Um, the DSP. So that was a bit more difficult. Yeah, okay. Okay, so you can program things in DSP chips. And by the way, there's companies, for example, that come up with VSPs, video signal processors. These are chips that are specialized, that have specialized APIs and instructions to do video operations. Like, you know, you give it frame one, frame two, compute motion estimation, give me the residue, or give frame one, compute the DCT. And examples of doors are, again, TI is very big on that. TI is the largest DSP chip manufacturer in the world. And there are more TI chips and cell phones than any other chips. Actually, TI chips, at one point, they were like in 75% of all the chips in the world. All the cell phones in the world. And then there's Equator, for example, that's a video signal processing chip, where you program things in there, okay. So this is somewhere between a cross between, it's not as tedious as assembly, although one of us says it is. Hopefully, there's some instructions that make our life easier to get DSP special operations. Then you can also do general purpose computing on an NC, for example, on a PC or on a workstation or whatever. You have an image and you just open up the Microsoft Visual Basic and program something you can see, or if you're in Solaris, you write a C code and compile and give the input image and see what it is. And the beauty of C is that it's high-level programming. You don't have to dig into the assembly level of, okay, this register now gets this value and then transfer this register to the other one. It's just not so messy. So as we move down here, it's easier to implement and takes less time to implement. But what do we give up as we move along this direction? What do we give up as we move from ASIC to DSP chips to general purpose computing? Yeah, there's more specific instructions here, but think of the two extreme ASICs versus programming in C. Speed, I can do, if I have a, exactly, if I'm doing something very special and I built an ASIC for that job, it's no longer general purpose anything. That ASIC can only do that job that I designed it for, but it's doing damn fast. It's going to be many gigabits per second throughput, for example, or terabits per second throughput, whereas the general purpose computer is very easy to program. It'll take you five minutes to cook up a C program that does linear filtering on the computer. Even though it's got an X gigahertz CPU, it'll never be as fast if you design a dedicated ASIC to do it. On the other hand, if you change your mind and say, "Ah, the filter I really wanted to implement wasn't this, it was something else." The ASIC takes another two years before you do it, before you come up with the right filter caps. On the computer, you just change the numbers in five minutes you're done. There's trade-offs as you move around here. There's ease of programmability as you move down. DSP chip is much more programmable, easier to program, quicker to come up with the solution than an ASIC, and a C program is even easier to come up with the solution, with public solution than the DSP chip. However, these guys are, as you move up, you have less flexibility but more speed. In fact, your teaching assistant, Cindy, besides asking her homework question, you can ask her about her research, she's implementing an ASIC that does the compression for lossless layout compression for lithography, massless lithography project. Her thesis is actually a prime example of a situation where she implements the compression algorithms on the computer, and now she has to translate that into ASICs and make all those things work, and on an ASIC, she's shooting to get 500 gigabits per second, something that no computer can process, no matter how fast it is in today's world, and you can ask her how many years it's going to take her to tape it out, and fabricate, and debug, and test. That's essentially a PhD thesis, right? You get a PhD thesis off doing an ASIC, but you can't get a PhD thesis off for writing a C program that does something. Okay, I think I'm going to stop here, and next time we'll continue the discussion on the fire filters. Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop prime exclusive deals and save up to 50% on weekly grocery favorites, plus save 10% on Amazon brands, like our new brand Amazon Saver, 365 by Whole Foods market, a plenty, and more. Come back for new deals rotating every week. Don't miss out on savings. Shop prime exclusive deals at Amazon Fresh. Select varieties. We wear our work, day by day, stitch by stitch. At Dickey's, we believe work is what we're made of. So, whether you're gearing up for a new project, or looking to add some tried and true work wear to your collection, remember that Dickey's has been standing the test of time for a reason. The work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickey's.com and use the promo code Workwear20 at checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickey's a trusted name for over a century.