The Bold Blueprint Podcast

The Bold Blueprint Avideh Zakhor Recognizing your progress keeps you motivated and reminds you that you are moving in the right direction.

Whether it’s overcoming a challenge

Broadcast on:: 09 Oct 2024
Audio Format:: other

Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. We wear our work day by day, stitch by stitch. At Dickies we believe work is what we're made of. So whether you're gearing up for a new project or looking to add some tried and true work wear to your collection, remember that Dickies has been standing the test of time for a reason. Their work wear isn't just about looking good. It's about performing under pressure and lasting through the toughest jobs. Head over to Dickies.com and use the promo code WorkWear20 at Checkout to save 20% on your purchase. It's the perfect time to experience the quality and reliability that has made Dickies a trusted name for over a century. Okay, let's get started. What we talked about last time, let me let me make a couple of announcements first. The great at homework is ready for you to pick up. That's the last homework of the semester. Also the presentations are all next week from 9 to 11 in Wang Room. I would strongly encourage all of you to attend the entire two hours to ask questions and interact with your fellow students and various other things that that's going on. Are there any questions? Yeah. I haven't come up with that yet. I might or might not. Does that matter? Yeah, my worry is that if you come with any order then people just show up for theirs and come and leave. Don't want to do that. Yeah, that's a very good point. Bring your own laptop. If you don't, there's a computer there that you can use to access something that you put on some website that you can do that you can reach. Is that going to be actually something of it? Is there going to be an issue for any of you that neither have a place to put their presentation online to look it up? Oh, and then just look it up. Okay, that's a good point. Okay. Or bring a laptop. Okay. Is this an issue that it's at 9 a.m.? I just wanted to make sure that we get done because the room is solidly booked at 11. We get kicked out no matter what. Whereas if we have it here, we don't get kicked out. We can exceed 11. But I think that the projector is better there and we can actually see images better and various other things. Any questions or comments? Okay, so what I want to talk about is a wrap-up DCT. So what we talked about last time was the bid allocation stuff, the optimum bid allocation between DCT coefficients. And today what I want to do is talk a little bit about DCT quantization. Here we go. In particular, just basically go over. Okay, just basically go over how the various ways people quantize DCT coefficients and then quickly run into JPG. Show you some images that have been JPG quantized and then use JPG as a motivation why we do JPG 2000. And then that takes us automatically into multi-resolution sub-band wavelet coding and then come back to paper, explain some of that stuff. And then the next lecture talk a little bit more about JPG 2000 and some examples of real-life situations where wavelets are used and talk about what's called EZW technique that is used to encode wavelet coefficients because that's kind of different from DCP. So if you can switch back to the to the computer please. Okay, I will make those available to you on the web so you don't have to make copies of it. So just to remind you what transform coding is, the basic idea is that you're representing an image or a piece of an image which is shown here as a linear combination of a bunch of basis functions. So it's T1 times this plus T2 times this, etc. To learn to do page down. Okay, so this is again review of last time. How do you design how to pick your transforms? We talked about this last time. We want to have energy compaction. We want to have as few other coefficients in a transform domain contained as much of the energy of the signal and we also want the coefficients to be uncorrelated with each other as much as possible. And we went over KLT component load transform which is the optimal transform for a given statistical class of signals that have the same covariance matrix. We also talked about the disadvantage of KLT and then we moved to DCT and we showed that if the signal is the first order Markov process then as row the correlation coefficient approaches 1, KLT approaches DCT or the performance of DCT approaches KLT. And we talked about different block sizes and we talked both about the compression efficient, the compaction properties of DCT versus KLT versus Walsh Hadamard last time. And we also talked about block sizes and we found out that as the block size increases from 2 by 2 to 4 by 4 to 8 by it to 16 by 16 the compaction property improves. However beyond 16 by 16 this doesn't get any better which is one of the reasons people use that. Just so that you can you can visualize this DCT basis functions this is the we've already shown you the the 4 by 4 basis functions for 4 by 4 DCTs this is the same thing for 8 by 8 with with DC coefficient here and the highest frequency here and again as we move this way the the horizontal frequency increases as we move this way the vertical frequency increases this is the forward transform this is the reverse transform review of last times so this example is used can you read the numbers there no let me see if I can enhance this to make it a hundred percent so yeah so this is an example of a block and 8 by 8 block an intensity domain that we apply the DCT type 2 and this DCT 2 in block is actually probably a command in matlab that you can use and this is the set of coefficients that you end up getting okay so now we talk about quantization of DCT coefficients all right I'm gonna go back to 80 percent again okay so you have the 8 by 8 block you've taken the transform you got 64 numbers how do you how do you quantize each of those things do we've already talked about the fact that in practice nobody really uses Lloyd Max in essence if you use entropy coding techniques such as Huffman and arithmetic coding on top of uniform quantization you can do as well or even exceed Lloyd Max performance so so that's kind of the preferred method of all coding schemes that currently exist but how do you pick so in real life you're given a certain bit budget for your image and you have to come up with the quantization step size how do you how do you do that deep down I mean fundamentally this is one of the inherent problems of DCT and it's really two issues here one is I have a budget for an entire image and I've divided it into a bunch of 8 by 8 blocks there's two questions that need to be answered how do I allocate my budget between the 8 by 8 blocks that's step one step two not that I decided okay well this 8 by 8 block has a lot more activity than the other one this should get more bits than the other guy within that how do I then if I have X bits for that block how do I divide that X bits between the DCT coefficients how should a what quantization steps hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon Saver 365 by Whole Foods Market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon fresh select varieties we wear our work day by day stitch by stitch the Dickey's we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the Dickey's has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to Dickey's dot com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that is made Dickey's a trusted name for over a century as optimally should I pick for each of the coefficients that such that I hit that target bid rate because once you quantize each of the coefficients then the rest of the rest of the game is to Huffman code those etc and you that each quantization step corresponds to so many bits the Huffman coding or the entropy coding doesn't allow you to to change the number of final bits that comes out of each block the only knob that you have essentially to change the number of bits a social coming out of each block is just the quantization step size so this this has been and continues to be one of the biggest problems with discrete cosine transform and as a result there's a lot of papers that over the years that have come up with heuristics of ways of solving that and and here I'm not going to go over I'm thinking different methods that people have have proposed to do it but I'm just going to talk about basic guidelines that one uses in choosing the coefficients the step size for each of the coefficients so for quantizing these CT coefficients there's two factors that one has to take into consideration one of them we already talked about last time which is what if you've got n random variables and each of them have the same kind of PDF the same class of PDF but they have different variances then if the formula we showed last time can can tell you how to do optimal bid allocation between those and random variables right maybe I should fire up the the lecture notes from last time and flash that equation by you a little bit if we have it on there oh no not that okay so if we go to last time's lecture discrete cosine transform and rotated you should I my machine it does it right away okay go to the end of the lecture yeah so this this is the this is the this is how we ended last time's lecture essentially the idea was you've got n random variables you have a total of b bits to distribute amongst them and each of them have the same PDF but they have different variances sigma i is the i the variance of the i-th random variable optimally if you solve a Lagrangian optimization problem which we didn't solve but we can go through it you find that the b sub i the number of bits you have to give to the i-th random variables given by this so coming back to this problem here so the as I said there's two factors one of them is that and that you you have the some of the this some of the DCT coefficients for all natural imagery for all ensemble of all image blocks that human beings and can't encounter in natural imagery the the variance of these different coefficients are different from each other for example the ones near DC have a lot more energy and the ones that all the way are very high frequency ones have very small energy and small variances and therefore if you just strictly speaking applied that that optimal formula that I just flashed at you you got to spend more bits on the under DC coefficients or the coefficients with the higher variance but that's not the entire story because the second that brings me to the second factor and that is the fact that the human eye is a lot more sensitive to low frequency coefficients than it is to high frequency coefficients in general if you were to draw the the frequency response or the frequency characterization of the human eye it at very very small frequencies it it acts as a as a differentiator and it peaks and then it acts as an integrator so for all intents and purposes you can think of it as a as a low-pass kind of a characteristic in that the the eye is more sensitive to the low frequency coefficient so that has to be taken into account and so as a result people have spent a lot of a lot of time money effort experiments to in order to come up with the optimal quantization matrix for JPEG or for for eight by eight DCT which is really the underlying basic building block of JPEG compression okay so they call it this normalization matrix and it's shown right here right so this this is the this is the matrix that default that that JPEG announces and uses as a default normalization matrix now one thing I need to tell all of you is that even though JPEG and MPEG are standards and you would think that they nail down every bit of the specifications of how you do and how you encode something in real life they leave a lot of freedom to the to the to the people who build the encoders and decoders for example you don't have to use this this default quantization matrix for JPEG and again people have done proprietary companies like let's say Sony's building encoders for JPEG or JVC or Pioneer or Panasonic or whoever is doing that they do a lot of their own work to come up with optimal quantization tables optimum half-man tables there is some leeway with even within the standards in order to to improve the quality of the images that you get and in fact that's why you you hear I mean the first time I heard something like oh the Sony encoder for JPEG is very good I said what the hell is that there at all JPEG you know what difference does it make that's not quite true there's a lot of optimizations to be done individually by companies yeah say that again okay now let me let me back off so so if I use this this is the basic default normalization matrix if I use this quantization table that results in one bit rate right so if I wanted to get slightly higher study do it I simply scale this up so if I wanted to get higher bit rate I scale I scale this down by factor of two right now the quantization but but all the numbers get divided by two this is just telling you the relative importance of the different coefficients right so so essentially when you pick the quality factor in your JPEG encoder you're either multiplying this whole matrix all the elements of the matrix by two or dividing it by two or by four or by six or by whatever the number is that that you're playing with and and I'm sure you all agree with me this is one of the kind of the frustrating points about JPEG encoding in that in that it is a matter of trial and error and just because you picked this matrix for image one and it resulted in X bits doesn't mean that if I applied across 10 images it results in X bits in fact as a matter of practice I can tell you that if you apply this you know on 10 images you probably get a range of bit rates that that could be factor of three off from each other so this is one of the main issues and main problems with discrete cosine transform okay so so so this this matrix is is been designed both by experimentation by taking the human very visual properties into consideration as well as that equation that I showed you which which is mathematically correct but it doesn't take any of the perceptual effects into consideration so so here is again I don't know can you read the numbers or should I make this a hundred again we'll make it a hundred so here is remember from previous example here we had this intensity block this was our DCT coefficients 31 51 1.1 etc so you come down here and those are the same DCT coefficients and this is what happens if you quantize it with that with that normalization matrix you had hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon Saver 365 by Whole Foods Market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon Fresh select varieties we wear our work day by day stitch by stitch a Dickey's we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the Dickey's has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure the last thing through the toughest jobs head over to Dickey's dot com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that has made Dickey's a trusted name for over a century before for example the first one is the first quantization step is 16 therefore if I come down here if I quantize 31 with the 16 bit quantizer 16 with quantization with a quantize 0 coin with step size of 16 I get to quantize this 51 with 11 I get 5 which is shown here and this is this is already this is already quite small one has been quantized to 0 minus 24 has been quantized to minus 2 etc so the the main thing to notice about this quantized 8 by 8 dct matrix is this cut a lot of zeros and is this good or bad it makes us happy or not unhappy happy extremely happy that's that's why we went to the transform domain because we wanted to get that compaction property you can see that instead of having 64 non-zero coefficients which if we were dealing with space domain and we were doing prediction on pixels and all that kind of games I would have to deal with 64 numbers now I have to deal with 1 2 3 4 5 6 7 8 9 10 11 12 12 numbers or so okay so there they're fundamentally two ways of quantizing the the 8 by 8 coefficients one of them is like we've done now it's called you just apply quantization matrix like that it's called threshold coding and and and the ones that are zero they're just zero we forget about them and now all we have to do is is specify the value and location of non-zero coefficients as you can see this guy became zero and this guy became zero so what what impact is what people end up doing is you have to say how many there could be like a random non-zero value here so how do we deal with that it comes in in in the next view graphs but basically what you're doing is you use one length coding which I have to explain what it is in just a second to specify the location of non-zero coefficients and how many zeros there are in between non-zero coefficients and and then you you apply Huffman coding on to to code the the one-length codes okay and you all know what Huffman coding is right so so you you want you run run length coding to specify the location of non- zeros you run Huffman coding to both specify the values of the non- zeros and also the values of the the run length codes is that clear to everyone there are two sets of Huffman codes that get utilized yeah so let's let's move I think it gets explained even more so so here's the now the quant so so now that we quantize them to to these values we kind of send them to the to the receiver and now we have to unquantize them so the receiver doesn't know doesn't know that this first coefficient was 31 it just thinks that its quantized value is two so it looks up in the make the same matrix that it has it says okay the quantization step size for this was 16 I received the value of 2 therefore my reconstruction value for that coefficient is 32 and the same thing here it's 5 my quantization step size is 11 so I'm going to reconstruct it as 55 and so this is what the receiver ends up getting calculating as that for that 8 by 8 block and and you can see this is the original DCT block without quantization this is after quantization so certain error has been introduced 31 has become 32 51.7 has become 55 1.16 has become 0 - 24 has become - 32 etc etc okay and the next slide kind of shows the the the because of this quantization error now we have we have error not in the in the space domain in the image domain right these are the DCT coefficients now instead of taking at the receiver at the decoder instead of taking inverse DCT of that we've got to take inverse DCT of the quantized coefficients and because of that there's certain error being introduced so if this is the original this this is the original image block that we started off with and visually it looks like this after we've taken the DCT quantized the DCT coefficients and unquantized image looks something like this which is I think visually is a fair approximation and if I wanted to be even a better approximation of the original image what do I do just decrease the quantization step size that reduces the amount of error that that I introduced so that that's it's a knob that you can apply and in order to to affect quality now one of the reasons people don't like these CTs is and many software implementations if you're sending your video or images across a channel and at any point in time the channel has so much bandwidth available for example there's fading going on if you're sending over wireless channel or if you're sending it over internet there's congestion and all these other things so people people give a feedback to the encoder saying oh the channel has gone down so code this frame at the lower rate one problem with DCT is that if you have a target bit rate you have to do multiple trials trial and errors and exhaustive search to some extent to hit the correct bit rate and that's that's one of the main one of the disadvantages of DCT so in some sense it's not very good doesn't have it this is called rate control so DCT doesn't have very granular rate control again people have done a lot of work coming over with models for images you know ahead of time before I do anything can I compute do something but look at my block and and guess so that I can hit my target rate in one shot with this much accuracy that's still a very hard problem that's for those of you there's a subset of you doing matching pursuits or or kind of decomposition that's one at one huge advantage of that kind of a technique and for those of you who are not doing that project you'll find out about it next Friday okay so so now what you do is use zigzag orders so as I said one method which would be done here is threshold coding which means that you just you just quantize it and a bunch of the coefficients were zero and another method which is used a lot less it's called zonal coding and well let me let me get to it let me just go in an order so so now that we've got those DCT coefficients we have to to come up with some ordering of them because there's there's a bunch of non-zero values and then there's zeros and non-zero is mixed with the zeros right and we have to specify the location of the non-zero ones and the value of non-zero coefficients to do that we have to come up with some sort of a this is a two dimensional array converted to one dimensional area so we can talk about the location right and typically because the energy of the variance and the energy of the coefficients is the highest here and becomes lowest as we go across diagonally people do some sort of a zigzag ordering of these things so if you so this is coefficient zero one two three four five six seven eight et cetera okay and you might be you might be asking a question which is why not go down first and you could equally accept that there's there's some reason to believe that you know this coefficient is I forgot has a low horizontal frequency and high a vertical frequency so it's generally it has more energy than than this coefficient but it's if it does it's a very minor effect it basically stems from the following fact you take in picture of things because of gravity and because of the way we've laid things out there is in general a lot more horizontal lines than there are vertical lines but that's that's a hypothesis that's to be proven and therefore people argue that like this D.C. coefficient this D.C. coefficient probably has more energy hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon Saver 365 by Whole Foods Market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon Fresh select varieties we wear our work day by day stitch by stitch a Dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the Dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure lasting through the toughest jobs head over to Dickies calm and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that is made Dickies a trusted name for over a century then this one it's it's all second-order effects not nothing nothing to sort about too much in any event so so once you once you zigzag order these dCT coefficients in this manner you end up getting a one-dimensional array and if you if you plot the variance of that one-dimensional area as a functional coefficient index you get the decreasing function which is exactly what you wanted to do because now now you're gonna specify the location of non-zero coefficients and you want ideally you want all the non-zero coefficients to be next to each other and if by some random haphazard thing there's one non-zero coefficient here you all you jump to that but by by this rank ordering and by this zigzag scanning of this thing you you're able to pack all the non-zero coefficient for most images right next to each other is that clear okay so having done that so now I'm going to talk about the one-length coding and and and the Huffman aspects of this so so this was our quantized dCT coefficients two five zero minus two etc and this is a run length symbol representation can you read or shall I make this a bit larger a little bit larger okay we can do 120 okay much better okay so what can you read now okay so we start at this upper left corner this is two and the next the next one says zero comma five that means there's nothing there's no zero coefficients between this remember this is a zigzag order that we're going to do this this this down up down up this this is the spiral it's it's called zigzag that's the zigzag scan we do so we started to now we have to move to five and this is zero so with the symbol we generate is zero comma five implying there's no non-zero coefficients between two and there's no zero coefficients between two and five and the five here just represents this value the next one is zero comma nine means there's zero zero coefficients between this and the next coefficient which is nine the next one is zero fourteen from here to here there's no zero coefficients and the next coefficient is fourteen the next one is zero one which is represents this one the next one is is is a zero coefficient and the next coefficient after that is minus two we're not going to quote the zero all we're going to say is that there's one zero coefficients between one and the next non-zero coefficient which is minus two so that's this one comma minus two and then you come down this way this way this way this way repeat the same thing repeat the same thing and then here here you here you end up coming at minus one and you have a bunch of you have one two zero coefficients and and then a minus one so this is kind of this is the rep this is the symbol representation of this this thing here okay and then and then you hit and and end of block okay so the so what you end up kind of coding is for your run length values is counting the number of zeros like in this case one in this case two in this case we have I oh we don't have a non-zero coefficient here but if we had some oddball value non-zero coefficient here you have to count all the zeros to get to that point and and this is what what's meant by one length coding and in one length coding after you to run it going essentially is counting the number of zeros between non-zero coefficients and at the end you can you've converted this sequence that has a bunch of zeros and a bunch of non-zero values into a sequence of numbers 5 14 10 etc and that sequence of numbers is something that you have to entropy code again in a lossless way you apply Huffman coding to that in order to to represent it and that this is what's what's said here okay so so here's some pictures that hopefully can come out nicely on the screen I might have to reduce this by a little bit yeah well you know what I'll show two at a time because I want you to to see the that's too much one we'll make it 150 it's trial and error thing happens at PDF as well okay I think this is good enough so what you have on top now I have to open my own notes page 23 what you have on on the upper left is an original flower signal this is if we keep one fourth of the coefficients on each block and this is if we keep one eighth of the coefficients that means eight out of 64 and this is if we keep one one sixteenth of the coefficients that means four out of 64 so as you can see the the the coarseness the image becomes coarser and coarser as you keep fewer and fewer coefficients and you can see these these what's called blocking artifacts does that come through in your screen no does it no because the the screen acts as a low pass filter so it smooths out the blocking artifact well I'm gonna give the I'm gonna give the original of these to to Rosita to put on the web so if you download these I'm not I'm not gonna have her scanned this hard copy I have the electronic version of this so I'm gonna give this to Rosita to put on the web so after the class you can look at this pictures on the web and see that you know this thing has a lot more blocking artifact than for example the top one can you see there's a quality difference between these two is does that come across okay very good very good okay um so now so so that brings me to the last kind of comment about threshold coding versus zonal coding so all of what I've talked about for so far is a technique called threshold coding which essentially means we just applied that quantization matrix to the image and the coefficients that are 0 they're 0 and that's it we just run that code and half my code and that's by the way that's used in 99% of the systems I haven't heard of too many people using zonal coding but just so that you know and they're familiar with the name zonal coding means that you you have it you have it you have a predetermined zone in your 8x8 DCT region for example let me come back here so this is my 8x8 blocks I could say okay this this this triangle here is what I always code regardless of how many zeros there are here and how many non-zero coefficients scattered outside I have a prefix zone in mine and I'm always gonna code coefficients there that's referred to as zonal coding hardly anybody uses it okay so so that's that's an alternative to what's called so this is called zonal coding that's an alternative to what's called threshold coding which is used a lot more frequently and we already talked about threshold coding essentially apply the quantization matrix okay so so then as I said the other thing to remember is that each each of the coefficients that are non-zero has to get Huffman coded in addition the one-length codes which tell you how many zeros there are between non-zero coefficients gets Huffman coded so there's a lot of Huffman coded tables hanging around and the Huffman coding as you know it's merely a bid allocation scheme right it introduces no loss no error it's just allocates bits essentially versus allocates fewer bits to the symbols that occur more frequently and bigger larger number of bits to the symbols that occur less frequently so having described all that what is the potential problem with it with a communication system that employs this especially if you're using it in a video setting where hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop Prime exclusive deals and save up to 50% on weekly grocery favorites plus save ten percent on Amazon brands like our new brand Amazon Saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop Prime exclusive deals at Amazon Fresh select varieties we wear our work day by day stitch by stitch at Dickey's we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember that Dickey's has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to Dickey's dot com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that has made Dickey's a trusted name for over a century each frame is predicted based on the previous one well that is it one bit in the Huffman table gets flipped the whole the entire bit stream gets corrupted and you just make huge mistakes as to what coefficients was where and everything like so so it's a so it causes kind of so for this reason and if it is for video that error keeps propagates from from frame to the other that kind of a bad thing so for that reason there's been a lot of work on what's called error resilient coding and error concealment error this and or that because in in for example if you're sending video over wireless channels or there's a lot of noise and unless you use reliable protocols like TCP that that that end of retransmitting until the correct packet is sent entire in its entirety it's in it in its entirety unless you use that you're gonna be in hot water so people have come up with what's called reversible variable length coding I mean reverse for Huffman coding so if something goes wrong you periodically have to re-synchronize all the blocks so that the error doesn't indefinitely propagate through your images I mean there's a whole body of literature and maybe when I taught the course I don't know in late 99 or 98 or something like that a lot of the students did their projects and paper review on this topic of error resilience it was a very hot topic in in late 90s error resiliency error recovery error concealment that kind of a thing okay okay so now not that we've explained all the machinery and let's just talk about JPEG a little bit so there's there's multiple flavors of JPEG first of all it stands for joint photographic expert group an impact stands for moving picture expert group I mean you have to have a big ego to call yourself experts right and these these guys meet four times a year and and companies and representatives and each company tries to put their approach and their patent into the standard because at the end there is a patent pool and the only the only way companies recover all the R&D money put into generating all those results is is to have patents that get into the standard so there's a huge political battles in the standards meeting as to as to which techniques get into it and it's not always purely technical superiority of one technique versus versus another and then some companies they don't want it they don't want to show their hand so they hire other companies secretly without telling the rest of them to represent them and it's a start-off lab is the one that usually does that they're they're essentially a consulting house and so they get hired mostly by Asian companies to present results and try to get the patents in so it's a big political kind of a conglomerate if any of you are interested I can tell you stories about that at some other time but anyway so so I it's part of ISO international standardization standards organization and MPEG is part of ISO as well and JPEG and MPEG are both a standard are done jointly by ISO and ITU and ITU is the International Communication Union so there's three flavors of JPEG okay there's basic JPEG that was that was became that was completed if you will or wrapped up early 90s and it's a loss decoding and it's based on TCT pretty much the techniques we've talked about so far then there's a loss less JPEG which which results in lossless encoding of pictures so that means that if you had a picture of that every pixel was eight bits deep then after you JPEG included it using JPEG lossless the reconstructed JPEG image at the other end bit by bit matches the original nothing that got lost and you might say well what what are some applications of this medical imaging is one because there's dangers of lawsuits and patient lawsuits if there's errors introduced in the images and then you know big ego big ego Hollywood directors we don't want to introduce any loss in their in their in their movies because they think that every scene has its own intention and if nobody else comes and compresses it it distorts their original intention of what they meant so so that's another kind of application like what what Alan and and Cindy are both working on are also lossless coding but they're not based on JPEG but in those applications you're you're compressing layouts of integrated circuits and if you introduce losses for example a functioning transistor might become dysfunctional and and then you you end up wasting billions of dollars building a chip many many copies of a chip that don't work so it's very important to have any loss there okay so there are application where lossless is important and then there's the third one called JPEG 2000 which as the name implies was ratified around 2000 and that's based on wavelet transform and we'll talk about multi resolutions and wavelet coding and some of it possible in today's lecture if not on Wednesday's lecture so how does JPEG the regular 1992 JPEG standard work well it's it's what it's called the baseline JPEG it's lossy and each each color component there's RGB each one is divided into 8 by 8 blocks and each 8 by 8 blocks we do three steps we do block DCT perceptual based quantization which is that normalization matrix I just showed you and then we do variable length coding run length and Huffman we discovered that so all the basic components we talked I mean we've talked about Huffman coding let's talk about run length coding very briefly now and quantization and DCT and it all comes together in one place so there's some additional tricks and sauces they apply for example for DC code so so you have blocks 8 by 8 blocks coming one after another each one has a DC coefficient one trick people use and they're actually using it a lot even in H.264 interframe coding is that there's correlation between the DC coefficient of this block and the DC coefficient of the next block even though we applied transform coding to reduce correlation across blocks there are correlations so what you can do is you can apply predictive coding like DPCM or Delta modulation whatever it is but across the DC coefficients of successive blocks and code the error rather than code the actual DC coefficient by the way the other thing you should know is that even though in all of these discussions we've assumed the PDF of all the coefficients in the 8 by 8 blocks are the same in practice people have shown that the PDF of the DC coefficient is slightly has a different shape fundamental not just a different variance but fundamentally different shape than the other coefficients the thought is that the PDF of the DC coefficient is more Laplacian like this than than the other coefficients anyway so then the AC coefficients are one length coded and I think this this slide is pretty much a repeat of what we've done before so I won't go too much into the details of it too much details I'm gonna skip a lot of these things okay this is a good slide to stop so I'll first show you the global view which is the this is the 8-bit per pixel linear image original this is the 0.59 bits per pixel this is 0.37 bits per pixel and this is 0.22 bits per pixel and as you can see there's a lot more a lot more blockiness here than there is here or here and now let me zoom it up even more so you can see better so that's the original this is 0.59 and keep that in mind and I'll go down and show you 0.22 you can you see the difference between this and that can you see the blocking artifact in this one really and and and and and there's again a whole series of papers published how to remove the blocking artifact including myself there was a paper by Ruth Rosenholz and myself in 1992 transactions on CSBT which is actually the second most cited paper I ever wrote about how to remove this blocking artifact it's applying iterative techniques okay actually it's not it's very easy to describe because you've already done homeworks on it the basic idea is what what's the simplest way of the blocking a picture blur yeah low pass filter blur but then when you blur if suppose I blur this image and I take the DCT code and I go through the encoding process again and I take the DCT of it and I and I quantize it the same way this was supposed to be quantized if as a result of blaring the coefficients that were supposed to be in this quantization being fell out of the quantization bin I'd re-quantize them but for example if they fell out this way I re-quantize them to the boundary of the quantization bin if they fell out this way I'd re-quantize them to this and I repeat the process so I blared just enough so that the quantization constraint remember when we when we unquantize this guy as the DCT decoders usually do the dumb thing if the quantization bin is here they pick the midpoint and say that's my reconstruction value well the basic idea we came out with was forget about the reconstruction value allow the reconstruction value to float in this entire quantization bin and so blared up so that it improves the quality but at the same time don't blare it so much that the quantization coefficient falls outside of the range it was supposed to be in and repeat that iterative process and and and you get something that that that's not too blurry but at the same time doesn't have the quantization artifacts yeah in the past I would bring that paper and and and show it on the screen but I didn't do this time it's a pretty old paper okay so so how do we apply this is all for for for black and white JPEG pictures how do you apply this to color images well you can do one of two things one way to do it is to say that we are going to do actually you guys can't even read that can you it's kind of hard let's just stick to 90 this this room really needs upgrades so one way to do it is to say each image has RGB components and red green and blue and just so that you have an idea what I mean by that why don't we zoom to the camera to this part in the middle of your J-lim books there's a color insert that they put here so this is the last this is the Vegas picture right and this is the red green and blue components of it just so that you know but but there's a high degree of correlation between these components so so if you know that if you were to plot the probability distribution the histogram of the red pixels and the green and the blue their peaks and the shapes of those histograms look very similar to each other that means that this this big correlation with my RGB right and remember the name of the game in compression is what is to take advantage of that redundancy you want to make things as uncorrelated as possible so if I start with a color image and decompose in RGB and code are separately from G separately from B I haven't exploited the correlation with an RGMB that's terrible I've spent more bits than I should have and so one way to get around that is to convert RGB to a different coordinate system Y CBCR which is called Y is the luminous components CB and CR are the two commonest components and and then the other so that way now in this in this new coordinate system Y CBCR as I'll show you in just a second the the the loom the there's no correlation between the pixels anymore supposedly there's just or at least people if you if you plot them if you do correlation answer you see that there's a lot less correlation between the Y CBCR coordinates than there are between the other than than between RGB the other thing that that you should know is that if you were to plot the frequency response of the eye again characterizing human visual properties if you were to plot the the frequency response of the eye for luminous component we have a different response than for commonance so the the the peak response for luminance occurs at at the higher frequency three to ten cycles per degree than for commonance which occurs at point one to point five cycles per degree question and we have a vision science guy right go ahead oh absolutely I would I bet so yeah I'll talk about that so so as a result because the eye has a higher bandwidth and luminance and a lower bandwidth and prominence you can sub sample the prominence without losing anything and in fact actually when TV was first launched in this country even though I wasn't here and none of us were born at that time it was just black and white and then it was only after thought that they put coma and had to do that there was a little bit of spectrum and this squeezed in the coma there so the bandwidth occupied by Luma on the TV stations is a lot higher than the than the coma and boy what a difference it makes and of course you have companies like Technicolor that we had a guy from Technicolor visiting us just yesterday they make most of their money from painting movies so when when when directors shot shoot a movie they're usually not happy with the color so they pass it along to these guys and by hand frame to frame they have artists that sit down and adjust the color so it looks good when it's when it's released okay so so coming back to your point about sub sampling here here is the four kinds of okay well before I get to that so so here is the the matrix conversion if you have RGB values you can apply to this matrix to get y c b c r and then can go back from from one domain to the other domain as well all right so it's it's an inverter man it's just a three-by-three operation now coming back to the prominent sub sampling there's many many formats for video and color images and I'll start from left and I go to the right there's four four four where for every two-by-two y-pixel there's also four cb and four cr pixels so in this picture the black the black circles of y and rectangle here is I'm sorry trying it the red trying it is a cb and cr pixels so in four four four this four y pixels four cb's four cr then we got four two two which means that for every two-by-two pixel two-by-two y pixels there's two cb's and two there's two cb's and two cr pixels okay so that's four two two means for every four of these there's two of these and two of that then you go to four one one that means for every four y pixel there's one cb and one cr okay and so there's oh sorry sorry in this case not for every two-by-two but for every four-by-one y pixels so for every four of these guys there's one cb and one cr so another four one cb one cr another four etc and then you have four two zero where for every two-by-two pixel you have one cb and one cr pixel okay so there's a hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon Saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon Fresh select varieties we wear our work day-by-day stitch-by-stitch the dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and last thing through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century. So when you hear about video format that was you know 4 2 2 4 2 0 that that's kind of what it refers to it's not the most intuitive thing but but at least the numbers kind of represent what the thing is all about right and the most common format is what is anybody know it's 4 2 0 that's the one that's used the most and in fact in JPEG coding and in MPEG coding most of the time the video is that the initial video format that you receive to encode is it's through cameras through high definition cameras though there is in this in this format so how does how does JPEG do that deal with this well they've got the thing called a basic coding unit which we call the data unit and it consists of four 8 by 8 blocks in and of y Luma and one 8 by 8 block of CB and one 8 by 8 block of C4 so there's four to one subsets so for every four y pixel there's one of these guys four y pixels there's one of these guys right and and and and then you just do the encoding of these things separately and you get much better performance than doing RGB kind of encoding so RGB encoding is almost not done and here is the default quantization table for the Y and the CR and CB so for why we are I already showed you this earlier for the intensity right the intensity signal by the way it's just a black and white portion let me show you this picture I think that that would be useful for that Vegas picture right here can you switch back to the to the camera here so this is the for that same color picture that I showed you a second ago where is it this my insert here yeah this was the Vegas picture in color these are the RGB components and this is for that same Vegas picture this is the Y component this is the I and this is Q and let me see is it the same it hold on yeah it's not yeah YIQ is yet a different yet a different kind of a coordinate system than Y CB CR but it's pretty close so this is why IQ okay but but the point is that the the black and white signal that you see is essentially the Y in the YIQ is the same as the Y and this Y CR CB but then CR CB the color components there are different from here there's a lot of different the point to be taken home it's just a lot of different color color systems and you have to do a lot of conversion but but anyway for CR CB for the default quantization table for luminance for why is is the same as what I showed you earlier and for chrominance CR and CB it looks like this so you can see there's huge quantization values for all of these coefficients okay I'm moving down performance of JPEG I'm just going to show you a few pictures see let me see if I can okay so for this is on the left is the original you can see it's got vivid colors and on the right is something that's being quantized at 39 at 1.63 bits per pixel so I would say I'm looking at the higher resolution computer screen I don't see that much difference like compare the apple with the Apple as they say comes right just right I didn't plan that but anyway and the pair with the pair and whatever the the rose you can't see but I think this is a little bit coarser than the other this one is a little bit shinier I don't know if it shows on your screen or not there's a lot of flicker on that screen moving down is that the only picture I got no moving down we can talk about pros and cons of JPEG and the problems of JPEG which are numerous because it's already a 10 year old standard you know it's low complexity memory efficient reasonable coding efficiency in fact in on the Internet most pictures today are JPEG and other than JPEG what else do you find on the Internet cliff and Jeff and what else PNG and and how are the efficiency of those compared to JPEG they're not so good so it's reasonably good but it has all these cons and the reason that all these cons are listed is because then at some point people said oh we need to do JPEG 2000 see here's the thing the people in companies who go to this standardization meeting they want to make sure they job continues forever right so as soon as they finish one standard they said oh there's so many things wrong with that standard we got to do a new one because they want to keep going to these meetings and they want to keep their job alive and boom now we do JPEG 2000 anyway these are the cons for it it doesn't have spatial scalability doesn't have SNR scalability it has blocking artifacts it has poor error resilience you know it doesn't have all these bells and whistles they wanted to introduce tiling and regions of interest so boom now we come to JPEG 2000 so what are the features that it has well it it has improved coding efficiency and it achieves that by using redlets which we talk about today or on Wednesday it has full quality scalability and I'm going to talk about scalability in just a few minutes there's this temporal scalability for video for images there's spatial scalability and what's called SNR or quality scalability I'll talk about all of those in just a second has improved error resilience is tiling regions of interest but it is more demanding on memory and computation time so what is scalability basically scalability means the following idea if I encode an image in a scalable way I'm generating a scalable bitstream with a property that if I receive I don't know a portion of it I can still decode and have a slightly degraded version and by degraded it could be either smaller spatial quality or it could be lower visual quality and then as I encode more bits here's a bitstream from left to right right so if I receive only this portion I get let's say the spatial scalability means this I get it I can decode it to a small image and if I as I receive more bits I can make it bigger more bits bigger and more bits bigger so in this case I'm showing spatial scalability you can also think about what's called quality scalability or SNR signal to noise ratio scalability so in that case if I receive a little bit of the bitstream the size of the image that I decode is the same but the quality is poor and as I get more of it the quality keeps improving but but I have but I need to have encoded the image in a scalable way to begin with so that it it decodes in this nice scalable way right and what what are some reasons one would want to do that why why why is it a good idea to have scalability anyone uh-huh real-time communication what do you mean by that mm-hmm okay name one application that we're all engaged in that that would benefit from something like that web browsing right so what do you do when you're browsing you go from one page to the other and if there's a big if the big image with lots of bits in one page then you have to wait for that whole image let's say you're browsing you're trying to find an image of interest and and if the image that that you're after takes takes if every image on that page takes two minutes to completely get decoded it takes you two minutes per page to figure out if that's the page if you want it and this is the image you want it but if you get a quick glance in like two seconds oh that's a picture of a flower I'm really only after I don't know a car then you quickly get out of it and move to the other one you can get it quickly you can get a course idea of what the image is all about without having to wait to hear the whole the whole thing right actually I kind of wish they made TV programs and movies like that right essentially like the trailer for movies is that like you get a two minute overview and you don't run it you just walk away or you can you can write scalable books right read the introduction part of the chapter you don't like it just just if you don't like the book just dump the book too that's that's called the preface book okay so says every clear what scalability is all about and you can apply we can you can do this in a temporal domain for video what does it mean well instead of viewing it at 30 frames per second you could you could have it so that when the bandwidth goes like if you're sending things over the internet and then with highly fluctuates right rather than real-time and controlling the encoder so that it produce fewer bits or more bits or more frames or less frames you can encode it in a in a in a scalable way so when the channel goes down you send fewer bits and when the channel goes up you send more bits and the quality improves and again that's another paper I did with Dan 10 in 1999 transactions on multimedia that's my third most referenced paper and actually if you want to know what my first most reference paper is it's something that David Topman did that went into JPEG 2000 and I'll talk about that next time okay so scalability so let's move on what what do we mean by all of this so here's an example of three images this decoded from a scalable bitstream at point one two five point two point two point two five and point five and let me increase that to let's say 150 so you can see so at you can see there's a lot of artifacts on this image compared to this well I think it's but I don't see too much between these two let me just memorize this and look at this do you see any difference I do yeah I think the TV really just kills the whole if you just had an LCD projector if you could see exactly what we got on the screen here anyway again the electronic version goes on the web I encourage you take a look at it after we put it on the web yeah I mean basically there's a lot yeah I don't think I don't think you see that let me just move on to the next thing so spatial scalability so what you have is a small version of this and then you send more bits you get a slightly bigger and some more bits and you get slightly bigger so so this this this scalability thing is a great thing but what could be one of its drawbacks one of the problems is that if you buy by by the virtue of making your bitstream scalable right by introducing that property you might lose some coding efficiency right because now you have this additional property that the first I don't know the first 10% of the bitstream can be decoded and give you this image and so at the encoder side we'll talk about how to do these kind of encodings in just a second but at the encoder side you have to encode this first and then encode the difference between this and this and then encode the difference between this and this and these additional steps could inherently increase the the code could inherently degrade the coding efficiency and really David Thomas thesis who was a student at Berkeley my my maybe third PhD student was the first I mean that his thesis lay down the groundwork as to how you can do this without losing coding efficiency and in fact JPEG 2000 is it is it there's a fully scalable coding technique and in fact it's called fine granularity scalable what does that mean is that this this is three level scalability I can I can decode from this to this to this so it's not very fine there's another thing called SNR scalability which has fine grain that means that every additional of two bits or five bits improves the quality so I have many decoding points right many many many thousands of decoding decoding points thousands of rates at which it's encoded and thousands of rates at which it can be decoded so David Thomas thesis was good because it was the first time people showed that you can eat the cake and have it too can have fine grain scalability and not hurt the the the reconstruction quality and decoding efficiency and in fact his thesis was kind of the the the the building block for what later on became JPEG 2000 he went on to to write after Berkeley he went on to HP and he did what I just told you he went to all the standards meeting for four years and he got the technique into the standards and then he took it academic job at Australia where he's continues to be now and he he managed to you know continue to perfect the technique but but really the the groundwork I would like to think at least was was later Berkeley and and that that is that is my most reference paper but coming down here just not that I told you all these things let's just look at some images to convince you that this JPEG 2000 is better than JPEG so and it has better coding efficiency at the same time as being fully scalable so this is an original picture 512 by 512 and this is encoding it using JPEG do you see any difference between them you do I don't all right here okay okay all right and now if I slide to the right a little bit more this is the JPEG 2000 version so this is JPEG and this is JPEG 2000 and you can see there's this vast vast differences between them yeah do you does everybody agree with that and next next picture this is JPEG 2000 and if I move to the left that's JPEG you can see all these contouring and block in does that show yeah okay good something shows okay so how does JPEG 2000 achieve the scalability the core of it is really a wavelet transform and and I'll talk about I think it's this is a good time to to to start I'll talk about multi-resolution coding pyramid coding and wavelet transform wavelet decomposition next next time hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon saver 365 by Whole Foods Market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon fresh select varieties we wear our work day by day stitch by stitch a Dickey's we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the Dickey's has been standing the test of time for a reason their work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to Dickey's dot com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that is made Dickey's a trusted name for over a century