Archive.fm

The Bold Blueprint Podcast

The Bold Blueprint Avideh Zakhor By showing up and taking action every day

By showing up and taking action every day, you build habits that bring you closer to your goals.

Broadcast on:
09 Oct 2024
Audio Format:
other

Hey Amazon Prime members, why pay more for groceries when you can save big on thousands of items at Amazon Fresh? Shop Prime exclusive deals and save up to 50% on weekly grocery favorites. Plus save 10% on Amazon brands like our new brand Amazon Saver, 365 by Whole Foods Market, Aplenty and more. Come back for new deals rotating every week. Don't miss out on savings. Shop Prime exclusive deals at Amazon Fresh. Select varieties. [Music] I also need to make some announcements about the presentations and the schedule for the rest of the course because some of you have sent an email. I want to try to remain consistent to what I had announced earlier. So help me, help me remember what I had said or what I put in the handouts. The reports I remember what I always do on the 15th of May. That's written down in the handout that I gave you. The question is about the presentations, the class presentations. What did I say before? What did I announce? I recall having a been the last day of the class which I announced last time a few times ago that might not quite happen. Is that what most of the people remember? So other than the last day of the class, I didn't say any other variant of that, did I? Right. The issue is that the department has a retreat that all faculty are required to attend on Thursday May 4th and Friday May 5th and Friday May 5th is our last day of class. So I kind of have to make a decision as to you know there's this few choices we have for Friday May 5th. Either the whole class gets canceled because I'm not available or maybe Cindy gives a lecture or somebody else gives a substitute lecture. In that case then we will have the presentations on May 3rd. Another option for me is to say you know let's just dump the department retreat and just have a regular class stick to the May 5th schedule of you guys presenting. So those are kind of the two options. So if I have a substitute teaching the May 5th class or if it is canceled then the presenters will be due May 3rd which is even two days before. So is that gonna cause trouble? But both May 3rd and May 5th, right? You know I like my tendencies to just stick to what we announced and just stick to May 5th and and just not attend the not attend. I mean despite of how bad it looks to the rest of the department I think the class is kind of more important and this is what we announced this is what we agreed upon and my tendencies to just stick to May 5th. I mean another thing that goes through my head is that if the reports are due on May 15th I bet a lot of you get a lot done between May 5th and 15th. So one other possibility is that okay let's not have presentations at all. Let's just so that you have really till May 15th to do your best in the project and have more results etc and just attach a PowerPoint in your in your or you know a 10 page as if it was a presentation that would be kind of another option. So as of now I think other than the retreat constraint at your end there's a group of you that's not available on May 3rd but is available on May 5th is that correct. How many is that? Just you okay you know what let me let me just let me let me do it Wednesday now let me do some more thinking to Friday I'll make an announcement on Friday what what the the class have a preference one way or the other at least the ones who are here. You like the attachment okay how many people like the attachment idea oh my god okay what raise your hands one two three four five hundred people like the presentation idea you're kind of in it's you're in minority but I know you have a very good point I also like to listen to people and there's some people who never raised their hand like Allen and Miggs what's the difference to you guys that you get you get more of a feedback you get see one advantage of doing their presentation before the report is due is that through this interactive process of me and others listening to your presentation and asking questions you can actually improve and have some more results but by the time the report is due and then the other thing is you know how much progress are you are you making is is everything ready by by May 5th or would that additional ten days buy you a lot more in terms of getting the last minute coding or results done that week I'm in Washington DC and Ohio for two government reviews so that that would have been a good choice but it's just shot okay well let me do something okay let me let me consider all of the input and I'll make an announcement on Friday what we're done okay so any questions and comments about the ways lecture last Friday okay so just to kind of recap what we're doing so this is the last kind of module if you will of the course and basically what we're talking about is an image on video compression and you know if I've thought a lot about how to do this this last module I mean I think about it every time I teach the course I try not to repeat what I did exactly the last time simply because things are changing all the time for example when I thought the course in in 1997 there was no such standard as JPEG 2000 and and now there's you know there's JPEG 2000 and the standards keep evolving the new compression techniques come about in any way my preference has always been rather than I mean there's two approaches one can take one is just pick a standard like MPEG-4 or H264 and just explain every detail about it but the problem is you've spent all your time on that one standard and and you don't really learn about anything else the other approach which I think is much better and that's what we're pursuing is kind of teach you about the underlying building blocks that go into different standards okay for example discrete cosine transform is in many many standards is in J old in JPEG it's in MPEG-1 MPEG-2 MPEG-4 as of JPEG 2000 time people have switched to wavelets so wavelets is another building block or sub-bank coding another building block we should concentrate on so rather than really telling you all the detail in MPEG-4 okay now this happens in you know extreme level of detail it's best to talk about the concepts and the and the building blocks that that that that present that are in these different compression techniques and that includes audio and video by the way for example discrete cosine transform is the basis of MP3 and sub-bank coding is the basis of MP3 so so knowing the basics is kind of the best the best way to go about it so generally speaking there are many issues one has to worry about if you're dealing with with compression and and picking a good compression algorithm first of all why do we need compression to begin with well images and video take up a huge amount of space and if you if you don't end up if you don't compress it you're gonna have you're gonna need a huge amount of storage if you're just storing it locally and you're gonna need a huge amount of bandwidth to get it from point A to point B and by the way point A and point B could both be a place only on the computer system for example from the disk to the display etc so in the in the old days before the computers were so powerful you would do a mini compression and then just the last minute before you displayed it you you uncompress it so that you don't chew up the i/o buses inside your computer system or any other device you're dealing with so so and and really did the roots of image compression go back to 1960s where JPL sent these Cassini project and various other ones to space they would take pictures of space and they had to transmit it and there you're really really bandwidth limited and power limited if you didn't compress your entire satellite budget would or entire spacecraft budget power budget would go into sending the bits out so it was it was really important to do compression and that's really the the guys like Cassini who have the book and image processing come from JPL and from that background of having to compress hey Amazon Prime members why pay more for groceries when you can save big on thousands of items at Amazon Fresh shop prime exclusive deals and save up to 50% on weekly grocery favorites plus save 10% on Amazon brands like our new brand Amazon Saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at Amazon Fresh select varieties we wear our work day by day stitch by stitch at Dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20% on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century um and for when you also there's a little bit of a difference between images and video uh so backtrack so besides so what so one of the key factors in deciding what compression technique to use is what's called compression efficiency you start with a file of size a you compress it to size of 5b and there's basically two ways of doing that one way is what's called loss less that means 5b these the exact same representation of 5b file b and a are the exact same representation of the same thing and uh and if you compare the reconstructed output of these two things after you compress it they're identical no bits were lost so for example i have a you know i could have a digital movie that's even spielzberg built and he absolutely refuses any of his movies to be compressed in a lossy way he won't he insists on lossless compression and so uh you you pass it through a box that compresses it in a lossless fashion you get a new file then you pass it through a decompression and you get exactly what you started off with no no fidelity lost that's got lossless compression and for example two of the students in in your class at Cindy and Ellen are both on working on lossless compressions but for a different application for for massless lithography and for lithography in general for representing layout files for real estate design another application of lossless compression besides uh movies and cinema where where directors don't want to introduce any artifacts into the movies is medical imaging doctors for generations have been trained on detecting tumors and abnormalities on on analog images and even though sometimes when you compress it they agree that the image has been enhanced from from the analog and that and sometimes they don't even see any visual difference between the original digital and the compressed digital that's lossy they they still because of lawsuits and because of the the those are the fact that the training in this medical establishment is entrenched into the system that they don't want to have anything to do with lossy compression so that's another field now in contrast with lossy compression if you stick to lossy compression the compression ratios you get it's very low could be depending upon what data what you're doing it could be three five ten or whatever on the other hand you could you could shoot at having what's called lossy compression which means that you start with a five or data and if it is an audiovisual data you're taking advantage of the characteristic of the ear and the eye and you're saying well you know we can compress this and there will be a slight degradation in the output but it's almost invisible to the eye but in that case and this is called the lossy compression and what happens is that you can trade it there's a thing called weight distortion so you at a very low rate distortion or the distortion means the difference between the uncompressed image and the original before you even did any compression if that distortion is large if the rate is very low that distortion becomes large if the rate that you're encoding for lossy compression goes up as the rate goes up the difference that distortion becomes smaller and smaller so the rate distortion curve as a result is a curve that kind of dies down like this so in in in in lossy compression where does the loss come from well you know i start with an image for example and i take the transform of the pixels i get the speed cosine transform coefficients of discrete Fourier transform coefficients it comes from exactly what we talked about last time quantization you you have a discrete Fourier transform that's you know a number with n digits in it you quantize it to one of m levels it's that quantization process that introduces error in both in the coefficient and in the final image okay um so having said all of these things the the there's there's three basic questions that we try to address when we deal with compression the first question is what to code and let me explain what i mean by that so let's say i have an image i could be coding the pixels themselves i could be coding the difference between the pixels i could be coming up with predictions for each pixel and and not and subtract that prediction from the actual error generated an error and code that error instead or i could just walk away from pixels i could transform the pixels into a different domain how they merge wild transform DFT you know dct wavelets just the transformation and i could i could choose to code doves or i could come up with a parametric model for my image let's say i'm only coding head and head and shoulders and faces i have you know nine kind of eyes 10 kinds of lips 12 kinds of nose and i i just send the parameters of how every person looks like and with glasses or without glasses straight hair curly hair black hair blonde hair you know that's that's a parametric model so i could be coding those parameters so in fact all coding you're coding similarity transformations that's the stuff that Barnsley came out with so so the one of the key question is what aspect of the image do we end up coding coding means quantizing okay and that's that's the main thing i'm going to try to address today the second aspect which i think Wei talked about a little bit last time's lecture is once you've decided what to code you want to you want to figure out how to quantize it so if i'm i've decided to code for example dft coefficients or dcd coefficients do i use a uniform quantizer do i use a non-uniform quantizer do i use adaptive quantization and quantization is a process in which you start with a random variable and you discretize it into a finite number of bins and in order to to tell the receiver the the compressor what bin it fell into and then when the when the at the other end when the decoder has the information which bin it fell into it has to choose a reconstruction level in that bin to represent all the values that lie in that bin so for example if if suppose i was coding in pixel intensity pixel levels and there were the values are between 0 and 256 and i have a two-bit quantizer and i'm doing uniform quantization so that's 128 that's 64 that's 192 so my quantization bins are this is bin one bin two bin three bin four so at the encoder i tell i specify which bin each pixel fell into this is like the simplest kind of compression you can never imagine but it's not very good you specify what bin it fell into and then send that information to the decoder the decoder receives okay the information for example the pixel is is in this bin then it has to figure out what reconstruction value to choose for this bin and most of the time you pick a midpoint and that's the reconstruction level but if the probability distribution function of the original pixels in this bin was for example like this you don't want to pick the midpoint you want to perhaps speak to something slightly to the left right because you want to minimize the mean square error between the reconstruction level and it's the in a statistical sense and the original pixel value okay or you might want to decide to do non-uniform quantization say look i have a probability distribution function for my pixels and and i want to have you know five reconstruction levels but tell me where the decision boundaries where my quantization bins should be in order to minimize the mean square error between the reconstructive value and the original and that's called non-uniform quantization and and i think we talked about Loidmax quantizers that's a special case of that's the most famous kind of a non-linear quantization so besides deciding what to code the next most important question is what to quant how to quantize so once you quantize so essentially you start with a bunch of real random variables real numbers infinite precision on the real line when you quantize it now you have finite number of values there's no there's no infinite precision anymore so now you have a discrete source with a discrete finite set of out what's called alphabet right so whether i'm in this bin this bin this bin or this bin in this case we have four quantization bins my alphabet size is four i have a source with alphabet size of four and now i have to code that what does that mean i have to bid allocate the allocate bits to each of these four letters in my alphabet and the simplest way again is to say oh i got four letters oh two two bits per letter zero zero zero one one zero one one and that's a simple way of doing it it's not necessarily the most optimal way of why because some of the alphabets in the letter are much more likely to occur than others pretty much if you're coding with the english language text some letters are more free occur more frequently than the others and if that's the case how many of you have ever played with with the ham radio or no one ham radio is okay you know what it is so hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch at dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember that dickies has been standing the test of time for a reason their work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century so when you're doing hand radio you you assign um shorter quote words if you will to the alphabet letters that are more frequent and you assign longer code words to the alphabet of the alphabet letters that are less frequent and by doing so you are maximizing what's called compression efficiency okay so so there's a so for every source with finite number of alphabets that emits a symbol every unit of time there's an optimal lowest number of bits that can be used to represent that source and that lowest number of bits is called entropy right so from an information theoretic point of view that thing that we talked about last time entropy is is the lower bound as to how many bits you need to represent a source with with with so many alphabets and entropy has this the following expression i'll just write it done just to remind remind you it's summation of pi log pi where i ranges from one to n okay so so if i have the source with n alphabets and each one has a probability of occurring you know this half one fourth one third etc then i can compute this and from information theoretic point of view i didn't prove this to you but but from an information theoretic point of view it can be shown that that's the previous number of bits it can be shown yeah sorry yes yeah that's right because log of pi is negative so we put the minus sign okay so you all know that right everybody understands or remembers that from last time's lecture from ways lecture now just because we showed this is kind of classic um for in information theory information theory is full of results that have lower bounds but those guys never tell you how to achieve it or how do you even get close to it um so people then start this i mean the classic one is Shannon's theorem right which tells you at what signal does it show what what bitrate you can get out of a channel but it's around the proof of it is a randomized algorithm it doesn't tell you how to achieve it but it's still a useful result because when you go to bed at night you know half or half from the optimum you are and so if you're let's say within half a dB you can you can dress that piece and not worry too much but if you're 3 dBs off you can say oh god somebody else is going to discover a technique that get closer to the bound so these these bounds are kind of they're not entirely useless they're kind of useful as in terms of letting us know how far off we're from the optimum in any event so there's multiple methods if i have a source with so many alphabets in it and now you know what these alphabet letters are coming from these are the quantized values of our pixels or our dct coefficients or a wave of code whatever the thing is we decided what to code right they become our letters okay and and now we've got to do bid allocation as i said this is the lower lower bound and how many bits it takes to to code a source like that and the techniques that people have come up with is husband coding for bid allocation and then it's magic okay arithmetic coding and we talked about those as well so it turns out that arithmetic coding can approach entropy a little bit better than Huffman coding because this is more sophisticated actually because you've got to do more computation there's more there's more calculations involved in this than there is in this this is something that by the way i was invented the IBM in mid 70s late 70s and because IBM had all the patents on it for years and years and years it wouldn't be considered for standards until the patents expired in i don't know early 90s or somewhere somewhere along the way and then it started getting into the standards and it's now in lots of different standards a prime example of what the patent does to a technique it makes even though it's better than this people just don't use it because because they have to pay royalties actually another prime example of what licensing and patents and stuff stifles use is what is mp3 why do you think mp3 is so popular not because it's definitely not compression efficient i can swear to that it's in fact it's terrible in terms of compression efficiency it's a very old it's like a 30-year-old standard but the reason it's so popular especially with your generation of of kids is is because it's free because and actually naps there and those other guys made it even more free they converted the songs into their format and allowed us all to download it so anyway um i didn't down actually i never downloaded anything i don't even have a device to to listen to i i have a sony walkman actually and i put i buy you know thirteen dollars cd's and in it and and it's quite big actually doesn't fit in my pocket should should convert to ipod but anyway so mp3 was successful because it was free and as medic holding was not successful for a long time now it's getting the planning is way into the standards it's now in hg64 so in this scheme of source coding right you pick a letter from the from your alphabet and you assign variable number of bits to each letter the letters that are more more uh frequent you assign fewer bits this they're the ones that occur less frequently you assign longer bits the opposite of this scheme is is called dictionary based techniques okay and that's is what i think we talked about ziv lampole uh which is the prime example of it and that's an and again another classic case that it was not patented and it was used everywhere the zip function you have on your pc is this ziv lampole uh and it was invented in 1977 and um what it does is instead of assigning variable number of bits to each letter in our alphabet it looks at a stream of letters and assigns a fixed number of bits to variable number of letters so it it groups variable number of letters together and assigns fixed bits number of bits to it and and essentially the technique works by looking backward into the stream and figuring out where the um repetitions occurred and it points to that so so it's very good for example for text compression i have a i have a text and the the string of characters that occurred previously instead of coding them again i just send the pointer you say go so many spaces before and copy that to this location so so that's that's kind of the the gist of it okay and uh and this is used all the time for many lossless coding kind of techniques okay and and again um the the circuits that Cindy is building for example uses a variation of this uh veto day student my group came up with an algorithm called c4 that combines some features of this with predictive coding which i'll talk about today and and it's good for for layout compression for real aside designs and now Cindy is is building those in hardware so that's kind of an overview of what we talked about um but what i like to mention is that besides so what are the metrics we care about when we think about compression well we certainly care about compression ratio and compression efficiency if i if i start with a one megabyte file and if i have a fixed fidelity criterion i say okay my distortion shouldn't be any more than this let's say let's say i have a picture of mana and i want to compress that and i say my distortion can't be any more than x then ideally i like to minimize the rate i want to compress that file as much as possible that's obvious but in some situations other criteria is coming to play for example the encoding complexity or the decoding complexity so why is that important well um you know i didn't have made my cell phone i didn't make myself anyway if if if if i'm taking a video with my cell phone and i want to send it like to my dad or my my husband or my son or my mom or whatever you want i have to encode it on the cell phone and if they encode complexities too high so that after encoding a 10 second clips i can't make any phone calls for the rest of the day until i go home and recharge my battery then it's useless right why is decoding complexity important well the reverse situation but suppose i have subscribed you know i'm a singular and i want to watch after that what show do they have on their cell phones nowadays i know spring time horizon very well i don't know the one that i subscribe to i don't know what it has um whatever let's spread house i want to watch some videos music videos okay and if i stream the thing and and and and i and it shows up too much cycles to decode again my battery would be dead and in many applications we as a nation hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by Whole Foods Market aplenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch at dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work wear to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century and and perhaps we as humankind i don't i can't really speak for the rest of the world just in the united states it's true that we are more consumers of content rather than generators of content right i mean i mean think how many hours a day you watch a week you watch tv or movies or or listen to things versus in how many hours a week you you create content actually you do create content this home works you do is content but it's not like video and audio that like content i mean like like audio and video type thing so so because of because of that because we mostly consume content and not so much generated many compression algorithms have been designed to be asymmetric with the decode complexity being a lot smaller than encode complexity okay and that's true for impact four is true for h26 well the extra super so complex i can't really speak about it but uh it is true for most video compression algorithm so it they're highly asymmetric with with an encoding being much more compute intensive than the decoding and but but nevertheless um there are situations where you know the reverse is true for video conferencing for example if you're doing video conferencing your cell phone the encode complexity also needs to be low so that the other party so so that the the frames get encoded real time so that you can have an actual interactive communication that brings me to my next point which is some differences between images and video uh and what what let me before doing that let me just say one more comment the reason encoding complexity also can be large is that if you're a consumer of a content you can do the encoding off time it doesn't have to be real time like they can encode desperate housewives or can encode an entire dvd you know that's a big business for encoding is done dvd's are essentially impact two versions of the movies um so you can you can you can spend a huge amount of time coming up with a nicely encoded version of a particular two-hour movie put it on dvd then send it to the video stores all across the nation for the masses to watch it and it was worth it because you encode once however millions of people or millions of times that encode the content is viewed so it's worthwhile to spend cycles even though the encoding is more complex because it's consumed too many times it's worthwhile to spend a lot of a lot of cycles on it so those are those are kind of um you know one way applications of video where it's encoded once and then you stream it or you send it you download it and you watch it the other kind of video application that's also very important is is interactive video applications two way let's say i'm having a video conferencing the same way as you're having a interactive phone conversations you become yourself when you're talking to someone there's FCC regulations of how much that delay should be in order for that interaction to be interactive and not be kind of like i give you a lecture and there's there's a two-minute delay before you receive it and then you give me you answer my question it arrives here two minutes later you want to maintain interactivity so you have to have low delay does anybody know what that delay is for telephones is it a second that's right it's about 150 milliseconds that's the FCC regulation when they came up with um the telephone the wire line telephone system in this country a telephone from the east coast to the west coast let's say los angeles to new york should not have any more delay than 150 milliseconds i don't know how many of you remember when i was a kid when we talked over the telephone on satellites like when we want to united states for example the delay was very substantial seconds you couldn't really have a conversation like i would you know i would say something i would then really wait for you know 10 20 seconds for the other party to hear it because i knew they're going to answer i would be silent to give them a chance to answer and then they would answer and they would be silent for a few seconds to make sure that i have i mean it it really is no longer interactive how many even have done satellite calls to to have noticed the delay well this just shows the advances in telecommunication so i i must say i haven't called foreign country well i've called asia many times now and i haven't noticed any delay so so they've improved that quite a bit but anyway for when you're doing video interactive stuff um then there's two things the encoding can't be done offline it has to be done real time so so if a two-hour dvd movie encoding will take you five days to do it doesn't matter because you know once you're done you can mass distribute it but but when you're doing video conferencing let's say you know actually the the most the video conferencing system you see every day in use is on television when one reporter is talking to another reporter they're essentially doing live video communication between them and in that case the the encoding complexity um has to be low or if we can't or we have to use simple encoding compression techniques so that we can encode real time so that we we still preserve the 150 millisecond delays that we're interested in that case the encoding decoder have to be symmetric the encoder cannot be massively more complex than the decoder okay so these are some general issues about compression so so we've talked a little bit about we've talked a little bit about the problem of how to quantize a little bit about how to code and what i'm concentrating today mostly is to talk about the problem of what to code okay and in particular i'm going to start off with um with the with using just pixel domain information as the information that we encode actually before i get on with the rest of the lecture i just remembered an announcement that i forgot to say today at four o'clock in four or five today at four o'clock in four or five cori i'm sorry four or five soda at four p.m there is a talk on blind deconvolution i forgot the name of the speaker but um do you remember it's professor Aaron Hertzmann from um from which school uh university of toronto yeah so from university of toronto so he's actually asked the speakers he's actually asked the audience to to email him their motion blurred pictures so that during his talk he can process and show you the results so uh i think it'll be pretty interesting talk and it's very relevant to the course because we just covered blind deconvolution like a week ago so i i encourage you to attend that okay so so what we're going to talk about today right now is that really the problem of um image compression and more specifically the problem of what to code okay so image compression problem of what to code and kind of the the first thing that we want to um the simplest uh kind of scheme that that you can think about is um pulse code modulation or pcm so the first class of techniques what we deal about is waveform coding and within waveform coding we're going to consider pcm which is pulse code modulation we're going to consider dpcm which is differential pulse code modulation and then uh i think that's mostly that's mostly it and then we move on to transform coding and really we talk about discrete cosine transform and cahoon and low of transform and then we'll move on and talk about subband slash wavelet coding slash or all kinds of what what i call multi-resolution coding multi-resolution coding and then time permitting actually can you zoom out so the entire page is being seen in one shot okay great thank you and then if there's time we're going to talk about fractal coding um i'm going to talk about vector quantization hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch the dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century um and then if there's time after then right after we talk about this this problem then the next topic we talk about is video coding which uses a lot of the stuff we talked about earlier for image compression but it we also need to talk about the problem of motion compensation and motion estimation in other words how do we take advantage of the redundancy across frames and then time permitting I'm going to just concentrate on one standard pick one standard and just show you how we can mix all these ingredients that we we this building blocks that we came up with in order to um to do um to to build a standard to to build it end-to-end compression system so it could be jpeg could be jpeg 2000 it could be mp3 actually there's one thing I didn't put in and that's audio coding here okay it could be mp3 it could be h264 or h263 or it could be just regular mpeg 4 for example mpeg 1 2 and 4 so so any one of these standards okay so so let me begin with um the concept of a pulse code modulation but before I start that I would like to talk about the metric that we're going to use in order to evaluate the different techniques that that we're going to be using right so in particular um so how to how to evaluate the compression algorithm how to evaluate or compare various compression techniques well what what happens is you start you start with um you start with an image f of n one and n two that's your original and then you pass it through some compression engine and and then all comes f hat of n one and n two so you can talk about mean square error between these two signals or you can talk about what's called signal to peak signal to noise ratio and i'm going to explain what these are not that different from the expressions that we had if you can roll up a little bit from the expressions that we had when we dealt with um restoration so i'm going to define this normalized mean squared error as n m s e uh as to be um in in percentages to be a hundred times the um the variance of f of f hat of n one and n two minus f of n one comma n two over the variance of f of n one comma n two okay so this is in percentage n m s e is is hundred times did you say ten percent twenty percent etc and then signal to noise ratio in decibels or dv is ten log ten of um the inverse of n m s e in percentage over one hundred and it's in dv okay so you multiply this by a hundred to get rid of this thing so it's this quantity so it's essentially ten log ten of this over that so let me write that down ten log base ten of this whole thing which is variance of f over variance of f hat minus f that's this s and r and dv okay so just as a rule of thumb you you know after you you work for many years in compression you you have some numbers in the back of your head that that tell you didn't this signal to noise ratio is good and some numbers that are bad so signal to noise ratio is intense are terrible twenty are bad thirty is good high thirties are very very good so if you got thirty dv signal to noise ratio you're doing okay if you're doing thirty five thirty seven dv you're doing really good as well you have to be careful when you're dealing with compression to make sure your signal to noise ratio is not so high that it's not discernible to the eye anymore because at that point you're wasting your bits like for example your eye probably can't can't have the difference between signal to noise ratio of forty and forty one and forty two and forty three and anything beyond that is it's so high fidelity you can see it now there's been a lot of work in an image and video processing bashing signal to noise ratio as a metric in fact there's been studies that have gone for years to try to replace it right uh saying that this doesn't really correlate well with what the human eye perceives is a good image or is a good video and oh my god the tectronics folks there was a whole portion of MPEG-4 group of people that met every quarter for years and years tried to come up with other subjects other metrics that that reflect better the subjective quality and i i'd sad to say that none of them were worked uh and that still s and r if you look at icas for icic papers this year icas being in france and icid being an atlanta every compression paper still talks about fixing not to noise ratio i'm saying not to noise ratio so so that that's something i would anticipate you know ten years from now if i'm still teaching this course then there might be some advances and we might not be using quite as an art by the way what what are you guys using speech do we have any speech people in the room yeah what do we use all right but they have another thing called mos mean opinion score which is between one and five and uh and that they use that kind of all the time in in their in their experiments and it's much see the the auditory system in the human and human beings is a lot better modeled and characterized than the visual uh so in fact the the ear pretty much just as how howard said acts as a spectrum analyzer there's different frequency bands and your cochlea uh acts as a spectrum analyzer it's like a filter bank right it it it produces output at different frequencies so essentially your ear is it you know it's it's as if you went and bought a tectronics spectrum analyzer it's just that it's smaller and it fits here and it sends them the eye on the other hand is absolutely not a spectrum analyzer at all very little understood about it okay enough of the metrics so let me let me i'm hoping that today i can get done with pcm and dpcm and then on friday i will cover dct and klt and then next wednesday um the subband and wave the multi-resolution stuff and then next friday um the other kind of uh portion okay um so let's talk about uh pulse code modulation kind of the simplest um wave uh simplest way of doing pulse code modulation simplest way of encoding anything so um the basic idea is is this um so this is the encoder and you have f of n one and n two hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by whole foods market aplenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch at dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work wear to your collection remember that dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century you pass it through a uniform quantizer and and outcomes f hat it's very very simple it can't get any simpler than that okay however it's it's it's terribly inefficient it requires um it requires you know five or six bits for the thing for the signal that you do this to to look to look good if you just um if you were to use you know two bits per pixel if you if you were using this technique you get terrible contouring effect so let me just show that to you if you can zoom in to do these pictures okay so this is the original lens image 512 by 512 um and this is the pcm quantized version of it when you're spending two bits per pixel and initially you might think two bits per pixel is low but let me warn you jpeg 2000 you can get pictures that your eye is indistinguishable from the original at point one two five bits per pixel so two bits per pixel in fact is terrible it's it's awful anyway so i'm hoping to show you some of the techniques that i mean jpeg 2000 as i said uses wavelet so we'll hope to get to that but here here's the thing uh two bits per pixel you could you see all the contouring so even even through our monitor system that's um leave something to be desired you can see the hot terrible the images so um so what do we do to to fix that um so i showed you figure 10 22 um a m b in jadlin's book so how to fix that one way to do it is to add what's called um Roberts pseudo noise technique and this is a technique that's routinely used in engineering and it's not only used for image compression it's also used for a to d design how many people do we have here that do i don't like to do a conversion or have experience doing that you basically add didder at one end in order to randomize things in order to get rid of the contouring effect that that you just saw and the way that works is like this so this is the encoder so in this case uh at the transmitter side or at the encoder side um let me move on to a new sheet you'll have f of n one and n two um and what you do is you add this noise w of n one and n two to the signal and then pass it through a uniform quantizer and you make sure that this this pseudo random noise which is predetermined ahead of times is also known by the receiver this is not signal dependent it's something it's some sequence of random numbers i've i've cooked up ahead of time and i want to make sure that that's available and this is this is the boundary line between my transmitter and the receiver so in this side is a transmitter and this side is the receiver and rx doesn't stand for prescription it means for receiver and and how do we do that well you send this guy here you have w and one and n two here because you added it at the receiver you subtract it at the at the trans because you added it at the transmitter you subtract it at the receiver and out comes f hat of n one and n two okay and generally speaking um so so inherent to this this scheme is that w of n one and n two is known both at the receiver and at the transmitter okay ahead of time uh and usually people you choose w of n one and n two you can you can use different noises but typically you use it to have be a white noise sequence with uniform pdf in other words pw of w0 is one over delta for w0 between delta over two and minus delta over two and it's zero otherwise pick a pick a random number between zero and delta okay where in this case um delta is the quantization step size quantization step size okay so you add a noise within the quantization step size into the into the into the system and at the other end you're subtractive and i'd like to show you a figure in in limbs book that kind of and that kind of explains this process as an example so figure um uh 10 21 and j lims book pretty much covers that so you can zoom in please so this is a um the amazing thing is that he he applied this technique to to a speech sample even though this is an image on video processing but the concepts are easier to understand for one dimensional signal we'll just concentrate on that this is an original speech signal um and this is what happens if you quantize this at two bits per sample so you pick the the maximum pick the minimum two bits per sample is four levels of quantization so um so here you do it there's one two three and four all right one two three and four and this is a function of time and this is what happens um if you if you now add the the white noise of the characteristics i just talked about to the original signal and then quantize it to two bits per pixel and then subtract the white noise at the other end so that white noise is known again to both transfer and receiver so you can see this is a much more faithful representation of what you had here than this is and finally there's there's a way to kind of improve the fidelity or the good looking mess or the visual appearance that's the word the visual appearance of the final signal you can do some more low pass filtering it's called post processing on the on the reconstructed output and get something like this which if you can zoom out just a tiny bit that that looks even closer to the original than this it the reason being is that in the process of adding noise and quantizing removing those you add a little bit of a high frequency unintentionally so if you do low pass a little bit of noise removal type techniques like low pass filtering um then then you can you can do better um so just just so that you can see what kind of performance you get out of this thing um look here so this is the original lana this is the two big quantization on an image without doing this roberts noise technique this is if you add the roberts noise technique but don't do any post processing and this is with roberts noise and with post processing and actually i must say looking at the picture here i see a big difference between these two so the noise processing actually does something there's a lot of dots high frequency dots on the face here that have been blurred out over there okay so um so coming back here so 10.21 and 10.22 of j-lims book so um so you can improve visual quality by doing uh what's called post processing or to reduce noise okay and and by the way this post processing to reduce noise is an integral part of a lot of video compression standards and techniques every time you do apply the street cosine transform you get blocking artifacts you need to do post processing to remove that to smooth it out so the eye doesn't see these sharp boundaries of the blocks okay all right so pcm is is is just very trivial it's it's primitive it's it's it's old it's non-sophisticated etc so you can become a little bit more sophisticated by using what's called delta modulation another waveform coding technique and even though we're going to apply it to the text of domain when when you get to video and when we talk about motion hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by whole foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch at dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies dot com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century compensation delta modulation and pc dpc and differential pascal modulation are more or less the same ideas we use to take advantage of the redundancy across the temporal dimension across frames so how does the in the delta modulation work again here's the transmitter you start with f of n and you do you know one bit quantization or however number of bit quantization you want to do and what comes out of here is e hat of n which is either delta over two or minus delta over two depending upon whether the signal was above a threshold or below a threshold this is a one bit quantizer right and then um you um uh look at the previous sample f hat of n minus one that you've reconstructed that the receiver and you subtract the way you generate this signal sorry i should have drawn this first before before going there so you assume you have access to the to the decoded and encoded version of the n minus first sample you subtract that from the current end sample you generate an error signal e of n one bit quantize it that's the quantity you send to the receiver okay and um and and um to in order to to come up with the transmitter also needs to keep track in order in order for it to have access to f hat of n minus one at every step it has to build f hat of one f hat of two f hat of n minus one f hat of n and and how do how does it build f hat of n well the same way as the receiver does which i'll show in just a second basically you add e hat to f of n minus one in order to come up with with an estimate of f hat of n not estimate in order to come up with f hat of n and and this step that you can roll up a little bit this step that i just did shows you what the receiver has got to do right the receiver essentially start with it it receives e hat of n it adds to it the f hat of n minus one and outcomes f hat of n and then how does from f hat to f hat of n minus one well it goes through a delay function z minus one delay function to get that so you so these are the block diagrams let me just explain intuitively what's going on essentially the the transmitter and the receiver both have access to the previous sample f hat of n minus one okay of course the transmitter has access to f of n minus one the actual f of n minus one without any quantization errors or compression but we don't want to use that because otherwise the two loops that the transmitter and receiver will get out a sink they drift apart from each other so to the transmitter and receiver they both work with this f hat quantity they know what the reconstructive value of the n minus first pixel is the transmitter gets the actual n pixels subtracts it it uses the n minus first reconstructed value as a prediction for that it subtracts it gets an error quantizes it and that's what it says to the receiver so the receiver now has e hat of n but the receiver also had f hat of n minus one so it adds these two things as an estimate of f hat of n right because because of the way the transmitter did the transmitter subtracted these two to get e or e hat that the receiver does the opposite of it it adds e hat to this to get f f so so you have a a loop in the transmitter and a loop in the receiver that are in perfect synchrony with each other they're both tracking f hat of one f hat of two f hat of three etc you can imagine why things would go out of out of whack if instead of f hat the transmitter would use f of n minus one then the two things would drift apart and and the errors accumulate okay so what are some of the simple equations that that that that govern this thing well i mean e of n is is f minus f hat so let's just write that down e hat of n is just the quantized value of e of n which is a delta over two if e of n is larger than zero and it's minus delta over two if e of n is smaller than zero and the last thing that we want to talk about is um actually let me let me not use this notation here because i want to use q for for something else so so if e hat of n is related to e of n to this equation and if i talk about quantization noise which is defined as f hat of n minus f of n that's how much error due to due to these delta modulations we've introduced it's basically e hat of n minus e of n okay so what are some challenges in in in the system how many parameters do we have in designing this system essentially delta and so the big question that comes up is how do we pick delta optimally and there's no really good answer to that simply because of the following um i think i'm going i'm just going to show figure uh 1026 of jlyn if you pick delta are too small you get one problem if you get picked delta too large you get another problem so there's this compromise um that that you have to kind of come out with um you might say well why is it that for so let me explain what i mean by that let's just zoom in into this picture okay so this is a signal that's slowly varying in this region this is the this is the x-axis as a function of space i'm moving from left to right on my image for example right and this axis is the intensity of the pixels right in this region the the dark black line here is this is my signal value right so in this region the signal is slowly varying and then here the signal is is faster it's it's it hits an edge the intensity changes very rapidly so in the in this region if i've picked delta delta is essentially the height of from here to here the change here so i've picked this particular value of delta and in this region what i end up getting is what's called granular noise the signal is constant and yet uh i'm going to represent it as this value this value this value this value is going to go up and down so this is what's going to cause granular noise right and and here the signal is varying too fast and i've decided i have to go up up up up but i can't catch up with the rate it went up so this is called slope overload and you can see both of those phenomena uh if you look at figure 1027 and in j-lim's book okay so on the left hand side you have a delta modulated signal where delta is chosen to be eight percent of the overall dynamic range so if the pixels are from 0 to 256 intensity values eight percent are 256 is what is it 20 so so delta is chosen 20 for them and here if is if delta is chosen 15 of the dynamic range okay uh so in this case delta is too small and you can see the edges are extremely blurred when when there's a big change from uh in the intensity the slope overload thing that we're just talking about happens and and therefore the edges become very blurred here the edges are less blurred but in some constant intensity regions like like here you get a lot of granular noise that that wasn't in the original signal um and just so that you know the signal to noise ratio in this case is 8 dB in this case is 10 dB and moving here um is is showing that so so you can improve yeah actually i'm going to skip that that image all together so um so so delta modulation uh so 1026 and 1027 um so delta modulation has this problem of you know how to pick the value of delta in an in an optimal fashion okay so next what we move on to is um DPCM which is differential pulse code modulation and let me draw the kind of the block diagram of this system um and intuitively explain what what what it does and then into in on Friday's lecture i'll i'll um i'll drive all the maybe we can do the equations today too uh equations for it so at the transmitter once again i have some predictions are for f and i call it f had of or if prime of n one and n two and i subtract the actual value of the pixel f of n one n two from the prediction value and i get an error signal e of n one and n two and then i PCM quantize that so you can see that this is built on top of PCM and what i get out of it is e hat of n one and n two right and i send that to the receiver and the receiver here maybe i should do the receiver on a different sheet because i need that space so the the receiver gets this guy e hat of n one and n two from from this end and it adds to it the same prediction f prime of n one and n two so here we subtracted it at the receiver after it quantizes if it adds it up you can imagine if this box didn't agree it didn't exist this would be a perfect kind of a system and then outcomes f had of n one and n two so the question is how do we how do we come up with a good prediction and the answer is you can use any any technique that you like as long as you it makes a good prediction and what's and how do we decide a good prediction but a good prediction is one where with the u_s_ number of bits it has the highest fidelity i mean gee we're doing compression that's the most obvious thing to do right so you can and there's you can use linear predictions you can do nonlinear predictions you can do armor models autoregressive i mean all kinds of things so so we we we kind of leave this box here as just prediction but this prediction the only thing we need to impose on it is because receiver and transmitter both need to have access to f hat this prediction at both places you know this is also based on prediction so just draw the prediction box here as well but the thing to remember is that both of these predictions should be done based on values that are available both at the transmitter and the receiver so this prediction cannot be based on f values the uncompressed raw pixel values it has to be based on f hat values so all i say here is that this is based on previously coded pixels for example f hat of n one minus one comma n two so if you're if you're doing it if you're encoding your picture in a raster scan left to right top to bottom you've already done this pixel the transmitter and receiver both have access to that so you can use it uh f hat of n one comma n two minus one f hat of n one minus one comma n two etc and and how does so and this is the same thing here this prediction is also based on f hat of n one minus one comma n two comma f hat of n one comma n two minus one f hat of n one minus one comma n two minus one etc sorry this is n two minus one i already had that pixel and so again the thing to remember for both of these schemes is that whatever algorithm you use here it has to be the same algorithm you use here so the transmitter and receiver ahead of time have agreed how do they do how they do the prediction based on previously coded pixels and and where did these previously coded pixels come from well how do i how does the transmitter figure out what these f hats are well it it has access to the e hats that it generates and if it if it had if it adds the e hats just like what the receiver does it has the e hat to f prime to get f hat so it does it has the e hat to the f prime here to get to get f hat of n one and one and n two right and and then it can it can use various delay elements here in order to figure out all these other quantities but you have to be scanning your image in such a way that these are available and how does how does the receiver know what these guys are well these are previously encoded quantities so essentially through a delay box and then it you can you can know all these other values okay so so one of the questions that you might ask is okay you you just told me i use these pixels to predict how do i do that well you can come up with any prediction function you like you can even come up with the table lookup if this pixel takes on this value of that value and the other one the other value then the prediction is this etc you can you can do anything complicated but something simple that people have come up with so how to do prediction is is linear combination okay so you can just say that um f prime of n one and n two is just a linear combination over k one m k two of a of k one and k two times f had of n one minus k one comma n two minus k two so basically all i'm saying is this is a linear com that predictor is a linear combination of the previously coded pixels where k one and k two are you you might say okay for what values of k one and k two well we generally we call it r sub a they have a regional support of r sub a okay and the goal is so how do we pick so the question is how to choose the a coefficients so what what well we better pick a criteria and minimize it or maximize it or optimize it in order to do that well what what is it that we want to uh what is it that we want to minimize or maximize well we want to make sure that what happens we want to make sure this prediction is as close as possible to to the f value right coming back to to this picture here you know sorry so where is my transmitter i lost it oh here yeah to this picture here i want to make sure that that the the energy the expected value of this this signal is the the variance of this signal or the energy in this signal is as small as possible okay so one thing reasonable to do is minimize why because if the variance of e is small or the energy in it is small it takes fewer bits to quantize it for a given fidelity level that's something that when i when i first learned compression it wasn't ever obviously stated so we want to minimize expected value of e squared of n1 and n2 in other words the expected value of f of n1 comma n2 minus the double summation of a k1 k2 f hat of that etc this whole thing here with the whole thing squared i'm going to minimize that um i think i'm going to stop because there's a lot more and there's a lot of examples i want to show next time so just we'll we'll stop at this point and on friday i'll pick up this so you can you can easily imagine with some change of variables here and so just solving a linear system in equations we can figure out what the best values of a's are and that'll be one thing that people people use all the time so we'll see you all on friday hey amazon prime members why pay more for groceries when you can save big on thousands of items at amazon fresh shop prime exclusive deals and save up to 50 percent on weekly grocery favorites plus save 10 percent on amazon brands like our new brand amazon saver 365 by Whole Foods market a plenty and more come back for new deals rotating every week don't miss out on savings shop prime exclusive deals at amazon fresh select varieties we wear our work day by day stitch by stitch a dickies we believe work is what we're made of so whether you're gearing up for a new project or looking to add some tried and true work where to your collection remember the dickies has been standing the test of time for a reason the work where isn't just about looking good it's about performing under pressure and lasting through the toughest jobs head over to dickies.com and use the promo code work where 20 at checkout to save 20 percent on your purchase it's the perfect time to experience the quality and reliability that has made dickies a trusted name for over a century