1 00:00:00,000 --> 00:00:00,050 2 00:00:00,050 --> 00:00:01,770 The following content is provided 3 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 4 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 5 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 6 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 7 00:00:13,330 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 9 00:00:17,832 --> 00:00:22,730 10 00:00:22,730 --> 00:00:23,390 PROFESSOR: Hi. 11 00:00:23,390 --> 00:00:24,750 I'm Srini Devadas. 12 00:00:24,750 --> 00:00:27,040 I'm a professor of electrical engineering and computer 13 00:00:27,040 --> 00:00:27,650 science. 14 00:00:27,650 --> 00:00:30,970 I'm going to be co-lecturing 6.006-- Introduction 15 00:00:30,970 --> 00:00:34,950 to Algorithms-- this term with professor Erik Domane. 16 00:00:34,950 --> 00:00:36,001 Eric, say hi. 17 00:00:36,001 --> 00:00:36,883 ERIK DOMANE: Hi. 18 00:00:36,883 --> 00:00:38,650 [LAUGHTER] 19 00:00:38,650 --> 00:00:40,210 PROFESSOR: And we hope you're going 20 00:00:40,210 --> 00:00:43,710 to have a fun time in 6.006 learning 21 00:00:43,710 --> 00:00:45,760 a variety of algorithms. 22 00:00:45,760 --> 00:00:50,760 What I want to do today is spend literally a minute or so 23 00:00:50,760 --> 00:00:55,004 on administrative details, maybe even less. 24 00:00:55,004 --> 00:00:56,420 What I'd like to do is to tell you 25 00:00:56,420 --> 00:01:00,670 to go to the website that's listed up there and read it. 26 00:01:00,670 --> 00:01:02,250 And you'll get all information you 27 00:01:02,250 --> 00:01:06,430 need about what this class is about from a standpoint 28 00:01:06,430 --> 00:01:11,590 of syllabus; what's expected of you; the problem set 29 00:01:11,590 --> 00:01:15,660 schedule; the quiz schedule; and so on and so forth. 30 00:01:15,660 --> 00:01:19,460 I want to dive right in and tell you about interesting things, 31 00:01:19,460 --> 00:01:24,550 like algorithms and complexity of algorithms. 32 00:01:24,550 --> 00:01:26,490 I want to spend some time giving you 33 00:01:26,490 --> 00:01:29,380 an overview of the course content. 34 00:01:29,380 --> 00:01:31,640 And then we're going to dive right 35 00:01:31,640 --> 00:01:35,230 in and look at a particular problem of peak 36 00:01:35,230 --> 00:01:38,360 finding-- both the one dimensional version and a two 37 00:01:38,360 --> 00:01:41,900 dimensional version-- and talk about algorithms to solve 38 00:01:41,900 --> 00:01:46,670 this peak finding problem-- both varieties of it. 39 00:01:46,670 --> 00:01:50,000 And you'll find that there's really 40 00:01:50,000 --> 00:01:53,090 a difference between these various algorithms 41 00:01:53,090 --> 00:01:56,480 that we'll look at in terms of their complexity. 42 00:01:56,480 --> 00:01:59,070 And what I mean by that is you're 43 00:01:59,070 --> 00:02:02,750 going to have different run times of these algorithms 44 00:02:02,750 --> 00:02:06,210 depending on input size, based on how 45 00:02:06,210 --> 00:02:08,600 efficient these algorithms are. 46 00:02:08,600 --> 00:02:14,370 And a prerequisite for this class is 6.042. 47 00:02:14,370 --> 00:02:18,620 And in 6.042 you learned about asymptotic complexity. 48 00:02:18,620 --> 00:02:21,240 And you'll see that in this lecture 49 00:02:21,240 --> 00:02:25,430 we'll analyze relatively simple algorithms today 50 00:02:25,430 --> 00:02:28,070 in terms of their asymptotic complexity. 51 00:02:28,070 --> 00:02:30,340 And you'll be able to compare and say 52 00:02:30,340 --> 00:02:33,940 that this algorithm is fasten this other one-- assuming 53 00:02:33,940 --> 00:02:37,320 that you have large inputs-- because it's 54 00:02:37,320 --> 00:02:40,840 asymptotically less complex. 55 00:02:40,840 --> 00:02:43,185 So let's dive right in and talk about the class. 56 00:02:43,185 --> 00:02:52,420 57 00:02:52,420 --> 00:02:54,550 So the one sentence summary of this class 58 00:02:54,550 --> 00:02:58,910 is that this is about efficient procedures 59 00:02:58,910 --> 00:03:04,850 for solving problems on large inputs. 60 00:03:04,850 --> 00:03:06,800 And when I say large inputs, I mean things 61 00:03:06,800 --> 00:03:10,720 like the US highway system, a map 62 00:03:10,720 --> 00:03:14,110 of all of the highways in the United States; 63 00:03:14,110 --> 00:03:17,850 the human genome, which has a billion letters 64 00:03:17,850 --> 00:03:23,170 in its alphabet; a social network responding to Facebook, 65 00:03:23,170 --> 00:03:26,840 that I guess has 500 million nodes or so. 66 00:03:26,840 --> 00:03:28,280 So these are large inputs. 67 00:03:28,280 --> 00:03:31,470 Now our definition of large has really changed with the times. 68 00:03:31,470 --> 00:03:35,440 And so really the 21st century definition of large 69 00:03:35,440 --> 00:03:36,971 is, I guess, a trillion. 70 00:03:36,971 --> 00:03:37,470 Right? 71 00:03:37,470 --> 00:03:40,680 Back when I was your age large was like 1,000. 72 00:03:40,680 --> 00:03:42,400 [LAUGHTER] 73 00:03:42,400 --> 00:03:44,844 I guess I'm dating myself here. 74 00:03:44,844 --> 00:03:46,760 Back when Eric was your age, it was a million. 75 00:03:46,760 --> 00:03:47,260 Right? 76 00:03:47,260 --> 00:03:48,650 [LAUGHTER] 77 00:03:48,650 --> 00:03:55,000 But what's happening really the world is moving faster, 78 00:03:55,000 --> 00:03:56,420 things are getting bigger. 79 00:03:56,420 --> 00:04:00,880 We have the capability of computing on large inputs, 80 00:04:00,880 --> 00:04:03,220 but that doesn't mean that efficiency 81 00:04:03,220 --> 00:04:05,760 isn't of paramount concern. 82 00:04:05,760 --> 00:04:08,690 The fact of matter is that you can, maybe, 83 00:04:08,690 --> 00:04:13,550 scan a billion elements in a matter of seconds. 84 00:04:13,550 --> 00:04:17,750 But if you had an algorithm that required cubic complexity, 85 00:04:17,750 --> 00:04:19,899 suddenly you're not talking about 10 raised to 9, 86 00:04:19,899 --> 00:04:22,079 you're talking about 10 raised to 27. 87 00:04:22,079 --> 00:04:24,510 And even current computers can't really 88 00:04:24,510 --> 00:04:30,890 handle those kinds of numbers, so efficiency is a concern. 89 00:04:30,890 --> 00:04:34,820 And as inputs get larger, it becomes more of a concern. 90 00:04:34,820 --> 00:04:35,320 All right? 91 00:04:35,320 --> 00:04:39,398 So we're concerned about-- 92 00:04:39,398 --> 00:04:43,760 93 00:04:43,760 --> 00:04:51,310 --efficient procedures-- for solving large scale problems 94 00:04:51,310 --> 00:04:51,940 in this class. 95 00:04:51,940 --> 00:04:58,140 96 00:04:58,140 --> 00:05:01,640 And we're concerned about scalability, 97 00:05:01,640 --> 00:05:07,030 because-- just as, you know, 1,000 98 00:05:07,030 --> 00:05:09,600 was a big number a couple of decades ago, 99 00:05:09,600 --> 00:05:12,140 and now it's kind of a small number-- it's 100 00:05:12,140 --> 00:05:16,430 quite possible that by the time you guys are professors 101 00:05:16,430 --> 00:05:18,220 teaching this class in some university 102 00:05:18,220 --> 00:05:20,690 that a trillion is going to be a small number. 103 00:05:20,690 --> 00:05:24,430 And we're going to be talking about-- I don't know-- 104 00:05:24,430 --> 00:05:27,520 10 raised to 18 as being something 105 00:05:27,520 --> 00:05:32,620 that we're concerned with from a standpoint of a common case 106 00:05:32,620 --> 00:05:34,510 input for an algorithm. 107 00:05:34,510 --> 00:05:38,120 So scalability is important. 108 00:05:38,120 --> 00:05:41,480 And we want to be able to track how our algorithms are going 109 00:05:41,480 --> 00:05:44,000 to do as inputs get larger and larger. 110 00:05:44,000 --> 00:05:47,210 111 00:05:47,210 --> 00:05:52,180 You going to learn a bunch of different data structures. 112 00:05:52,180 --> 00:05:56,650 We'll call them classic data structures, 113 00:05:56,650 --> 00:06:01,450 like binary search trees, hash tables-- that 114 00:06:01,450 --> 00:06:06,020 are called dictionaries in Python-- and data 115 00:06:06,020 --> 00:06:09,470 structures-- such as balanced binary search trees-- that 116 00:06:09,470 --> 00:06:12,975 are more efficient than just the regular binary search trees. 117 00:06:12,975 --> 00:06:14,350 And these are all data structures 118 00:06:14,350 --> 00:06:18,540 that were invented many decades ago. 119 00:06:18,540 --> 00:06:20,850 But they've stood the test of time, 120 00:06:20,850 --> 00:06:23,530 and they continue to be useful. 121 00:06:23,530 --> 00:06:26,210 We're going to augment these data structures in various ways 122 00:06:26,210 --> 00:06:30,330 to make them more efficient for certain kinds of problems. 123 00:06:30,330 --> 00:06:33,980 And while you're not going to be doing a whole lot of algorithm 124 00:06:33,980 --> 00:06:36,180 design in this class, you will be 125 00:06:36,180 --> 00:06:38,335 doing some design and a whole lot of analysis. 126 00:06:38,335 --> 00:06:40,880 127 00:06:40,880 --> 00:06:46,060 The class following this one, 6.046 Designing Analysis 128 00:06:46,060 --> 00:06:48,530 of Algorithms, is a class that you 129 00:06:48,530 --> 00:06:52,080 should take if you like this one. 130 00:06:52,080 --> 00:06:57,180 And you can do a whole lot more design of algorithms in 6.046. 131 00:06:57,180 --> 00:06:59,880 But you will look at classic data structures 132 00:06:59,880 --> 00:07:06,260 and classical algorithms for these data structures, 133 00:07:06,260 --> 00:07:12,470 including things like sorting and matching, and so on. 134 00:07:12,470 --> 00:07:17,200 And one of the nice things about this class 135 00:07:17,200 --> 00:07:21,800 is that you'll be doing real implementations of these data 136 00:07:21,800 --> 00:07:25,130 structures and algorithms in Python. 137 00:07:25,130 --> 00:07:28,220 138 00:07:28,220 --> 00:07:30,880 And in particular are each of the problem 139 00:07:30,880 --> 00:07:38,680 sets in this class are going to have both a theory 140 00:07:38,680 --> 00:07:41,930 part to them, and a programming part to them. 141 00:07:41,930 --> 00:07:43,430 So hopefully it'll all tie together. 142 00:07:43,430 --> 00:07:46,060 The kinds of things we're going to be talking about in lectures 143 00:07:46,060 --> 00:07:51,200 and recitations are going to be directly connected 144 00:07:51,200 --> 00:07:53,260 to the theory parts of the problem sets. 145 00:07:53,260 --> 00:07:55,800 And you'll be programming the algorithms that we talk about 146 00:07:55,800 --> 00:07:58,680 in lecture, or augmenting them, running them. 147 00:07:58,680 --> 00:08:03,180 Figuring out whether they work well on large inputs or not. 148 00:08:03,180 --> 00:08:06,510 149 00:08:06,510 --> 00:08:09,530 So let me talk a little bit about the modules 150 00:08:09,530 --> 00:08:11,462 in this class and the problem sets. 151 00:08:11,462 --> 00:08:12,920 And we hope that these problem sets 152 00:08:12,920 --> 00:08:15,470 are going to be fun for you. 153 00:08:15,470 --> 00:08:19,430 And by fun I don't mean easy. 154 00:08:19,430 --> 00:08:22,656 I mean challenging and worthwhile, so at the end of it 155 00:08:22,656 --> 00:08:24,280 you feel like you've learned something, 156 00:08:24,280 --> 00:08:26,870 and you had some fun along the way. 157 00:08:26,870 --> 00:08:28,580 All right? 158 00:08:28,580 --> 00:08:30,550 So content wise-- 159 00:08:30,550 --> 00:08:37,350 160 00:08:37,350 --> 00:08:41,830 --we have eight modules in the class. 161 00:08:41,830 --> 00:08:44,490 Each of which, roughly speaking, has 162 00:08:44,490 --> 00:08:47,020 a problem set associated with it. 163 00:08:47,020 --> 00:08:51,950 The first of these is what we call algorithmic thinking. 164 00:08:51,950 --> 00:08:55,710 165 00:08:55,710 --> 00:08:59,130 And we'll kick start that one today. 166 00:08:59,130 --> 00:09:01,480 We'll look at a particular problem, as I mentioned, 167 00:09:01,480 --> 00:09:02,790 of peak finding. 168 00:09:02,790 --> 00:09:04,350 And as part of this, you're going 169 00:09:04,350 --> 00:09:07,960 to have a problem set that's going to go out today as well. 170 00:09:07,960 --> 00:09:12,320 And you'll find that in this problem set 171 00:09:12,320 --> 00:09:14,420 some of these algorithms I talk about today will 172 00:09:14,420 --> 00:09:17,090 be coded in Python and given to. 173 00:09:17,090 --> 00:09:20,190 A couple of them are going to have bugs in them. 174 00:09:20,190 --> 00:09:24,340 You'll have to analyze the complexity of these algorithms; 175 00:09:24,340 --> 00:09:27,380 figure out which ones are correct and efficient; 176 00:09:27,380 --> 00:09:29,760 and write a proof for one of them. 177 00:09:29,760 --> 00:09:30,260 All right? 178 00:09:30,260 --> 00:09:33,320 So that's sort of an example problem set. 179 00:09:33,320 --> 00:09:37,600 And you can expect that most of the problem sets 180 00:09:37,600 --> 00:09:40,036 are going to follow that sort of template. 181 00:09:40,036 --> 00:09:40,750 All right. 182 00:09:40,750 --> 00:09:44,810 So you'll get a better sense of this 183 00:09:44,810 --> 00:09:46,690 by the end of the day today for sure. 184 00:09:46,690 --> 00:09:48,930 Or a concrete sense of this, because we'll 185 00:09:48,930 --> 00:09:52,850 be done with lecture and you'll see your first problem set. 186 00:09:52,850 --> 00:09:57,540 We're going to be doing a module on sorting and trees. 187 00:09:57,540 --> 00:10:00,619 Sorting you now about, sorting a bunch of numbers. 188 00:10:00,619 --> 00:10:02,160 Imagine if you had a trillion numbers 189 00:10:02,160 --> 00:10:04,250 and you wanted to sort them. 190 00:10:04,250 --> 00:10:07,610 What kind of algorithm could use for that? 191 00:10:07,610 --> 00:10:10,280 Trees are a wonderful data structure. 192 00:10:10,280 --> 00:10:14,760 There's different varieties, the most common being binary trees. 193 00:10:14,760 --> 00:10:17,580 And there's ways of doing all sorts of things, 194 00:10:17,580 --> 00:10:22,560 like scheduling, and sorting, using various kinds of trees, 195 00:10:22,560 --> 00:10:24,200 including binary trees. 196 00:10:24,200 --> 00:10:31,330 And we have a problem set on simulating a logic network 197 00:10:31,330 --> 00:10:36,660 using a particular kind of sorting algorithm in a data 198 00:10:36,660 --> 00:10:38,340 structure. 199 00:10:38,340 --> 00:10:41,150 That is going to be your second problem set. 200 00:10:41,150 --> 00:10:47,190 And more quickly, we're going to have modules on hashing, 201 00:10:47,190 --> 00:10:51,240 where we do things like genome comparison. 202 00:10:51,240 --> 00:10:56,330 In past terms we compared a human genome to a rat genome, 203 00:10:56,330 --> 00:10:59,350 and discovered they were pretty similar. 204 00:10:59,350 --> 00:11:01,860 99% similar, which is kind of amazing. 205 00:11:01,860 --> 00:11:04,960 But again, these things are so large that you 206 00:11:04,960 --> 00:11:07,590 have to have efficiency in the comparison methods 207 00:11:07,590 --> 00:11:08,460 that you use. 208 00:11:08,460 --> 00:11:11,690 And you'll find that if you don't get the complexity low 209 00:11:11,690 --> 00:11:15,300 enough, you just won't be able to complete-- 210 00:11:15,300 --> 00:11:19,950 your program won't be able to finish running within the time 211 00:11:19,950 --> 00:11:21,260 that your problem set is do. 212 00:11:21,260 --> 00:11:21,760 Right? 213 00:11:21,760 --> 00:11:24,660 Which is a bit of a problem. 214 00:11:24,660 --> 00:11:28,860 So that's something to keep in mind as you test your code. 215 00:11:28,860 --> 00:11:32,140 The fact is that you will get large inputs to run your code. 216 00:11:32,140 --> 00:11:34,960 And you want to keep complexity in mind 217 00:11:34,960 --> 00:11:40,070 as you're coding and thinking about the pseudocode, 218 00:11:40,070 --> 00:11:43,624 if you will, of your algorithm itself. 219 00:11:43,624 --> 00:11:44,790 We will talk about numerics. 220 00:11:44,790 --> 00:11:47,420 221 00:11:47,420 --> 00:11:50,840 A lot of the time we talk about such large numbers 222 00:11:50,840 --> 00:11:54,290 that 32 bits isn't enough. 223 00:11:54,290 --> 00:11:57,130 Or 64 bits isn't enough to represent these numbers. 224 00:11:57,130 --> 00:11:58,910 These numbers have thousands of bits. 225 00:11:58,910 --> 00:12:01,110 A good example is RSA encryption, 226 00:12:01,110 --> 00:12:05,140 that is used in SSL, for example. 227 00:12:05,140 --> 00:12:09,720 And when you go-- use https on websites, 228 00:12:09,720 --> 00:12:12,710 RSA is used at the back end. 229 00:12:12,710 --> 00:12:15,360 And typically you work with prime numbers 230 00:12:15,360 --> 00:12:18,510 that are thousands of bits long in RSA. 231 00:12:18,510 --> 00:12:19,930 So how do you handle that? 232 00:12:19,930 --> 00:12:21,270 How does Python handle that? 233 00:12:21,270 --> 00:12:22,950 How do you write algorithms that can 234 00:12:22,950 --> 00:12:26,270 deal with what are called infinite precision numbers? 235 00:12:26,270 --> 00:12:30,500 So we have a module on numerics in the middle of the term that 236 00:12:30,500 --> 00:12:31,850 talks about that. 237 00:12:31,850 --> 00:12:35,480 Graphs, really a fundamental data structure 238 00:12:35,480 --> 00:12:37,970 in all of computer science. 239 00:12:37,970 --> 00:12:42,610 You might have heard of the famous Rubik's cube assignment 240 00:12:42,610 --> 00:12:43,110 from . 241 00:12:43,110 --> 00:12:46,850 006 a 2 by 2 by 2 Rubik's cube. 242 00:12:46,850 --> 00:12:48,690 What's the minimum number of moves 243 00:12:48,690 --> 00:12:53,240 necessary to go from a given starting configuration 244 00:12:53,240 --> 00:12:56,640 to the final end configuration, where all of the faces-- each 245 00:12:56,640 --> 00:12:58,940 of the faces has uniform color? 246 00:12:58,940 --> 00:13:01,830 And that can be posed as a graph problem. 247 00:13:01,830 --> 00:13:04,052 We'll probably do that one this term. 248 00:13:04,052 --> 00:13:05,760 In previous terms we've done other things 249 00:13:05,760 --> 00:13:07,310 like the 15 puzzle. 250 00:13:07,310 --> 00:13:10,170 And so some of these are tentative. 251 00:13:10,170 --> 00:13:12,420 We definitely know what the first problem set is like, 252 00:13:12,420 --> 00:13:16,420 but the rest of them are, at this moment, tentative. 253 00:13:16,420 --> 00:13:20,340 And to finish up shortest paths. 254 00:13:20,340 --> 00:13:24,660 Again in terms past we've asked you 255 00:13:24,660 --> 00:13:27,380 to write code using a particular algorithm that 256 00:13:27,380 --> 00:13:30,984 finds the shortest path from Caltech to MIT. 257 00:13:30,984 --> 00:13:33,150 This time we may do things a little bit differently. 258 00:13:33,150 --> 00:13:37,150 We were thinking maybe we'll give you a street map of Boston 259 00:13:37,150 --> 00:13:41,360 and go figure out if Paul Revere used 260 00:13:41,360 --> 00:13:44,140 the shortest path to get to where he was going, 261 00:13:44,140 --> 00:13:45,025 or things like that. 262 00:13:45,025 --> 00:13:47,540 We'll try and make it fun. 263 00:13:47,540 --> 00:13:54,420 Dynamic programming is an important algorithm design 264 00:13:54,420 --> 00:14:00,690 technique that's used in many, many problems. 265 00:14:00,690 --> 00:14:04,510 And it can be used to do a variety of things, including 266 00:14:04,510 --> 00:14:06,600 image compression. 267 00:14:06,600 --> 00:14:10,060 How do you compress an image so the number of pixels 268 00:14:10,060 --> 00:14:12,960 reduces, but it still looks like the image 269 00:14:12,960 --> 00:14:15,761 that you started out with, that had many more pixels? 270 00:14:15,761 --> 00:14:16,260 All right? 271 00:14:16,260 --> 00:14:18,970 So you could use dynamic programming for that. 272 00:14:18,970 --> 00:14:23,370 And finally, advanced topics, complexity theory, research 273 00:14:23,370 --> 00:14:25,760 and algorithms. 274 00:14:25,760 --> 00:14:28,590 Hopefully by now-- by this time in the course, 275 00:14:28,590 --> 00:14:30,330 you have been sold on algorithms. 276 00:14:30,330 --> 00:14:32,605 And most, if not all of you, would 277 00:14:32,605 --> 00:14:34,550 want to pursue a carrier in algorithms. 278 00:14:34,550 --> 00:14:37,680 And we'll give you a sense of what else is there. 279 00:14:37,680 --> 00:14:40,364 We're just scratching the surface in this class, 280 00:14:40,364 --> 00:14:42,530 and there's many, many classes that you can possibly 281 00:14:42,530 --> 00:14:47,650 take if you want to continue in-- to learn about algorithms, 282 00:14:47,650 --> 00:14:49,790 or to pursue a career in algorithms. 283 00:14:49,790 --> 00:14:51,580 All right? 284 00:14:51,580 --> 00:14:53,990 So that's the story of the class, 285 00:14:53,990 --> 00:14:55,840 or the synopsis of the class. 286 00:14:55,840 --> 00:15:01,950 And I encourage you to go spend a few minutes on the website. 287 00:15:01,950 --> 00:15:05,850 In particular please read the collaboration policy, and get 288 00:15:05,850 --> 00:15:08,440 a sense of what is expected of you. 289 00:15:08,440 --> 00:15:13,580 What the rules are in terms of doing the problem sets. 290 00:15:13,580 --> 00:15:17,100 And the course grading break down, 291 00:15:17,100 --> 00:15:20,860 the grading policies are all listed on the website as well. 292 00:15:20,860 --> 00:15:23,000 All right. 293 00:15:23,000 --> 00:15:23,870 OK. 294 00:15:23,870 --> 00:15:26,210 So let's get started. 295 00:15:26,210 --> 00:15:28,930 I want to talk about a specific problem. 296 00:15:28,930 --> 00:15:32,000 And talk about algorithms for a specific problem. 297 00:15:32,000 --> 00:15:35,560 We picked this problem, because it's so easy to understand. 298 00:15:35,560 --> 00:15:38,790 And they're fairly straightforward algorithms 299 00:15:38,790 --> 00:15:41,280 that are not particularly efficient to solve 300 00:15:41,280 --> 00:15:42,530 this problem. 301 00:15:42,530 --> 00:15:45,060 And so this is a, kind of, a toy problem. 302 00:15:45,060 --> 00:15:49,660 But like a lot of toy problems, it's 303 00:15:49,660 --> 00:15:55,230 very evocative in that it points out the issues involved 304 00:15:55,230 --> 00:15:57,739 in designing efficient algorithms. 305 00:15:57,739 --> 00:15:59,280 So we'll start with a one dimensional 306 00:15:59,280 --> 00:16:02,395 version of what we call peak finding. 307 00:16:02,395 --> 00:16:05,810 308 00:16:05,810 --> 00:16:10,635 And a peak finder is something in the one dimensional case. 309 00:16:10,635 --> 00:16:14,180 310 00:16:14,180 --> 00:16:18,240 Runs on an array of numbers. 311 00:16:18,240 --> 00:16:22,770 And I'm just putting-- 312 00:16:22,770 --> 00:16:27,020 --symbols for each of these numbers here. 313 00:16:27,020 --> 00:16:31,546 And the numbers are positive, negative. 314 00:16:31,546 --> 00:16:33,170 We'll just assume they're all positive, 315 00:16:33,170 --> 00:16:34,480 it doesn't really matter. 316 00:16:34,480 --> 00:16:38,460 The algorithms we describe will work. 317 00:16:38,460 --> 00:16:41,330 And so we have this one dimensional array 318 00:16:41,330 --> 00:16:43,450 that has nine different positions. 319 00:16:43,450 --> 00:16:47,405 And a through i are numbers. 320 00:16:47,405 --> 00:16:49,910 321 00:16:49,910 --> 00:16:53,030 And we want to find a peak. 322 00:16:53,030 --> 00:16:56,180 And so we have to define what we mean by a peak. 323 00:16:56,180 --> 00:17:00,320 And so, in particular, as an example, 324 00:17:00,320 --> 00:17:07,369 position 2 is a peak if, and only 325 00:17:07,369 --> 00:17:16,520 if, b greater than or equal to a, and b greater than or equal 326 00:17:16,520 --> 00:17:18,020 to c. 327 00:17:18,020 --> 00:17:21,359 So it's really a very local property corresponding 328 00:17:21,359 --> 00:17:22,270 to a peak. 329 00:17:22,270 --> 00:17:25,020 In the one dimensional case, it's trivial. 330 00:17:25,020 --> 00:17:26,220 Look to your left. 331 00:17:26,220 --> 00:17:27,990 Look to your right. 332 00:17:27,990 --> 00:17:31,990 If you are equal or greater than both of the elements 333 00:17:31,990 --> 00:17:35,120 that you see on the left and the right, you're a peak. 334 00:17:35,120 --> 00:17:35,760 OK? 335 00:17:35,760 --> 00:17:38,690 And in the case of the edges, you only 336 00:17:38,690 --> 00:17:40,700 have to look to one side. 337 00:17:40,700 --> 00:17:53,567 So position 9 is a peak if i greater than or equal to h. 338 00:17:53,567 --> 00:17:55,400 So you just have to look to your left there, 339 00:17:55,400 --> 00:17:57,483 because you're all the way on the right hand side. 340 00:17:57,483 --> 00:17:58,270 All right? 341 00:17:58,270 --> 00:18:00,480 So that's it. 342 00:18:00,480 --> 00:18:03,920 And the statement of the problem, the one dimensional 343 00:18:03,920 --> 00:18:13,820 version, is find the peak if it exists. 344 00:18:13,820 --> 00:18:19,490 345 00:18:19,490 --> 00:18:22,070 All right? 346 00:18:22,070 --> 00:18:24,510 That's all there is to it. 347 00:18:24,510 --> 00:18:27,890 I'm going to give you a straightforward algorithm. 348 00:18:27,890 --> 00:18:30,630 And then we'll see if we can improve it. 349 00:18:30,630 --> 00:18:31,270 All right? 350 00:18:31,270 --> 00:18:34,110 You can imagine that the straightforward algorithm is 351 00:18:34,110 --> 00:18:39,440 something that just, you know, walks across the array. 352 00:18:39,440 --> 00:18:43,629 But we need that as a starting point for building something 353 00:18:43,629 --> 00:18:44,420 more sophisticated. 354 00:18:44,420 --> 00:18:49,680 355 00:18:49,680 --> 00:18:57,340 So let's say we start from left and all 356 00:18:57,340 --> 00:19:01,500 we have is one traversal, really. 357 00:19:01,500 --> 00:19:05,360 358 00:19:05,360 --> 00:19:07,930 So let's say we have 1, 2, and then we 359 00:19:07,930 --> 00:19:10,810 have n over 2 over here corresponding 360 00:19:10,810 --> 00:19:14,620 to the middle of this n element array. 361 00:19:14,620 --> 00:19:18,970 And then we have n minus 1, and n. 362 00:19:18,970 --> 00:19:21,090 What I'm interested in doing is, not only 363 00:19:21,090 --> 00:19:24,880 coming up with a straightforward algorithm, 364 00:19:24,880 --> 00:19:29,300 but also precisely characterizing 365 00:19:29,300 --> 00:19:32,030 what its complexity is in relation 366 00:19:32,030 --> 00:19:35,260 to n, which is the number of inputs. 367 00:19:35,260 --> 00:19:35,760 Yeah? 368 00:19:35,760 --> 00:19:36,915 Question? 369 00:19:36,915 --> 00:19:38,456 AUDIENCE: Why do you say if it exists 370 00:19:38,456 --> 00:19:40,348 when the criteria in the [INAUDIBLE] 371 00:19:40,348 --> 00:19:41,397 guarantees [INAUDIBLE]? 372 00:19:41,397 --> 00:19:42,730 PROFESSOR: That's exactly right. 373 00:19:42,730 --> 00:19:44,540 I was going to get to that. 374 00:19:44,540 --> 00:19:50,530 So if you look at the definition of the peak, 375 00:19:50,530 --> 00:19:55,210 then what I have here is greater than or equal to. 376 00:19:55,210 --> 00:19:56,010 OK? 377 00:19:56,010 --> 00:19:59,660 And so this-- That's a great question that was asked. 378 00:19:59,660 --> 00:20:04,470 Why is there "if it exists" in this problem? 379 00:20:04,470 --> 00:20:08,440 Now in the case where I have greater than or equal to, 380 00:20:08,440 --> 00:20:12,310 then-- this is a homework question for you, 381 00:20:12,310 --> 00:20:18,240 and for the rest of you-- argue that any array will always 382 00:20:18,240 --> 00:20:19,610 have a peak. 383 00:20:19,610 --> 00:20:20,790 OK? 384 00:20:20,790 --> 00:20:24,300 Now if you didn't have the greater than or equal to, 385 00:20:24,300 --> 00:20:29,070 and you had a greater than, then can you make that argument? 386 00:20:29,070 --> 00:20:30,120 No, you can't. 387 00:20:30,120 --> 00:20:30,820 Right? 388 00:20:30,820 --> 00:20:33,230 So great question. 389 00:20:33,230 --> 00:20:35,859 In this case it's just a question-- 390 00:20:35,859 --> 00:20:37,400 You would want to modify this problem 391 00:20:37,400 --> 00:20:38,950 statement to find the peak. 392 00:20:38,950 --> 00:20:43,710 But if I had a different definition of a peak-- and this 393 00:20:43,710 --> 00:20:45,850 is part of algorithmic thinking. 394 00:20:45,850 --> 00:20:49,580 You want to be able to create algorithms that are general, 395 00:20:49,580 --> 00:20:52,130 so if the problem definition changes on you, 396 00:20:52,130 --> 00:20:54,300 you still have a starting point to go attack 397 00:20:54,300 --> 00:20:56,500 the second version of the problem. 398 00:20:56,500 --> 00:20:57,310 OK? 399 00:20:57,310 --> 00:21:01,479 So you could eliminate this in the case 400 00:21:01,479 --> 00:21:03,270 of the greater than or equal to definition. 401 00:21:03,270 --> 00:21:05,664 The "if it exists", because a peak will always exist. 402 00:21:05,664 --> 00:21:07,330 But you probably want to argue that when 403 00:21:07,330 --> 00:21:09,950 you want to show the correctness of your algorithm. 404 00:21:09,950 --> 00:21:13,210 And if in fact you had a different definition, 405 00:21:13,210 --> 00:21:19,130 well you would have to create an algorithm that tells you 406 00:21:19,130 --> 00:21:22,310 for sure that a peak doesn't exist, or find 407 00:21:22,310 --> 00:21:23,900 a peak if it exists. 408 00:21:23,900 --> 00:21:24,400 All right? 409 00:21:24,400 --> 00:21:26,300 So that's really the general case. 410 00:21:26,300 --> 00:21:29,830 Many a time it's possible that you're asked to do something, 411 00:21:29,830 --> 00:21:34,990 and you can't actually give an answer to the question, 412 00:21:34,990 --> 00:21:39,335 or find something that satisfies all the constraints required. 413 00:21:39,335 --> 00:21:41,710 And in that case, you want to be able to put up your hand 414 00:21:41,710 --> 00:21:43,470 and say, you know what? 415 00:21:43,470 --> 00:21:44,870 I searched long and hard. 416 00:21:44,870 --> 00:21:46,730 I searched exhaustively. 417 00:21:46,730 --> 00:21:49,930 Here's my argument that I searched exhaustively, 418 00:21:49,930 --> 00:21:51,101 and I couldn't find it. 419 00:21:51,101 --> 00:21:51,600 Right? 420 00:21:51,600 --> 00:21:53,490 If you do that, you get to keep your job. 421 00:21:53,490 --> 00:21:54,580 Right? 422 00:21:54,580 --> 00:21:57,390 Otherwise there's always the case 423 00:21:57,390 --> 00:21:59,060 that you didn't search hard enough. 424 00:21:59,060 --> 00:22:02,310 So it's nice to have that argument. 425 00:22:02,310 --> 00:22:02,810 All right? 426 00:22:02,810 --> 00:22:03,080 Great. 427 00:22:03,080 --> 00:22:04,190 Thanks for the question. 428 00:22:04,190 --> 00:22:05,170 Feel free to interrupt. 429 00:22:05,170 --> 00:22:07,840 Raise your hand, and I'm watching you guys, 430 00:22:07,840 --> 00:22:11,550 and I'm happy to answer questions at any time. 431 00:22:11,550 --> 00:22:14,540 So let's talk about the straightforward algorithm. 432 00:22:14,540 --> 00:22:16,510 The straightforward algorithm is something 433 00:22:16,510 --> 00:22:20,940 that starts from the left and just walks across. 434 00:22:20,940 --> 00:22:24,285 And you might have something that looks like that. 435 00:22:24,285 --> 00:22:24,960 All right? 436 00:22:24,960 --> 00:22:27,830 By that-- By this I mean the numbers are increasing 437 00:22:27,830 --> 00:22:30,730 as you start from the left, the peak is somewhere 438 00:22:30,730 --> 00:22:33,620 in the middle, and then things start decreasing. 439 00:22:33,620 --> 00:22:34,120 Right? 440 00:22:34,120 --> 00:22:39,570 So in this case, you know, this might be the peak. 441 00:22:39,570 --> 00:22:46,950 442 00:22:46,950 --> 00:22:49,550 You also may have a situation where 443 00:22:49,550 --> 00:22:51,240 the peak is all the way on the right, 444 00:22:51,240 --> 00:22:52,780 you started from the left. 445 00:22:52,780 --> 00:22:55,060 And it's 1, 2, 3, 4, 5, 6, literally 446 00:22:55,060 --> 00:22:56,390 in terms of the numbers. 447 00:22:56,390 --> 00:23:01,000 And you're going to look at n elements going all the way 448 00:23:01,000 --> 00:23:04,800 to the right in order to find the peak. 449 00:23:04,800 --> 00:23:07,310 So in the case of the middle you'd 450 00:23:07,310 --> 00:23:10,940 look at n over 2 elements. 451 00:23:10,940 --> 00:23:13,770 452 00:23:13,770 --> 00:23:15,020 If it was right in the middle. 453 00:23:15,020 --> 00:23:18,340 454 00:23:18,340 --> 00:23:26,340 And the complexity, worst case complexity-- 455 00:23:26,340 --> 00:23:29,830 --is what we call theta n. 456 00:23:29,830 --> 00:23:33,580 And it's theta n, because in the worst case, 457 00:23:33,580 --> 00:23:36,294 you may have to look at all n elements. 458 00:23:36,294 --> 00:23:38,710 And that would be the case where you started from the left 459 00:23:38,710 --> 00:23:40,860 and you had to go all the way to the right. 460 00:23:40,860 --> 00:23:43,850 Now remember theta n is essentially something 461 00:23:43,850 --> 00:23:45,830 that's says of the order of n. 462 00:23:45,830 --> 00:23:49,400 So it gives you both the lower bound and an upper bound. 463 00:23:49,400 --> 00:23:52,470 Big [? O ?] of n is just upper bound. 464 00:23:52,470 --> 00:23:53,970 And what we're saying here is, we're 465 00:23:53,970 --> 00:23:58,110 saying this algorithm that starts from the left 466 00:23:58,110 --> 00:24:03,470 is going to, essentially, require in the worst case 467 00:24:03,470 --> 00:24:06,740 something that's a constant times n. 468 00:24:06,740 --> 00:24:07,880 OK? 469 00:24:07,880 --> 00:24:11,210 And you know that constant could be 1. 470 00:24:11,210 --> 00:24:13,400 You could certainly set things up that way. 471 00:24:13,400 --> 00:24:15,860 Or if you had a different kind of algorithm, 472 00:24:15,860 --> 00:24:18,460 maybe you could work on the constant. 473 00:24:18,460 --> 00:24:22,360 But bottom line, we're only concerned, at this moment, 474 00:24:22,360 --> 00:24:24,760 about as asymptotic complexity. 475 00:24:24,760 --> 00:24:29,030 And the asymptotic complexity of this algorithm is linear. 476 00:24:29,030 --> 00:24:29,710 All right? 477 00:24:29,710 --> 00:24:32,150 That make sense? 478 00:24:32,150 --> 00:24:32,930 OK. 479 00:24:32,930 --> 00:24:38,950 So someone help me do better. 480 00:24:38,950 --> 00:24:39,890 How can we do better? 481 00:24:39,890 --> 00:24:43,040 How can we lower the asymptotic complexity 482 00:24:43,040 --> 00:24:46,700 of a one dimensional peak finder? 483 00:24:46,700 --> 00:24:48,450 Anybody want to take a stab at that? 484 00:24:48,450 --> 00:24:48,950 Yeah? 485 00:24:48,950 --> 00:24:50,086 Back there. 486 00:24:50,086 --> 00:24:52,078 AUDIENCE: Do a binary search subset. 487 00:24:52,078 --> 00:24:54,236 You look at the middle, and whatever 488 00:24:54,236 --> 00:24:58,552 is higher-- whichever side is higher, then cut that in half, 489 00:24:58,552 --> 00:25:00,290 because you know there's a peak. 490 00:25:00,290 --> 00:25:00,410 PROFESSOR: On-- 491 00:25:00,410 --> 00:25:01,578 AUDIENCE: For example if you're in the middle 492 00:25:01,578 --> 00:25:03,492 on the right side-- there's a higher number 493 00:25:03,492 --> 00:25:05,116 on the right side-- then you would just 494 00:25:05,116 --> 00:25:06,946 look at that, because you know that your peak's somewhere 495 00:25:06,946 --> 00:25:07,446 in there. 496 00:25:07,446 --> 00:25:08,900 And you continue cutting in half. 497 00:25:08,900 --> 00:25:09,360 PROFESSOR: Excellent! 498 00:25:09,360 --> 00:25:09,859 Excellent! 499 00:25:09,859 --> 00:25:11,200 That's exactly right. 500 00:25:11,200 --> 00:25:14,850 So you can-- You can do something different, which 501 00:25:14,850 --> 00:25:19,240 is essentially try and break up this problem. 502 00:25:19,240 --> 00:25:22,650 Use a divide and conquer strategy, and recursively break 503 00:25:22,650 --> 00:25:26,550 up this one dimensional array into smaller arrays. 504 00:25:26,550 --> 00:25:29,940 And try and get this complexity down. 505 00:25:29,940 --> 00:25:30,440 Yeah? 506 00:25:30,440 --> 00:25:33,239 AUDIENCE: Are we assuming that there's only one peak? 507 00:25:33,239 --> 00:25:34,280 PROFESSOR: No, we're not. 508 00:25:34,280 --> 00:25:34,980 AUDIENCE: OK. 509 00:25:34,980 --> 00:25:39,219 PROFESSOR: It's find a peak if it exists. 510 00:25:39,219 --> 00:25:40,760 And in this case it's, "find a peak", 511 00:25:40,760 --> 00:25:42,610 because of the definition. 512 00:25:42,610 --> 00:25:45,910 We don't really need this as it was discussed. 513 00:25:45,910 --> 00:25:46,660 All right? 514 00:25:46,660 --> 00:25:47,180 OK. 515 00:25:47,180 --> 00:25:49,080 So-- 516 00:25:49,080 --> 00:25:53,392 So that was a great answer, and-- You know this class 517 00:25:53,392 --> 00:25:54,850 after while is going to get boring. 518 00:25:54,850 --> 00:25:55,770 Right? 519 00:25:55,770 --> 00:25:57,650 Every class gets boring. 520 00:25:57,650 --> 00:26:00,777 So we, you know, try and break the monotony here a bit. 521 00:26:00,777 --> 00:26:02,860 And so-- And then the other thing that we realized 522 00:26:02,860 --> 00:26:04,790 was that these seats you're sitting on-- this 523 00:26:04,790 --> 00:26:06,998 is a nice classroom-- but the seats you're sitting on 524 00:26:06,998 --> 00:26:07,750 are kind of hard. 525 00:26:07,750 --> 00:26:08,250 Right? 526 00:26:08,250 --> 00:26:10,787 So what Eric and I did was we decided 527 00:26:10,787 --> 00:26:12,620 we'll help you guys out, especially the ones 528 00:26:12,620 --> 00:26:15,870 who are-- who are interacting with us. 529 00:26:15,870 --> 00:26:17,580 And we have these-- 530 00:26:17,580 --> 00:26:18,610 [LAUGHTER] 531 00:26:18,610 --> 00:26:22,145 --cushions that are 6.006 cushions. 532 00:26:22,145 --> 00:26:25,170 And, you know, that's a 2 by 2 by 2 Rubik's cube here. 533 00:26:25,170 --> 00:26:28,410 And since you answered the first question, you get a cushion. 534 00:26:28,410 --> 00:26:31,510 This is kind of like a Frisbee, but not really. 535 00:26:31,510 --> 00:26:32,010 So-- 536 00:26:32,010 --> 00:26:32,510 [LAUGHTER] 537 00:26:32,510 --> 00:26:35,190 I'm not sure-- I'm not sure I'm going to get it to you. 538 00:26:35,190 --> 00:26:36,565 But the other thing I want to say 539 00:26:36,565 --> 00:26:37,970 is this is not a baseball game. 540 00:26:37,970 --> 00:26:38,469 Right? 541 00:26:38,469 --> 00:26:40,560 Where you just grab the ball as it comes by. 542 00:26:40,560 --> 00:26:43,670 This is meant for him, my friend in the red shirt. 543 00:26:43,670 --> 00:26:45,920 So here you go. 544 00:26:45,920 --> 00:26:46,820 Ah, too bad. 545 00:26:46,820 --> 00:26:47,620 All right. 546 00:26:47,620 --> 00:26:48,580 It is soft. 547 00:26:48,580 --> 00:26:51,255 So, you know, it won't-- it won't hurt you if hits you. 548 00:26:51,255 --> 00:26:51,910 [LAUGHTER] 549 00:26:51,910 --> 00:26:52,540 All right. 550 00:26:52,540 --> 00:26:54,216 So we got a bunch of these. 551 00:26:54,216 --> 00:26:57,300 And raise your hands, you know, going 552 00:26:57,300 --> 00:27:01,025 to ask-- There's going to be-- I think-- There's 553 00:27:01,025 --> 00:27:03,150 some trivial questions that we're going to ask just 554 00:27:03,150 --> 00:27:05,180 to make sure you're awake. 555 00:27:05,180 --> 00:27:07,750 So an answer to that doesn't get you a cushion. 556 00:27:07,750 --> 00:27:10,514 But an answer like-- What's your name? 557 00:27:10,514 --> 00:27:11,180 AUDIENCE: Chase. 558 00:27:11,180 --> 00:27:11,890 PROFESSOR: Chase. 559 00:27:11,890 --> 00:27:15,134 An answer like Chase just gave is-- 560 00:27:15,134 --> 00:27:17,050 that's a good answer to a nontrivial question. 561 00:27:17,050 --> 00:27:18,500 That gets you a cushion. 562 00:27:18,500 --> 00:27:19,290 OK? 563 00:27:19,290 --> 00:27:20,300 All right, great. 564 00:27:20,300 --> 00:27:24,230 So let's put up by Chase's algorithm up here. 565 00:27:24,230 --> 00:27:26,510 I'm going to write it out for the 1D version. 566 00:27:26,510 --> 00:27:41,390 567 00:27:41,390 --> 00:27:45,205 So what we have here is a recursive algorithm. 568 00:27:45,205 --> 00:28:02,967 569 00:28:02,967 --> 00:28:04,800 So the picture you want to keep in your head 570 00:28:04,800 --> 00:28:06,860 is this picture that I put up there. 571 00:28:06,860 --> 00:28:11,010 And this is a divide and conquer algorithm. 572 00:28:11,010 --> 00:28:14,140 You're going to see this over and over-- this paradigm-- 573 00:28:14,140 --> 00:28:17,360 over and over in 6.006. 574 00:28:17,360 --> 00:28:22,745 We're going to look at the n over 2 position. 575 00:28:22,745 --> 00:28:25,990 576 00:28:25,990 --> 00:28:28,700 And we're going to look to the left, 577 00:28:28,700 --> 00:28:31,010 and we're going to look to the right. 578 00:28:31,010 --> 00:28:33,420 And we're going to do that in sequence. 579 00:28:33,420 --> 00:28:33,920 So-- 580 00:28:33,920 --> 00:28:36,680 581 00:28:36,680 --> 00:28:50,950 --if a n over 2 is less than a n over 2 minus 1, then-- 582 00:28:50,950 --> 00:28:54,380 --only look at the left half. 583 00:28:54,380 --> 00:28:57,680 584 00:28:57,680 --> 00:29:04,410 1 through n over 2 minus 1 to look for peak-- for a peak. 585 00:29:04,410 --> 00:29:08,381 586 00:29:08,381 --> 00:29:08,880 All right? 587 00:29:08,880 --> 00:29:10,295 So that's step one. 588 00:29:10,295 --> 00:29:12,170 And you know I could put it on the right hand 589 00:29:12,170 --> 00:29:15,990 side or the left hand side, doesn't really matter. 590 00:29:15,990 --> 00:29:20,311 I chose to do the left hand side first, the left half. 591 00:29:20,311 --> 00:29:24,570 And so what I've done is, through that one step, 592 00:29:24,570 --> 00:29:30,010 if in fact you have that condition-- a n over 2 593 00:29:30,010 --> 00:29:33,630 is less than a n over 2 minus 1-- then you move to your left 594 00:29:33,630 --> 00:29:37,490 and you work on one half of the problem. 595 00:29:37,490 --> 00:29:43,120 But if that's not the case, then if n over-- n over 2 596 00:29:43,120 --> 00:29:48,170 is less than a over n over-- n by 2 plus 1, 597 00:29:48,170 --> 00:29:57,520 then only look at n over 2 plus 1 through n for a peak. 598 00:29:57,520 --> 00:29:59,960 So I haven't bothered writing out all the words. 599 00:29:59,960 --> 00:30:03,480 They're exactly the same as the left hand side. 600 00:30:03,480 --> 00:30:06,160 You just look to the right hand side. 601 00:30:06,160 --> 00:30:10,430 Otherwise if both of these conditions don't fire, 602 00:30:10,430 --> 00:30:12,160 you're actually done. 603 00:30:12,160 --> 00:30:12,660 OK? 604 00:30:12,660 --> 00:30:16,130 That's actually the best case in terms of finishing early, 605 00:30:16,130 --> 00:30:18,340 at least in this recursive step. 606 00:30:18,340 --> 00:30:22,580 Because now the n over 2 position is a peak. 607 00:30:22,580 --> 00:30:27,210 608 00:30:27,210 --> 00:30:30,500 Because what you found is that the n over 2 position 609 00:30:30,500 --> 00:30:34,740 is greater than or equal to both of its adjacent positions, 610 00:30:34,740 --> 00:30:36,850 and that's exactly the definition of a peak. 611 00:30:36,850 --> 00:30:38,430 So you're done. 612 00:30:38,430 --> 00:30:39,350 OK? 613 00:30:39,350 --> 00:30:44,500 So all of this is good. 614 00:30:44,500 --> 00:30:53,307 You want to write an argument that this algorithm is correct. 615 00:30:53,307 --> 00:30:54,890 And I'm not going to bother with that. 616 00:30:54,890 --> 00:30:59,530 I just wave my hands a bit, and you all nodded, 617 00:30:59,530 --> 00:31:01,230 so we're done with that. 618 00:31:01,230 --> 00:31:07,310 But the point being you will see in your problem set 619 00:31:07,310 --> 00:31:11,560 a precise argument for a more complicated algorithm, the 2D 620 00:31:11,560 --> 00:31:12,720 version of this. 621 00:31:12,720 --> 00:31:16,900 And that should be a template for you to go write a proof, 622 00:31:16,900 --> 00:31:19,200 or an argument, a formal argument, 623 00:31:19,200 --> 00:31:21,620 that a particular algorithm is correct. 624 00:31:21,620 --> 00:31:23,550 That it does what it claims to do. 625 00:31:23,550 --> 00:31:30,370 And in this case it's two, three lines of careful reasoning 626 00:31:30,370 --> 00:31:34,520 that essentially say, given the definition of the peak, 627 00:31:34,520 --> 00:31:38,600 that this is going to find a peak in the array 628 00:31:38,600 --> 00:31:39,860 that you're given. 629 00:31:39,860 --> 00:31:40,900 All right? 630 00:31:40,900 --> 00:31:44,910 So we all believe that this algorithm is correct. 631 00:31:44,910 --> 00:31:48,650 Let's talk now about the complexity of this algorithm. 632 00:31:48,650 --> 00:31:50,630 Because the whole point of this algorithm 633 00:31:50,630 --> 00:31:52,700 was because we didn't like this theta 634 00:31:52,700 --> 00:31:56,350 n complexity corresponding to the straightforward algorithm. 635 00:31:56,350 --> 00:31:57,470 So it'd like to do better. 636 00:31:57,470 --> 00:32:08,350 637 00:32:08,350 --> 00:32:10,830 So what I'd like to do is ask one of you 638 00:32:10,830 --> 00:32:14,890 to give me a recurrence relation of the kind, you know, T of n 639 00:32:14,890 --> 00:32:18,040 equals blah, blah, blah. 640 00:32:18,040 --> 00:32:22,310 That would correspond to this recursive algorithm, 641 00:32:22,310 --> 00:32:24,020 this divide and conquer algorithm. 642 00:32:24,020 --> 00:32:29,050 And then using that, I'd like to get to the actual complexity 643 00:32:29,050 --> 00:32:33,280 in terms of what the theta of complexity corresponds to. 644 00:32:33,280 --> 00:32:33,780 Yeah? 645 00:32:33,780 --> 00:32:34,752 Back there? 646 00:32:34,752 --> 00:32:39,680 AUDIENCE: So the worst case scenario if T of n 647 00:32:39,680 --> 00:32:42,795 is going to be some constant amount of time-- 648 00:32:42,795 --> 00:32:43,420 PROFESSOR: Yep. 649 00:32:43,420 --> 00:32:47,116 AUDIENCE: --it takes to investigate whether a certain 650 00:32:47,116 --> 00:32:49,851 element is [INAUDIBLE], plus-- 651 00:32:49,851 --> 00:32:50,713 [COUGH] 652 00:32:50,713 --> 00:32:52,022 --T of n over 2? 653 00:32:52,022 --> 00:32:52,730 PROFESSOR: Great. 654 00:32:52,730 --> 00:32:53,550 Exactly right. 655 00:32:53,550 --> 00:32:54,460 That's exactly right. 656 00:32:54,460 --> 00:32:58,370 So if you look at this algorithm and you say, 657 00:32:58,370 --> 00:33:01,290 from a computation standpoint, can I 658 00:33:01,290 --> 00:33:05,510 write an equation corresponding to the execution 659 00:33:05,510 --> 00:33:06,570 of this algorithm? 660 00:33:06,570 --> 00:33:11,350 And you say, T of n is the work that this algorithm does on-- 661 00:33:11,350 --> 00:33:13,630 as input of size n. 662 00:33:13,630 --> 00:33:14,130 OK? 663 00:33:14,130 --> 00:33:25,390 664 00:33:25,390 --> 00:33:28,550 Then I can write this equation. 665 00:33:28,550 --> 00:33:31,310 666 00:33:31,310 --> 00:33:34,530 And this theta 1 corresponds to the two comparisons 667 00:33:34,530 --> 00:33:37,697 that you do looking at-- potentially the two comparisons 668 00:33:37,697 --> 00:33:39,280 that you do-- looking at the left hand 669 00:33:39,280 --> 00:33:41,440 side and the right hand side. 670 00:33:41,440 --> 00:33:44,580 So that's-- 2 is a constant, so that's why we put theta 1. 671 00:33:44,580 --> 00:33:45,200 All right? 672 00:33:45,200 --> 00:33:47,060 So you get a cushion, too. 673 00:33:47,060 --> 00:33:49,630 Watch out guys. 674 00:33:49,630 --> 00:33:50,780 Whoa! 675 00:33:50,780 --> 00:33:52,192 Oh actually that wasn't so bad. 676 00:33:52,192 --> 00:33:54,000 Good. 677 00:33:54,000 --> 00:33:55,620 Veers left, Eric. 678 00:33:55,620 --> 00:33:57,420 Veers left. 679 00:33:57,420 --> 00:34:03,360 So if you take this and you start expanding it, 680 00:34:03,360 --> 00:34:05,180 eventually you're going to get to the base 681 00:34:05,180 --> 00:34:12,091 case, which is T of 1 is theta 1. 682 00:34:12,091 --> 00:34:12,590 Right? 683 00:34:12,590 --> 00:34:16,580 Because you have a one element array you just for that array 684 00:34:16,580 --> 00:34:19,650 it's just going to return that as a peak. 685 00:34:19,650 --> 00:34:23,130 And so if you do that, and you expand it all the way out, 686 00:34:23,130 --> 00:34:31,080 then you can write T of n equals theta 1 plus theta 1. 687 00:34:31,080 --> 00:34:39,300 And you're going to do this log to the base 2 of n times. 688 00:34:39,300 --> 00:34:43,660 And adding these all up, gives you 689 00:34:43,660 --> 00:34:46,360 a complexity theta log 2 of n. 690 00:34:46,360 --> 00:34:48,330 Right? 691 00:34:48,330 --> 00:34:53,089 So now you compare this with that. 692 00:34:53,089 --> 00:34:54,630 And there's really a huge difference. 693 00:34:54,630 --> 00:34:57,440 There's an exponential difference. 694 00:34:57,440 --> 00:35:01,860 If you coded up this algorithm in Python-- 695 00:35:01,860 --> 00:35:06,170 and I did-- both these algorithms for the 1D version-- 696 00:35:06,170 --> 00:35:14,160 and if you run it on n being 10 million or so, 697 00:35:14,160 --> 00:35:17,820 then this algorithm takes 13 seconds. 698 00:35:17,820 --> 00:35:18,320 OK? 699 00:35:18,320 --> 00:35:21,880 The-- The theta 10 algorithm takes 13 seconds. 700 00:35:21,880 --> 00:35:26,070 And this one takes 0.001 seconds. 701 00:35:26,070 --> 00:35:26,570 OK? 702 00:35:26,570 --> 00:35:27,929 Huge difference. 703 00:35:27,929 --> 00:35:30,345 So there is a big difference between theta n and theta log 704 00:35:30,345 --> 00:35:31,970 n. 705 00:35:31,970 --> 00:35:35,840 It's literally the difference between 2 raised to n, and n. 706 00:35:35,840 --> 00:35:40,120 It makes sense to try and reduce complexity 707 00:35:40,120 --> 00:35:43,000 as you can see, especially if you're 708 00:35:43,000 --> 00:35:44,450 talking about large inputs. 709 00:35:44,450 --> 00:35:45,390 All right? 710 00:35:45,390 --> 00:35:48,860 And you'll see that more clearly as we 711 00:35:48,860 --> 00:35:51,300 go to a 2D version of this problem. 712 00:35:51,300 --> 00:35:52,202 All right? 713 00:35:52,202 --> 00:35:53,910 So you can't really do better for the 1D. 714 00:35:53,910 --> 00:35:56,750 The 1D is a straightforward problem. 715 00:35:56,750 --> 00:35:58,500 It gets a little more interesting-- 716 00:35:58,500 --> 00:36:01,080 the problems get a little-- excuse me, 717 00:36:01,080 --> 00:36:03,600 the algorithms get a little more sophisticated 718 00:36:03,600 --> 00:36:08,340 when we look at a 2D version of peak finding. 719 00:36:08,340 --> 00:36:10,535 So let's talk about the 2D version. 720 00:36:10,535 --> 00:36:15,810 721 00:36:15,810 --> 00:36:18,250 So as you can imagine in the 2D version 722 00:36:18,250 --> 00:36:20,715 you have a matrix, or a two dimensional array. 723 00:36:20,715 --> 00:36:23,490 724 00:36:23,490 --> 00:36:29,575 And we'll say this thing has n rows and m columns. 725 00:36:29,575 --> 00:36:34,700 726 00:36:34,700 --> 00:36:37,190 And now we have to define what a peak is. 727 00:36:37,190 --> 00:36:38,350 And it's a hill. 728 00:36:38,350 --> 00:36:41,540 It's the obvious definition of a peak. 729 00:36:41,540 --> 00:36:50,490 So if you had a in here, c, b, d, e. 730 00:36:50,490 --> 00:37:02,250 Then as you can guess, a is a 2D peak if, and only if, 731 00:37:02,250 --> 00:37:08,830 a greater than or equal to b; a greater than or equal to d, c 732 00:37:08,830 --> 00:37:10,061 and e. 733 00:37:10,061 --> 00:37:10,560 All right? 734 00:37:10,560 --> 00:37:12,230 So it's a little hill up there. 735 00:37:12,230 --> 00:37:12,730 All right? 736 00:37:12,730 --> 00:37:15,120 And again I've used the greater than or equal to here, 737 00:37:15,120 --> 00:37:18,490 so that's similar to the 1D in the case 738 00:37:18,490 --> 00:37:21,345 that you'll always find a peak in any 2D matrix. 739 00:37:21,345 --> 00:37:23,960 740 00:37:23,960 --> 00:37:29,210 Now again I'll give you the straightforward algorithm, 741 00:37:29,210 --> 00:37:31,640 and we'll call it the Greedy Ascent algorithm. 742 00:37:31,640 --> 00:37:41,660 743 00:37:41,660 --> 00:37:45,820 And the Greedy Ascent algorithm essentially picks a direction 744 00:37:45,820 --> 00:37:50,560 and, you know, tries to follow that direction in order 745 00:37:50,560 --> 00:37:52,770 to find a peak. 746 00:37:52,770 --> 00:38:01,840 So for example, if I had this particular-- 747 00:38:01,840 --> 00:38:10,790 --matrix; 14, 13, 12, 15, 9, 11, 17-- 748 00:38:10,790 --> 00:38:17,010 749 00:38:17,010 --> 00:38:20,850 Then what might happen is if I started at some arbitrary 750 00:38:20,850 --> 00:38:23,360 midpoint-- So the Greedy Ascent algorithm 751 00:38:23,360 --> 00:38:26,210 has to make choices as to where to start. 752 00:38:26,210 --> 00:38:29,142 Just like we had different cases here, 753 00:38:29,142 --> 00:38:31,100 you have to make a choice as to where to start. 754 00:38:31,100 --> 00:38:32,770 You might want to start in the middle, 755 00:38:32,770 --> 00:38:35,560 and you might want to work your way left first. 756 00:38:35,560 --> 00:38:38,380 Or you're going to all-- You just keep going left, 757 00:38:38,380 --> 00:38:39,720 our keep going right. 758 00:38:39,720 --> 00:38:42,340 And if you hit an edge, you go down. 759 00:38:42,340 --> 00:38:46,450 So you make some choices as to what the default traversal 760 00:38:46,450 --> 00:38:47,810 directions are. 761 00:38:47,810 --> 00:38:50,820 And so if you say you want to start with 12, 762 00:38:50,820 --> 00:38:54,050 you are going to go look for something to left. 763 00:38:54,050 --> 00:38:58,470 And if it's greater than, you're going to follow that direction. 764 00:38:58,470 --> 00:39:00,950 If it's not, if it's less, then you're 765 00:39:00,950 --> 00:39:04,200 going to go in the other direction, in this case, 766 00:39:04,200 --> 00:39:05,160 for example. 767 00:39:05,160 --> 00:39:13,120 So in this case you'll go to 12, 13 , 14, 15, 16, 17, 19, 768 00:39:13,120 --> 00:39:14,230 and 20. 769 00:39:14,230 --> 00:39:17,765 And you'd find-- You 'd find this peak. 770 00:39:17,765 --> 00:39:21,680 Now I haven't given you the specific details 771 00:39:21,680 --> 00:39:23,750 of a Greedy Ascent algorithm. 772 00:39:23,750 --> 00:39:33,400 But I think if you look at the worst case possibilities 773 00:39:33,400 --> 00:39:36,370 here, with respect to a given matrix, 774 00:39:36,370 --> 00:39:38,920 and for any given starting point, 775 00:39:38,920 --> 00:39:43,270 and for any given strategy-- in terms of choosing left first, 776 00:39:43,270 --> 00:39:48,630 versus right first, or down first versus up first-- 777 00:39:48,630 --> 00:39:51,370 you will have a situation where-- just 778 00:39:51,370 --> 00:39:55,450 like we had in the 1D case-- you may end up 779 00:39:55,450 --> 00:40:02,015 touching a large fraction of the elements in this 2D array. 780 00:40:02,015 --> 00:40:02,750 OK? 781 00:40:02,750 --> 00:40:05,190 So in this case, we ended up, you know, 782 00:40:05,190 --> 00:40:06,890 touching a bunch of different elements. 783 00:40:06,890 --> 00:40:10,529 And it's quite possible that you could end up touching-- 784 00:40:10,529 --> 00:40:12,820 starting from the midpoint-- you could up touching half 785 00:40:12,820 --> 00:40:16,990 the elements, and in some cases, touching all the elements. 786 00:40:16,990 --> 00:40:23,000 So if you do a worst case analysis of this algorithm-- 787 00:40:23,000 --> 00:40:25,410 a particular algorithm with particular choices in terms 788 00:40:25,410 --> 00:40:30,370 of the starting point and the direction of search-- 789 00:40:30,370 --> 00:40:33,750 a Greedy Ascent algorithm would have theta n m complexity. 790 00:40:33,750 --> 00:40:34,320 All right? 791 00:40:34,320 --> 00:40:42,480 And in the case where n equals m, or m equals n, 792 00:40:42,480 --> 00:40:44,840 you'd have theta n squared complexity. 793 00:40:44,840 --> 00:40:46,290 OK? 794 00:40:46,290 --> 00:40:48,440 I won't spend very much time on this, 795 00:40:48,440 --> 00:40:52,150 because I want to talk to you about the divide 796 00:40:52,150 --> 00:40:58,020 and conquer versions of this algorithm for the 2D peak. 797 00:40:58,020 --> 00:41:00,860 But hopefully you're all with me with respect 798 00:41:00,860 --> 00:41:03,530 to what the worst case complexity is. 799 00:41:03,530 --> 00:41:04,990 All right? 800 00:41:04,990 --> 00:41:06,070 People buy that? 801 00:41:06,070 --> 00:41:06,570 Yeah. 802 00:41:06,570 --> 00:41:07,390 Question back there. 803 00:41:07,390 --> 00:41:09,264 AUDIENCE: Can you-- Is that an approximation? 804 00:41:09,264 --> 00:41:14,630 Or can you actually get to n times m traversals? 805 00:41:14,630 --> 00:41:18,780 PROFESSOR: So there are specific Greedy Ascent algorithms, 806 00:41:18,780 --> 00:41:21,680 and specific matrices where, if I give you 807 00:41:21,680 --> 00:41:24,680 the code for the algorithm, and I give you a specific matrix, 808 00:41:24,680 --> 00:41:28,200 that I could make you touch all of these elements. 809 00:41:28,200 --> 00:41:28,870 That's correct. 810 00:41:28,870 --> 00:41:30,600 So we're talking about worst case. 811 00:41:30,600 --> 00:41:32,260 You're being very paranoid when you 812 00:41:32,260 --> 00:41:34,540 talk about worst case complexity. 813 00:41:34,540 --> 00:41:38,800 And so I'm-- hand waving a bit here, 814 00:41:38,800 --> 00:41:41,150 simply because I haven't given you the specifics 815 00:41:41,150 --> 00:41:42,150 of the algorithm yet. 816 00:41:42,150 --> 00:41:42,650 Right? 817 00:41:42,650 --> 00:41:44,669 This is really a set of algorithms, 818 00:41:44,669 --> 00:41:46,210 because I haven't given you the code, 819 00:41:46,210 --> 00:41:47,668 I haven't told you where it starts, 820 00:41:47,668 --> 00:41:49,050 and which direction it goes. 821 00:41:49,050 --> 00:41:52,380 But you go, do that, fix it, and I 822 00:41:52,380 --> 00:41:55,380 would be the person who tries to find the worst case complexity. 823 00:41:55,380 --> 00:41:58,250 Suddenly it's very easy to get to theta n 824 00:41:58,250 --> 00:42:03,140 m in terms of having some constant multiplying n times m. 825 00:42:03,140 --> 00:42:05,810 But you can definitely get to that constant 826 00:42:05,810 --> 00:42:08,520 being very close to 1. 827 00:42:08,520 --> 00:42:09,910 OK? 828 00:42:09,910 --> 00:42:11,350 If not 1. 829 00:42:11,350 --> 00:42:12,190 All right. 830 00:42:12,190 --> 00:42:14,480 So let's talk about divide and conquer. 831 00:42:14,480 --> 00:42:18,720 And let's say that I did something 832 00:42:18,720 --> 00:42:22,770 like this, where I just tried to jam the binary search 833 00:42:22,770 --> 00:42:26,340 algorithm into the 2D version. 834 00:42:26,340 --> 00:42:26,840 All right? 835 00:42:26,840 --> 00:42:37,780 836 00:42:37,780 --> 00:42:43,830 So what I'm going to do is-- 837 00:42:43,830 --> 00:42:55,430 --I'm going to pick the middle column, j equals m over 2. 838 00:42:55,430 --> 00:43:00,710 And I'm going to find a 1D peak using 839 00:43:00,710 --> 00:43:01,810 whatever algorithm I want. 840 00:43:01,810 --> 00:43:04,820 And I'll probably end up using the more efficient algorithm, 841 00:43:04,820 --> 00:43:07,850 the binary search version that's gone 842 00:43:07,850 --> 00:43:10,530 all the way to the left of the board there. 843 00:43:10,530 --> 00:43:14,000 And let's say I find a binary peak at (i, j). 844 00:43:14,000 --> 00:43:17,060 Because I've picked a column, and I'm just finding a 1D peak. 845 00:43:17,060 --> 00:43:20,320 846 00:43:20,320 --> 00:43:23,550 So this is j equals m over 2. 847 00:43:23,550 --> 00:43:25,690 That's i. 848 00:43:25,690 --> 00:43:29,850 Now I use (i,j). 849 00:43:29,850 --> 00:43:38,730 In particular row i as a start-- 850 00:43:38,730 --> 00:43:42,310 --to find a 1D peak on row i. 851 00:43:42,310 --> 00:43:47,470 852 00:43:47,470 --> 00:43:50,041 And I stand up here, I'm really happy. 853 00:43:50,041 --> 00:43:50,540 OK? 854 00:43:50,540 --> 00:43:53,440 Because I say, wow. 855 00:43:53,440 --> 00:43:56,850 I picked a middle column, I found a 1D peak, 856 00:43:56,850 --> 00:44:01,350 that is theta m complexity to find a 1D peak as we argued. 857 00:44:01,350 --> 00:44:06,665 And one side-- the theta m-- 858 00:44:06,665 --> 00:44:07,659 AUDIENCE: Log n. 859 00:44:07,659 --> 00:44:08,700 PROFESSOR: Oh, I'm sorry. 860 00:44:08,700 --> 00:44:09,730 You're right. 861 00:44:09,730 --> 00:44:13,490 The log n complexity, that's what this was. 862 00:44:13,490 --> 00:44:15,031 So I do have that here. 863 00:44:15,031 --> 00:44:15,530 Yeah. 864 00:44:15,530 --> 00:44:16,470 Log n complexity. 865 00:44:16,470 --> 00:44:18,920 Thanks, Eric. 866 00:44:18,920 --> 00:44:26,130 And then once I do that, I can find a 1D peak on row i. 867 00:44:26,130 --> 00:44:28,690 In this case row i would be m wide, 868 00:44:28,690 --> 00:44:30,630 so it would be log m complexity. 869 00:44:30,630 --> 00:44:33,840 If n equals m, then I have a couple of steps of log n, 870 00:44:33,840 --> 00:44:35,050 and I'm done. 871 00:44:35,050 --> 00:44:36,030 All right? 872 00:44:36,030 --> 00:44:38,320 Am I done? 873 00:44:38,320 --> 00:44:39,640 No. 874 00:44:39,640 --> 00:44:42,770 Can someone tell me why I'm not done? 875 00:44:42,770 --> 00:44:43,270 Precisely? 876 00:44:43,270 --> 00:44:43,947 Yep. 877 00:44:43,947 --> 00:44:46,841 AUDIENCE: Because when you do the second part 878 00:44:46,841 --> 00:44:50,155 to find the peak in row i, you might not 879 00:44:50,155 --> 00:44:52,987 have that column peak-- There might not 880 00:44:52,987 --> 00:44:54,320 be a peak on the column anymore. 881 00:44:54,320 --> 00:44:56,240 PROFESSOR: That's exactly correct. 882 00:44:56,240 --> 00:44:59,280 So this algorithm is incorrect. 883 00:44:59,280 --> 00:44:59,780 OK? 884 00:44:59,780 --> 00:45:01,460 It doesn't work. 885 00:45:01,460 --> 00:45:04,380 It's efficient, but incorrect. 886 00:45:04,380 --> 00:45:05,390 OK? 887 00:45:05,390 --> 00:45:07,215 It's-- You want to be correct. 888 00:45:07,215 --> 00:45:09,640 You know being correcting and inefficient 889 00:45:09,640 --> 00:45:13,580 is definitely better than being inefficient-- I'm sorry. 890 00:45:13,580 --> 00:45:15,790 Being incorrect and efficient. 891 00:45:15,790 --> 00:45:17,870 So this is an efficient algorithm, 892 00:45:17,870 --> 00:45:22,077 in the sense that it will only take log n time, 893 00:45:22,077 --> 00:45:22,910 but it doesn't work. 894 00:45:22,910 --> 00:45:25,620 And I'll give you a simple example 895 00:45:25,620 --> 00:45:27,650 here where it doesn't work. 896 00:45:27,650 --> 00:45:32,490 897 00:45:32,490 --> 00:45:35,680 The problem is-- 898 00:45:35,680 --> 00:45:39,960 --a 2D peak-- 899 00:45:39,960 --> 00:45:44,150 --may not exist-- 900 00:45:44,150 --> 00:45:46,090 --on row i. 901 00:45:46,090 --> 00:45:47,700 And here's an example of that. 902 00:45:47,700 --> 00:45:53,640 903 00:45:53,640 --> 00:45:58,360 Actually this is-- This is exactly the example of that. 904 00:45:58,360 --> 00:46:02,690 Let's say that I started with this row. 905 00:46:02,690 --> 00:46:05,057 Since it's-- I'm starting with the middle row, 906 00:46:05,057 --> 00:46:06,890 and I could start with this one or that one. 907 00:46:06,890 --> 00:46:10,640 Let's say I started with that one. 908 00:46:10,640 --> 00:46:16,350 I end up finding a peak. 909 00:46:16,350 --> 00:46:22,330 And if this were 10 up here, I'd choose 12 as a peak. 910 00:46:22,330 --> 00:46:25,856 And it's quite possible that I return 12 as a peak. 911 00:46:25,856 --> 00:46:27,900 Even though 19 is bigger, because 12 912 00:46:27,900 --> 00:46:30,370 is a peak given 10 and 11 up here. 913 00:46:30,370 --> 00:46:33,060 And then when I choose this particular row, 914 00:46:33,060 --> 00:46:36,720 and I find a peak on this row, it would be 14. 915 00:46:36,720 --> 00:46:38,870 That is a 1D peak on this row. 916 00:46:38,870 --> 00:46:41,840 But 14 is not a 2D peak. 917 00:46:41,840 --> 00:46:42,790 OK? 918 00:46:42,790 --> 00:46:47,402 So this particular example, 14 would return 14. 919 00:46:47,402 --> 00:46:50,740 And 14 is not a 2D peak. 920 00:46:50,740 --> 00:46:53,730 All right? 921 00:46:53,730 --> 00:46:57,460 You can collect your cushion after the class. 922 00:46:57,460 --> 00:47:01,880 So not so good. 923 00:47:01,880 --> 00:47:05,430 Look like an efficient algorithm, but doesn't work. 924 00:47:05,430 --> 00:47:06,180 All right? 925 00:47:06,180 --> 00:47:09,290 So how can we get to something that actually works? 926 00:47:09,290 --> 00:47:14,300 So the last algorithm that I'm going to show you-- 927 00:47:14,300 --> 00:47:16,920 And you'll see four different algorithms in your problem 928 00:47:16,920 --> 00:47:21,260 set-- 929 00:47:21,260 --> 00:47:24,340 --that you'll have to analyze the complexity for and decide 930 00:47:24,340 --> 00:47:28,180 if they're efficient, and if they're correct. 931 00:47:28,180 --> 00:47:33,440 But here's a-- a recursive version 932 00:47:33,440 --> 00:47:37,650 that is better than, in terms of complexity, 933 00:47:37,650 --> 00:47:40,120 than the Greedy Ascent algorithm. 934 00:47:40,120 --> 00:47:43,410 And this one works. 935 00:47:43,410 --> 00:47:46,470 So what I'm going to do is pick a middle column. 936 00:47:46,470 --> 00:47:49,750 937 00:47:49,750 --> 00:47:51,435 j equals m over 2 as before. 938 00:47:51,435 --> 00:47:54,050 939 00:47:54,050 --> 00:48:02,480 I'm going to find the global maximum on column j. 940 00:48:02,480 --> 00:48:05,316 941 00:48:05,316 --> 00:48:06,690 And that's going to be at (i, j). 942 00:48:06,690 --> 00:48:09,580 943 00:48:09,580 --> 00:48:18,230 I'm going to compare (i comma j minus 1), (i comma j), 944 00:48:18,230 --> 00:48:20,440 and (i,j plus 1). 945 00:48:20,440 --> 00:48:23,620 Which means that once I've found the maximum in this row, 946 00:48:23,620 --> 00:48:25,890 all I'm going to look to the left and the right, 947 00:48:25,890 --> 00:48:27,920 and compare. 948 00:48:27,920 --> 00:48:30,825 I'm going to pick the left columns. 949 00:48:30,825 --> 00:48:33,520 950 00:48:33,520 --> 00:48:40,890 If (i comma j minus 1) is greater than (i comma j)-- 951 00:48:40,890 --> 00:48:42,420 and similarly for the right. 952 00:48:42,420 --> 00:48:49,490 953 00:48:49,490 --> 00:48:55,720 And if in fact I-- either of these two conditions 954 00:48:55,720 --> 00:49:00,210 don't fire, and what I have is (i comma j) 955 00:49:00,210 --> 00:49:04,280 is greater than or equal to (i comma j minus 1) 956 00:49:04,280 --> 00:49:07,630 and (i comma j plus 1), then I'm done. 957 00:49:07,630 --> 00:49:12,760 Just like I had for the 1D version. 958 00:49:12,760 --> 00:49:17,500 If (i comma j) is greater than or equal to (i comma 959 00:49:17,500 --> 00:49:26,350 j minus 1), and (i comma j plus 1), that implies (i, j) 960 00:49:26,350 --> 00:49:28,591 is a 2D peak. 961 00:49:28,591 --> 00:49:29,212 OK? 962 00:49:29,212 --> 00:49:30,670 And the reason that is the case, is 963 00:49:30,670 --> 00:49:35,902 because (i comma j) was the maximum element in that column. 964 00:49:35,902 --> 00:49:37,360 So you know that you've compared it 965 00:49:37,360 --> 00:49:41,520 to all of the adjacent elements, looking up and looking down, 966 00:49:41,520 --> 00:49:43,000 that's the maximum element. 967 00:49:43,000 --> 00:49:45,150 Now you've look at the left and the right, 968 00:49:45,150 --> 00:49:47,750 and in fact it's greater than or equal to the elements 969 00:49:47,750 --> 00:49:49,110 on the left and the right. 970 00:49:49,110 --> 00:49:51,290 And so therefore it's a 2D peak. 971 00:49:51,290 --> 00:49:52,270 OK? 972 00:49:52,270 --> 00:49:57,710 So in this case, when you pick the left or the right columns-- 973 00:49:57,710 --> 00:49:59,570 you'll pick one of them-- you're going 974 00:49:59,570 --> 00:50:08,025 to solve the new problem with half the number of columns. 975 00:50:08,025 --> 00:50:16,540 976 00:50:16,540 --> 00:50:17,580 All right? 977 00:50:17,580 --> 00:50:20,965 And again, you have to go through an analysis, 978 00:50:20,965 --> 00:50:24,950 or an argument, to make sure that this algorithm is correct. 979 00:50:24,950 --> 00:50:29,740 But its intuitively correct, simply because it matches 980 00:50:29,740 --> 00:50:33,190 the 1D version much more closely. 981 00:50:33,190 --> 00:50:37,870 And you also have your condition where you break away right 982 00:50:37,870 --> 00:50:41,160 here, where you have a 2D peak, just like the 1D version. 983 00:50:41,160 --> 00:50:43,930 And what you've done is break this matrix up 984 00:50:43,930 --> 00:50:46,190 into half the size. 985 00:50:46,190 --> 00:50:51,090 And that's essentially why this algorithm works. 986 00:50:51,090 --> 00:50:55,806 When you have a single column-- 987 00:50:55,806 --> 00:51:01,070 988 00:51:01,070 --> 00:51:09,610 --find the global maximum and you're done. 989 00:51:09,610 --> 00:51:10,110 All right? 990 00:51:10,110 --> 00:51:12,570 So that's the base case. 991 00:51:12,570 --> 00:51:14,670 So let me end with just writing out 992 00:51:14,670 --> 00:51:17,870 what the recurrence relation for the complexity of this 993 00:51:17,870 --> 00:51:22,481 is, and argue what the overall complexity of this algorithm 994 00:51:22,481 --> 00:51:22,980 is. 995 00:51:22,980 --> 00:51:25,221 996 00:51:25,221 --> 00:51:26,720 And then I'll give you the bad news. 997 00:51:26,720 --> 00:51:30,781 998 00:51:30,781 --> 00:51:31,280 All right. 999 00:51:31,280 --> 00:51:36,260 So overall what you have is, you have something like T of (n, m) 1000 00:51:36,260 --> 00:51:42,570 equals T of (n, m over 2) plus theta n. 1001 00:51:42,570 --> 00:51:43,640 Why is that? 1002 00:51:43,640 --> 00:51:47,830 Well n is the number of rows, m is the number of columns. 1003 00:51:47,830 --> 00:51:51,430 In one case you'll be breaking things down 1004 00:51:51,430 --> 00:51:54,630 into half the number of columns, which is m over 2. 1005 00:51:54,630 --> 00:51:57,520 And in order to find the global maximum, 1006 00:51:57,520 --> 00:52:00,220 you'll be doing theta n work, because you're 1007 00:52:00,220 --> 00:52:01,495 finding the global maximum. 1008 00:52:01,495 --> 00:52:01,995 Right? 1009 00:52:01,995 --> 00:52:05,270 You just have to scan it-- this-- 1010 00:52:05,270 --> 00:52:08,840 That's the way-- That's what it's going to take. 1011 00:52:08,840 --> 00:52:11,960 And so if you do that, and you go run it through-- 1012 00:52:11,960 --> 00:52:16,210 and you know that T of (n, 1) is theta n-- which 1013 00:52:16,210 --> 00:52:20,880 is this last part over here-- that's your base case. 1014 00:52:20,880 --> 00:52:28,560 You get T of (n, m) is theta of n added to theta of n, 1015 00:52:28,560 --> 00:52:34,820 log of m times-- log 2 of m times. 1016 00:52:34,820 --> 00:52:42,250 Which is theta of n-- log 2 of m. 1017 00:52:42,250 --> 00:52:43,640 All right? 1018 00:52:43,640 --> 00:52:48,120 So you're not done with peak finding. 1019 00:52:48,120 --> 00:52:53,082 What you'll see is at four algorithms coded in Python-- 1020 00:52:53,082 --> 00:52:55,290 I'm not going to give away what those algorithms are, 1021 00:52:55,290 --> 00:52:57,090 but you'll have to recognize them. 1022 00:52:57,090 --> 00:53:00,180 You will have seen versions of those algorithms 1023 00:53:00,180 --> 00:53:01,850 already in lecture. 1024 00:53:01,850 --> 00:53:06,210 And your job is going to be to analyze the algorithms, as I 1025 00:53:06,210 --> 00:53:09,690 said before, prove that one of them is correct, 1026 00:53:09,690 --> 00:53:12,784 and find counter-examples for the ones that aren't correct. 1027 00:53:12,784 --> 00:53:14,200 The course staff will stick around 1028 00:53:14,200 --> 00:53:17,110 here to answer questions-- logistical questions-- 1029 00:53:17,110 --> 00:53:18,990 or questions about lecture. 1030 00:53:18,990 --> 00:53:21,650 And I owe that gentleman a cushion.