1 00:00:00,000 --> 00:00:00,050 2 00:00:00,050 --> 00:00:01,770 The following content is provided 3 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 4 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 5 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 6 00:00:10,720 --> 00:00:13,340 To make a donation or view additional materials 7 00:00:13,340 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 9 00:00:17,832 --> 00:00:21,155 10 00:00:21,155 --> 00:00:22,030 PROFESSOR: All right. 11 00:00:22,030 --> 00:00:23,220 So I brought a few problems. 12 00:00:23,220 --> 00:00:26,950 They're obviously not the quiz problems, though some of them 13 00:00:26,950 --> 00:00:29,280 are supposed to be similar. 14 00:00:29,280 --> 00:00:30,880 What I have here might not be what 15 00:00:30,880 --> 00:00:33,260 you have on the quiz because we might drop quiz problems 16 00:00:33,260 --> 00:00:35,280 or because some of them are just meant to make 17 00:00:35,280 --> 00:00:39,310 you think and not to give away the solutions to the quiz. 18 00:00:39,310 --> 00:00:41,490 Now, before we get started on this, 19 00:00:41,490 --> 00:00:43,814 do you guys have any burning questions or any concepts 20 00:00:43,814 --> 00:00:44,730 that you want covered? 21 00:00:44,730 --> 00:00:46,800 Based on that, I'll select which problems we do. 22 00:00:46,800 --> 00:00:47,727 Yes? 23 00:00:47,727 --> 00:00:50,288 AUDIENCE: This actually relates not too much to the Pset. 24 00:00:50,288 --> 00:00:53,040 25 00:00:53,040 --> 00:00:55,725 If you're looking at the time complexity 26 00:00:55,725 --> 00:01:02,360 to maybe transfer something from one table to another, 27 00:01:02,360 --> 00:01:04,164 it takes a lot more time, I would 28 00:01:04,164 --> 00:01:06,980 assume, to move the actual item to the new table 29 00:01:06,980 --> 00:01:09,250 than it does just to look at your point and be like, 30 00:01:09,250 --> 00:01:10,680 oh, there's nothing there. 31 00:01:10,680 --> 00:01:12,187 So if you were just going to look 32 00:01:12,187 --> 00:01:14,330 through an empty table of size m, 33 00:01:14,330 --> 00:01:17,325 the time to look through that empty table, I'm assuming, 34 00:01:17,325 --> 00:01:21,210 is much less than the time to actually move an item. 35 00:01:21,210 --> 00:01:28,650 PROFESSOR: So you're saying we have m things here. 36 00:01:28,650 --> 00:01:29,350 AUDIENCE: Yes. 37 00:01:29,350 --> 00:01:33,050 PROFESSOR: Some might be nil, and some 38 00:01:33,050 --> 00:01:36,260 might have stuff in them, and you're 39 00:01:36,260 --> 00:01:41,166 going to resize that to presumably 2 times m, 40 00:01:41,166 --> 00:01:42,540 and the way you do that is you're 41 00:01:42,540 --> 00:01:45,730 going to move the elements, presumably by rehashing them, 42 00:01:45,730 --> 00:01:46,411 right? 43 00:01:46,411 --> 00:01:46,994 AUDIENCE: Yes. 44 00:01:46,994 --> 00:01:50,170 45 00:01:50,170 --> 00:01:53,560 PROFESSOR: So these elements, at least when we use Python, 46 00:01:53,560 --> 00:01:57,060 we don't really store big elements anywhere. 47 00:01:57,060 --> 00:01:59,000 If you have a big object, we always 48 00:01:59,000 --> 00:02:01,270 work with references to that object. 49 00:02:01,270 --> 00:02:03,280 So you remember the address where 50 00:02:03,280 --> 00:02:06,510 that object lies in memory, and since the memory is 51 00:02:06,510 --> 00:02:10,110 finite and small, addresses are all from 0 to a small number, 52 00:02:10,110 --> 00:02:12,650 so they're constant. 53 00:02:12,650 --> 00:02:15,990 So what you have here is not a big object. 54 00:02:15,990 --> 00:02:17,800 It's the address of the big object, 55 00:02:17,800 --> 00:02:21,470 so moving is always constant time. 56 00:02:21,470 --> 00:02:23,750 AUDIENCE: What I'm saying is let's 57 00:02:23,750 --> 00:02:26,230 say that the table is completely full 58 00:02:26,230 --> 00:02:28,280 versus completely empty table. 59 00:02:28,280 --> 00:02:32,040 It would take more time to move everything out 60 00:02:32,040 --> 00:02:34,510 of the full table than it does just 61 00:02:34,510 --> 00:02:36,515 to look the empty table, right? 62 00:02:36,515 --> 00:02:37,390 PROFESSOR: Let's see. 63 00:02:37,390 --> 00:02:42,300 So writing something here is order 1 time, right? 64 00:02:42,300 --> 00:02:44,350 So moving is order 1 time. 65 00:02:44,350 --> 00:02:47,130 Moving one element is order 1 time. 66 00:02:47,130 --> 00:02:50,242 What's accessing an element in a table in a list? 67 00:02:50,242 --> 00:02:51,200 You have a Python list. 68 00:02:51,200 --> 00:02:53,282 What's the cost of doing an index access? 69 00:02:53,282 --> 00:02:54,740 AUDIENCE: It's also order 1, right? 70 00:02:54,740 --> 00:02:55,790 PROFESSOR: OK. 71 00:02:55,790 --> 00:02:59,890 So order 1, index. 72 00:02:59,890 --> 00:03:02,727 Order 1, move. 73 00:03:02,727 --> 00:03:04,060 Suppose you have an empty table. 74 00:03:04,060 --> 00:03:05,680 How many indices do you do? 75 00:03:05,680 --> 00:03:07,915 How many times the index? 76 00:03:07,915 --> 00:03:09,540 AUDIENCE: You look at each one, so it's 77 00:03:09,540 --> 00:03:12,425 order 1 times the length of the table. 78 00:03:12,425 --> 00:03:14,890 PROFESSOR: m. 79 00:03:14,890 --> 00:03:17,620 So if the table is empty, you have order m indices 80 00:03:17,620 --> 00:03:19,740 and 0 moves. 81 00:03:19,740 --> 00:03:22,970 Total running time, order m. 82 00:03:22,970 --> 00:03:26,070 If you have a full table, how many times 83 00:03:26,070 --> 00:03:27,412 do you index in the table? 84 00:03:27,412 --> 00:03:28,437 AUDIENCE: Still order m. 85 00:03:28,437 --> 00:03:29,020 PROFESSOR: OK. 86 00:03:29,020 --> 00:03:31,330 How many times do you move stuff? 87 00:03:31,330 --> 00:03:34,365 AUDIENCE: Order m. 88 00:03:34,365 --> 00:03:35,615 PROFESSOR: Total running time? 89 00:03:35,615 --> 00:03:38,835 AUDIENCE: It's order 2m, which is order m. 90 00:03:38,835 --> 00:03:40,960 PROFESSOR: So it doesn't matter whether the table's 91 00:03:40,960 --> 00:03:42,198 full or empty. 92 00:03:42,198 --> 00:03:42,739 AUDIENCE: OK. 93 00:03:42,739 --> 00:03:44,160 Just wanted to confirm that. 94 00:03:44,160 --> 00:03:45,784 PROFESSOR: And this is how you do that. 95 00:03:45,784 --> 00:03:47,940 96 00:03:47,940 --> 00:03:48,460 Cool. 97 00:03:48,460 --> 00:03:49,022 Thanks. 98 00:03:49,022 --> 00:03:49,855 Any other questions? 99 00:03:49,855 --> 00:04:00,410 100 00:04:00,410 --> 00:04:04,480 Then we will go over problems in the order in which I like them, 101 00:04:04,480 --> 00:04:06,110 which is easiest to hardest so that I 102 00:04:06,110 --> 00:04:07,690 don't have to explain the hard ones. 103 00:04:07,690 --> 00:04:14,710 104 00:04:14,710 --> 00:04:16,570 Warm up problem one. 105 00:04:16,570 --> 00:04:24,800 106 00:04:24,800 --> 00:04:27,720 So you have this recursion and you have to solve it, 107 00:04:27,720 --> 00:04:40,350 and you get a hint that n to the power of 1 over log n 108 00:04:40,350 --> 00:04:45,540 is 2, which is theta 1. 109 00:04:45,540 --> 00:04:47,130 So based on the hint, you can see 110 00:04:47,130 --> 00:04:50,826 that it's going to involve some math. 111 00:04:50,826 --> 00:04:53,700 It's going to get a bit ugly. 112 00:04:53,700 --> 00:04:55,260 So how do we solve recursions? 113 00:04:55,260 --> 00:04:56,110 Two methods. 114 00:04:56,110 --> 00:04:57,514 What are they? 115 00:04:57,514 --> 00:04:59,927 AUDIENCE: Expand. 116 00:04:59,927 --> 00:05:00,510 PROFESSOR: OK. 117 00:05:00,510 --> 00:05:02,951 Substitution formally, but basically, we 118 00:05:02,951 --> 00:05:04,450 expand this guy over and over again. 119 00:05:04,450 --> 00:05:06,084 And? 120 00:05:06,084 --> 00:05:07,340 AUDIENCE: Trees. 121 00:05:07,340 --> 00:05:09,970 PROFESSOR: Recursion trees. 122 00:05:09,970 --> 00:05:11,610 Which one do guys want to do first? 123 00:05:11,610 --> 00:05:15,080 124 00:05:15,080 --> 00:05:17,780 If you only have one t here, anything 125 00:05:17,780 --> 00:05:22,340 works because you can keep expanding it and that works, 126 00:05:22,340 --> 00:05:24,317 so we can do either method. 127 00:05:24,317 --> 00:05:25,900 Which one do you guys want to go over? 128 00:05:25,900 --> 00:05:28,580 129 00:05:28,580 --> 00:05:29,740 Trees. 130 00:05:29,740 --> 00:05:31,100 OK. 131 00:05:31,100 --> 00:05:36,020 So we start with the first node. 132 00:05:36,020 --> 00:05:39,292 The size of the problem is n. 133 00:05:39,292 --> 00:05:41,732 What's the cost inside here? 134 00:05:41,732 --> 00:05:45,148 135 00:05:45,148 --> 00:05:46,130 AUDIENCE: 1. 136 00:05:46,130 --> 00:05:47,550 PROFESSOR: OK. 137 00:05:47,550 --> 00:05:50,620 So this creates one sub-problem. 138 00:05:50,620 --> 00:05:51,483 What's the size? 139 00:05:51,483 --> 00:05:54,198 140 00:05:54,198 --> 00:05:56,170 AUDIENCE: n to the 1/2. 141 00:05:56,170 --> 00:05:58,130 PROFESSOR: OK. 142 00:05:58,130 --> 00:06:00,530 Square root of n equals n to the 1/2. 143 00:06:00,530 --> 00:06:01,774 You solved it already. 144 00:06:01,774 --> 00:06:02,440 What's the cost? 145 00:06:02,440 --> 00:06:05,279 146 00:06:05,279 --> 00:06:05,779 AUDIENCE: 1. 147 00:06:05,779 --> 00:06:08,641 148 00:06:08,641 --> 00:06:10,660 PROFESSOR: Do people remember this? 149 00:06:10,660 --> 00:06:14,980 Is anyone confused about what's going on here? 150 00:06:14,980 --> 00:06:15,760 OK. 151 00:06:15,760 --> 00:06:19,710 So two terms, something involving t and something 152 00:06:19,710 --> 00:06:21,250 not involving t. 153 00:06:21,250 --> 00:06:24,110 The thing involving t is what we want to get rid of. 154 00:06:24,110 --> 00:06:29,870 When we do our recursion tree, whatever is in here 155 00:06:29,870 --> 00:06:34,030 goes inside here, and this tells me 156 00:06:34,030 --> 00:06:37,920 how this number relates to this number. 157 00:06:37,920 --> 00:06:41,250 So when I go from one level to the next level, 158 00:06:41,250 --> 00:06:42,550 this is the transformation. 159 00:06:42,550 --> 00:06:48,406 n and becomes square root of n, so the transformation here 160 00:06:48,406 --> 00:06:50,030 is the same as the transformation here. 161 00:06:50,030 --> 00:06:57,880 162 00:06:57,880 --> 00:07:00,133 What's next? 163 00:07:00,133 --> 00:07:02,152 AUDIENCE: n to the 1/4. 164 00:07:02,152 --> 00:07:02,735 PROFESSOR: OK. 165 00:07:02,735 --> 00:07:05,500 166 00:07:05,500 --> 00:07:07,828 Cost? 167 00:07:07,828 --> 00:07:09,320 AUDIENCE: 1. 168 00:07:09,320 --> 00:07:10,000 PROFESSOR: OK. 169 00:07:10,000 --> 00:07:12,350 Do we need to do one more or do people see the pattern? 170 00:07:12,350 --> 00:07:16,090 171 00:07:16,090 --> 00:07:17,340 Silence means one more. 172 00:07:17,340 --> 00:07:21,240 If you guys don't speak, we're going to go slow. 173 00:07:21,240 --> 00:07:21,790 What's here? 174 00:07:21,790 --> 00:07:25,213 175 00:07:25,213 --> 00:07:29,125 AUDIENCE: n to the 1/8. 176 00:07:29,125 --> 00:07:31,559 PROFESSOR: What's here? 177 00:07:31,559 --> 00:07:32,059 AUDIENCE: 1. 178 00:07:32,059 --> 00:07:35,200 179 00:07:35,200 --> 00:07:37,170 PROFESSOR: Let's hope everyone saw the pattern, 180 00:07:37,170 --> 00:07:40,590 and suppose we've done this for l levels, 181 00:07:40,590 --> 00:07:43,330 so we're at the bottom. 182 00:07:43,330 --> 00:07:44,950 What should the cost be at the bottom? 183 00:07:44,950 --> 00:07:47,436 184 00:07:47,436 --> 00:07:48,352 AUDIENCE: [INAUDIBLE]. 185 00:07:48,352 --> 00:07:51,412 186 00:07:51,412 --> 00:07:52,120 PROFESSOR: Sorry. 187 00:07:52,120 --> 00:07:53,230 We don't start with the cost. 188 00:07:53,230 --> 00:07:55,438 What should the size of the problem be at the bottom? 189 00:07:55,438 --> 00:08:00,978 190 00:08:00,978 --> 00:08:03,448 AUDIENCE: n to the 1 over 2 to the i. 191 00:08:03,448 --> 00:08:11,015 192 00:08:11,015 --> 00:08:12,390 PROFESSOR: Let's say that this is 193 00:08:12,390 --> 00:08:15,080 level h, where h is the height of the tree. 194 00:08:15,080 --> 00:08:17,871 195 00:08:17,871 --> 00:08:20,204 AUDIENCE: Don't you want to do something like n to the 1 196 00:08:20,204 --> 00:08:20,936 over n? 197 00:08:20,936 --> 00:08:25,120 198 00:08:25,120 --> 00:08:26,420 PROFESSOR: Yeah. 199 00:08:26,420 --> 00:08:30,876 OK, so we want something that looks like that? 200 00:08:30,876 --> 00:08:33,300 AUDIENCE: If the recursion tree is height i, 201 00:08:33,300 --> 00:08:35,299 it is n to the 1 over 2 to the i, but 2 to the i 202 00:08:35,299 --> 00:08:37,267 should equal n, or approximately n. 203 00:08:37,267 --> 00:08:40,835 204 00:08:40,835 --> 00:08:41,459 PROFESSOR: Why? 205 00:08:41,459 --> 00:08:46,960 206 00:08:46,960 --> 00:08:48,520 I like that, but why? 207 00:08:48,520 --> 00:08:54,550 AUDIENCE: Because you need to go down 208 00:08:54,550 --> 00:08:57,690 until you're only looking at one element, 209 00:08:57,690 --> 00:09:02,067 and that would be one nth of the problem. 210 00:09:02,067 --> 00:09:02,650 PROFESSOR: OK. 211 00:09:02,650 --> 00:09:05,968 So we want this guy to look like what? 212 00:09:05,968 --> 00:09:15,450 213 00:09:15,450 --> 00:09:17,990 In fact, it doesn't exactly have to look like 1, but what's 214 00:09:17,990 --> 00:09:22,417 the advantage if we manage to get this guy to look like 1? 215 00:09:22,417 --> 00:09:23,250 We have a recursion. 216 00:09:23,250 --> 00:09:25,940 We don't have a base case here, right? 217 00:09:25,940 --> 00:09:34,780 A reasonable base case is T of 1 is theta 1. 218 00:09:34,780 --> 00:09:37,800 Whatever function that is, if you evaluate it at 1, 219 00:09:37,800 --> 00:09:41,640 you're going to get a constant, so you can say that. 220 00:09:41,640 --> 00:09:45,376 Now, at the same time I can say that for any constant, 221 00:09:45,376 --> 00:09:50,660 c, T of c is theta 1. 222 00:09:50,660 --> 00:09:53,180 223 00:09:53,180 --> 00:09:59,410 So if I take this constant here, which happens to be 2, 224 00:09:59,410 --> 00:10:01,100 but that's not to worry about that. 225 00:10:01,100 --> 00:10:04,320 If I take this guy here, I can put it in here. 226 00:10:04,320 --> 00:10:08,350 227 00:10:08,350 --> 00:10:12,560 And I know that this guy here equals this guy here. 228 00:10:12,560 --> 00:10:17,210 So if I can make this guy here look like this guy here, 229 00:10:17,210 --> 00:10:20,500 then I'm done. 230 00:10:20,500 --> 00:10:21,015 Make sense? 231 00:10:21,015 --> 00:10:23,780 232 00:10:23,780 --> 00:10:25,740 If it makes sense, everyone should nod so 233 00:10:25,740 --> 00:10:30,720 that I know and I can go forward, or smile or something. 234 00:10:30,720 --> 00:10:32,250 So this should look like 1. 235 00:10:32,250 --> 00:10:35,240 This is order 1 if not 1. 236 00:10:35,240 --> 00:10:39,250 Let's make it order 1, because it's 2 in this case. 237 00:10:39,250 --> 00:10:42,495 What's the cost here? 238 00:10:42,495 --> 00:10:43,410 AUDIENCE: 1. 239 00:10:43,410 --> 00:10:44,230 PROFESSOR: 1. 240 00:10:44,230 --> 00:10:47,130 Everything inside the bubbles is order of already, 241 00:10:47,130 --> 00:10:48,913 so I don't need to write an order of. 242 00:10:48,913 --> 00:10:52,380 243 00:10:52,380 --> 00:10:55,120 What do we do next? 244 00:10:55,120 --> 00:10:58,030 AUDIENCE: Solve the [INAUDIBLE] equation. 245 00:10:58,030 --> 00:10:59,485 2 to the [INAUDIBLE]. 246 00:10:59,485 --> 00:11:04,330 247 00:11:04,330 --> 00:11:06,825 PROFESSOR: You're skipping one step. 248 00:11:06,825 --> 00:11:09,200 That's exactly what you do when you have the substitution 249 00:11:09,200 --> 00:11:09,580 method. 250 00:11:09,580 --> 00:11:10,710 You're going to get to something and you 251 00:11:10,710 --> 00:11:11,834 need to solve the equation. 252 00:11:11,834 --> 00:11:14,160 But for the tree, there are two steps. 253 00:11:14,160 --> 00:11:15,990 So we need to add up all the costs here 254 00:11:15,990 --> 00:11:18,220 and that's the total cost here. 255 00:11:18,220 --> 00:11:24,681 In order to do that, first, we sum up over each level. 256 00:11:24,681 --> 00:11:26,180 And in this case, it's really simple 257 00:11:26,180 --> 00:11:27,846 because there's only one node per level, 258 00:11:27,846 --> 00:11:29,690 but if you have multiple nodes per level, 259 00:11:29,690 --> 00:11:31,600 you have to sum up for each level, 260 00:11:31,600 --> 00:11:33,570 and then you have to do a big sum. 261 00:11:33,570 --> 00:11:37,441 What's the sum for this level? 262 00:11:37,441 --> 00:11:37,940 1. 263 00:11:37,940 --> 00:11:40,920 264 00:11:40,920 --> 00:11:41,610 Come on, guys. 265 00:11:41,610 --> 00:11:43,834 You're scaring me. 266 00:11:43,834 --> 00:11:45,325 AUDIENCE: 1. 267 00:11:45,325 --> 00:11:46,319 AUDIENCE: 1. 268 00:11:46,319 --> 00:11:47,320 AUDIENCE: It's all 1. 269 00:11:47,320 --> 00:11:48,564 PROFESSOR: Excellent. 270 00:11:48,564 --> 00:11:49,980 So the only thing that I'm missing 271 00:11:49,980 --> 00:11:53,070 is to know how many levels I have, because the sum is going 272 00:11:53,070 --> 00:11:58,050 to be order h, whatever h is. 273 00:11:58,050 --> 00:11:59,480 How do we do that? 274 00:11:59,480 --> 00:12:02,990 n to the 1 over 2 to the power of h 275 00:12:02,990 --> 00:12:07,030 has to equal this guy, right? 276 00:12:07,030 --> 00:12:08,852 AUDIENCE: Why would it equal that guy? 277 00:12:08,852 --> 00:12:11,122 We know it's less than that guy, but we 278 00:12:11,122 --> 00:12:13,150 don't know it's equal to that guy. 279 00:12:13,150 --> 00:12:15,600 PROFESSOR: We have to make it equal because we can only 280 00:12:15,600 --> 00:12:18,300 stop when we get to the base case. 281 00:12:18,300 --> 00:12:20,210 So we have to expand the recursion tree 282 00:12:20,210 --> 00:12:22,570 until we get to a base case, and then we stop, 283 00:12:22,570 --> 00:12:25,270 and this is our base case because this 284 00:12:25,270 --> 00:12:27,670 is what the problem says should be our base case. 285 00:12:27,670 --> 00:12:29,774 AUDIENCE: Right, but n to the 1 over 2 to the h 286 00:12:29,774 --> 00:12:39,172 is not equal to n to the 1 over log n. 287 00:12:39,172 --> 00:12:41,380 PROFESSOR: Well, we can set h to be whatever we want. 288 00:12:41,380 --> 00:12:44,540 h is the height of the tree, so we don't know what it is. 289 00:12:44,540 --> 00:12:46,166 We have to find out what it is. 290 00:12:46,166 --> 00:12:47,540 AUDIENCE: So let's say 2 to the h 291 00:12:47,540 --> 00:12:52,992 is equal to log n if you want to make it look like that. 292 00:12:52,992 --> 00:12:54,700 PROFESSOR: Let me write down the equation 293 00:12:54,700 --> 00:12:57,030 to make sure you're right. 294 00:12:57,030 --> 00:13:00,010 You're probably right because you're thinking faster than me, 295 00:13:00,010 --> 00:13:04,136 but let me not embarrass myself and do this the right way. 296 00:13:04,136 --> 00:13:09,970 297 00:13:09,970 --> 00:13:14,890 So you said 2 to the h is log n, right? 298 00:13:14,890 --> 00:13:15,640 Looks about right. 299 00:13:15,640 --> 00:13:19,215 So what's h? 300 00:13:19,215 --> 00:13:21,500 AUDIENCE: Log base 2. 301 00:13:21,500 --> 00:13:23,770 PROFESSOR: All right. 302 00:13:23,770 --> 00:13:25,410 Log log n. 303 00:13:25,410 --> 00:13:28,110 304 00:13:28,110 --> 00:13:30,600 So T of n is order h. 305 00:13:30,600 --> 00:13:33,950 We got this from here. 306 00:13:33,950 --> 00:13:43,510 T of n is order h is order of log log n. 307 00:13:43,510 --> 00:13:44,820 Math people drowning, right? 308 00:13:44,820 --> 00:13:48,090 309 00:13:48,090 --> 00:13:49,205 Any questions about this? 310 00:13:49,205 --> 00:13:52,260 311 00:13:52,260 --> 00:13:53,493 Yes? 312 00:13:53,493 --> 00:13:55,710 AUDIENCE: The first line on the right-- 313 00:13:55,710 --> 00:13:56,565 PROFESSOR: This? 314 00:13:56,565 --> 00:13:57,190 AUDIENCE: Yeah. 315 00:13:57,190 --> 00:13:58,148 Is that your base case? 316 00:13:58,148 --> 00:13:59,214 What is that? 317 00:13:59,214 --> 00:14:01,380 PROFESSOR: We got a hint with the problem that said, 318 00:14:01,380 --> 00:14:06,660 n to the power of 1 over log n is 2, which is order 1. 319 00:14:06,660 --> 00:14:09,640 320 00:14:09,640 --> 00:14:12,507 So for the base case, we always want them to look like this. 321 00:14:12,507 --> 00:14:14,840 If we don't get a base case, we write our own base case, 322 00:14:14,840 --> 00:14:16,700 which is if you plug in a constant, 323 00:14:16,700 --> 00:14:18,950 you're going to get a constant. 324 00:14:18,950 --> 00:14:21,580 And since we're told that this guy is a constant, 325 00:14:21,580 --> 00:14:24,405 that's a pretty good hint that we want to get to it. 326 00:14:24,405 --> 00:14:29,994 327 00:14:29,994 --> 00:14:31,410 Let's see how we're doing on time. 328 00:14:31,410 --> 00:14:34,080 Good. 329 00:14:34,080 --> 00:14:37,740 Ready to move on to the next problem? 330 00:14:37,740 --> 00:14:38,832 Let's do a fun one. 331 00:14:38,832 --> 00:14:41,040 Some people might remember it from elementary school, 332 00:14:41,040 --> 00:14:44,450 but this time, we're going to look at it with our 6.006 eyes. 333 00:14:44,450 --> 00:14:50,850 So suppose you have m coins, gold coins. 334 00:14:50,850 --> 00:14:51,840 One of them is face. 335 00:14:51,840 --> 00:14:54,340 The fake one is super light because it's not real gold. 336 00:14:54,340 --> 00:14:57,380 It's something that looks like gold. 337 00:14:57,380 --> 00:15:03,210 And we have a scale, and the scale is super accurate. 338 00:15:03,210 --> 00:15:05,950 It can weigh any coins on either side 339 00:15:05,950 --> 00:15:08,640 and tell us which side is heavier. 340 00:15:08,640 --> 00:15:10,830 Perfect accuracy, no need to worry about errors. 341 00:15:10,830 --> 00:15:13,690 342 00:15:13,690 --> 00:15:16,550 I want to find out which coin is the bad coin. 343 00:15:16,550 --> 00:15:19,410 What is the minimum number of experiments I have to do? 344 00:15:19,410 --> 00:15:22,440 345 00:15:22,440 --> 00:15:24,880 So there is a strategy, and we can worry about that later, 346 00:15:24,880 --> 00:15:28,790 but using 6.006, what is the minimum number of experiments 347 00:15:28,790 --> 00:15:30,284 I have to do? 348 00:15:30,284 --> 00:15:31,280 AUDIENCE: Log N times. 349 00:15:31,280 --> 00:15:35,770 350 00:15:35,770 --> 00:15:36,710 PROFESSOR: Not quite. 351 00:15:36,710 --> 00:15:38,280 So this is what you think is, and you 352 00:15:38,280 --> 00:15:41,620 can do log n with binary search, right? 353 00:15:41,620 --> 00:15:45,270 The problem with binary search is if I put half of my coins 354 00:15:45,270 --> 00:15:47,550 on the left, half of my coins on the right, 355 00:15:47,550 --> 00:15:49,960 one side is going to be heavier, right? 356 00:15:49,960 --> 00:15:53,430 So the answers are going to be this or this, 357 00:15:53,430 --> 00:15:55,610 but I never get this. 358 00:15:55,610 --> 00:15:58,750 I only get one bit of information 359 00:15:58,750 --> 00:16:00,760 instead of getting one trit. 360 00:16:00,760 --> 00:16:03,250 A trit is a base three digit. 361 00:16:03,250 --> 00:16:07,952 How many bits of information in a trit? 362 00:16:07,952 --> 00:16:10,362 AUDIENCE: One and a half. 363 00:16:10,362 --> 00:16:12,290 PROFESSOR: Roughly. 364 00:16:12,290 --> 00:16:13,712 AUDIENCE: Log 3. 365 00:16:13,712 --> 00:16:14,420 PROFESSOR: Log 3. 366 00:16:14,420 --> 00:16:16,211 And we know that it's base 2 because that's 367 00:16:16,211 --> 00:16:17,720 what we use in CS. 368 00:16:17,720 --> 00:16:21,790 So we're discarding a fractional bit of information 369 00:16:21,790 --> 00:16:25,046 if we're not allowing for this to happen. 370 00:16:25,046 --> 00:16:31,550 371 00:16:31,550 --> 00:16:33,254 Anyone want to try something else? 372 00:16:33,254 --> 00:16:34,670 We have to prove this, by the way. 373 00:16:34,670 --> 00:16:36,753 We have to prove the minimum that we come up with. 374 00:16:36,753 --> 00:16:40,176 375 00:16:40,176 --> 00:16:42,064 AUDIENCE: You could just do it coin by coin, 376 00:16:42,064 --> 00:16:43,490 but that would take forever. 377 00:16:43,490 --> 00:16:45,010 PROFESSOR: That's N. That's worse. 378 00:16:45,010 --> 00:16:50,620 AUDIENCE: How about log base 4 N or something like that? 379 00:16:50,620 --> 00:16:53,020 AUDIENCE: Can you explain to me why we can't just 380 00:16:53,020 --> 00:16:54,190 do binary search? 381 00:16:54,190 --> 00:16:55,140 PROFESSOR: We can. 382 00:16:55,140 --> 00:16:58,030 It's definitely going to give us the correct answer, 383 00:16:58,030 --> 00:17:00,190 but it's not the minimum number of weighings 384 00:17:00,190 --> 00:17:03,570 because we're discarding a possible answer. 385 00:17:03,570 --> 00:17:05,319 So if you do binary search, you will never 386 00:17:05,319 --> 00:17:07,470 get that the two sides are equal. 387 00:17:07,470 --> 00:17:08,700 AUDIENCE: Log base 3. 388 00:17:08,700 --> 00:17:10,560 PROFESSOR: Log base 3 would be better 389 00:17:10,560 --> 00:17:12,829 because we have three choices all the time. 390 00:17:12,829 --> 00:17:13,980 Let's prove that. 391 00:17:13,980 --> 00:17:17,190 So the right answer happens to be log base 3 of N. Let's see 392 00:17:17,190 --> 00:17:19,880 how we would get it aside from guessing. 393 00:17:19,880 --> 00:17:22,850 AUDIENCE: So you divide it into thirds 394 00:17:22,850 --> 00:17:26,810 and compare one third and one third, and if they're equal, 395 00:17:26,810 --> 00:17:29,285 then the light one is in the other third. 396 00:17:29,285 --> 00:17:31,265 And if they're not, [INAUDIBLE] light one. 397 00:17:31,265 --> 00:17:34,937 Then you just keep dividing by 3. 398 00:17:34,937 --> 00:17:35,520 PROFESSOR: OK. 399 00:17:35,520 --> 00:17:36,760 So that's the strategy. 400 00:17:36,760 --> 00:17:38,250 What if I don't know the strategy? 401 00:17:38,250 --> 00:17:40,166 How do I do this without knowing the strategy? 402 00:17:40,166 --> 00:17:42,668 403 00:17:42,668 --> 00:17:46,014 AUDIENCE: What if the number of coins isn't divisible by 3? 404 00:17:46,014 --> 00:17:49,360 405 00:17:49,360 --> 00:17:52,186 PROFESSOR: Math people. 406 00:17:52,186 --> 00:17:56,520 AUDIENCE: Yeah, but then how do you-- OK, never mind. 407 00:17:56,520 --> 00:17:59,134 AUDIENCE: Just take the two extra coins and toss them out. 408 00:17:59,134 --> 00:18:02,220 409 00:18:02,220 --> 00:18:03,810 PROFESSOR: If it's not divisible by 3, 410 00:18:03,810 --> 00:18:07,540 you add fake coins that are good. 411 00:18:07,540 --> 00:18:11,050 I mean, you use good coins. 412 00:18:11,050 --> 00:18:13,500 But we're not worried about the strategy. 413 00:18:13,500 --> 00:18:15,240 I want us to think of a lower bound. 414 00:18:15,240 --> 00:18:17,470 This is a lower bound for an algorithm, right? 415 00:18:17,470 --> 00:18:23,280 You cannot do better than log 3 N experiments. 416 00:18:23,280 --> 00:18:27,247 Does the word "lower bound" ring any bells? 417 00:18:27,247 --> 00:18:29,580 Is there any lecture where we talked about lower bounds? 418 00:18:29,580 --> 00:18:37,070 419 00:18:37,070 --> 00:18:39,830 So if you sort and you're using a comparison model, 420 00:18:39,830 --> 00:18:40,970 what's the best you can do? 421 00:18:40,970 --> 00:18:44,260 422 00:18:44,260 --> 00:18:46,140 AUDIENCE: N log N. 423 00:18:46,140 --> 00:18:48,700 PROFESSOR: N log N. Good. 424 00:18:48,700 --> 00:18:58,010 So sorting using CMP, the comparison model, is N log N. 425 00:18:58,010 --> 00:18:59,110 How did we prove that? 426 00:18:59,110 --> 00:19:06,360 427 00:19:06,360 --> 00:19:07,300 One word. 428 00:19:07,300 --> 00:19:08,080 Well, two words. 429 00:19:08,080 --> 00:19:12,160 430 00:19:12,160 --> 00:19:13,280 Decision trees. 431 00:19:13,280 --> 00:19:16,140 Does anyone remember what decision trees are? 432 00:19:16,140 --> 00:19:17,086 One person. 433 00:19:17,086 --> 00:19:19,107 AUDIENCE: It's just a comparison thing, right? 434 00:19:19,107 --> 00:19:20,940 You're like, is it greater, is it less than, 435 00:19:20,940 --> 00:19:22,315 or is there some sort of question 436 00:19:22,315 --> 00:19:24,384 you're asking about each key. 437 00:19:24,384 --> 00:19:25,050 PROFESSOR: Cool. 438 00:19:25,050 --> 00:19:27,590 Let's go over that a little bit. 439 00:19:27,590 --> 00:19:31,790 No matter what your algorithm is, 440 00:19:31,790 --> 00:19:34,000 it's going to weigh some coins and it's 441 00:19:34,000 --> 00:19:36,310 going to get an answer from the scale. 442 00:19:36,310 --> 00:19:39,350 And then based on that, it's going to weigh some other coins 443 00:19:39,350 --> 00:19:42,120 and get some answer from the scale. 444 00:19:42,120 --> 00:19:44,510 And it will do some experiments and then 445 00:19:44,510 --> 00:19:46,540 it will give you an answer. 446 00:19:46,540 --> 00:19:50,300 So if you draw a decision tree, it would look like this. 447 00:19:50,300 --> 00:19:52,670 First, we start with 0 information. 448 00:19:52,670 --> 00:19:54,590 We weigh some coins. 449 00:19:54,590 --> 00:19:59,420 Based on that, we have three possible answers-- 450 00:19:59,420 --> 00:20:03,910 smaller, equal, greater. 451 00:20:03,910 --> 00:20:07,260 Now, if we're here, we're going to do another experiment. 452 00:20:07,260 --> 00:20:10,910 Three possible answers. 453 00:20:10,910 --> 00:20:13,350 If we're here, another experiment, three possible 454 00:20:13,350 --> 00:20:13,850 answers. 455 00:20:13,850 --> 00:20:16,770 If we're here, another experiment, three possible 456 00:20:16,770 --> 00:20:19,420 answers. 457 00:20:19,420 --> 00:20:21,830 Say we do a third experiment. 458 00:20:21,830 --> 00:20:24,641 One, two, three, one, two, three, one, two, three, one, 459 00:20:24,641 --> 00:20:27,863 two, three, one, two, three, one, two, three, one, two, 460 00:20:27,863 --> 00:20:30,760 three, one, two, three, one, two, three. 461 00:20:30,760 --> 00:20:33,160 And then suppose we stop. 462 00:20:33,160 --> 00:20:35,620 If we stop, we have to give an answer. 463 00:20:35,620 --> 00:20:37,530 So this is an answer, this is an answer, 464 00:20:37,530 --> 00:20:44,880 this is an answer, answer, answer, answer, answer, answer. 465 00:20:44,880 --> 00:20:47,100 So how many answers do I have at the bottom 466 00:20:47,100 --> 00:20:50,684 if I have three levels? 467 00:20:50,684 --> 00:20:52,600 Here I have three experiments, so three levels 468 00:20:52,600 --> 00:20:53,474 in the decision tree. 469 00:20:53,474 --> 00:20:56,160 How many answers? 470 00:20:56,160 --> 00:20:58,080 AUDIENCE: [INAUDIBLE]. 471 00:20:58,080 --> 00:21:00,260 PROFESSOR: 3 to the third because I start three 472 00:21:00,260 --> 00:21:03,580 at the first level, nine at the second, 27 at the third. 473 00:21:03,580 --> 00:21:05,050 Each time, I multiply by 3. 474 00:21:05,050 --> 00:21:08,490 475 00:21:08,490 --> 00:21:15,030 So if I do three weighings, I can give at most 27 answers. 476 00:21:15,030 --> 00:21:18,790 If I have more than 27 coins, I can't possibly 477 00:21:18,790 --> 00:21:24,410 decide which one is bad because say if I have 30 coins, 478 00:21:24,410 --> 00:21:27,400 then I need to be able to give out 30 answers. 479 00:21:27,400 --> 00:21:29,810 My algorithm has to have a place where 480 00:21:29,810 --> 00:21:33,020 it says the bad coin is coin one, coin two, coin three, all 481 00:21:33,020 --> 00:21:34,460 the way to coin 30. 482 00:21:34,460 --> 00:21:36,960 Here I only have 27 possible answers, 483 00:21:36,960 --> 00:21:39,297 so this isn't going to cut it for 30 coins. 484 00:21:39,297 --> 00:21:41,880 I need to do one more comparison so that I have a deeper tree. 485 00:21:41,880 --> 00:21:44,560 486 00:21:44,560 --> 00:21:49,230 So suppose I have h comparisons instead. 487 00:21:49,230 --> 00:21:51,500 How many leaves? 488 00:21:51,500 --> 00:21:53,464 How many possible answers? 489 00:21:53,464 --> 00:21:55,900 AUDIENCE: h to the third. 490 00:21:55,900 --> 00:21:56,663 PROFESSOR: Almost. 491 00:21:56,663 --> 00:21:57,538 AUDIENCE: 3 to the h. 492 00:21:57,538 --> 00:22:00,380 PROFESSOR: 3 to the h. 493 00:22:00,380 --> 00:22:07,370 So 3 multiplied by 3 multiplied by 3 multiplied by 3 h times, 494 00:22:07,370 --> 00:22:09,210 so 3 to the h. 495 00:22:09,210 --> 00:22:12,210 It's no longer equal to 27. 496 00:22:12,210 --> 00:22:14,425 3 to the h is the number of possible answers. 497 00:22:14,425 --> 00:22:17,700 This has to be bigger or equal to N. Otherwise, 498 00:22:17,700 --> 00:22:18,885 the algorithm is incorrect. 499 00:22:18,885 --> 00:22:21,830 500 00:22:21,830 --> 00:22:23,210 So what can we say about h? 501 00:22:23,210 --> 00:22:28,261 502 00:22:28,261 --> 00:22:29,177 AUDIENCE: [INAUDIBLE]. 503 00:22:29,177 --> 00:22:43,370 504 00:22:43,370 --> 00:22:45,740 PROFESSOR: We did all this without even thinking 505 00:22:45,740 --> 00:22:47,940 of what an algorithm would look like. 506 00:22:47,940 --> 00:22:49,330 This works for any algorithm. 507 00:22:49,330 --> 00:22:52,820 No matter how smart you are, no matter how much math you know, 508 00:22:52,820 --> 00:22:55,940 your algorithm is going to be bound by this. 509 00:22:55,940 --> 00:22:59,071 So the fact that the answer looks like this 510 00:22:59,071 --> 00:23:01,320 gives you some intuition for how to solve the problem. 511 00:23:01,320 --> 00:23:03,445 If you want to solve the problem now and figure out 512 00:23:03,445 --> 00:23:06,600 the strategy, you know that you have a 3 here. 513 00:23:06,600 --> 00:23:08,430 So if you divide into 2 every time, 514 00:23:08,430 --> 00:23:11,820 you're not going to get to the right limit. 515 00:23:11,820 --> 00:23:14,660 So first you do this, you get a lower bound, 516 00:23:14,660 --> 00:23:18,190 and then you use your intuition to figure out 517 00:23:18,190 --> 00:23:20,110 what the lower bound means. 518 00:23:20,110 --> 00:23:24,170 In this case, it would mean the strategy that we heard earlier. 519 00:23:24,170 --> 00:23:25,750 You have to divide into 3 every time 520 00:23:25,750 --> 00:23:28,810 and then figure out what you do based on the comparison. 521 00:23:28,810 --> 00:23:32,790 So your answer works perfectly once we have this. 522 00:23:32,790 --> 00:23:34,389 And also, once we have this, you know 523 00:23:34,389 --> 00:23:36,430 that your answer is correct because it's optimal. 524 00:23:36,430 --> 00:23:37,680 You can't do better than that. 525 00:23:37,680 --> 00:23:40,007 526 00:23:40,007 --> 00:23:41,340 Any questions on decision trees? 527 00:23:41,340 --> 00:23:44,590 528 00:23:44,590 --> 00:23:47,420 So lower bounds are a boring topic in general. 529 00:23:47,420 --> 00:23:48,770 They tell you what you can't do. 530 00:23:48,770 --> 00:23:51,600 They don't tell you anything useful about what you can do. 531 00:23:51,600 --> 00:23:53,900 In some cases, being able to reason about a lower bound 532 00:23:53,900 --> 00:23:55,274 gives you a hint of the solution. 533 00:23:55,274 --> 00:24:01,730 534 00:24:01,730 --> 00:24:02,230 New problem. 535 00:24:02,230 --> 00:24:23,340 536 00:24:23,340 --> 00:24:26,380 Suppose we have a 2D map. 537 00:24:26,380 --> 00:24:29,080 There's a hill, and you take a satellite picture of it 538 00:24:29,080 --> 00:24:33,930 at night, and you get a picture with bright pixels and not 539 00:24:33,930 --> 00:24:35,070 bright pixels. 540 00:24:35,070 --> 00:24:39,540 There are numbers showing how bright your pixels are. 541 00:24:39,540 --> 00:24:46,530 1, 2, 1, 2, 3, 0, 0. 542 00:24:46,530 --> 00:24:51,645 I'm going to draw out an example so we can use our intuition. 543 00:24:51,645 --> 00:24:55,338 0, 0, 1. 544 00:24:55,338 --> 00:25:13,090 545 00:25:13,090 --> 00:25:14,680 So suppose this is our map. 546 00:25:14,680 --> 00:25:21,660 It's W times H-- W of what these are, 547 00:25:21,660 --> 00:25:25,860 I think they're columns, and H of the other ones. 548 00:25:25,860 --> 00:25:28,080 And you want to find a certain picture inside it. 549 00:25:28,080 --> 00:25:31,210 You want to see how many times does a certain pattern show up. 550 00:25:31,210 --> 00:25:35,470 Say the pattern is small w times small h, 551 00:25:35,470 --> 00:25:37,740 and it looks like this. 552 00:25:37,740 --> 00:25:40,980 553 00:25:40,980 --> 00:25:43,374 But this will be the input to your problem, 554 00:25:43,374 --> 00:25:44,790 so the pattern might be different. 555 00:25:44,790 --> 00:25:46,660 You can't hard code this in. 556 00:25:46,660 --> 00:25:48,290 And this is useful. 557 00:25:48,290 --> 00:25:50,250 This problem is called a bunker hill problem. 558 00:25:50,250 --> 00:25:53,429 This is a hill, and this is a bunker. 559 00:25:53,429 --> 00:25:54,720 You take a picture of the hill. 560 00:25:54,720 --> 00:25:55,960 You want to know where the bunkers are 561 00:25:55,960 --> 00:25:58,670 so you can bomb them at night so then you can attack the place. 562 00:25:58,670 --> 00:25:59,660 AUDIENCE: That's awful. 563 00:25:59,660 --> 00:26:00,645 PROFESSOR: Thank you. 564 00:26:00,645 --> 00:26:05,670 I'll take that as a compliment So a nice way of solving this? 565 00:26:05,670 --> 00:26:09,963 566 00:26:09,963 --> 00:26:13,090 AUDIENCE: You could just go through each row, 567 00:26:13,090 --> 00:26:15,950 and then look for a match for the first row, and then-- 568 00:26:15,950 --> 00:26:17,212 PROFESSOR: Yep. 569 00:26:17,212 --> 00:26:18,559 Is this a match? 570 00:26:18,559 --> 00:26:19,642 That's what you're saying. 571 00:26:19,642 --> 00:26:20,128 AUDIENCE: Yeah. 572 00:26:20,128 --> 00:26:21,025 We can see that's a match. 573 00:26:21,025 --> 00:26:21,320 PROFESSOR: Is this a match? 574 00:26:21,320 --> 00:26:22,360 Is this a match? 575 00:26:22,360 --> 00:26:24,920 By the way, this is a match. 576 00:26:24,920 --> 00:26:29,960 This is not a match, this is not a match, this is not a match, 577 00:26:29,960 --> 00:26:30,970 this is not a match. 578 00:26:30,970 --> 00:26:32,200 Now we go down here. 579 00:26:32,200 --> 00:26:34,560 This is not a match, this is not a match, 580 00:26:34,560 --> 00:26:37,242 this is not a match, so on and so forth. 581 00:26:37,242 --> 00:26:39,075 AUDIENCE: That wasn't what I was suggesting, 582 00:26:39,075 --> 00:26:41,422 but that's a good idea. 583 00:26:41,422 --> 00:26:43,630 PROFESSOR: Maybe you're suggesting something smarter, 584 00:26:43,630 --> 00:26:45,296 and I don't want to let you do something 585 00:26:45,296 --> 00:26:47,932 smarter so that we look at the brute force approach first. 586 00:26:47,932 --> 00:26:49,390 AUDIENCE: I mean, I was just saying 587 00:26:49,390 --> 00:26:51,200 take the first row of your bunker, 588 00:26:51,200 --> 00:26:53,150 and then compare it to other rows, 589 00:26:53,150 --> 00:26:56,787 and once you hit that, then check and see if the rest of-- 590 00:26:56,787 --> 00:26:58,370 PROFESSOR: Yeah, that's a bit smarter, 591 00:26:58,370 --> 00:27:00,300 so that's harder to reason about. 592 00:27:00,300 --> 00:27:04,623 Let's take this one and figure out the running time of it. 593 00:27:04,623 --> 00:27:07,654 AUDIENCE: Does that mean even if you know it's not a match, 594 00:27:07,654 --> 00:27:10,114 you keep checking all nine of them? 595 00:27:10,114 --> 00:27:10,780 PROFESSOR: Yeah. 596 00:27:10,780 --> 00:27:12,500 Say at the worst case, you only find out 597 00:27:12,500 --> 00:27:14,232 it's not a match all the way at the end. 598 00:27:14,232 --> 00:27:16,895 599 00:27:16,895 --> 00:27:19,436 AUDIENCE: Are you trying to look for all matches or just one? 600 00:27:19,436 --> 00:27:22,560 PROFESSOR: All matches. 601 00:27:22,560 --> 00:27:25,155 AUDIENCE: You're limited to n squared time almost no matter 602 00:27:25,155 --> 00:27:26,755 what, right? 603 00:27:26,755 --> 00:27:29,270 If you have a small bunker in a large field, 604 00:27:29,270 --> 00:27:32,210 you have to hit the small bunker every time. 605 00:27:32,210 --> 00:27:34,720 PROFESSOR: Are you going to solve the problem for me? 606 00:27:34,720 --> 00:27:37,337 AUDIENCE: Are we trying to find the [INAUDIBLE]? 607 00:27:37,337 --> 00:27:37,920 PROFESSOR: No. 608 00:27:37,920 --> 00:27:39,697 We're trying to find out the running 609 00:27:39,697 --> 00:27:42,420 time for the dumb algorithm first. 610 00:27:42,420 --> 00:27:44,150 Humor me and let's solve this first, 611 00:27:44,150 --> 00:27:46,870 and then let's get to the efficient algorithm, OK? 612 00:27:46,870 --> 00:27:50,580 AUDIENCE: Big W minus small w plus 1 times big H 613 00:27:50,580 --> 00:27:52,080 minus small h plus 1. 614 00:27:52,080 --> 00:27:55,080 615 00:27:55,080 --> 00:27:57,080 AUDIENCE: Where'd you get plus 1 from? 616 00:27:57,080 --> 00:27:59,582 617 00:27:59,582 --> 00:28:01,064 AUDIENCE: So it's WH? 618 00:28:01,064 --> 00:28:02,101 AUDIENCE: Yeah. 619 00:28:02,101 --> 00:28:04,100 PROFESSOR: Well, there's something missing here. 620 00:28:04,100 --> 00:28:08,250 This is how many positions I have that I have to look at. 621 00:28:08,250 --> 00:28:11,014 How much time does it take to compare the small images? 622 00:28:11,014 --> 00:28:11,930 AUDIENCE: [INAUDIBLE]. 623 00:28:11,930 --> 00:28:15,860 624 00:28:15,860 --> 00:28:18,800 PROFESSOR: This is smaller than wh, which is the input size, 625 00:28:18,800 --> 00:28:21,120 so it's scary if you have an algorithm that runs faster 626 00:28:21,120 --> 00:28:24,010 than the input size because it means 627 00:28:24,010 --> 00:28:25,926 you're not looking at all the input. 628 00:28:25,926 --> 00:28:34,641 629 00:28:34,641 --> 00:28:36,640 So this is definitely bigger than the input size 630 00:28:36,640 --> 00:28:38,530 once we add the w times h here. 631 00:28:38,530 --> 00:28:39,555 Don't forget this guy. 632 00:28:39,555 --> 00:28:42,590 633 00:28:42,590 --> 00:28:48,700 This is the naive algorithm, and if we discard the small order 634 00:28:48,700 --> 00:28:51,012 factors, we get that this is order of WHwh. 635 00:28:51,012 --> 00:28:53,950 636 00:28:53,950 --> 00:28:56,420 How can you do better? 637 00:28:56,420 --> 00:28:59,240 You have the answer, right? 638 00:28:59,240 --> 00:29:01,850 Let's let everyone else think for a minute, 639 00:29:01,850 --> 00:29:04,300 and then you can give me the answer if you want, 640 00:29:04,300 --> 00:29:06,620 or someone else can give me the answer. 641 00:29:06,620 --> 00:29:08,955 642 00:29:08,955 --> 00:29:11,079 I guess you should because you thought of it first. 643 00:29:11,079 --> 00:29:17,070 644 00:29:17,070 --> 00:29:17,780 Any ideas? 645 00:29:17,780 --> 00:29:23,620 646 00:29:23,620 --> 00:29:28,010 So you're thinking about the input size, right? 647 00:29:28,010 --> 00:29:32,160 Someone was thinking about the input size. 648 00:29:32,160 --> 00:29:34,680 So the input size is W times H, right? 649 00:29:34,680 --> 00:29:36,519 So if I have an algorithm that's W times H, 650 00:29:36,519 --> 00:29:38,810 that's optimal because it has to look at all the input. 651 00:29:38,810 --> 00:29:42,016 652 00:29:42,016 --> 00:29:43,890 Well, we're going to have an algorithm that's 653 00:29:43,890 --> 00:29:47,990 W times H, so with that out of the way, 654 00:29:47,990 --> 00:29:51,000 does that inspire anyone as to what the solution is? 655 00:29:51,000 --> 00:29:58,485 656 00:29:58,485 --> 00:30:00,510 AUDIENCE: Do that thing that I was saying, 657 00:30:00,510 --> 00:30:03,704 just take the first row, but then you still have a W term. 658 00:30:03,704 --> 00:30:04,370 PROFESSOR: Yeah. 659 00:30:04,370 --> 00:30:07,470 So let's make it better. 660 00:30:07,470 --> 00:30:09,250 It is the correct intuition. 661 00:30:09,250 --> 00:30:11,920 Now try to use a trick we learned in lecture to make that 662 00:30:11,920 --> 00:30:12,420 faster. 663 00:30:12,420 --> 00:30:17,233 664 00:30:17,233 --> 00:30:18,816 AUDIENCE: Just use the top left corner 665 00:30:18,816 --> 00:30:21,527 instead of the whole row. 666 00:30:21,527 --> 00:30:22,110 PROFESSOR: OK. 667 00:30:22,110 --> 00:30:24,110 So we could use the top left corner, 668 00:30:24,110 --> 00:30:26,590 and if the top left corner doesn't match, 669 00:30:26,590 --> 00:30:30,630 then we don't have to check for matches. 670 00:30:30,630 --> 00:30:33,300 So this works for reasonably random data. 671 00:30:33,300 --> 00:30:35,630 As long as we don't have a lot of false positives, 672 00:30:35,630 --> 00:30:38,040 we're going to run fast. 673 00:30:38,040 --> 00:30:43,450 Now, the top corner of this one, if the map 674 00:30:43,450 --> 00:30:47,820 has a lot of 1's and then some 2's sprinkled all over it, 675 00:30:47,820 --> 00:30:51,370 most of the time, we'll have to go through the whole image 676 00:30:51,370 --> 00:30:54,190 so we're going to have a lot of false positives. 677 00:30:54,190 --> 00:30:59,180 How do we make our false positive rate go down? 678 00:30:59,180 --> 00:31:01,680 AUDIENCE: Looks kind of like a rolling hash problem. 679 00:31:01,680 --> 00:31:04,580 PROFESSOR: Looks like a rolling hash problem, exactly. 680 00:31:04,580 --> 00:31:07,147 Let's see if we can use rolling hashes. 681 00:31:07,147 --> 00:31:09,230 AUDIENCE: But then you still have that lowercase w 682 00:31:09,230 --> 00:31:11,022 term, though. 683 00:31:11,022 --> 00:31:12,480 PROFESSOR: How do we get rid of it? 684 00:31:12,480 --> 00:31:15,204 685 00:31:15,204 --> 00:31:16,182 AUDIENCE: [INAUDIBLE]. 686 00:31:16,182 --> 00:31:16,890 PROFESSOR: Sorry? 687 00:31:16,890 --> 00:31:21,633 688 00:31:21,633 --> 00:31:22,758 AUDIENCE: Wouldn't it be w? 689 00:31:22,758 --> 00:31:25,936 I mean the running time if we were just going through one row 690 00:31:25,936 --> 00:31:29,115 would be big W minus little w, times-- 691 00:31:29,115 --> 00:31:31,534 692 00:31:31,534 --> 00:31:33,200 PROFESSOR: So where's your rolling hash? 693 00:31:33,200 --> 00:31:37,768 694 00:31:37,768 --> 00:31:40,143 AUDIENCE: I guess you can use the entire thing as a hash, 695 00:31:40,143 --> 00:31:40,643 too. 696 00:31:40,643 --> 00:31:42,970 That would kind of work. 697 00:31:42,970 --> 00:31:45,330 PROFESSOR: So we want a hash for the whole thing. 698 00:31:45,330 --> 00:31:50,928 Instead of using this as the hash, we want a smarter hash. 699 00:31:50,928 --> 00:31:53,880 AUDIENCE: It's the entire thing, and then 700 00:31:53,880 --> 00:31:58,308 as you move to the right, you can add those and subtract, 701 00:31:58,308 --> 00:32:01,487 and compare that with the hash. 702 00:32:01,487 --> 00:32:02,070 PROFESSOR: OK. 703 00:32:02,070 --> 00:32:05,510 So we'd have a rolling hash that has everything in here, 704 00:32:05,510 --> 00:32:08,640 and then as I move to the right, I add these guys 705 00:32:08,640 --> 00:32:09,710 and I remove these guys. 706 00:32:09,710 --> 00:32:12,730 707 00:32:12,730 --> 00:32:19,400 This is big W times big H, roughly, times small h 708 00:32:19,400 --> 00:32:21,470 because every time I move to the right, 709 00:32:21,470 --> 00:32:25,090 I have to do order h work. 710 00:32:25,090 --> 00:32:32,260 So I'm down from this thing to order of WHh, 711 00:32:32,260 --> 00:32:33,590 So it's better. 712 00:32:33,590 --> 00:32:34,590 It's one step forward. 713 00:32:34,590 --> 00:32:35,990 Now, let's make this even faster. 714 00:32:35,990 --> 00:32:47,420 715 00:32:47,420 --> 00:32:53,810 What if I could do this in order 1 instead of order h? 716 00:32:53,810 --> 00:32:57,790 717 00:32:57,790 --> 00:33:00,114 How would I do this in order 1? 718 00:33:00,114 --> 00:33:03,450 AUDIENCE: You'd have to compress all the rows, 719 00:33:03,450 --> 00:33:05,690 and then take the hash of each column. 720 00:33:05,690 --> 00:33:08,662 PROFESSOR: How would we compress them? 721 00:33:08,662 --> 00:33:12,050 AUDIENCE: Take the hash of the column. 722 00:33:12,050 --> 00:33:13,830 PROFESSOR: You want to compress the rows? 723 00:33:13,830 --> 00:33:14,413 AUDIENCE: Yes. 724 00:33:14,413 --> 00:33:16,705 You divide it-- 725 00:33:16,705 --> 00:33:18,330 PROFESSOR: Let's not compress the rows. 726 00:33:18,330 --> 00:33:23,970 727 00:33:23,970 --> 00:33:26,050 AUDIENCE: You could take just your bunker, 728 00:33:26,050 --> 00:33:29,980 and then figure out the hashes of the three columns, 729 00:33:29,980 --> 00:33:33,290 and just run through like that. 730 00:33:33,290 --> 00:33:36,146 You'd still have to access each of those items. 731 00:33:36,146 --> 00:33:39,800 I don't really see how it's faster. 732 00:33:39,800 --> 00:33:41,380 I guess it's less, though. 733 00:33:41,380 --> 00:33:42,340 It's less. 734 00:33:42,340 --> 00:33:43,300 Maybe it's only 1. 735 00:33:43,300 --> 00:33:47,630 736 00:33:47,630 --> 00:33:53,970 AUDIENCE: So do you want to hash each little column [INAUDIBLE]? 737 00:33:53,970 --> 00:34:01,200 738 00:34:01,200 --> 00:34:03,950 PROFESSOR: So we're going to hash all these guys, 739 00:34:03,950 --> 00:34:06,340 and then we're going to have hashes for them, 740 00:34:06,340 --> 00:34:10,590 and we're going to do the regular Rabin-Karp 741 00:34:10,590 --> 00:34:13,420 for the hashes. 742 00:34:13,420 --> 00:34:16,478 Now, what happens when I go down? 743 00:34:16,478 --> 00:34:18,840 PROFESSOR: You have to recompute everything. 744 00:34:18,840 --> 00:34:20,024 PROFESSOR: Let's do better than recompute everything. 745 00:34:20,024 --> 00:34:21,241 AUDIENCE: Do you want to [INAUDIBLE] downward 746 00:34:21,241 --> 00:34:21,540 on each column? 747 00:34:21,540 --> 00:34:22,165 PROFESSOR: Yep. 748 00:34:22,165 --> 00:34:23,489 Rolling hash. 749 00:34:23,489 --> 00:34:28,199 I want to make this faster, so I have big W hashes. 750 00:34:28,199 --> 00:34:30,610 They're all little h inside. 751 00:34:30,610 --> 00:34:32,360 Here, I have to compute them brute force. 752 00:34:32,360 --> 00:34:33,485 I can't do anything better. 753 00:34:33,485 --> 00:34:36,250 But when I go from here to here, there's 754 00:34:36,250 --> 00:34:39,394 only one element going out and one element going in. 755 00:34:39,394 --> 00:34:42,070 756 00:34:42,070 --> 00:34:43,310 Same for all these guys. 757 00:34:43,310 --> 00:34:46,080 Let's not make the picture uglier than it needs to be. 758 00:34:46,080 --> 00:34:48,960 So I have big W rolling hashes. 759 00:34:48,960 --> 00:34:51,340 They're vertical rolling hashes. 760 00:34:51,340 --> 00:34:59,310 And then the rolling hashes hash columns, so my the sliding 761 00:34:59,310 --> 00:35:03,230 window that I have is little w rolling hashes. 762 00:35:03,230 --> 00:35:10,320 Each rolling hash is little h in size, so it's a hash of hashes. 763 00:35:10,320 --> 00:35:12,000 It's nested hashes. 764 00:35:12,000 --> 00:35:14,410 And then when I go down, I only have 765 00:35:14,410 --> 00:35:17,950 to roll down each of the rolling hashes by 1, 766 00:35:17,950 --> 00:35:19,300 so that's constant time. 767 00:35:19,300 --> 00:35:24,630 So to go from here to here, to the slide the window one down, 768 00:35:24,630 --> 00:35:27,330 I have to roll this hash down, roll this one, 769 00:35:27,330 --> 00:35:29,640 this one, this one, this one, this one, and all of them 770 00:35:29,640 --> 00:35:31,640 roll down in constant time. 771 00:35:31,640 --> 00:35:36,640 So when I'm adding a column to the hash, say I'm here 772 00:35:36,640 --> 00:35:40,380 and I want to go here, I roll down this hash 773 00:35:40,380 --> 00:35:42,560 and I have the answer. 774 00:35:42,560 --> 00:35:43,210 It's order 1. 775 00:35:43,210 --> 00:35:45,360 I'm adding it in order 1. 776 00:35:45,360 --> 00:35:47,726 777 00:35:47,726 --> 00:35:48,600 Does this make sense? 778 00:35:48,600 --> 00:35:55,087 779 00:35:55,087 --> 00:35:58,572 AUDIENCE: It's tricky. 780 00:35:58,572 --> 00:36:00,196 PROFESSOR: But it's not too bad, right? 781 00:36:00,196 --> 00:36:02,770 AUDIENCE: You just need to do the vertical roll first, right? 782 00:36:02,770 --> 00:36:03,395 PROFESSOR: Yep. 783 00:36:03,395 --> 00:36:05,880 784 00:36:05,880 --> 00:36:07,890 To have the simplest possible code, 785 00:36:07,890 --> 00:36:12,520 you start with big W rolling hashes, do 1D Rabin-Karp, 786 00:36:12,520 --> 00:36:17,267 you roll everything down, 1D Rabin-Karp, 787 00:36:17,267 --> 00:36:18,100 and keep doing that. 788 00:36:18,100 --> 00:36:20,810 789 00:36:20,810 --> 00:36:23,372 OK What's the running time for this? 790 00:36:23,372 --> 00:36:24,410 AUDIENCE: WH. 791 00:36:24,410 --> 00:36:25,012 PROFESSOR: WH. 792 00:36:25,012 --> 00:36:27,595 AUDIENCE: Does this one have a space complexity about W, then, 793 00:36:27,595 --> 00:36:28,780 because [INAUDIBLE]? 794 00:36:28,780 --> 00:36:29,840 PROFESSOR: Yeah. 795 00:36:29,840 --> 00:36:32,500 So my memory requirement went up to 4W. 796 00:36:32,500 --> 00:36:39,850 797 00:36:39,850 --> 00:36:41,670 Is everyone happy with this? 798 00:36:41,670 --> 00:36:46,340 It's one of the few cases where an approach for solving a 1D 799 00:36:46,340 --> 00:36:48,670 problem generalizes to 2D. 800 00:36:48,670 --> 00:36:51,630 In most problems, you have to rethink the whole situation. 801 00:36:51,630 --> 00:37:02,980 802 00:37:02,980 --> 00:37:04,497 Let's do a hard problem. 803 00:37:04,497 --> 00:37:05,580 Enough with the easy ones. 804 00:37:05,580 --> 00:37:10,580 805 00:37:10,580 --> 00:37:19,240 Two lists, roughly size N, and they're both sorted. 806 00:37:19,240 --> 00:37:21,090 Let me fill them out with random numbers. 807 00:37:21,090 --> 00:37:25,010 808 00:37:25,010 --> 00:37:50,710 5, 13, 22, 43, 56, 62, 81, 86, 87, 2, 3, 7, 9, 15, 19, 809 00:37:50,710 --> 00:37:53,965 24, 28, 32. 810 00:37:53,965 --> 00:37:56,780 811 00:37:56,780 --> 00:37:58,700 So I have these lists. 812 00:37:58,700 --> 00:38:01,810 Let's be generous and say that all the numbers are different. 813 00:38:01,810 --> 00:38:03,110 They're sorted. 814 00:38:03,110 --> 00:38:09,110 I want to find the nth number, so the number with rank n, 815 00:38:09,110 --> 00:38:19,274 out of both lists as fast as possible. 816 00:38:19,274 --> 00:38:22,680 AUDIENCE: Wouldn't it just be index? 817 00:38:22,680 --> 00:38:26,470 PROFESSOR: So the thing is if, for example, n is 1, 818 00:38:26,470 --> 00:38:27,990 then it's this. 819 00:38:27,990 --> 00:38:29,140 This is the second number. 820 00:38:29,140 --> 00:38:31,884 This is the third. 821 00:38:31,884 --> 00:38:33,800 So if you take the lists and you combine them, 822 00:38:33,800 --> 00:38:36,720 then I want the result out of that. 823 00:38:36,720 --> 00:38:39,120 AUDIENCE: Let's do merge sort and the combined 824 00:38:39,120 --> 00:38:41,950 [INAUDIBLE] index, right? 825 00:38:41,950 --> 00:38:44,980 PROFESSOR: Full merge sort, N log N? 826 00:38:44,980 --> 00:38:45,480 No. 827 00:38:45,480 --> 00:38:46,870 AUDIENCE: It's already sorted, though. 828 00:38:46,870 --> 00:38:47,160 PROFESSOR: OK. 829 00:38:47,160 --> 00:38:47,610 So what do we do? 830 00:38:47,610 --> 00:38:48,260 AUDIENCE: We just do merge. 831 00:38:48,260 --> 00:38:49,560 That's just order N. 832 00:38:49,560 --> 00:38:51,182 PROFESSOR: So merge. 833 00:38:51,182 --> 00:38:53,050 AUDIENCE: Merge [INAUDIBLE]. 834 00:38:53,050 --> 00:38:54,760 PROFESSOR: OK. 835 00:38:54,760 --> 00:38:58,230 So merge and then index is the first approach, 836 00:38:58,230 --> 00:39:01,840 which is order N. Then you said run the merge algorithm, 837 00:39:01,840 --> 00:39:07,052 but stop when you get to the little nth element, 838 00:39:07,052 --> 00:39:09,010 so that's a little bit better and we don't have 839 00:39:09,010 --> 00:39:12,490 to produce an array so the space complexity is down to order 1, 840 00:39:12,490 --> 00:39:14,600 right? 841 00:39:14,600 --> 00:39:15,615 Now let's do-- 842 00:39:15,615 --> 00:39:17,170 AUDIENCE: Logarighmic times n. 843 00:39:17,170 --> 00:39:18,575 PROFESSOR: Yeah, exactly. 844 00:39:18,575 --> 00:39:19,200 This is linear. 845 00:39:19,200 --> 00:39:20,810 We have to get to logarithms. 846 00:39:20,810 --> 00:39:23,228 How do we get to logarithms? 847 00:39:23,228 --> 00:39:24,860 AUDIENCE: Do a modified binary search. 848 00:39:24,860 --> 00:39:32,070 849 00:39:32,070 --> 00:39:38,190 AUDIENCE: What if you first looked at-- actually, 850 00:39:38,190 --> 00:39:39,830 I don't know what I'm going to say. 851 00:39:39,830 --> 00:39:42,132 852 00:39:42,132 --> 00:39:43,090 PROFESSOR: Anyone else? 853 00:39:43,090 --> 00:39:46,037 854 00:39:46,037 --> 00:39:47,120 So modified binary search. 855 00:39:47,120 --> 00:39:49,380 Do you know the full answer, or do you 856 00:39:49,380 --> 00:39:51,350 want to start looking at the solution? 857 00:39:51,350 --> 00:39:53,665 AUDIENCE: I have an idea [INAUDIBLE]. 858 00:39:53,665 --> 00:39:55,290 PROFESSOR: Let's see how it would work. 859 00:39:55,290 --> 00:39:58,530 860 00:39:58,530 --> 00:40:11,530 AUDIENCE: So if we take the n over second element 861 00:40:11,530 --> 00:40:19,628 on each row, the one that is lower, 862 00:40:19,628 --> 00:40:22,600 that's at least the n over second element, 863 00:40:22,600 --> 00:40:23,725 and the one that's higher-- 864 00:40:23,725 --> 00:40:35,230 865 00:40:35,230 --> 00:40:38,415 PROFESSOR: So let's say this is our N1 and N2, 866 00:40:38,415 --> 00:40:41,330 and they're both order N initially. 867 00:40:41,330 --> 00:40:45,010 So this is N2 over 2 if this is smaller. 868 00:40:45,010 --> 00:40:48,328 869 00:40:48,328 --> 00:40:49,786 AUDIENCE: The lower one is at least 870 00:40:49,786 --> 00:40:51,790 the N over second element since everything 871 00:40:51,790 --> 00:41:03,844 before it is less than N. Does that make sense? 872 00:41:03,844 --> 00:41:04,510 PROFESSOR: Yeah. 873 00:41:04,510 --> 00:41:07,140 So this is N over 2 are greater, right? 874 00:41:07,140 --> 00:41:07,840 This guy. 875 00:41:07,840 --> 00:41:16,030 876 00:41:16,030 --> 00:41:28,859 AUDIENCE: We also know that the element above it 877 00:41:28,859 --> 00:41:35,316 is at most the nth element because it's greater than-- 878 00:41:35,316 --> 00:41:37,620 PROFESSOR: So it's at most N1 plus N2 879 00:41:37,620 --> 00:41:39,370 because that's how many you have in total, 880 00:41:39,370 --> 00:41:44,150 and the one on the top, you know that these ones are 881 00:41:44,150 --> 00:41:45,520 bigger than h, right? 882 00:41:45,520 --> 00:41:47,850 But you don't know anything about these ones, 883 00:41:47,850 --> 00:41:52,360 so it's minus N1 over 2. 884 00:41:52,360 --> 00:41:56,700 So it's at most N1 over 2 plus N2, the top element. 885 00:41:56,700 --> 00:41:59,590 886 00:41:59,590 --> 00:42:06,860 AUDIENCE: So then we can take the element three quarters 887 00:42:06,860 --> 00:42:09,992 of the way through N2 and one quarter of the way 888 00:42:09,992 --> 00:42:13,934 through N1 to do more [INAUDIBLE]. 889 00:42:13,934 --> 00:42:14,850 AUDIENCE: [INAUDIBLE]? 890 00:42:14,850 --> 00:42:17,722 891 00:42:17,722 --> 00:42:18,305 AUDIENCE: Yes. 892 00:42:18,305 --> 00:42:22,962 893 00:42:22,962 --> 00:42:24,920 PROFESSOR: Let's see what happens in each case. 894 00:42:24,920 --> 00:42:34,270 So if little n is here, then you divide. 895 00:42:34,270 --> 00:42:41,490 So if n is smaller than this, then you chop them up here 896 00:42:41,490 --> 00:42:43,860 and you've divided the problem into half. 897 00:42:43,860 --> 00:42:45,190 You're good. 898 00:42:45,190 --> 00:42:49,680 If it's bigger than this other number here, 899 00:42:49,680 --> 00:42:52,610 you've chopped the problem up and you're here. 900 00:42:52,610 --> 00:42:53,500 You're good. 901 00:42:53,500 --> 00:42:56,100 Now, the hard case seems to be when it's in between. 902 00:42:56,100 --> 00:42:57,690 So what do we do then? 903 00:42:57,690 --> 00:42:59,854 AUDIENCE: Aren't those two numbers the same? 904 00:42:59,854 --> 00:43:01,770 AUDIENCE: If it's between, take the upper half 905 00:43:01,770 --> 00:43:07,220 of the bottom one and the lower half of the upper one, right? 906 00:43:07,220 --> 00:43:09,695 If it's between 15 and 43, then you take everything 907 00:43:09,695 --> 00:43:15,375 in the upper half of N2 and you take the lower half of N1. 908 00:43:15,375 --> 00:43:16,750 AUDIENCE: Yeah, you should always 909 00:43:16,750 --> 00:43:18,662 be taking the upper half of one and the lower 910 00:43:18,662 --> 00:43:21,240 half of the other in this case. 911 00:43:21,240 --> 00:43:21,990 PROFESSOR: Really? 912 00:43:21,990 --> 00:43:26,270 913 00:43:26,270 --> 00:43:35,230 AUDIENCE: [INAUDIBLE] N2, 15 is at least 914 00:43:35,230 --> 00:43:37,330 the nth over 2 element. 915 00:43:37,330 --> 00:43:39,664 I think we're using three n's at the same time. 916 00:43:39,664 --> 00:43:42,110 AUDIENCE: Are you using the little n or the big N? 917 00:43:42,110 --> 00:43:42,776 AUDIENCE: Sorry. 918 00:43:42,776 --> 00:43:45,100 919 00:43:45,100 --> 00:43:47,224 [INAUDIBLE] is element and the lists are called N 920 00:43:47,224 --> 00:43:48,140 and this is confusing. 921 00:43:48,140 --> 00:43:50,376 Can we rename the nth element to something 922 00:43:50,376 --> 00:43:53,130 like m or some other useful number? 923 00:43:53,130 --> 00:43:54,200 The kth element, OK. 924 00:43:54,200 --> 00:43:57,680 So the 15 at least the kth over 2 element, 925 00:43:57,680 --> 00:44:00,227 so it can't be anything on the left half of the-- 926 00:44:00,227 --> 00:44:01,060 PROFESSOR: k over 2? 927 00:44:01,060 --> 00:44:02,350 Why k over 2? 928 00:44:02,350 --> 00:44:03,570 This list is size N. 929 00:44:03,570 --> 00:44:05,220 AUDIENCE: Sorry. 930 00:44:05,220 --> 00:44:07,470 I didn't pick the elements at N over 2. 931 00:44:07,470 --> 00:44:11,369 I picked the elements at k over 2. 932 00:44:11,369 --> 00:44:12,660 PROFESSOR: Why would I do that? 933 00:44:12,660 --> 00:44:18,790 934 00:44:18,790 --> 00:44:24,030 If N1 is greater than k, then I chop off the end of the list, 935 00:44:24,030 --> 00:44:24,966 right? 936 00:44:24,966 --> 00:44:29,325 If N2 is bigger than k, then I chop off the end of the list 937 00:44:29,325 --> 00:44:29,825 completely. 938 00:44:29,825 --> 00:44:33,375 939 00:44:33,375 --> 00:44:37,100 If this list is sorted and I want the third element, 940 00:44:37,100 --> 00:44:39,200 I know that these are not the answer. 941 00:44:39,200 --> 00:44:42,140 No matter what's down here, these are not the answer. 942 00:44:42,140 --> 00:44:49,350 So I know for sure that k is going to be bigger than N1, N2. 943 00:44:49,350 --> 00:44:55,640 So instead of going there, let's go at k over 2. 944 00:44:55,640 --> 00:45:01,290 And here, let's go for k over 2. 945 00:45:01,290 --> 00:45:03,255 Now this one looks a bit nastier. 946 00:45:03,255 --> 00:45:07,720 N1 plus N2 stays what it was before. 947 00:45:07,720 --> 00:45:14,970 948 00:45:14,970 --> 00:45:18,560 AUDIENCE: On the list where you got the element that was lower, 949 00:45:18,560 --> 00:45:24,690 you know that everything to the left of it is less than k 950 00:45:24,690 --> 00:45:25,240 over 2. 951 00:45:25,240 --> 00:45:31,470 952 00:45:31,470 --> 00:45:34,330 The element number is lower than k over 2, 953 00:45:34,330 --> 00:45:36,650 so we're not using anything to the left of the 15. 954 00:45:36,650 --> 00:45:38,540 You can kill that section for us. 955 00:45:38,540 --> 00:45:42,007 PROFESSOR: OK, so we can kill it, but then what's the rank? 956 00:45:42,007 --> 00:45:43,590 When I recurse, how am I going to know 957 00:45:43,590 --> 00:45:45,520 the rank that I'm looking for? 958 00:45:45,520 --> 00:45:48,817 959 00:45:48,817 --> 00:45:50,701 AUDIENCE: k minus what you killed. 960 00:45:50,701 --> 00:45:55,700 961 00:45:55,700 --> 00:45:57,520 AUDIENCE: You can save the branches. 962 00:45:57,520 --> 00:45:59,560 PROFESSOR: So you want to kill this guy, right? 963 00:45:59,560 --> 00:46:01,970 So you want to kill these numbers. 964 00:46:01,970 --> 00:46:04,330 But here I have k over 2 numbers, 965 00:46:04,330 --> 00:46:06,130 and here I have k over 2 numbers. 966 00:46:06,130 --> 00:46:09,401 How do I know that it's not somewhere here? 967 00:46:09,401 --> 00:46:13,562 AUDIENCE: How do you know that what's not somewhere there? 968 00:46:13,562 --> 00:46:16,220 AUDIENCE: You compare 15 and 43, right? 969 00:46:16,220 --> 00:46:19,660 And then you see that 43 is bigger, 970 00:46:19,660 --> 00:46:26,300 and you see 15 is smaller, so then you would go to, I guess, 971 00:46:26,300 --> 00:46:36,195 k over 4 index in N1, and 3k over 4 in N2. 972 00:46:36,195 --> 00:46:37,695 AUDIENCE: When you recurse, you said 973 00:46:37,695 --> 00:46:41,430 that you've killed k over 2 elements. 974 00:46:41,430 --> 00:46:46,947 AUDIENCE: But you can also kill everything to the right of 43. 975 00:46:46,947 --> 00:46:47,530 AUDIENCE: Yes. 976 00:46:47,530 --> 00:46:50,015 You can kill everything to the right of 43 977 00:46:50,015 --> 00:46:51,754 since it can't be any of those elements, 978 00:46:51,754 --> 00:46:53,670 and you can kill everything to the left of 15. 979 00:46:53,670 --> 00:46:57,236 And then you repeat the algorithm again with the lists 980 00:46:57,236 --> 00:47:00,210 you didn't kill, except you also put in a term of we've 981 00:47:00,210 --> 00:47:03,950 already covered k over 2 elements. 982 00:47:03,950 --> 00:47:07,570 PROFESSOR: So we want the element with the rank k over 4 983 00:47:07,570 --> 00:47:08,715 over these lists. 984 00:47:08,715 --> 00:47:13,646 985 00:47:13,646 --> 00:47:16,330 So I know for sure that what I have is either k over 2 986 00:47:16,330 --> 00:47:21,240 or less than k over 2, right? 987 00:47:21,240 --> 00:47:23,710 So this is less than k over 2, and then I'm 988 00:47:23,710 --> 00:47:25,532 looking for a rank of k over 4. 989 00:47:25,532 --> 00:47:28,184 990 00:47:28,184 --> 00:47:30,250 That seems to work. 991 00:47:30,250 --> 00:47:33,540 How does the running time look? 992 00:47:33,540 --> 00:47:38,030 AUDIENCE: It should be O of log k. 993 00:47:38,030 --> 00:47:40,530 AUDIENCE: I think it's log [INAUDIBLE]. 994 00:47:40,530 --> 00:47:45,030 995 00:47:45,030 --> 00:47:52,609 PROFESSOR: So log k, log N1 plus N2. 996 00:47:52,609 --> 00:47:57,440 997 00:47:57,440 --> 00:47:58,620 Are these different? 998 00:47:58,620 --> 00:48:00,830 AUDIENCE: Yes. 999 00:48:00,830 --> 00:48:06,406 AUDIENCE: If k is 1, the algorithm 1000 00:48:06,406 --> 00:48:10,084 should only recurse once, even if N is 20 million. 1001 00:48:10,084 --> 00:48:16,220 1002 00:48:16,220 --> 00:48:18,364 PROFESSOR: OK. 1003 00:48:18,364 --> 00:48:21,902 AUDIENCE: But if k is 20 million and the list lengths are 1004 00:48:21,902 --> 00:48:24,784 two million long, it'll take approximately those lengths 1005 00:48:24,784 --> 00:48:26,740 to run. 1006 00:48:26,740 --> 00:48:28,690 PROFESSOR: OK. 1007 00:48:28,690 --> 00:48:36,420 So what gets reduced, aside from the list size, k gets reduced. 1008 00:48:36,420 --> 00:48:39,870 k seems to define the input size for the next iteration 1009 00:48:39,870 --> 00:48:42,700 because I'll have at least k over 2 elements in one 1010 00:48:42,700 --> 00:48:45,640 of these buckets. 1011 00:48:45,640 --> 00:48:48,250 So it sounds like it should be log k. 1012 00:48:48,250 --> 00:48:56,120 1013 00:48:56,120 --> 00:49:00,390 k is bigger than N1, N2, but it should hopefully be smaller 1014 00:49:00,390 --> 00:49:03,740 than the sum because otherwise, why am I doing the problem? 1015 00:49:03,740 --> 00:49:08,360 So this is definitely order of N1 plus N2. 1016 00:49:08,360 --> 00:49:09,390 So this is a bound. 1017 00:49:09,390 --> 00:49:11,820 This is a slightly tighter bound. 1018 00:49:11,820 --> 00:49:13,665 We have a different solution to the problem. 1019 00:49:13,665 --> 00:49:16,945 1020 00:49:16,945 --> 00:49:18,820 All the possible solutions are hard to argue. 1021 00:49:18,820 --> 00:49:21,540 They all come down to something like this. 1022 00:49:21,540 --> 00:49:26,070 The one that we have requires you to use-- you 1023 00:49:26,070 --> 00:49:29,873 have two indices, and you know that the sum of the indices 1024 00:49:29,873 --> 00:49:33,580 is k, and you do binary search on the top 1025 00:49:33,580 --> 00:49:36,090 and adjust the index on the bottom 1026 00:49:36,090 --> 00:49:40,040 to keep the constraint that the sum of the two indices in N. 1027 00:49:40,040 --> 00:49:44,620 And you can look at that in the notes that we're going to post. 1028 00:49:44,620 --> 00:49:45,780 Are you guys tired? 1029 00:49:45,780 --> 00:49:49,509 Do you want to look at one more thing, or are we done? 1030 00:49:49,509 --> 00:49:50,425 AUDIENCE: [INAUDIBLE]. 1031 00:49:50,425 --> 00:49:57,370 1032 00:49:57,370 --> 00:50:01,510 PROFESSOR: Let's look at something reasonably easy. 1033 00:50:01,510 --> 00:50:03,350 You guys can read this on your own. 1034 00:50:03,350 --> 00:50:04,910 I'm not going to bore you with that. 1035 00:50:04,910 --> 00:50:08,650 1036 00:50:08,650 --> 00:50:12,030 So suppose we have some functions, 1037 00:50:12,030 --> 00:50:13,990 and we want to order them according 1038 00:50:13,990 --> 00:50:16,070 to their asymptotic growth rate. 1039 00:50:16,070 --> 00:50:19,480 Do people remember how to do this from Pset one? 1040 00:50:19,480 --> 00:50:22,400 So the idea is that you take each function, you simplify it, 1041 00:50:22,400 --> 00:50:23,970 and then you sort them. 1042 00:50:23,970 --> 00:50:28,234 So let's have a couple of simple ones and then some hard ones, 1043 00:50:28,234 --> 00:50:29,900 and we're going to stop in five minutes. 1044 00:50:29,900 --> 00:50:39,110 1045 00:50:39,110 --> 00:50:39,650 What's this? 1046 00:50:39,650 --> 00:50:42,752 1047 00:50:42,752 --> 00:50:44,153 AUDIENCE: n to the fourth. 1048 00:50:44,153 --> 00:50:46,652 1049 00:50:46,652 --> 00:50:47,235 PROFESSOR: OK. 1050 00:50:47,235 --> 00:50:49,810 1051 00:50:49,810 --> 00:50:51,090 Let's see. 1052 00:50:51,090 --> 00:50:51,850 n choose 3. 1053 00:50:51,850 --> 00:50:54,444 What is this? 1054 00:50:54,444 --> 00:50:55,539 AUDIENCE: [INAUDIBLE]. 1055 00:50:55,539 --> 00:50:56,580 PROFESSOR: OK, very good. 1056 00:50:56,580 --> 00:50:57,796 Why? 1057 00:50:57,796 --> 00:51:01,177 AUDIENCE: Something about choose is n times n minus 1 times 1058 00:51:01,177 --> 00:51:03,592 n minus 2. 1059 00:51:03,592 --> 00:51:07,456 AUDIENCE: It's n factorial over n minus 2 factorial. 1060 00:51:07,456 --> 00:51:11,220 1061 00:51:11,220 --> 00:51:13,325 PROFESSOR: This comes out to be roughly n cubed. 1062 00:51:13,325 --> 00:51:16,265 1063 00:51:16,265 --> 00:51:16,765 Cool. 1064 00:51:16,765 --> 00:51:19,570 1065 00:51:19,570 --> 00:51:22,808 How about n plus log to the fourth of n? 1066 00:51:22,808 --> 00:51:25,616 1067 00:51:25,616 --> 00:51:27,020 AUDIENCE: N. 1068 00:51:27,020 --> 00:51:28,960 PROFESSOR: Yep. 1069 00:51:28,960 --> 00:51:32,290 So even if I have a polynomial in a logarithm, 1070 00:51:32,290 --> 00:51:35,780 it's still dominated by pure n. 1071 00:51:35,780 --> 00:51:39,000 Now, suppose we want to order these guys together 1072 00:51:39,000 --> 00:51:43,730 with-- which one doesn't look boring at all? 1073 00:51:43,730 --> 00:51:46,940 n to the log n and 2 to the n. 1074 00:51:46,940 --> 00:51:49,790 1075 00:51:49,790 --> 00:51:50,739 Let's sort them. 1076 00:51:50,739 --> 00:51:51,780 Which one's the smallest? 1077 00:51:51,780 --> 00:51:53,394 Which one's the biggest? 1078 00:51:53,394 --> 00:51:59,250 1079 00:51:59,250 --> 00:52:02,504 AUDIENCE: 2 to the n is bigger. 1080 00:52:02,504 --> 00:52:04,420 PROFESSOR: Let's start with the smallest ones, 1081 00:52:04,420 --> 00:52:07,340 because I think that will be easy. 1082 00:52:07,340 --> 00:52:10,480 So which one's the absolute smallest out of all these guys? 1083 00:52:10,480 --> 00:52:11,340 AUDIENCE: n. 1084 00:52:11,340 --> 00:52:12,590 PROFESSOR: OK. 1085 00:52:12,590 --> 00:52:14,120 Then? 1086 00:52:14,120 --> 00:52:15,960 AUDIENCE: n to the third. 1087 00:52:15,960 --> 00:52:17,276 PROFESSOR: Cubed and fourth. 1088 00:52:17,276 --> 00:52:18,650 So we have to compare these guys. 1089 00:52:18,650 --> 00:52:19,608 How do we compare them? 1090 00:52:19,608 --> 00:52:23,010 1091 00:52:23,010 --> 00:52:26,900 n to the power of log n and 2 to the n something. 1092 00:52:26,900 --> 00:52:29,984 1093 00:52:29,984 --> 00:52:30,900 AUDIENCE: [INAUDIBLE]. 1094 00:52:30,900 --> 00:52:34,819 1095 00:52:34,819 --> 00:52:35,860 PROFESSOR: Take the logs. 1096 00:52:35,860 --> 00:52:38,550 So when we have something confusing with exponentials, 1097 00:52:38,550 --> 00:52:40,500 take the logs and see what we get. 1098 00:52:40,500 --> 00:52:42,640 Logs are monotonic, so if you take the logs, 1099 00:52:42,640 --> 00:52:45,450 you'll have the same relationship afterwards. 1100 00:52:45,450 --> 00:52:53,090 So log of this is log of n to the power of log n, 1101 00:52:53,090 --> 00:53:05,020 so it's log n times log n, so it's log 2 n. 1102 00:53:05,020 --> 00:53:08,940 Log of 2 to the n is n. 1103 00:53:08,940 --> 00:53:11,404 Which one's bigger? 1104 00:53:11,404 --> 00:53:12,205 AUDIENCE: n. 1105 00:53:12,205 --> 00:53:13,080 PROFESSOR: All right. 1106 00:53:13,080 --> 00:53:21,040 1107 00:53:21,040 --> 00:53:22,470 And we're not going to solve this, 1108 00:53:22,470 --> 00:53:25,150 but how would you go about solving this guy? 1109 00:53:25,150 --> 00:53:28,840 1110 00:53:28,840 --> 00:53:30,060 What do you do to it? 1111 00:53:30,060 --> 00:53:30,860 AUDIENCE: Sterling. 1112 00:53:30,860 --> 00:53:33,170 PROFESSOR: Sterling, yep. 1113 00:53:33,170 --> 00:53:35,270 You do Sterling, you go through the numbers, 1114 00:53:35,270 --> 00:53:37,210 and you figure out the answer. 1115 00:53:37,210 --> 00:53:39,135 And then if you have to do logarithms, 1116 00:53:39,135 --> 00:53:41,390 you use logarithms to figure out where 1117 00:53:41,390 --> 00:53:43,510 it belongs among these guys. 1118 00:53:43,510 --> 00:53:44,800 AUDIENCE: What's Sterling? 1119 00:53:44,800 --> 00:53:45,540 AUDIENCE: Sterling's formula. 1120 00:53:45,540 --> 00:53:47,506 It's that gross thing that was on the board 1121 00:53:47,506 --> 00:53:48,464 before we came in here. 1122 00:53:48,464 --> 00:53:54,390 PROFESSOR: So Sterling says that n factorial is ugly. 1123 00:53:54,390 --> 00:54:03,030 1124 00:54:03,030 --> 00:54:09,440 2 pi n here times n over e to the power of n. 1125 00:54:09,440 --> 00:54:13,670 1126 00:54:13,670 --> 00:54:14,827 So what's this binomial? 1127 00:54:14,827 --> 00:54:15,910 What's the formula for it? 1128 00:54:15,910 --> 00:54:21,670 1129 00:54:21,670 --> 00:54:23,210 OK, formula for n choose k. 1130 00:54:23,210 --> 00:54:24,512 Anyone? 1131 00:54:24,512 --> 00:54:28,363 AUDIENCE: It's n times 1 over 2 factorial times n 1132 00:54:28,363 --> 00:54:29,611 minus k factorial. 1133 00:54:29,611 --> 00:54:32,750 1134 00:54:32,750 --> 00:54:35,410 PROFESSOR: In this case, it's n factorial 1135 00:54:35,410 --> 00:54:40,576 over n over 2 factorial raised to the power of 2, right? 1136 00:54:40,576 --> 00:54:42,950 And then we chug through the math and get to some answer. 1137 00:54:42,950 --> 00:54:46,390 1138 00:54:46,390 --> 00:54:47,956 All right? 1139 00:54:47,956 --> 00:54:48,455