1 00:00:00,000 --> 00:00:00,090 2 00:00:00,090 --> 00:00:01,770 The following content is provided 3 00:00:01,770 --> 00:00:04,000 under a Creative Commons license. 4 00:00:04,000 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 5 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 6 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 7 00:00:13,320 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 9 00:00:17,832 --> 00:00:21,742 10 00:00:21,742 --> 00:00:23,450 PROFESSOR: Are people understanding AVLs? 11 00:00:23,450 --> 00:00:27,600 12 00:00:27,600 --> 00:00:29,960 That's good, because if everyone raised their hands, 13 00:00:29,960 --> 00:00:31,690 we'd be done and out of here. 14 00:00:31,690 --> 00:00:34,270 So we're not covering new material in this recitation. 15 00:00:34,270 --> 00:00:37,320 We're talking about AVLs again, except this time 16 00:00:37,320 --> 00:00:39,210 you'll tell me how we'll do them, 17 00:00:39,210 --> 00:00:42,140 and we'll look at the code and see 18 00:00:42,140 --> 00:00:44,780 how the theory matches the code. 19 00:00:44,780 --> 00:00:47,150 And I need one of these. 20 00:00:47,150 --> 00:00:47,800 Thank you. 21 00:00:47,800 --> 00:00:52,344 22 00:00:52,344 --> 00:00:53,260 So let's start simple. 23 00:00:53,260 --> 00:00:55,610 What's a BST? 24 00:00:55,610 --> 00:00:57,110 AUDIENCE: Binary search tree. 25 00:00:57,110 --> 00:01:00,110 26 00:01:00,110 --> 00:01:02,930 PROFESSOR: OK, binary search tree. 27 00:01:02,930 --> 00:01:06,080 It's binary because every node has at most two children. 28 00:01:06,080 --> 00:01:07,686 Why is it a search tree? 29 00:01:07,686 --> 00:01:09,627 AUDIENCE: It's easy to search. 30 00:01:09,627 --> 00:01:10,210 PROFESSOR: OK. 31 00:01:10,210 --> 00:01:11,279 Why is it easy to search? 32 00:01:11,279 --> 00:01:12,945 AUDIENCE: Because you start [INAUDIBLE], 33 00:01:12,945 --> 00:01:15,195 is my number greater than this key or is it less than, 34 00:01:15,195 --> 00:01:16,410 and then you go from there. 35 00:01:16,410 --> 00:01:16,993 PROFESSOR: OK. 36 00:01:16,993 --> 00:01:20,600 So if I would have to state this as a formal rep invariant 37 00:01:20,600 --> 00:01:24,016 thing, what would I say it is so that I can do that operation? 38 00:01:24,016 --> 00:01:26,268 AUDIENCE: Node at right is greater than 39 00:01:26,268 --> 00:01:30,910 node at key, which is greater than node at left. 40 00:01:30,910 --> 00:01:32,140 PROFESSOR: OK, excellent. 41 00:01:32,140 --> 00:01:36,190 So it turns out I can use this argument recursively 42 00:01:36,190 --> 00:01:40,830 to say that for a given node, everything that 43 00:01:40,830 --> 00:01:44,740 is to the right of that node is greater, 44 00:01:44,740 --> 00:01:48,710 and everything that is to the left is smaller. 45 00:01:48,710 --> 00:01:52,450 46 00:01:52,450 --> 00:01:54,480 And this allows us to do search quickly 47 00:01:54,480 --> 00:01:57,180 because if you're looking for a key, 48 00:01:57,180 --> 00:02:00,480 say you have numbers in your tree and you're looking for 5. 49 00:02:00,480 --> 00:02:03,770 If you arrive at a node whose value is 4, you go right. 50 00:02:03,770 --> 00:02:07,820 If you arrive at a node whose value is 7, you go left. 51 00:02:07,820 --> 00:02:11,420 OK So what do we know how to do on BSTs? 52 00:02:11,420 --> 00:02:14,325 What are the operations that we know? 53 00:02:14,325 --> 00:02:17,580 AUDIENCE: Minimum, maximum. 54 00:02:17,580 --> 00:02:18,965 PROFESSOR: Min, max. 55 00:02:18,965 --> 00:02:22,100 56 00:02:22,100 --> 00:02:24,210 AUDIENCE: Left, right, parent. 57 00:02:24,210 --> 00:02:26,640 PROFESSOR: So those are for the node. 58 00:02:26,640 --> 00:02:30,300 I want the queries and the updates for the BST type. 59 00:02:30,300 --> 00:02:31,717 AUDIENCE: Insert and delete. 60 00:02:31,717 --> 00:02:32,300 PROFESSOR: OK. 61 00:02:32,300 --> 00:02:35,030 62 00:02:35,030 --> 00:02:39,052 Insert, Delete. 63 00:02:39,052 --> 00:02:40,036 AUDIENCE: Next larger. 64 00:02:40,036 --> 00:02:42,897 65 00:02:42,897 --> 00:02:43,480 PROFESSOR: OK. 66 00:02:43,480 --> 00:02:52,340 67 00:02:52,340 --> 00:02:55,026 And then in here, the S in BST. 68 00:02:55,026 --> 00:02:57,626 Someone give me the S so we can move on. 69 00:02:57,626 --> 00:02:58,800 AUDIENCE: Find. 70 00:02:58,800 --> 00:03:03,254 PROFESSOR: Find, search, depending 71 00:03:03,254 --> 00:03:04,420 on what code you're reading. 72 00:03:04,420 --> 00:03:07,070 73 00:03:07,070 --> 00:03:11,260 What's the running time for these guys? 74 00:03:11,260 --> 00:03:12,550 AUDIENCE: Order h. 75 00:03:12,550 --> 00:03:14,220 PROFESSOR: Excellent. 76 00:03:14,220 --> 00:03:16,350 So everything has the same running time. 77 00:03:16,350 --> 00:03:19,030 Nice and easy to remember. 78 00:03:19,030 --> 00:03:19,900 Order h. 79 00:03:19,900 --> 00:03:21,636 What's h? 80 00:03:21,636 --> 00:03:23,930 AUDIENCE: [INAUDIBLE]. 81 00:03:23,930 --> 00:03:25,190 PROFESSOR: OK. 82 00:03:25,190 --> 00:03:27,120 How do we define this height? 83 00:03:27,120 --> 00:03:31,269 What's the height for this tree? 84 00:03:31,269 --> 00:03:32,185 AUDIENCE: [INAUDIBLE]. 85 00:03:32,185 --> 00:03:35,460 86 00:03:35,460 --> 00:03:36,720 PROFESSOR: Good. 87 00:03:36,720 --> 00:03:38,880 What's the height for this tree? 88 00:03:38,880 --> 00:03:43,460 89 00:03:43,460 --> 00:03:48,883 What's the height for this tree? 90 00:03:48,883 --> 00:03:49,466 AUDIENCE: Two. 91 00:03:49,466 --> 00:03:52,600 92 00:03:52,600 --> 00:03:55,020 PROFESSOR: So now for a more general case, 93 00:03:55,020 --> 00:04:00,320 where this is the height, and the height of my left subtree 94 00:04:00,320 --> 00:04:00,880 is hl. 95 00:04:00,880 --> 00:04:04,940 The height of my right subtree is hr. 96 00:04:04,940 --> 00:04:06,672 What's h? 97 00:04:06,672 --> 00:04:11,001 AUDIENCE: It's the maximum of hr times hl plus 1. 98 00:04:11,001 --> 00:04:16,785 99 00:04:16,785 --> 00:04:17,660 PROFESSOR: Nice save. 100 00:04:17,660 --> 00:04:19,118 I heard a "plus 1" somewhere there. 101 00:04:19,118 --> 00:04:22,290 102 00:04:22,290 --> 00:04:24,480 Very good. 103 00:04:24,480 --> 00:04:25,070 This is r. 104 00:04:25,070 --> 00:04:28,870 105 00:04:28,870 --> 00:04:31,060 So if we look at the first part of the code, 106 00:04:31,060 --> 00:04:35,840 lines one through eight, lines seven and eight 107 00:04:35,840 --> 00:04:39,280 implement the definition that we talked about here. 108 00:04:39,280 --> 00:04:41,440 So in our Python implementation, each node 109 00:04:41,440 --> 00:04:44,550 knows the height of the tree that he's the root of. 110 00:04:44,550 --> 00:04:47,640 111 00:04:47,640 --> 00:04:49,900 And since we're storing that, we need 112 00:04:49,900 --> 00:04:52,150 to update it every once in awhile when we make changes 113 00:04:52,150 --> 00:04:54,870 to the tree, like when we insert nodes. 114 00:04:54,870 --> 00:04:56,810 And the way we do that is update height, which 115 00:04:56,810 --> 00:04:59,970 uses the formula that we came up with here. 116 00:04:59,970 --> 00:05:04,120 Now, there is a hack on lines two, three, four, five. 117 00:05:04,120 --> 00:05:06,750 Can anyone tell me what the hack is? 118 00:05:06,750 --> 00:05:20,989 119 00:05:20,989 --> 00:05:22,470 AUDIENCE: The negative 1. 120 00:05:22,470 --> 00:05:23,240 PROFESSOR: OK. 121 00:05:23,240 --> 00:05:24,140 How does that work? 122 00:05:24,140 --> 00:05:24,980 Why do I need it? 123 00:05:24,980 --> 00:05:27,780 124 00:05:27,780 --> 00:05:30,170 AUDIENCE: That's so if you're at the root node, 125 00:05:30,170 --> 00:05:33,510 you can still calculate the height. 126 00:05:33,510 --> 00:05:37,820 PROFESSOR: Depends on how the root node looks like. 127 00:05:37,820 --> 00:05:40,039 AUDIENCE: If it has no children. 128 00:05:40,039 --> 00:05:41,080 PROFESSOR: That's a leaf. 129 00:05:41,080 --> 00:05:41,880 AUDIENCE: Yes. 130 00:05:41,880 --> 00:05:43,670 PROFESSOR: OK. 131 00:05:43,670 --> 00:05:47,390 So if I'm in this case or in this case, what's hl, 132 00:05:47,390 --> 00:05:49,140 what's hr? 133 00:05:49,140 --> 00:05:52,130 In this case, I have this node of height zero, 134 00:05:52,130 --> 00:05:55,400 so I can make a small mistake here and it'll save me, 135 00:05:55,400 --> 00:05:59,990 but here, I have no children, so hl and hr have 136 00:05:59,990 --> 00:06:04,560 to be set in such a way that this formula evaluates to 0. 137 00:06:04,560 --> 00:06:07,890 If I set them to minus 1, I'll have minus 1, minus 1. 138 00:06:07,890 --> 00:06:11,756 The maximum is minus 1 plus 1 equals 0. 139 00:06:11,756 --> 00:06:15,650 AUDIENCE: It's just to check the nodes for AVLs. 140 00:06:15,650 --> 00:06:18,090 PROFESSOR: We use that to update the height. 141 00:06:18,090 --> 00:06:19,950 For AVLs, we need to know the height 142 00:06:19,950 --> 00:06:21,650 of a node instantaneously. 143 00:06:21,650 --> 00:06:23,200 We can't afford to go down the tree 144 00:06:23,200 --> 00:06:25,590 and compute the height every time we need it, 145 00:06:25,590 --> 00:06:29,750 so every node gets to store its height. 146 00:06:29,750 --> 00:06:32,885 There's a small field in each node that has the height. 147 00:06:32,885 --> 00:06:35,010 So we need to update that every once in awhile when 148 00:06:35,010 --> 00:06:36,770 we do insertions and deletions. 149 00:06:36,770 --> 00:06:38,590 This is how we update it, and in order 150 00:06:38,590 --> 00:06:42,320 to update it for this case, where we're at a leaf, 151 00:06:42,320 --> 00:06:45,230 we have to say that the height of a non-existing tree 152 00:06:45,230 --> 00:06:47,516 is minus 1. 153 00:06:47,516 --> 00:06:49,810 Of course, in theory and in real life, 154 00:06:49,810 --> 00:06:52,540 there are no non-existing trees, so this is a clever hack 155 00:06:52,540 --> 00:06:56,470 to reduce code size. 156 00:06:56,470 --> 00:07:00,220 So we said binary search trees would look something like this. 157 00:07:00,220 --> 00:07:16,408 158 00:07:16,408 --> 00:07:17,430 Is this guy an AVL? 159 00:07:17,430 --> 00:07:24,050 160 00:07:24,050 --> 00:07:26,696 Does anyone think it's an a AVL? 161 00:07:26,696 --> 00:07:31,660 Does anyone think it's not an AVL? 162 00:07:31,660 --> 00:07:35,205 Can one of you tell me why? 163 00:07:35,205 --> 00:07:39,300 AUDIENCE: The tree with 5 as its root is not balanced. 164 00:07:39,300 --> 00:07:43,862 165 00:07:43,862 --> 00:07:45,695 PROFESSOR: So this guy here is not balanced. 166 00:07:45,695 --> 00:07:48,670 Why is it not balanced? 167 00:07:48,670 --> 00:07:53,790 AUDIENCE: Because it has two on its right and zero on its left, 168 00:07:53,790 --> 00:07:55,290 so the difference is greater than 1. 169 00:07:55,290 --> 00:08:00,214 170 00:08:00,214 --> 00:08:02,380 PROFESSOR: You're saying that there's something here 171 00:08:02,380 --> 00:08:08,100 that's two, and something here that's-- OK, so height. 172 00:08:08,100 --> 00:08:11,890 So then it's almost like that. 173 00:08:11,890 --> 00:08:17,770 It's 1 here and minus 1 here. 174 00:08:17,770 --> 00:08:19,470 So the reason I asked you to clarify 175 00:08:19,470 --> 00:08:21,300 is that first you said two and zero, 176 00:08:21,300 --> 00:08:23,940 and that's the node count, and AVL 177 00:08:23,940 --> 00:08:25,660 doesn't care about node count. 178 00:08:25,660 --> 00:08:27,650 AVL cares about height. 179 00:08:27,650 --> 00:08:35,949 So for example, if I have something like this, 180 00:08:35,949 --> 00:08:37,619 this is a happy AVL. 181 00:08:37,619 --> 00:08:39,809 Three nodes, here one node here. 182 00:08:39,809 --> 00:08:42,460 The difference in node count is greater than 1, 183 00:08:42,460 --> 00:08:44,200 but the difference in height is 1. 184 00:08:44,200 --> 00:08:47,360 Therefore, this is a good AVL. 185 00:08:47,360 --> 00:08:49,360 So what's a rep invariant for an AVL? 186 00:08:49,360 --> 00:08:56,024 187 00:08:56,024 --> 00:08:58,973 AUDIENCE: The height of the left subtree for every node 188 00:08:58,973 --> 00:09:02,950 is within 1 of the height of the right subtree. 189 00:09:02,950 --> 00:09:03,825 PROFESSOR: Excellent. 190 00:09:03,825 --> 00:09:06,420 191 00:09:06,420 --> 00:09:07,030 AVL. 192 00:09:07,030 --> 00:09:15,050 The core property is that for every node, 193 00:09:15,050 --> 00:09:20,850 the left height and the right height differ by at most 1. 194 00:09:20,850 --> 00:09:22,300 What else? 195 00:09:22,300 --> 00:09:24,560 If we want to be completely rigorous, 196 00:09:24,560 --> 00:09:26,140 what else do we have to say? 197 00:09:26,140 --> 00:09:28,790 198 00:09:28,790 --> 00:09:29,530 It's a BST. 199 00:09:29,530 --> 00:09:34,170 200 00:09:34,170 --> 00:09:36,145 So an AVL is a special kind of BST, 201 00:09:36,145 --> 00:09:38,470 and that's why when we write the AVL code, 202 00:09:38,470 --> 00:09:43,030 we inherit from the BST and we use some of its methods 203 00:09:43,030 --> 00:09:43,530 heavily. 204 00:09:43,530 --> 00:09:47,660 205 00:09:47,660 --> 00:09:48,760 Why do we like AVLs? 206 00:09:48,760 --> 00:09:52,140 What's so good about them? 207 00:09:52,140 --> 00:09:56,140 AUDIENCE: It's faster because it ensures the minimal height 208 00:09:56,140 --> 00:10:00,640 of the entire tree because most of our complexities are o of h, 209 00:10:00,640 --> 00:10:02,670 so it would have to be smaller. 210 00:10:02,670 --> 00:10:05,110 PROFESSOR: So we care about this. 211 00:10:05,110 --> 00:10:07,660 In a regular binary tree, the worst case that you have 212 00:10:07,660 --> 00:10:12,560 is this. 213 00:10:12,560 --> 00:10:15,740 Ignoring this part, this is a worst case binary search tree 214 00:10:15,740 --> 00:10:19,880 where it's basically a list, so height is order n. 215 00:10:19,880 --> 00:10:24,510 In an AVL, you're saying it's better. 216 00:10:24,510 --> 00:10:29,480 So the reason why we care about AVLs is 217 00:10:29,480 --> 00:10:36,570 that height is order of log n. 218 00:10:36,570 --> 00:10:39,265 Now, did people understand from lecture why that's the case? 219 00:10:39,265 --> 00:10:42,840 220 00:10:42,840 --> 00:10:47,660 Can anyone tell me why that's the case? 221 00:10:47,660 --> 00:10:49,588 AUDIENCE: Well, it's just like every level 222 00:10:49,588 --> 00:10:55,715 you go down, if you split off that many times, of course 223 00:10:55,715 --> 00:10:58,260 it's log n. 224 00:10:58,260 --> 00:11:03,690 If you have n nodes and they're filled up to the edge, 225 00:11:03,690 --> 00:11:06,140 there's going to be log n of them. 226 00:11:06,140 --> 00:11:09,230 AUDIENCE: It's close to a full binary tree, right? 227 00:11:09,230 --> 00:11:12,260 PROFESSOR: For some definition of "close." 228 00:11:12,260 --> 00:11:14,010 So here's what I use to remember, 229 00:11:14,010 --> 00:11:18,810 and I think I can persuade you that the height is log n using 230 00:11:18,810 --> 00:11:21,949 the argument that I'll show you here. 231 00:11:21,949 --> 00:11:23,490 Let's start building a tree this way. 232 00:11:23,490 --> 00:11:25,750 Let's say I have a fixed height, and I 233 00:11:25,750 --> 00:11:28,620 want to have as few nodes as possible. 234 00:11:28,620 --> 00:11:33,280 If I have a tree with a big height and very few nodes, 235 00:11:33,280 --> 00:11:38,740 h is going to be bad when you write it as a function of n. 236 00:11:38,740 --> 00:11:40,670 So those trees are unbalanced. 237 00:11:40,670 --> 00:11:43,000 Big height, small number of nodes. 238 00:11:43,000 --> 00:11:46,000 So say we're trying to build an AVL with the smallest 239 00:11:46,000 --> 00:11:48,720 number of nodes and a fixed height. 240 00:11:48,720 --> 00:11:49,860 What if the height is 0? 241 00:11:49,860 --> 00:11:52,660 What does that tree look like? 242 00:11:52,660 --> 00:11:56,930 It's not too complicated, right? 243 00:11:56,930 --> 00:11:58,610 This is an AVL of height 0. 244 00:11:58,610 --> 00:12:01,740 It's the only possible AVL of height 0. 245 00:12:01,740 --> 00:12:04,260 Now, what if we're trying to build an AVL of height 1 246 00:12:04,260 --> 00:12:05,855 that has as few nodes as possible? 247 00:12:05,855 --> 00:12:10,550 248 00:12:10,550 --> 00:12:12,890 This is what it looks like, right? 249 00:12:12,890 --> 00:12:16,510 Height 0, height 1. 250 00:12:16,510 --> 00:12:18,080 I could add another node here, but I 251 00:12:18,080 --> 00:12:21,370 don't want to because I want as few nodes as possible. 252 00:12:21,370 --> 00:12:23,720 Now, what if I tried to build an AVL of height 2 253 00:12:23,720 --> 00:12:25,295 that has as few nodes as possible? 254 00:12:25,295 --> 00:12:28,340 255 00:12:28,340 --> 00:12:29,010 Can I do this? 256 00:12:29,010 --> 00:12:31,735 257 00:12:31,735 --> 00:12:34,353 AUDIENCE: So at the worst case, you have h minus 1 258 00:12:34,353 --> 00:12:36,160 and then h minus 2 there. 259 00:12:36,160 --> 00:12:38,890 PROFESSOR: OK. 260 00:12:38,890 --> 00:12:39,840 You're moving ahead. 261 00:12:39,840 --> 00:12:42,280 You're forcing me to move faster, but you're right. 262 00:12:42,280 --> 00:12:45,320 And the reason is this is unbalanced. 263 00:12:45,320 --> 00:12:48,590 The height at the left has to be the height of the right 264 00:12:48,590 --> 00:12:49,810 plus or minus 1. 265 00:12:49,810 --> 00:12:51,680 Can't be anything else. 266 00:12:51,680 --> 00:12:54,980 So at the very least, I have to build a tree of height 0 here, 267 00:12:54,980 --> 00:12:57,240 and I know that the best tree of height 268 00:12:57,240 --> 00:12:59,000 0 that I have is this guy. 269 00:12:59,000 --> 00:13:03,770 270 00:13:03,770 --> 00:13:04,540 Cool. 271 00:13:04,540 --> 00:13:07,940 So for height 3, I would have to do this. 272 00:13:07,940 --> 00:13:11,168 And then you're saying, what would I use at the left? 273 00:13:11,168 --> 00:13:16,148 AUDIENCE: You'd use whatever height's on the right minus 2. 274 00:13:16,148 --> 00:13:18,444 The height on the right minus 1 per side. 275 00:13:18,444 --> 00:13:20,360 PROFESSOR: So on the right, I have a height 2. 276 00:13:20,360 --> 00:13:23,000 On the left, I have a height 1. 277 00:13:23,000 --> 00:13:26,360 278 00:13:26,360 --> 00:13:29,940 If I want to go up to height 4, I do the same thing. 279 00:13:29,940 --> 00:13:33,470 So if I want to build an AVL tree 280 00:13:33,470 --> 00:13:36,890 with as few nodes as possible and height h, 281 00:13:36,890 --> 00:13:38,960 I start with the root, then at the right, 282 00:13:38,960 --> 00:13:43,350 I build an AVL tree of height h minus 1, and at the left, 283 00:13:43,350 --> 00:13:45,970 an AVL tree of height h minus 2. 284 00:13:45,970 --> 00:13:48,420 And if these have the minimum number of nodes, 285 00:13:48,420 --> 00:13:51,040 then it turns out that the whole thing 286 00:13:51,040 --> 00:13:52,487 has the minimum number of nodes. 287 00:13:52,487 --> 00:13:54,570 I don't want to build a tree where the heights are 288 00:13:54,570 --> 00:13:57,500 equal because that would mean more nodes here, 289 00:13:57,500 --> 00:13:59,040 so this is the best I can do. 290 00:13:59,040 --> 00:14:02,095 This is the best way I can build a tall tree 291 00:14:02,095 --> 00:14:03,345 with as few nodes as possible. 292 00:14:03,345 --> 00:14:10,450 293 00:14:10,450 --> 00:14:13,150 Suppose I want to write the number of nodes 294 00:14:13,150 --> 00:14:14,270 as a function of height. 295 00:14:14,270 --> 00:14:16,970 You're telling me what it is. 296 00:14:16,970 --> 00:14:21,420 When I was here, you were giving me the answer for this. 297 00:14:21,420 --> 00:14:22,940 So suppose I have a height, and I 298 00:14:22,940 --> 00:14:25,090 want to know how many nodes I have in my tree that 299 00:14:25,090 --> 00:14:26,370 has a minimum number of nodes. 300 00:14:26,370 --> 00:14:28,657 What is it? 301 00:14:28,657 --> 00:14:39,635 AUDIENCE: It's N h minus 1, and then plus N h minus 2. 302 00:14:39,635 --> 00:14:44,126 303 00:14:44,126 --> 00:14:45,649 There might be a constant. 304 00:14:45,649 --> 00:14:47,190 PROFESSOR: There might be a constant. 305 00:14:47,190 --> 00:14:50,950 306 00:14:50,950 --> 00:14:54,840 N h minus 1 is this tree, N h minus 2 is this tree, 307 00:14:54,840 --> 00:14:56,399 so what's the constant? 308 00:14:56,399 --> 00:14:57,315 AUDIENCE: [INAUDIBLE]. 309 00:14:57,315 --> 00:14:57,940 PROFESSOR: Yep. 310 00:14:57,940 --> 00:15:01,195 This guy. 311 00:15:01,195 --> 00:15:04,355 AUDIENCE: Doesn't he have height 0, though? 312 00:15:04,355 --> 00:15:06,119 AUDIENCE: We're talking number of nodes. 313 00:15:06,119 --> 00:15:06,660 AUDIENCE: Oh. 314 00:15:06,660 --> 00:15:07,888 Number of nodes, yeah. 315 00:15:07,888 --> 00:15:10,919 316 00:15:10,919 --> 00:15:12,710 PROFESSOR: I'm not going to solve this now. 317 00:15:12,710 --> 00:15:14,876 We learned how to solve recurrences a long time ago, 318 00:15:14,876 --> 00:15:16,879 so I'll pretend I still remember how to do that, 319 00:15:16,879 --> 00:15:18,670 and I will tell you that the solution looks 320 00:15:18,670 --> 00:15:20,480 something like this. 321 00:15:20,480 --> 00:15:25,080 N of h is roughly 5, which is the magic number that we talked 322 00:15:25,080 --> 00:15:27,760 about in lecture-- roughly, which 323 00:15:27,760 --> 00:15:34,880 means there might be some things there that I forgot-- to the h. 324 00:15:34,880 --> 00:15:38,280 What really matters is that it's an exponential in h. 325 00:15:38,280 --> 00:15:40,830 If you look at these guys, this is close to the Fibonacci 326 00:15:40,830 --> 00:15:44,720 number formula except there's a plus 1 here. 327 00:15:44,720 --> 00:15:48,350 So these guys are bigger than the Fibonacci numbers. 328 00:15:48,350 --> 00:15:50,670 This is definitely bigger than whatever 329 00:15:50,670 --> 00:15:52,045 the formula for Fibonacci numbers 330 00:15:52,045 --> 00:15:58,160 is, which is 5 to the h minus something over something. 331 00:15:58,160 --> 00:16:01,700 What really matters is the minimum number of nodes 332 00:16:01,700 --> 00:16:06,240 in an AVL of height h is an exponential as a function of h. 333 00:16:06,240 --> 00:16:10,110 If you invert that, you get that the maximum height 334 00:16:10,110 --> 00:16:20,160 for a tree with N nodes is log N. This is a way 335 00:16:20,160 --> 00:16:24,270 to construct an AVL that shows you that the height has 336 00:16:24,270 --> 00:16:27,400 to be logarithmic as long as we can keep this rep 337 00:16:27,400 --> 00:16:29,657 invariant true. 338 00:16:29,657 --> 00:16:31,094 AUDIENCE: I have a question. 339 00:16:31,094 --> 00:16:35,884 I didn't quite follow how you got h minus 2 and h minus 1 340 00:16:35,884 --> 00:16:37,330 there. 341 00:16:37,330 --> 00:16:39,020 PROFESSOR: Here, here, here? 342 00:16:39,020 --> 00:16:40,836 AUDIENCE: There. 343 00:16:40,836 --> 00:16:43,110 The tree down. 344 00:16:43,110 --> 00:16:43,930 This one? 345 00:16:43,930 --> 00:16:44,630 This one? 346 00:16:44,630 --> 00:16:45,780 AUDIENCE: This one, yeah. 347 00:16:45,780 --> 00:16:46,363 PROFESSOR: OK. 348 00:16:46,363 --> 00:16:49,030 So suppose I'm at height 4 here. 349 00:16:49,030 --> 00:16:53,780 350 00:16:53,780 --> 00:16:56,790 What's the best way to construct an AVL that 351 00:16:56,790 --> 00:16:59,720 has as few nodes as possible but height 4? 352 00:16:59,720 --> 00:17:01,990 If this guy has to be at height 4, 353 00:17:01,990 --> 00:17:06,180 then it has to have a child at height 3, at least one child. 354 00:17:06,180 --> 00:17:08,880 Otherwise, it's not going to be height 4. 355 00:17:08,880 --> 00:17:12,609 Now, I need to build something here that has height 3. 356 00:17:12,609 --> 00:17:14,000 What should I build? 357 00:17:14,000 --> 00:17:17,430 The best AVL tree that I know that has height 3, right? 358 00:17:17,430 --> 00:17:20,940 I have to get to height 3 using as few nodes as possible, 359 00:17:20,940 --> 00:17:23,490 so I'm going to use this guy because it has as few nodes as 360 00:17:23,490 --> 00:17:26,230 possible and it has height 3. 361 00:17:26,230 --> 00:17:29,990 So this covers my right side. 362 00:17:29,990 --> 00:17:35,430 Now, for my left side, what am I going to use here? 363 00:17:35,430 --> 00:17:38,990 Another AVL tree that has as few nodes as possible, right? 364 00:17:38,990 --> 00:17:41,810 So I'm definitely keeping that. 365 00:17:41,810 --> 00:17:45,628 But what's going to be the height of that? 366 00:17:45,628 --> 00:17:47,753 AUDIENCE: So we want the difference to be less than 367 00:17:47,753 --> 00:17:48,384 or equal to 1? 368 00:17:48,384 --> 00:17:49,050 PROFESSOR: Yeah. 369 00:17:49,050 --> 00:17:50,820 Otherwise, it's not an AVL. 370 00:17:50,820 --> 00:17:55,330 AUDIENCE: So it would be of height 4. 371 00:17:55,330 --> 00:17:58,690 PROFESSOR: So I know for sure that this guy has height 4. 372 00:17:58,690 --> 00:18:00,446 The difference has to be plus, minus 1. 373 00:18:00,446 --> 00:18:01,570 AUDIENCE: So it would be 2. 374 00:18:01,570 --> 00:18:03,590 PROFESSOR: 2, 3, 4. 375 00:18:03,590 --> 00:18:06,362 I know for sure I don't want it to be 4, 376 00:18:06,362 --> 00:18:08,070 and now I have to choose between 2 and 3. 377 00:18:08,070 --> 00:18:10,354 A tree of height 2 will have fewer nodes 378 00:18:10,354 --> 00:18:11,770 than a tree of height 3, so that's 379 00:18:11,770 --> 00:18:14,140 why we're doing it this way. 380 00:18:14,140 --> 00:18:17,870 So to build a tree of height 4, build a tree of height 3, 381 00:18:17,870 --> 00:18:20,280 build a tree of height 2, connect them together. 382 00:18:20,280 --> 00:18:24,000 That's how we get to this, and then this. 383 00:18:24,000 --> 00:18:24,500 Cool. 384 00:18:24,500 --> 00:18:27,220 385 00:18:27,220 --> 00:18:29,660 Do people remember how to do insertions and deletions 386 00:18:29,660 --> 00:18:30,920 in a regular binary tree? 387 00:18:30,920 --> 00:18:35,080 388 00:18:35,080 --> 00:18:35,580 Yes? 389 00:18:35,580 --> 00:18:40,420 390 00:18:40,420 --> 00:18:43,540 How do I insert 6.5 here in this one? 391 00:18:43,540 --> 00:18:47,360 392 00:18:47,360 --> 00:18:49,910 AUDIENCE: You take 6.5, you're like, oh, it's greater than 4, 393 00:18:49,910 --> 00:18:51,410 then you move to 5. 394 00:18:51,410 --> 00:18:53,410 Then you're like, oh, it's greater than 5. 395 00:18:53,410 --> 00:18:53,920 Go to 6. 396 00:18:53,920 --> 00:18:58,230 Oh, it's less than 6, and then it goes to the left of 6. 397 00:18:58,230 --> 00:18:59,586 I mean-- sorry. 398 00:18:59,586 --> 00:19:02,460 I meant left of 7. 399 00:19:02,460 --> 00:19:03,110 PROFESSOR: OK. 400 00:19:03,110 --> 00:19:04,788 So it's bigger than-- 401 00:19:04,788 --> 00:19:06,500 AUDIENCE: Testing you all, guys. 402 00:19:06,500 --> 00:19:08,920 Oh yeah. 403 00:19:08,920 --> 00:19:12,168 6.5 is bigger than 6, so then it goes to the left of 7 404 00:19:12,168 --> 00:19:13,581 because it's less than 7. 405 00:19:13,581 --> 00:19:18,000 406 00:19:18,000 --> 00:19:21,899 PROFESSOR: I thought I had my example wrong for a second. 407 00:19:21,899 --> 00:19:22,940 AUDIENCE: I'm just tired. 408 00:19:22,940 --> 00:19:27,060 PROFESSOR: Me too, so don't scare me. 409 00:19:27,060 --> 00:19:29,390 All right. 410 00:19:29,390 --> 00:19:32,740 So suppose we have heights stored in the nodes here, 411 00:19:32,740 --> 00:19:35,480 because we want to do that for AVLs. 412 00:19:35,480 --> 00:19:39,960 We'll figure out why in a bit. 413 00:19:39,960 --> 00:19:45,520 The height of this guy used to be 0, 1, 2, 3, right? 414 00:19:45,520 --> 00:19:50,649 0, 1, 2, 3. 415 00:19:50,649 --> 00:19:52,190 What happened when I added this node? 416 00:19:52,190 --> 00:19:54,820 417 00:19:54,820 --> 00:19:57,924 AUDIENCE: You added 1 to everything [INAUDIBLE]. 418 00:19:57,924 --> 00:19:58,590 PROFESSOR: Yeah. 419 00:19:58,590 --> 00:20:01,340 So I went down on my insertion path 420 00:20:01,340 --> 00:20:04,960 to find out where to insert a node, and then I added it. 421 00:20:04,960 --> 00:20:07,480 422 00:20:07,480 --> 00:20:09,080 I just chose my case conveniently, 423 00:20:09,080 --> 00:20:11,810 but in some cases, all the heights 424 00:20:11,810 --> 00:20:14,990 of the nodes on that path have changed. 425 00:20:14,990 --> 00:20:18,090 So in an AVL, after we insert, we 426 00:20:18,090 --> 00:20:19,720 have to make sure that the height 427 00:20:19,720 --> 00:20:23,000 of every node on the path is updated. 428 00:20:23,000 --> 00:20:24,600 Does this make sense? 429 00:20:24,600 --> 00:20:30,500 So the heights will be 1, 2, 3, 4. 430 00:20:30,500 --> 00:20:33,540 431 00:20:33,540 --> 00:20:35,440 So the way we implemented the AVLs is 432 00:20:35,440 --> 00:20:38,690 that we do regular insertions and deletions, 433 00:20:38,690 --> 00:20:42,360 and then at the end, we say, well it used to be an AVL. 434 00:20:42,360 --> 00:20:43,860 Now we added or removed the node, 435 00:20:43,860 --> 00:20:46,360 so it might be a slightly unbalanced AVL, 436 00:20:46,360 --> 00:20:49,340 which means it's not an AVL. 437 00:20:49,340 --> 00:20:51,320 And we have the rebalance procedure. 438 00:20:51,320 --> 00:20:56,042 So if you look at on the second page of the code, 439 00:20:56,042 --> 00:20:59,620 insert and delete are really tiny, lines 20 to 22 440 00:20:59,620 --> 00:21:01,380 and 24 to 28. 441 00:21:01,380 --> 00:21:03,980 And they're really tiny because they call the old code 442 00:21:03,980 --> 00:21:06,500 and then they call Rebalance. 443 00:21:06,500 --> 00:21:11,200 So all the magic in an AVL is in Rebalance. 444 00:21:11,200 --> 00:21:13,650 The first thing that Rebalance does, if you see, 445 00:21:13,650 --> 00:21:16,410 it has a while loop there, and the first thing that it does 446 00:21:16,410 --> 00:21:20,150 in the while loop is it updates the height, and this is why. 447 00:21:20,150 --> 00:21:22,540 The height might have changed after the insert, 448 00:21:22,540 --> 00:21:25,210 so any decision based on the old height is bad. 449 00:21:25,210 --> 00:21:28,391 450 00:21:28,391 --> 00:21:29,640 That's why we have that there. 451 00:21:29,640 --> 00:21:32,900 452 00:21:32,900 --> 00:21:33,780 Make sense so far? 453 00:21:33,780 --> 00:21:38,610 454 00:21:38,610 --> 00:21:42,440 So if you look at rebalance, don't try to understand it 455 00:21:42,440 --> 00:21:44,530 quite yet, but what it does is it 456 00:21:44,530 --> 00:21:47,307 calls Rotate Left and Rotate Right. 457 00:21:47,307 --> 00:21:48,640 Well, Left Rotate, Right Rotate. 458 00:21:48,640 --> 00:21:51,190 459 00:21:51,190 --> 00:21:53,980 All the magic is in Rebalance, and the tools that it uses 460 00:21:53,980 --> 00:21:56,930 are Left Rotate and Right Rotate. 461 00:21:56,930 --> 00:21:59,800 Now, I'm going to show you what a rotation is supposed 462 00:21:59,800 --> 00:22:16,950 to do in-- and the children of these nodes are a, b, c, d. 463 00:22:16,950 --> 00:22:23,860 Also, this node is hanging off of something here, e. 464 00:22:23,860 --> 00:22:26,630 If I want to do a Right Rotate here, 465 00:22:26,630 --> 00:22:33,360 so if I want to rotate the tree like this, then after rotating, 466 00:22:33,360 --> 00:22:34,806 it's going to look like this. 467 00:22:34,806 --> 00:22:51,860 468 00:22:51,860 --> 00:22:57,240 So notice that c got moved from B to A, but it got 469 00:22:57,240 --> 00:23:00,320 moved in such a way that the whole thing is still a BST. 470 00:23:00,320 --> 00:23:03,050 These guys show up in the same order as children, 471 00:23:03,050 --> 00:23:05,700 and these guys show up in the same order, 472 00:23:05,700 --> 00:23:08,404 so search will still work. 473 00:23:08,404 --> 00:23:10,570 This is how it's supposed to look like as a picture. 474 00:23:10,570 --> 00:23:13,830 Let's try to write the pseudocode for achieving this. 475 00:23:13,830 --> 00:23:17,450 476 00:23:17,450 --> 00:23:21,160 AUDIENCE: First, you identify the parent node. 477 00:23:21,160 --> 00:23:23,935 Step one. 478 00:23:23,935 --> 00:23:26,320 PROFESSOR: Well, let's say we have it. 479 00:23:26,320 --> 00:23:29,790 AUDIENCE: I mean from B. You say, A is my parent. 480 00:23:29,790 --> 00:23:32,477 That's now going to become my right child. 481 00:23:32,477 --> 00:23:33,060 PROFESSOR: OK. 482 00:23:33,060 --> 00:23:38,060 483 00:23:38,060 --> 00:23:40,740 So what do you want to change? 484 00:23:40,740 --> 00:23:44,400 By the way, this whole thing has to happen in constant time, 485 00:23:44,400 --> 00:23:47,150 so we're not allowed to go anywhere inside here, here, 486 00:23:47,150 --> 00:23:48,370 here, or here. 487 00:23:48,370 --> 00:23:49,870 We're allowed to change these links, 488 00:23:49,870 --> 00:23:51,510 but if you go inside and try to do 489 00:23:51,510 --> 00:23:53,500 more complicated restructuring, that's 490 00:23:53,500 --> 00:23:55,297 going to block the running time. 491 00:23:55,297 --> 00:23:56,880 We're only allowed to change the links 492 00:23:56,880 --> 00:23:58,570 that you see on the board. 493 00:23:58,570 --> 00:24:04,150 AUDIENCE: I know in the BST, not BST node but BST, the delete, 494 00:24:04,150 --> 00:24:08,050 if you're deleting the root, they make up the pseudoroot. 495 00:24:08,050 --> 00:24:10,440 PROFESSOR: Let's not worry about it. 496 00:24:10,440 --> 00:24:12,970 So Delete already did that magic for you, right? 497 00:24:12,970 --> 00:24:16,221 So the pseudoroot would be here, so this node has a root. 498 00:24:16,221 --> 00:24:16,720 We're happy. 499 00:24:16,720 --> 00:24:18,553 AUDIENCE: That's not what I'm talking about. 500 00:24:18,553 --> 00:24:21,720 I'm saying that you could do the same with B because you're 501 00:24:21,720 --> 00:24:25,100 going to have to break a link with lowercase c onto B 502 00:24:25,100 --> 00:24:28,520 in order to flip it. 503 00:24:28,520 --> 00:24:32,510 So I'm saying you could have a placer, some sort of place 504 00:24:32,510 --> 00:24:35,607 to put it so you don't lose it. 505 00:24:35,607 --> 00:24:36,190 PROFESSOR: OK. 506 00:24:36,190 --> 00:24:40,330 Let's see if we do that. 507 00:24:40,330 --> 00:24:44,650 AUDIENCE: The new right child of B will be A, 508 00:24:44,650 --> 00:24:50,840 and the new left child of A will be C, and the new parent of c 509 00:24:50,840 --> 00:24:51,750 is A. 510 00:24:51,750 --> 00:24:54,145 PROFESSOR: OK, let's move slower. 511 00:24:54,145 --> 00:24:55,660 I have to write them down. 512 00:24:55,660 --> 00:24:59,131 So the new right child of B is? 513 00:24:59,131 --> 00:25:03,690 AUDIENCE: Is A. Also, you probably 514 00:25:03,690 --> 00:25:05,601 should do temp variables to store them. 515 00:25:05,601 --> 00:25:07,100 AUDIENCE: I don't think you have to. 516 00:25:07,100 --> 00:25:12,432 You can just swap the one connection and swap the other. 517 00:25:12,432 --> 00:25:15,064 518 00:25:15,064 --> 00:25:17,230 PROFESSOR: Let's do it without worrying about temps, 519 00:25:17,230 --> 00:25:19,150 and then we can figure out temps later. 520 00:25:19,150 --> 00:25:26,000 So B's right becomes A. So it's going to be like this. 521 00:25:26,000 --> 00:25:30,674 522 00:25:30,674 --> 00:25:32,090 Let me erase this confusing error. 523 00:25:32,090 --> 00:25:36,560 524 00:25:36,560 --> 00:25:38,785 AUDIENCE: And A's left child is c. 525 00:25:38,785 --> 00:25:45,420 526 00:25:45,420 --> 00:25:52,250 PROFESSOR: A's left child is c. 527 00:25:52,250 --> 00:25:54,150 OK. 528 00:25:54,150 --> 00:25:56,790 AUDIENCE: C's parent is A. 529 00:25:56,790 --> 00:26:02,980 PROFESSOR: C's parent is A. OK, very good. 530 00:26:02,980 --> 00:26:05,104 So I changed the child, then I changed the parent 531 00:26:05,104 --> 00:26:06,270 so that they would match up. 532 00:26:06,270 --> 00:26:10,160 533 00:26:10,160 --> 00:26:13,285 So if B's right is A, then what should I do with-- 534 00:26:13,285 --> 00:26:19,264 AUDIENCE: A's parent is B. 535 00:26:19,264 --> 00:26:20,680 PROFESSOR: Always do them in pairs 536 00:26:20,680 --> 00:26:23,790 so you don't lose track of them. 537 00:26:23,790 --> 00:26:24,516 And? 538 00:26:24,516 --> 00:26:29,592 AUDIENCE: B's parent is e and e's subchild is B. 539 00:26:29,592 --> 00:26:30,300 PROFESSOR: Sorry. 540 00:26:30,300 --> 00:26:32,606 B's parent is e, right? 541 00:26:32,606 --> 00:26:33,230 AUDIENCE: Yeah. 542 00:26:33,230 --> 00:26:36,492 543 00:26:36,492 --> 00:26:40,170 PROFESSOR: So B's parent is e, and-- 544 00:26:40,170 --> 00:26:42,090 AUDIENCE: e's left. 545 00:26:42,090 --> 00:26:43,470 PROFESSOR: Child is-- 546 00:26:43,470 --> 00:26:44,390 AUDIENCE: B. 547 00:26:44,390 --> 00:26:46,796 PROFESSOR: Because I drew it like this, right? 548 00:26:46,796 --> 00:26:49,940 But if I drew it like this? 549 00:26:49,940 --> 00:26:52,150 AUDIENCE: e's-- 550 00:26:52,150 --> 00:26:54,330 PROFESSOR: It would have to be the right child, 551 00:26:54,330 --> 00:26:57,931 so we have to look at both cases. 552 00:26:57,931 --> 00:26:59,879 AUDIENCE: If B is greater than e, 553 00:26:59,879 --> 00:27:04,749 then it would be right child, and if B is less than e, 554 00:27:04,749 --> 00:27:05,977 it would be left child. 555 00:27:05,977 --> 00:27:06,560 PROFESSOR: OK. 556 00:27:06,560 --> 00:27:18,540 So if B is greater than e, then e's right is B. 557 00:27:18,540 --> 00:27:23,680 Otherwise, e's left is B, right? 558 00:27:23,680 --> 00:27:25,540 Now, suppose comparisons are expensive 559 00:27:25,540 --> 00:27:27,874 and they don't want to do a comparison to find this out. 560 00:27:27,874 --> 00:27:29,415 I want to play with the tree instead. 561 00:27:29,415 --> 00:27:29,990 What do I do? 562 00:27:29,990 --> 00:27:37,372 563 00:27:37,372 --> 00:27:41,880 AUDIENCE: Can you see which one A was before that? 564 00:27:41,880 --> 00:27:43,670 PROFESSOR: Yep. 565 00:27:43,670 --> 00:27:45,885 AUDIENCE: If A used to be the right child, 566 00:27:45,885 --> 00:27:47,100 now B is the right child. 567 00:27:47,100 --> 00:27:47,766 PROFESSOR: Yeah. 568 00:27:47,766 --> 00:27:51,150 I haven't changed the child yet, so I can do that. 569 00:27:51,150 --> 00:27:54,355 AUDIENCE: That's still a comparison though, right? 570 00:27:54,355 --> 00:27:56,730 PROFESSOR: But now I'm doing a pointer comparison and not 571 00:27:56,730 --> 00:27:59,510 a key comparison. 572 00:27:59,510 --> 00:28:02,270 AUDIENCE: If e.r is A, then-- 573 00:28:02,270 --> 00:28:08,270 PROFESSOR: If e.right is A, then it becomes B. Otherwise. 574 00:28:08,270 --> 00:28:11,060 575 00:28:11,060 --> 00:28:13,707 OK, this looks good. 576 00:28:13,707 --> 00:28:15,290 So there's the issue of temp variables 577 00:28:15,290 --> 00:28:16,969 and assigning these in the right order 578 00:28:16,969 --> 00:28:19,010 so you don't have too many temp variables and too 579 00:28:19,010 --> 00:28:22,310 many lines of code, and the Python code in the handout 580 00:28:22,310 --> 00:28:23,650 takes care of that. 581 00:28:23,650 --> 00:28:25,380 But this is the logic that you want. 582 00:28:25,380 --> 00:28:28,890 So if you have to write it from scratch, 583 00:28:28,890 --> 00:28:30,270 you don't have to memorize that. 584 00:28:30,270 --> 00:28:32,760 Remember that you want to get from here to here, 585 00:28:32,760 --> 00:28:37,210 and do exactly the thought process that we have here. 586 00:28:37,210 --> 00:28:42,060 What if I want to do a left rotation instead 587 00:28:42,060 --> 00:28:45,450 of a right rotation? 588 00:28:45,450 --> 00:28:49,136 AUDIENCE: You just have to change the r's to l's. 589 00:28:49,136 --> 00:28:50,010 PROFESSOR: Very good. 590 00:28:50,010 --> 00:28:51,090 Copy, paste. 591 00:28:51,090 --> 00:28:52,770 Swap l's and r's and we're done. 592 00:28:52,770 --> 00:28:57,060 593 00:28:57,060 --> 00:28:58,120 Why do we need rotations? 594 00:28:58,120 --> 00:29:02,902 595 00:29:02,902 --> 00:29:04,110 AUDIENCE: To rebalance stuff. 596 00:29:04,110 --> 00:29:05,200 PROFESSOR: To rebalance stuff. 597 00:29:05,200 --> 00:29:05,700 OK. 598 00:29:05,700 --> 00:29:08,069 Why do we rebalance stuff? 599 00:29:08,069 --> 00:29:10,001 AUDIENCE: Because you don't want your code 600 00:29:10,001 --> 00:29:12,635 to crash when you add nodes that are sequentially larger, 601 00:29:12,635 --> 00:29:16,042 and then you try to find something, and then it crashes. 602 00:29:16,042 --> 00:29:16,625 PROFESSOR: OK. 603 00:29:16,625 --> 00:29:17,742 Why would it crash? 604 00:29:17,742 --> 00:29:19,283 AUDIENCE: Because the recursion depth 605 00:29:19,283 --> 00:29:21,602 is exceeded because you're going down 606 00:29:21,602 --> 00:29:23,040 the line trying to find something, 607 00:29:23,040 --> 00:29:24,517 and you go down too far. 608 00:29:24,517 --> 00:29:25,100 PROFESSOR: OK. 609 00:29:25,100 --> 00:29:27,650 So pretending we don't have a recursion depth issue, 610 00:29:27,650 --> 00:29:30,670 then it's going to be slow. 611 00:29:30,670 --> 00:29:33,990 So you start from a nice AVL, and if you don't rebalance, 612 00:29:33,990 --> 00:29:37,430 you get to a BST that's slow, slow, slow, 613 00:29:37,430 --> 00:29:40,240 and then you'll fail our tests and still get a 0. 614 00:29:40,240 --> 00:29:40,740 Yes? 615 00:29:40,740 --> 00:29:47,560 AUDIENCE: But if you just had a carrot-like tree, 616 00:29:47,560 --> 00:29:50,470 or if you added, for instance, 4, and then you added in 5, 617 00:29:50,470 --> 00:29:52,960 and then you added in 3, and then you added in 6, 618 00:29:52,960 --> 00:29:57,300 and then you added in 2, you'd just get a carrot. 619 00:29:57,300 --> 00:30:02,930 So then I feel like AVL wouldn't cover that case. 620 00:30:02,930 --> 00:30:04,970 PROFESSOR: Let's do them in sequence. 621 00:30:04,970 --> 00:30:09,500 622 00:30:09,500 --> 00:30:12,047 So what are we inserting? 623 00:30:12,047 --> 00:30:12,630 So you said 4. 624 00:30:12,630 --> 00:30:15,450 AUDIENCE: So you start with 4, and then you insert 5, 625 00:30:15,450 --> 00:30:18,760 then you insert 3. 626 00:30:18,760 --> 00:30:19,700 PROFESSOR: Let's see. 627 00:30:19,700 --> 00:30:21,402 4, 3. 628 00:30:21,402 --> 00:30:28,530 AUDIENCE: Then you insert 6, then you insert 2, 629 00:30:28,530 --> 00:30:34,150 then you insert 7, and then 1. 630 00:30:34,150 --> 00:30:35,650 AUDIENCE: You've got to rotate that. 631 00:30:35,650 --> 00:30:36,970 AUDIENCE: Well, the thing is-- 632 00:30:36,970 --> 00:30:38,515 PROFESSOR: Well, is this an AVL? 633 00:30:38,515 --> 00:30:40,032 PROFESSOR: Well, it's balanced. 634 00:30:40,032 --> 00:30:40,740 PROFESSOR: Is it? 635 00:30:40,740 --> 00:30:42,454 AUDIENCE: No, it's not. 636 00:30:42,454 --> 00:30:43,390 5 is unhappy. 637 00:30:43,390 --> 00:30:46,200 638 00:30:46,200 --> 00:30:49,140 AUDIENCE: That height of the tree is only one 639 00:30:49,140 --> 00:30:51,270 greater than the other height. 640 00:30:51,270 --> 00:30:53,015 PROFESSOR: So the height here is 1. 641 00:30:53,015 --> 00:30:56,442 What's the height here? 642 00:30:56,442 --> 00:30:58,400 There's nothing here, so the height is minus 1. 643 00:30:58,400 --> 00:31:02,200 644 00:31:02,200 --> 00:31:06,060 AUDIENCE: I mean, but the other side of the tree. 645 00:31:06,060 --> 00:31:08,820 PROFESSOR: So if you're looking at this guy, 646 00:31:08,820 --> 00:31:11,510 things look balanced, but in an AVL, 647 00:31:11,510 --> 00:31:14,572 this has to hold for every node. 648 00:31:14,572 --> 00:31:16,780 If there's one node where the heights are unbalanced, 649 00:31:16,780 --> 00:31:18,434 the whole thing is unbalanced. 650 00:31:18,434 --> 00:31:20,350 Otherwise, the construction that we did before 651 00:31:20,350 --> 00:31:22,070 wouldn't make sense. 652 00:31:22,070 --> 00:31:24,320 AUDIENCE: It's all [INAUDIBLE] within the [INAUDIBLE]. 653 00:31:24,320 --> 00:31:25,220 PROFESSOR: Yep. 654 00:31:25,220 --> 00:31:27,820 So now that we're going to look at rebalancing, which 655 00:31:27,820 --> 00:31:29,350 is the magic behind AVLs, we have 656 00:31:29,350 --> 00:31:32,920 to make sure that after we start with something that's 657 00:31:32,920 --> 00:31:35,329 slightly imbalanced, when we rotate things around, 658 00:31:35,329 --> 00:31:37,412 we have to make sure that everything gets balanced 659 00:31:37,412 --> 00:31:42,320 at the end and happy, so that's a good thing to keep in mind. 660 00:31:42,320 --> 00:31:42,820 Good. 661 00:31:42,820 --> 00:31:43,730 Any other questions? 662 00:31:43,730 --> 00:31:47,880 663 00:31:47,880 --> 00:31:50,254 AUDIENCE: Is it obvious that you can 664 00:31:50,254 --> 00:31:53,040 put any number of nodes in an AVL tree? 665 00:31:53,040 --> 00:31:54,700 PROFESSOR: Point any number of nodes? 666 00:31:54,700 --> 00:31:56,360 AUDIENCE: Yeah. 667 00:31:56,360 --> 00:31:59,513 It's obvious that you can do one node, or two nodes or three 668 00:31:59,513 --> 00:32:05,020 nodes, but is it true that for any number of nodes, 669 00:32:05,020 --> 00:32:07,030 you can arrange into an AVL tree? 670 00:32:07,030 --> 00:32:08,644 Does that make sense? 671 00:32:08,644 --> 00:32:09,310 PROFESSOR: Yeah. 672 00:32:09,310 --> 00:32:13,030 So if you want to have any number of nodes, 673 00:32:13,030 --> 00:32:16,142 you call Insert, AVL Insert, and then you'll get an AVL. 674 00:32:16,142 --> 00:32:16,808 AUDIENCE: Right. 675 00:32:16,808 --> 00:32:19,100 But what I'm trying to say is, is it 676 00:32:19,100 --> 00:32:23,230 possible to construct an AVL tree out of 13 nodes? 677 00:32:23,230 --> 00:32:24,230 PROFESSOR: Sure. 678 00:32:24,230 --> 00:32:26,158 AUDIENCE: Or 17 nodes? 679 00:32:26,158 --> 00:32:28,616 AUDIENCE: Is there a limit for where that property will not 680 00:32:28,616 --> 00:32:29,115 fit? 681 00:32:29,115 --> 00:32:29,830 AUDIENCE: Right. 682 00:32:29,830 --> 00:32:31,587 Is there some set of nodes-- 683 00:32:31,587 --> 00:32:32,920 PROFESSOR: I like this question. 684 00:32:32,920 --> 00:32:34,160 I like this question. 685 00:32:34,160 --> 00:32:39,720 So what would be the perfect binary search tree? 686 00:32:39,720 --> 00:32:42,684 687 00:32:42,684 --> 00:32:44,999 AUDIENCE: An element of log N height. 688 00:32:44,999 --> 00:32:46,540 PROFESSOR: An element of log N height 689 00:32:46,540 --> 00:32:54,200 and the complete tree, so something that looks like this. 690 00:32:54,200 --> 00:33:00,640 691 00:33:00,640 --> 00:33:04,665 Where did we see this thing before? 692 00:33:04,665 --> 00:33:05,595 AUDIENCE: In the heap. 693 00:33:05,595 --> 00:33:06,470 PROFESSOR: All right. 694 00:33:06,470 --> 00:33:08,650 So a heap looks exactly like this, 695 00:33:08,650 --> 00:33:13,620 except the values inside don't fulfill the BST requirement. 696 00:33:13,620 --> 00:33:15,675 Otherwise, we'd know how to build perfect BSTs. 697 00:33:15,675 --> 00:33:17,780 It so happens that we don't. 698 00:33:17,780 --> 00:33:19,350 But is this an AVL? 699 00:33:19,350 --> 00:33:25,895 700 00:33:25,895 --> 00:33:26,478 AUDIENCE: Yes. 701 00:33:26,478 --> 00:33:31,835 702 00:33:31,835 --> 00:33:34,082 AUDIENCE: That example or in general? 703 00:33:34,082 --> 00:33:35,790 PROFESSOR: Let's start with that example. 704 00:33:35,790 --> 00:33:36,415 Is this an AVL? 705 00:33:36,415 --> 00:33:41,290 706 00:33:41,290 --> 00:33:42,360 AUDIENCE: No. 707 00:33:42,360 --> 00:33:45,600 The node just to the left of the root-- 708 00:33:45,600 --> 00:33:47,250 PROFESSOR: This guy? 709 00:33:47,250 --> 00:33:49,130 AUDIENCE: No. 710 00:33:49,130 --> 00:33:52,032 That one's of height two, so it has height one. 711 00:33:52,032 --> 00:33:53,240 PROFESSOR: This is beautiful. 712 00:33:53,240 --> 00:33:54,890 This is as good as it could get. 713 00:33:54,890 --> 00:33:56,740 This is an optimal binary search tree. 714 00:33:56,740 --> 00:33:59,190 This is perfect. 715 00:33:59,190 --> 00:34:01,100 It better match the definition of AVL, 716 00:34:01,100 --> 00:34:04,270 because otherwise, it would mean AVL doesn't like perfect trees. 717 00:34:04,270 --> 00:34:06,830 So the good news is that any complete tree 718 00:34:06,830 --> 00:34:10,159 is going to be an AVL because everything 719 00:34:10,159 --> 00:34:14,989 is perfectly balanced or almost perfectly balanced. 720 00:34:14,989 --> 00:34:18,690 The only thing that's not complete is the last level. 721 00:34:18,690 --> 00:34:21,790 So all the paths from the roots to the last level 722 00:34:21,790 --> 00:34:25,810 are either height log N or height log N minus 1. 723 00:34:25,810 --> 00:34:28,810 So wherever you do the height comparison, 724 00:34:28,810 --> 00:34:31,219 you're going to get a difference of 1, 725 00:34:31,219 --> 00:34:35,739 and we can keep adding nodes to this. 726 00:34:35,739 --> 00:34:39,179 This is how I'd build a BST of as many nodes as we want. 727 00:34:39,179 --> 00:34:41,489 AUDIENCE: I have a question. 728 00:34:41,489 --> 00:34:46,600 For the fixed heights, why we try to minimize the nodes? 729 00:34:46,600 --> 00:34:51,079 Wouldn't a better way to build a more efficient tree 730 00:34:51,079 --> 00:34:55,120 is to try to minimize the height? 731 00:34:55,120 --> 00:35:00,205 So instead of adding nodes up, just fill in the tree? 732 00:35:00,205 --> 00:35:01,080 PROFESSOR: Very good. 733 00:35:01,080 --> 00:35:04,080 So you're thinking of how to build an efficient tree, how 734 00:35:04,080 --> 00:35:05,870 to build a good tree. 735 00:35:05,870 --> 00:35:08,760 Here, I'm trying to prove the property 736 00:35:08,760 --> 00:35:11,480 about the maximum height of a tree. 737 00:35:11,480 --> 00:35:13,520 So here I'm playing devil's advocate. 738 00:35:13,520 --> 00:35:15,980 I'm thinking, what is the worst case for AVLs? 739 00:35:15,980 --> 00:35:18,720 How do I make AVLs look as bad as possible? 740 00:35:18,720 --> 00:35:20,840 That's why I started this way. 741 00:35:20,840 --> 00:35:22,641 So usually, we're the good guy coding, 742 00:35:22,641 --> 00:35:24,140 but after you're done coding and you 743 00:35:24,140 --> 00:35:27,050 know how your algorithm runs, if you want to have peace of mind 744 00:35:27,050 --> 00:35:30,200 and go to sleep well and get full points afterwards, 745 00:35:30,200 --> 00:35:32,300 it sometimes helps to think as devil's advocate. 746 00:35:32,300 --> 00:35:34,200 How would I break this algorithm? 747 00:35:34,200 --> 00:35:36,410 What's the worst thing that I could do to it? 748 00:35:36,410 --> 00:35:38,680 And if you can find something that breaks it, 749 00:35:38,680 --> 00:35:41,560 well, you probably know how to fix it. 750 00:35:41,560 --> 00:35:43,890 If not, you can sleep well at night 751 00:35:43,890 --> 00:35:45,776 and know that your algorithm is sound. 752 00:35:45,776 --> 00:35:49,350 753 00:35:49,350 --> 00:35:55,216 AUDIENCE: In that example on the board, where we said 754 00:35:55,216 --> 00:36:00,250 the height of the tree on the left side was h minus 1, 755 00:36:00,250 --> 00:36:03,220 and then the other side had the h minus 2. 756 00:36:03,220 --> 00:36:05,990 757 00:36:05,990 --> 00:36:11,920 For that h minus 2 tree, every node in that 758 00:36:11,920 --> 00:36:16,107 has to be h minus 2 height, every leaf. 759 00:36:16,107 --> 00:36:17,190 Is that what we're saying? 760 00:36:17,190 --> 00:36:19,810 Otherwise, it falls apart because there would be one 761 00:36:19,810 --> 00:36:20,310 that's-- 762 00:36:20,310 --> 00:36:22,518 PROFESSOR: So we're saying that the root of this tree 763 00:36:22,518 --> 00:36:24,900 has to be h minus 2. 764 00:36:24,900 --> 00:36:27,970 And the way we do this is we copy this guy over, 765 00:36:27,970 --> 00:36:30,740 so this guy's lopsided, too. 766 00:36:30,740 --> 00:36:34,105 So it's not every path to the bottom, just one path. 767 00:36:34,105 --> 00:36:34,646 AUDIENCE: OK. 768 00:36:34,646 --> 00:36:37,001 So it could be an incomplete. 769 00:36:37,001 --> 00:36:37,960 PROFESSOR: Yep. 770 00:36:37,960 --> 00:36:43,020 In that case, is that balanced? 771 00:36:43,020 --> 00:36:48,310 PROFESSOR: So this was an AVL of height 2, right? 772 00:36:48,310 --> 00:36:51,530 If I stick it here, it's still going to be an AVL of height 2. 773 00:36:51,530 --> 00:36:53,432 AUDIENCE: Right. 774 00:36:53,432 --> 00:36:55,140 PROFESSOR: This thing will have height 2. 775 00:36:55,140 --> 00:36:58,518 This thing will have height 3. 776 00:36:58,518 --> 00:37:00,446 AUDIENCE: OK. 777 00:37:00,446 --> 00:37:02,743 So it's not everything on the same level, 778 00:37:02,743 --> 00:37:04,784 necessarily, that needs to be of the same height. 779 00:37:04,784 --> 00:37:06,230 It's only the children. 780 00:37:06,230 --> 00:37:07,210 PROFESSOR: Yep. 781 00:37:07,210 --> 00:37:10,340 So the two children, so two nodes on the same level, 782 00:37:10,340 --> 00:37:14,137 might have their heights differ by one, but not more than one. 783 00:37:14,137 --> 00:37:15,720 If they could differ by more than one, 784 00:37:15,720 --> 00:37:17,480 you could have a link list. 785 00:37:17,480 --> 00:37:20,740 Because they differ by one, then it's sort of sane. 786 00:37:20,740 --> 00:37:24,190 It's almost balanced. 787 00:37:24,190 --> 00:37:25,110 Good. 788 00:37:25,110 --> 00:37:26,130 I like the questions. 789 00:37:26,130 --> 00:37:27,820 It means you guys are thinking. 790 00:37:27,820 --> 00:37:28,903 I really like it. 791 00:37:28,903 --> 00:37:30,045 Yeah? 792 00:37:30,045 --> 00:37:31,550 AUDIENCE: If they differ by one. 793 00:37:31,550 --> 00:37:34,280 So you have an ideal tree up there, 794 00:37:34,280 --> 00:37:36,200 and then you have the worst case possible. 795 00:37:36,200 --> 00:37:39,080 That wouldn't affect performance at all, right, 796 00:37:39,080 --> 00:37:40,520 if you have the worst case? 797 00:37:40,520 --> 00:37:42,600 PROFESSOR: So what we know what performance is we 798 00:37:42,600 --> 00:37:45,090 have this guarantee. 799 00:37:45,090 --> 00:37:48,640 It's at worst a constant times log n. 800 00:37:48,640 --> 00:37:50,240 This is constant times log n. 801 00:37:50,240 --> 00:37:52,530 The constant happens to be 1. 802 00:37:52,530 --> 00:37:54,890 This is a constant times log n. 803 00:37:54,890 --> 00:37:58,515 The constant happens to be something bigger than 1. 804 00:37:58,515 --> 00:38:00,692 I think it's somewhere between 2 and 3. 805 00:38:00,692 --> 00:38:02,952 AUDIENCE: That just varies by the constant. 806 00:38:02,952 --> 00:38:03,856 PROFESSOR: Yeah. 807 00:38:03,856 --> 00:38:06,525 And since we only care about asymptotics in this class, 808 00:38:06,525 --> 00:38:08,020 we're happy. 809 00:38:08,020 --> 00:38:10,760 But we don't know how to build this. 810 00:38:10,760 --> 00:38:13,210 This is just something pretty that we draw on a board, 811 00:38:13,210 --> 00:38:15,940 but we don't have an algorithm that efficiency builds 812 00:38:15,940 --> 00:38:21,380 this out of a random series of insertions and deletions. 813 00:38:21,380 --> 00:38:23,740 I hope we don't. 814 00:38:23,740 --> 00:38:26,090 Otherwise, I look bad. 815 00:38:26,090 --> 00:38:27,360 What do I want to delete? 816 00:38:27,360 --> 00:38:32,220 817 00:38:32,220 --> 00:38:34,164 AUDIENCE: Instead of a binary search 818 00:38:34,164 --> 00:38:35,870 tree as the base of the AVL, if you 819 00:38:35,870 --> 00:38:39,785 wanted to have four children per node, 820 00:38:39,785 --> 00:38:42,470 would this change that much except you'd 821 00:38:42,470 --> 00:38:46,606 just have twice as many variables? 822 00:38:46,606 --> 00:38:48,230 PROFESSOR: That's a good exam question. 823 00:38:48,230 --> 00:38:53,510 824 00:38:53,510 --> 00:38:56,270 So it turns out that there's this tree called 825 00:38:56,270 --> 00:39:01,490 a B tree, which has 1,000 or more nodes. 826 00:39:01,490 --> 00:39:02,486 AUDIENCE: B tree? 827 00:39:02,486 --> 00:39:03,897 Doesn't that stand for binary? 828 00:39:03,897 --> 00:39:04,480 PROFESSOR: No. 829 00:39:04,480 --> 00:39:10,972 It's B. Just B. That's used for databases on disk. 830 00:39:10,972 --> 00:39:13,430 So there, you want to make the height as small as possible, 831 00:39:13,430 --> 00:39:15,763 because every time you get a node, you do a disk access, 832 00:39:15,763 --> 00:39:17,140 and that's expensive. 833 00:39:17,140 --> 00:39:18,970 But once you do a disk access, the nodes 834 00:39:18,970 --> 00:39:22,430 can be as big as you want. 835 00:39:22,430 --> 00:39:25,730 You can read a few bytes off the disk at roughly the same cost 836 00:39:25,730 --> 00:39:27,870 as you can read a megabyte, so you're better off 837 00:39:27,870 --> 00:39:28,859 reading a megabyte. 838 00:39:28,859 --> 00:39:31,150 I might be exaggerating a bit, but for a few kilobytes, 839 00:39:31,150 --> 00:39:32,090 that's true. 840 00:39:32,090 --> 00:39:34,560 So B trees have thousands of children, 841 00:39:34,560 --> 00:39:35,880 and they keep them sorted. 842 00:39:35,880 --> 00:39:39,630 It's sort of like that, but the fan out is huge. 843 00:39:39,630 --> 00:39:41,570 And it turns out everything is still log n 844 00:39:41,570 --> 00:39:44,256 as long as the number of children you have is constant. 845 00:39:44,256 --> 00:39:46,434 AUDIENCE: But for this rotation thing, though. 846 00:39:46,434 --> 00:39:47,350 PROFESSOR: Oh god, no. 847 00:39:47,350 --> 00:39:48,820 We don't have to rotate. 848 00:39:48,820 --> 00:39:51,640 It gets a lot more complicated. 849 00:39:51,640 --> 00:39:54,475 We haven't gotten to rebalancing yet, right? 850 00:39:54,475 --> 00:39:56,350 Wait until you see how rebalancing looks like 851 00:39:56,350 --> 00:39:57,920 with two children, and then imagine 852 00:39:57,920 --> 00:40:01,920 trying to figure out the cases for 1,000 children. 853 00:40:01,920 --> 00:40:03,560 That's not sane. 854 00:40:03,560 --> 00:40:05,450 In B trees, they use something else 855 00:40:05,450 --> 00:40:07,800 to make them balance the right way. 856 00:40:07,800 --> 00:40:11,710 And of course, they're harder than this. 857 00:40:11,710 --> 00:40:15,910 So let me try to get through rebalancing somewhat quickly. 858 00:40:15,910 --> 00:40:25,610 859 00:40:25,610 --> 00:40:29,080 This is where I forget what I need to do. 860 00:40:29,080 --> 00:40:29,580 Rebalancing. 861 00:40:29,580 --> 00:41:12,980 862 00:41:12,980 --> 00:41:27,450 So suppose we call Rebalance on this guy, 863 00:41:27,450 --> 00:41:30,930 and we know that the nodes here have heights k minus 1, 864 00:41:30,930 --> 00:41:33,600 this is k minus 1 or k, and this is k, 865 00:41:33,600 --> 00:41:36,730 and I want to call Rebalance here. 866 00:41:36,730 --> 00:41:39,040 Let's first figure out if it's an AVL tree 867 00:41:39,040 --> 00:41:41,030 or if there's something wrong with it. 868 00:41:41,030 --> 00:41:44,675 What's the height here? 869 00:41:44,675 --> 00:41:47,221 AUDIENCE: k plus 1. 870 00:41:47,221 --> 00:41:49,220 PROFESSOR: So no matter what the height is here, 871 00:41:49,220 --> 00:41:51,400 the height here has to be k plus 1. 872 00:41:51,400 --> 00:41:53,736 What's the height here? 873 00:41:53,736 --> 00:41:56,570 AUDIENCE: k plus 2. 874 00:41:56,570 --> 00:41:57,610 PROFESSOR: OK. 875 00:41:57,610 --> 00:41:59,780 What else can you tell me about this node? 876 00:41:59,780 --> 00:42:05,160 877 00:42:05,160 --> 00:42:06,980 If I call Check RI on it, it's going 878 00:42:06,980 --> 00:42:10,440 to crash because the rep invariant for AVLs 879 00:42:10,440 --> 00:42:11,970 does not hold here. 880 00:42:11,970 --> 00:42:13,900 This child at the top of this tree 881 00:42:13,900 --> 00:42:15,430 will have height k minus 1. 882 00:42:15,430 --> 00:42:17,250 This will have height k plus 1. 883 00:42:17,250 --> 00:42:18,980 The difference is more than 1. 884 00:42:18,980 --> 00:42:21,650 Not good. 885 00:42:21,650 --> 00:42:22,544 So this is unhappy. 886 00:42:22,544 --> 00:42:23,710 How do we fix the situation? 887 00:42:23,710 --> 00:42:26,380 888 00:42:26,380 --> 00:42:29,200 AUDIENCE: Rotate left around k plus 1. 889 00:42:29,200 --> 00:42:30,255 PROFESSOR: All right. 890 00:42:30,255 --> 00:42:33,520 So it better be a rotation, because we spent so much 891 00:42:33,520 --> 00:42:36,016 figuring out how to rotate. 892 00:42:36,016 --> 00:42:37,890 Let's see what happens if we rotate this way, 893 00:42:37,890 --> 00:42:39,370 and the way to keep track of this 894 00:42:39,370 --> 00:42:41,556 is I'm going to label my nodes. 895 00:42:41,556 --> 00:42:45,010 896 00:42:45,010 --> 00:42:53,090 B, A, left child of A has height k minus 1. 897 00:42:53,090 --> 00:42:55,316 This guy is k minus 1. 898 00:42:55,316 --> 00:43:00,760 899 00:43:00,760 --> 00:43:04,480 So what's the height of A now? 900 00:43:04,480 --> 00:43:06,432 AUDIENCE: k. 901 00:43:06,432 --> 00:43:07,896 Oh wait, that's not a node. 902 00:43:07,896 --> 00:43:10,840 Just k minus 1 or k. 903 00:43:10,840 --> 00:43:13,420 PROFESSOR: So this is a tree of height k minus 1. 904 00:43:13,420 --> 00:43:15,310 This is a height of tree k minus 1 or k. 905 00:43:15,310 --> 00:43:18,640 The height of this guy is? 906 00:43:18,640 --> 00:43:19,140 AUDIENCE: k. 907 00:43:19,140 --> 00:43:22,486 908 00:43:22,486 --> 00:43:23,920 AUDIENCE: k plus 1. 909 00:43:23,920 --> 00:43:26,520 PROFESSOR: Or k plus 1. 910 00:43:26,520 --> 00:43:28,300 The height of this guy is definitely k. 911 00:43:28,300 --> 00:43:30,665 The height of this guys is k or k plus 1. 912 00:43:30,665 --> 00:43:31,290 Is that an AVL? 913 00:43:31,290 --> 00:43:32,510 Is it happy now? 914 00:43:32,510 --> 00:43:37,100 915 00:43:37,100 --> 00:43:40,430 So this is the easy kind of rebalancing, one rotation 916 00:43:40,430 --> 00:43:41,418 and you're done. 917 00:43:41,418 --> 00:43:43,334 AUDIENCE: [INAUDIBLE] the difference between k 918 00:43:43,334 --> 00:43:44,749 minus 1 and k minus 1 over k? 919 00:43:44,749 --> 00:43:46,290 PROFESSOR: It's not k minus 1 over k. 920 00:43:46,290 --> 00:43:47,820 It's either k minus 1 or k. 921 00:43:47,820 --> 00:43:50,610 922 00:43:50,610 --> 00:43:51,186 Bad notation. 923 00:43:51,186 --> 00:43:54,050 924 00:43:54,050 --> 00:43:55,310 Thank you for the question. 925 00:43:55,310 --> 00:43:57,220 You saved everyone from confusion. 926 00:43:57,220 --> 00:44:00,028 927 00:44:00,028 --> 00:44:02,370 No, there's no fractional heights. 928 00:44:02,370 --> 00:44:03,390 Anything else? 929 00:44:03,390 --> 00:44:03,890 Thank you. 930 00:44:03,890 --> 00:44:04,490 That was good. 931 00:44:04,490 --> 00:44:09,461 932 00:44:09,461 --> 00:44:09,960 All right. 933 00:44:09,960 --> 00:44:11,030 So this is easy, right? 934 00:44:11,030 --> 00:44:12,270 This is the easy rotation. 935 00:44:12,270 --> 00:44:13,560 Let's do the hard one now. 936 00:44:13,560 --> 00:44:37,670 937 00:44:37,670 --> 00:44:40,664 What's the height here? 938 00:44:40,664 --> 00:44:42,152 AUDIENCE: k plus 1. 939 00:44:42,152 --> 00:44:47,112 940 00:44:47,112 --> 00:44:50,584 PROFESSOR: What's the height here? 941 00:44:50,584 --> 00:44:53,580 AUDIENCE: Plus 2. 942 00:44:53,580 --> 00:44:56,550 PROFESSOR: AVL, not AVL? 943 00:44:56,550 --> 00:44:58,310 AUDIENCE: No. 944 00:44:58,310 --> 00:45:00,810 PROFESSOR: k minus 1 on the left child, 945 00:45:00,810 --> 00:45:02,650 k plus 1 on the right child. 946 00:45:02,650 --> 00:45:04,350 The difference is more than one. 947 00:45:04,350 --> 00:45:06,120 Not an AVL. 948 00:45:06,120 --> 00:45:07,160 How do we fix this? 949 00:45:07,160 --> 00:45:14,132 950 00:45:14,132 --> 00:45:17,620 AUDIENCE: Make a right rotation [INAUDIBLE]. 951 00:45:17,620 --> 00:45:18,560 PROFESSOR: Sorry? 952 00:45:18,560 --> 00:45:21,560 AUDIENCE: Make a right rotation on k first. 953 00:45:21,560 --> 00:45:23,378 PROFESSOR: So make a right rotation where? 954 00:45:23,378 --> 00:45:25,242 AUDIENCE: On k [INAUDIBLE] plus 1. 955 00:45:25,242 --> 00:45:30,290 956 00:45:30,290 --> 00:45:32,665 PROFESSOR: Let me break this up so we 957 00:45:32,665 --> 00:45:34,870 could see how that would work out. 958 00:45:34,870 --> 00:45:39,280 959 00:45:39,280 --> 00:45:42,720 So these two are k minus 1 and k minus 1, 960 00:45:42,720 --> 00:45:48,140 and these nodes are A, B, and C. So you're 961 00:45:48,140 --> 00:45:52,450 saying do a right rotation here, right? 962 00:45:52,450 --> 00:45:54,940 Let's see what that gets us to. 963 00:45:54,940 --> 00:45:58,360 So A is the same, and then here I'm 964 00:45:58,360 --> 00:46:04,140 going to have C instead of B, and B. The right child of B 965 00:46:04,140 --> 00:46:06,700 is the same as it was before. 966 00:46:06,700 --> 00:46:14,920 The left child of B is k minus 1. 967 00:46:14,920 --> 00:46:19,170 Then the left child of C is another k minus 1, 968 00:46:19,170 --> 00:46:21,510 and this guy is k minus 1. 969 00:46:21,510 --> 00:46:23,992 OK, what's the height at B now? 970 00:46:23,992 --> 00:46:25,920 AUDIENCE: k. 971 00:46:25,920 --> 00:46:27,320 PROFESSOR: OK. 972 00:46:27,320 --> 00:46:29,759 What's the height here? 973 00:46:29,759 --> 00:46:33,140 AUDIENCE: k plus 1. 974 00:46:33,140 --> 00:46:35,022 PROFESSOR: And the height here? 975 00:46:35,022 --> 00:46:37,760 AUDIENCE: k plus 2. 976 00:46:37,760 --> 00:46:40,110 PROFESSOR: Is this an AVL? 977 00:46:40,110 --> 00:46:43,936 So we're not done, but what's the good news about this? 978 00:46:43,936 --> 00:46:47,664 AUDIENCE: Now you can rotate it left around C. 979 00:46:47,664 --> 00:46:48,596 PROFESSOR: Yep. 980 00:46:48,596 --> 00:46:49,530 Exactly. 981 00:46:49,530 --> 00:46:55,980 That is this, and we know how to fix this in one step. 982 00:46:55,980 --> 00:46:57,630 You said exactly the right thing. 983 00:46:57,630 --> 00:46:59,910 This is exactly what we do. 984 00:46:59,910 --> 00:47:03,640 First, rotate here so that we get to that, 985 00:47:03,640 --> 00:47:06,390 and then we're in the happy case. 986 00:47:06,390 --> 00:47:08,990 So intuitively, the difference between the happy case 987 00:47:08,990 --> 00:47:13,330 and the harder case that's going to be happy eventually 988 00:47:13,330 --> 00:47:16,860 is where the heavy path is, so where 989 00:47:16,860 --> 00:47:19,370 you have the bigger height. 990 00:47:19,370 --> 00:47:24,500 In this case, k plus 2, k plus 1, k. 991 00:47:24,500 --> 00:47:29,230 So the heavy path is this thing here. 992 00:47:29,230 --> 00:47:30,160 It's a straight line. 993 00:47:30,160 --> 00:47:33,400 Because of that, we can rotate here and then 994 00:47:33,400 --> 00:47:36,290 redistribute the weights uniformly. 995 00:47:36,290 --> 00:47:39,410 In this case, the heavy path is this. 996 00:47:39,410 --> 00:47:44,920 It's a zigzag, so one rotation won't fix it. 997 00:47:44,920 --> 00:47:46,570 Instead, we do one rotation here so 998 00:47:46,570 --> 00:47:50,570 that we make the weights be distributed 999 00:47:50,570 --> 00:47:52,610 in a straight line, and then one rotation here, 1000 00:47:52,610 --> 00:47:55,230 which makes everything look good. 1001 00:47:55,230 --> 00:47:55,979 Yes? 1002 00:47:55,979 --> 00:47:57,770 AUDIENCE: So that's just a general pattern. 1003 00:47:57,770 --> 00:47:59,694 And you redistribute [INAUDIBLE]? 1004 00:47:59,694 --> 00:48:02,412 1005 00:48:02,412 --> 00:48:04,870 PROFESSOR: So the first thing we do is say you're at a node 1006 00:48:04,870 --> 00:48:05,964 and you need to rebalance. 1007 00:48:05,964 --> 00:48:07,380 First thing, you check the heights 1008 00:48:07,380 --> 00:48:09,980 and see if it's already balanced. 1009 00:48:09,980 --> 00:48:13,120 If it's not balanced, suppose your heavier node 1010 00:48:13,120 --> 00:48:14,700 is on the right. 1011 00:48:14,700 --> 00:48:19,260 Then you go and see if the right child is heavier 1012 00:48:19,260 --> 00:48:20,810 or the left child is heavier. 1013 00:48:20,810 --> 00:48:24,660 So you see if the height distribution is a straight line 1014 00:48:24,660 --> 00:48:26,680 or is exact, and then you decide whether you're 1015 00:48:26,680 --> 00:48:28,429 going to do one rotation or two rotations. 1016 00:48:28,429 --> 00:48:31,970 1017 00:48:31,970 --> 00:48:35,050 So this is all nice and good except for one thing. 1018 00:48:35,050 --> 00:48:39,105 The heights keep changing here, and we're 1019 00:48:39,105 --> 00:48:42,250 storing the heights in each node. 1020 00:48:42,250 --> 00:48:44,150 So if we're doing all this rotating 1021 00:48:44,150 --> 00:48:47,830 without updating the heights, then we're 1022 00:48:47,830 --> 00:48:49,924 going to have something that looks like an AVL 1023 00:48:49,924 --> 00:48:51,840 but doesn't keep track of the heights anymore, 1024 00:48:51,840 --> 00:48:54,980 so eventually, we're going to make bad decisions. 1025 00:48:54,980 --> 00:48:57,440 So we need to update the heights. 1026 00:48:57,440 --> 00:49:00,254 Where do we update the heights? 1027 00:49:00,254 --> 00:49:01,670 You can look at the code and cheat 1028 00:49:01,670 --> 00:49:05,457 or you can try to think and tell me the answer, either way. 1029 00:49:05,457 --> 00:49:13,570 1030 00:49:13,570 --> 00:49:17,630 So this only does rotates, right? 1031 00:49:17,630 --> 00:49:20,649 If rotates would be able to update the heights, 1032 00:49:20,649 --> 00:49:21,690 everything would be good. 1033 00:49:21,690 --> 00:49:23,520 I wouldn't have to worry about this. 1034 00:49:23,520 --> 00:49:26,350 1035 00:49:26,350 --> 00:49:29,520 Pretend I hadn't deleted the original case here, 1036 00:49:29,520 --> 00:49:38,050 which I think was A, B, C, and then some children. 1037 00:49:38,050 --> 00:49:39,720 What heights changed here? 1038 00:49:39,720 --> 00:49:45,950 1039 00:49:45,950 --> 00:49:50,640 When I did this rotation, what heights changed? 1040 00:49:50,640 --> 00:49:53,040 AUDIENCE: B's height changed. 1041 00:49:53,040 --> 00:49:58,320 AUDIENCE: B, A, and C is greater than B. 1042 00:49:58,320 --> 00:50:02,240 PROFESSOR: B and potentially A. If I'm not sure, 1043 00:50:02,240 --> 00:50:04,300 then I'm going to try to update the height anyway 1044 00:50:04,300 --> 00:50:06,630 to make sure that I have the right height. 1045 00:50:06,630 --> 00:50:10,790 After I do this, I have to update the height on B 1046 00:50:10,790 --> 00:50:15,225 and update the heights on A. Is this correct? 1047 00:50:15,225 --> 00:50:21,472 1048 00:50:21,472 --> 00:50:22,960 AUDIENCE: And B. 1049 00:50:22,960 --> 00:50:24,210 PROFESSOR: And all the way up. 1050 00:50:24,210 --> 00:50:26,240 Let's assume that happens. 1051 00:50:26,240 --> 00:50:28,242 Does this look right? 1052 00:50:28,242 --> 00:50:31,220 AUDIENCE: Don't you also have to update for C? 1053 00:50:31,220 --> 00:50:31,940 PROFESSOR: For c. 1054 00:50:31,940 --> 00:50:34,886 1055 00:50:34,886 --> 00:50:37,840 Did I change anything under c? 1056 00:50:37,840 --> 00:50:40,290 AUDIENCE: I think he's talking about capital C. 1057 00:50:40,290 --> 00:50:42,980 PROFESSOR: Oh, capital C. Yeah, OK. 1058 00:50:42,980 --> 00:50:48,800 1059 00:50:48,800 --> 00:50:50,185 So the children are the same. 1060 00:50:50,185 --> 00:50:53,900 They were a and b before, they're a and b afterwards, 1061 00:50:53,900 --> 00:50:55,586 so I don't have to worry about this guy. 1062 00:50:55,586 --> 00:50:56,960 The heights here haven't changed, 1063 00:50:56,960 --> 00:50:59,705 so the height here hasn't changed. 1064 00:50:59,705 --> 00:51:01,080 But you're thinking of a problem, 1065 00:51:01,080 --> 00:51:03,121 and that is good because there is a problem here. 1066 00:51:03,121 --> 00:51:06,251 1067 00:51:06,251 --> 00:51:09,632 AUDIENCE: What if there's a subtree instead of small c? 1068 00:51:09,632 --> 00:51:13,220 PROFESSOR: A subtree instead of small c? 1069 00:51:13,220 --> 00:51:14,610 OK, so yeah, this is a subtree. 1070 00:51:14,610 --> 00:51:17,140 1071 00:51:17,140 --> 00:51:19,960 After rotating, this subtree moves over here. 1072 00:51:19,960 --> 00:51:22,612 1073 00:51:22,612 --> 00:51:23,236 AUDIENCE: Wait. 1074 00:51:23,236 --> 00:51:25,580 You have to update A first, and then update B. 1075 00:51:25,580 --> 00:51:27,300 PROFESSOR: OK. 1076 00:51:27,300 --> 00:51:29,510 So the height starts from the bottom. 1077 00:51:29,510 --> 00:51:31,260 The bottom height zero, the next one 1078 00:51:31,260 --> 00:51:33,400 up, the next one up, the next one up. 1079 00:51:33,400 --> 00:51:34,920 Keep that in mind. 1080 00:51:34,920 --> 00:51:37,640 Nothing changed below this guy, nothing changed below this guy, 1081 00:51:37,640 --> 00:51:42,090 nothing changed here, so I don't have to update these. 1082 00:51:42,090 --> 00:51:44,540 But when I compute the height, update height 1083 00:51:44,540 --> 00:51:47,490 at the beginning of the listing assumes 1084 00:51:47,490 --> 00:51:49,450 that the height of the children is correct. 1085 00:51:49,450 --> 00:51:51,116 So when I compute the height of my mode, 1086 00:51:51,116 --> 00:51:53,000 the height of the children has to be updated. 1087 00:51:53,000 --> 00:51:55,117 So if I call Update Height of B first, 1088 00:51:55,117 --> 00:51:56,950 I already know that this might have changed, 1089 00:51:56,950 --> 00:51:59,420 so whatever answer I get is bogus. 1090 00:51:59,420 --> 00:52:03,575 So this doesn't work and I have to do this. 1091 00:52:03,575 --> 00:52:13,290 1092 00:52:13,290 --> 00:52:14,480 This is rotation. 1093 00:52:14,480 --> 00:52:16,380 That's rebalancing. 1094 00:52:16,380 --> 00:52:18,240 One more trick to rebalancing. 1095 00:52:18,240 --> 00:52:23,200 So we talked about rebalancing the subtree at one node. 1096 00:52:23,200 --> 00:52:24,660 What's missing from this picture? 1097 00:52:24,660 --> 00:52:29,560 1098 00:52:29,560 --> 00:52:32,010 AUDIENCE: So you check it on all levels? 1099 00:52:32,010 --> 00:52:32,920 PROFESSOR: Yeah. 1100 00:52:32,920 --> 00:52:34,560 So I made some changes here. 1101 00:52:34,560 --> 00:52:38,830 My tree might look happy starting from here on, 1102 00:52:38,830 --> 00:52:41,090 but if I did an insertion or a deletion, 1103 00:52:41,090 --> 00:52:43,570 heights changed all the way up, so I 1104 00:52:43,570 --> 00:52:46,635 have to go all the way up and do more rebalancings potentially. 1105 00:52:46,635 --> 00:52:50,460 1106 00:52:50,460 --> 00:52:51,600 And that's it. 1107 00:52:51,600 --> 00:52:52,787 These are AVLs. 1108 00:52:52,787 --> 00:52:53,620 Any other questions? 1109 00:52:53,620 --> 00:52:58,574 1110 00:52:58,574 --> 00:52:59,490 Did you guys get them? 1111 00:52:59,490 --> 00:53:00,993 Do they make more sense now? 1112 00:53:00,993 --> 00:53:01,820 AUDIENCE: Yes. 1113 00:53:01,820 --> 00:53:03,030 PROFESSOR: OK. 1114 00:53:03,030 --> 00:53:04,300 Good. 1115 00:53:04,300 --> 00:53:08,134 You'll have to play with them on the next Pset. 1116 00:53:08,134 --> 00:53:10,004 AUDIENCE: By "play," what do you mean? 1117 00:53:10,004 --> 00:53:12,170 PROFESSOR: Well, you're not writing it from scratch. 1118 00:53:12,170 --> 00:53:13,914 You have to modify existing code. 1119 00:53:13,914 --> 00:53:17,614 AUDIENCE: What are we going to make it do? 1120 00:53:17,614 --> 00:53:20,030 PROFESSOR: So you'll have to keep track of something other 1121 00:53:20,030 --> 00:53:20,850 than heights. 1122 00:53:20,850 --> 00:53:22,670 You'll have to keep track of a new value. 1123 00:53:22,670 --> 00:53:24,503 AUDIENCE: Oh, like the minimum or something? 1124 00:53:24,503 --> 00:53:26,020 PROFESSOR: Or something, yeah. 1125 00:53:26,020 --> 00:53:27,371