**Math 176 - Data Structures
Fall 2000 - Lecture Topics
Instructor: Sam Buss**

I will try to keep the following list of lecture topics up to date as much as possible. If you miss class, it is highly recommended that you get notes from other students.

**Day 1. Friday, September 22.**

- General introduction, grading policies, office hours, etc. See the course web page for most of this.
- Abstract data types (ADT's). Separate interface specification from implementation.
- The
**Stack**ADT. Generally supports*push*,*pop*,*peek*,*size*, and*empty*methods. Possible implementations include (i) using linked lists, (ii) using dynamically reallocated arrays, or (iii) using fixed-size arrays. - The
**Queue**ADT. Generally supports*enqueue*,*dequeue*,*peek*,*size*and*empty*methods. Possible implementations include (i) using linked lists, (ii) using dynamically reallocated circular arrays, or (iii) using fixed-size circular arrays. - The
**Deque**ADT. A double-ended queue (whence the name "d-e-que" abbreviating "double ended queue) that allows pushes and pops at both ends of the queue. Generally supports*pushLeft*,*pushRight*,*popLeft*,*popRight*,*peekLeft*,*peekRight*,*size*,*empty*operations. Possible implementations are the same as for queues. - Discussion of constant time or O(1) time operations. Most of the operations
discussed above are constant time; however, when using an array based implementation, you
only get
*amortized constant time*because occasionally extra time is needed to reallocate the array. - Advantages/disadvantages of linked list implementations versus (circular) arrays.
Disadvantages of linked lists include extra space required for pointer(s) and the
potential for memory fragmentation and memory management overhead. The disadvantages
of array-based implementations are the fact that you get only
*amortized*O(1) run times, and that you must write code to perform the dynamic reallocation.

**Day 2. Monday, September 25.**

- Definitions of Big-O, Little-o, Big-Omega, and Theta. See beginning of Chapter 2.
- Linked list implementation of the Set ADT needs run time of O(N
^{2}) for N operations. - Comparison of time N
^{2}microseconds and N log N microseconds. See the online Table of Runtimes. - Definitions of height, depth, root, leaf for trees.

**Day 3. Wednesday, September 27.**

- Trees and binary trees.
- Implementation of trees with children of a node in a linked list. Implementation of binary trees with children pointers.
- Binary search trees. The key ordering property of binary search trees.
- Algorithms for contains(), add() and remove() (a.k.a., find(), insert() and
delete() methods) for binary search trees. These were covered in class only
informally. You must read the textbook and understand the details of Weiss's
implementations. In particular, understand: (i) Why he returns TreeNode's or null.
(ii) How the use of recursive calls means that no a TreeNode does not need to have
a pointer (reference) to its parent. (iii) Why the binary search tree property is
preserved by the operations. (iii) Why keys have type
`Comparable`

. - Performance analysis: each operation takes time O(height of the tree). You should be sure to understand why this is true!
- Guidelines for academic integrity for programming assignments.

**Day 4. Friday, September 29.**

- In-order traversal of binary trees.
- Bound on number of nodes in a binary tree in terms of the height h. So height = Omega(number of nodes). Goal is h = O(log N) so as to get a runtime of O(log N) for all individual operations on a binary search trees.
- AVL trees. The AVL property.
- Theorem: h= O(log N) for AVL trees. Proof is postponed.
- Single rotations. Double rotations.
- How to rebalance an AVL tree after a adding a new leaf to the tree. You should
read the corresponding parts of the text book and understand how the recursive
implementation of Weiss's
**insert**works. In particular, why does he not take advantage of the fact that only one rotation is needed? Why does his insert( x, t) method return an AvlNode object? And, verify that his code properly updates the height values for the nodes after a rotation. - We did not finish the last case of rebalancing the AVL trees after an add(), so we will cover this Monday.

**Day 5. Monday, October 2.**

- The rest of the rebalancing for add() in AVL trees.
- Iterators. Discussion of how they work in general, and how they work in Java. You should look at the online Java documentation for more inormation on how to use iterators.
- Threaded AVL trees.
- How to rebalance an AVL tree after a remove() operation. I have made some figures showing how to rebalance after deletion on the web.

**Day 6. Wednesday, October 4.**

- Tree traversal orders: In-order, pre-order, post-order, level-order.
- Proof of the theorem that h=O(log N) for AVL trees.
- Advice on how to write your ThreadedAvlTree program. See also the page about Software Aids and Suggestions for Programming Threaded AVL Trees.

**Day 7. Friday, October 6.**

- Discussion of virtual memory, paging, disk accesses are much more expensive then main memory accesses. Tree seach methods will behave poorly once main memory is exhausted.
- B-trees. Purposes: (a) disk bound data structures, and (b) substitute for binary search trees
- B-trees data structure. Algorithm for find() or contains().
- B-tree insertion algorithm.
- B-tree deletion algorithm.

**Day 8. Monday, October 9.**

- Updates on the AVL tree programming assignment.
- Splay trees. Purposes and algorithms.
- Beginning of hashing.
- Collisions.
- Separate chaining.
- The duplicate birthday phenomenon.

**Day 9. Wednesday, October 11.**

- Examples of good and bad hash functions.
- java's hash function for strings.
- Cryptographic applications: Hashing for message authentification. (This topic is not required knowledge for the rest of Math 176.)

**Day 10. Friday, October 13.**

- Practical aspects of hash functions. Using Java's hashCode. The difference between "mod" and "%".
- Open addressing.
- Linear probing. Find, add algorithms. Primary clustering.
- Lazy deletion algorithm for Open Addressing..

**Day 11. Monday, October 16.**

- Rehashing.
- Load factors.
- Non-lazy deletion for linear probing. Algorithm available on-line too.
- Next programming assignment..

**Day 12. Wednesday, October 17.**

- More on the programming assignment.
- How to "break" Java's hashCode() method for strings and generate strings with the same hash code values.
- Fair comparison of load factors for open addressing versus separate chaining.
- Quadratic probing. Works if load factor <0.5.
- Prime table size and load factor <0.5 mean quadratic probing always succeeds.

**Day 13. Friday, October 19.**

- Double hashing. Algorithms.
- Proof of correctness of double hashing.
- Priority queues and applications
- BST's can implement priority queues in O(log n) time per operation.
- Binary heaps can be more efficient at priority queues.
- Complete binary tree property of binary heaps.
- Heap property: A key must be less than or equal to the keys of its children.
- Insert, findMind, deleteMin.
- PercolateUp.

**Day 14. Monday, October 23.**

- More on binary heaps. Review of Friday's material.
- Using bit operations to compute indices of children and parents.
- PercolateDown.
- DeleteMin
- Buildheap.
- Proof of O(N) runtime for build heap.

**Day 15. Wednesday, October 24.**

- Leftist heaps. The merge operation.
- Null path lengths. The leftist property.
- Proof that the right path of a leftist heap has length O(log N).
- The merge algorithm.

**Day 16. Friday, October 26.**

- Review of leftist heaps.
- The class structures for leftist heaps.
- Implementing deleteMin, findMin and insert in terms of merge.
- Skew heaps.
- Theorem on amortized cost of skew heap operations: O(log N) per operation. Proof postponed.
- Disjoint Set ADT started.
- Equivalence relations.
- Example of connected components in a graph.
- Functional behavior of union and find in the Disjoint Set ADT.

**Day 17. Monday, October 30.**

- More on Disjoint Set ADT.
- Algorithm for find(), union().
- Weiss's union() works only on roots. The algorithm in lecture works on arbitrary vertices.
- How the data for the Disjoint Set ADT is stored in an integer array. Integer values as pointers.
- Union-by-size.
- Union-by-height (also known as "union-by-rank:")
- Theorem: log n bounds on height for union-by-size and union-by-rank.
- Path compression.

**Day 18. Wednesday, November 1.**

- More on Disjoint Set ADT.
- Modification to return the minimum numbered element of the set. Use a second M[] array.
- Proof of log n upper bound on height for union-by-size. Proof of the corresponding result for union-by-height is similar.
- Ackermann function, "tower of twos" superexponential function.
- log* function. Inverse Ackermann function (alpha). Very slo growing functions.
- Theorem: log*n and alpha(n) upper bounds on tree height for union-find with path compression and either union-by-size or union-by-height.

**Day 19. Friday, November 3.**

- MIDTERM EXAM. Covers thru binary heaps. Any topics from the assignments or discussed in lecture may appear on the midterm. Definitely will have asymptotic bounds questions. No proofs.

**Day 20. Monday, November 6.**

- Expected number of comparisons for hashing with separate chaining and with open addressing. Idealized calculations.
- Amortized cost analysis. General framework. Cumulative time. Potential function.
- Handout on amortized cost analysis available in PDF format and postscript format.

**Day 21. Wednesday, November 8.**

- Amortized cost analysis for dynamically resizable arrays.
- A outline of the proof of the amortized cost analysis for skew heaps. The rest of the proof can be found in the textbook in Chapter 11.
- The next programming assignment will involve writing the heart of a search engine (ranking documents with regard to how well they match a two word search criteria). This will posted to the web in the next few days and discussed in class on Monday.

**Day 22. Monday, November 13.**

- The details of the proof of the amortized run time bound for skew heaps.
- Programming assignment #4.
- Inverted lists for finding occurences of words in large document sets.

**Day 23. Wednesday, November 15.**

- More details on the implementation of programming assignment #4. HashMap, ArrayList, etc.
- Walking through inverted lists. An online handout is available..

**Day 24. Friday, November 17.**

- Skip lists: see Weiss, pp. 399-402 and 468-473.
- Introduction to Skip Lists, motivated by scanning Inverted Lists.
- Level of a element in a skip list. Level of an edge in a Skip List.
- Finding a key in a Skip List, or the position after which the key would be inserted.

**Day 25. Monday, November 20.**

- Discussion of how search engines word.
- Discussion of what "close occurences" are.
- Insertion into skip lists.
- Randomized skip lists
- Deterministic skip lists, via a reduction to B-trees.

**Day 26. Wednesday, November 22.**

- Red-black trees. See Weiss, pp. 460-462.
- Proof of O(log n) height bound for red-black trees.
- Insertion algorithm.

**Day 27. Monday, November 27.**

- Treaps. See Weiss, pp. 481-483.
- Treaps are BST's with an extra priority field and are heaps with respect to priority.
- Insertion algorithm: Ordinary BST tree insertion followed by single rotations.
- Random BST's. Treaps on arbitrary input and BST's with randomly ordered insertions yield equivalent probability distribution on the shape of the tree.
- Random trees as aove are chosen by choosing the root uniformly at random and the recursively constructing the two subtrees.
- Theorem: expected depth of random BST is O(log N).

**Day 28. Wednesday, November 28.**

- Proof of the Theorem from Day 27.
- Huffman codes. See Weiss, pp. 357-362.
- Code words. Binary strings encoding symbols.
- Prefix codes. Unique decoding.
- Trie representation of a prefix code.
- There is a one-one correspondence between binary tries and prefix codes.

**Day 29. Friday, December 1.** Last day of class!

- Forthcoming. (Huffman's algorithm for finding the optimal prefix code given symbol frequency counts.)