Weighted Quick-Union with Path Compression
A Deep Dive into Weighted Quick-Union with Path Compression
In the world of computer science, some algorithms are so efficient and elegant that they feel like magic. The Weighted Quick-Union with Path Compression algorithm is one of them. It's the gold standard for solving a class of problems known as "dynamic connectivity" problems.
Let's break it down.
The Problem: Dynamic Connectivity
Imagine you have a set of objects. Over time, you're told that certain pairs of these objects are now connected. The fundamental question you want to answer, at any point, is: "Are object A and object B connected?"
This "connection" could mean anything:
- Social Networks: Are two people in the same network of friends?
- Computer Networks: Can two computers on a network send messages to each other?
- Image Processing: Are two pixels part of the same contiguous region of color?
- Maze Solving: Is there a path from the start to the end?
The data structure that handles this is often called a Union-Find or Disjoint-Set Union (DSU). It has two primary operations:
union(p, q): Connect objectpand objectq.find(p): Find the "identifier" of the group thatpbelongs to. Iffind(p)equalsfind(q), thenpandqare connected.
The Journey to Optimization
Let's build up to the final, optimized algorithm by looking at simpler versions first. We'll represent our objects as nodes in a forest (a collection of trees). Each tree represents a connected component. The root of the tree is the unique identifier for that component.
Attempt 1: Quick-Union
In this approach, each node has a pointer to its parent. To find the root of a node, you just follow the parent pointers until you reach a node that points to itself.
find(p): Followparent[p]until you reach the root.union(p, q): Find the root ofp(let's call itrootP) and the root ofq(rootQ). Then, simply set the parent ofrootPto berootQ.
The Problem: This can lead to very tall, skinny trees. Imagine connecting items in a line: union(0,1), union(1,2), union(2,3), ... The tree becomes a long chain. A find operation on the deepest node would have to traverse the entire chain, making it slow (O(N) in the worst case).
union(0,1), union(1,2), union(2,3) 3 | 2 | 1 | 0Attempt 2: Weighted Quick-Union (Union by Size/Rank)
To avoid creating long chains, we can be smarter about our union operation. Instead of arbitrarily connecting one root to another, let's keep track of the "size" (number of nodes) of each tree.
When we perform union(p, q), we find their roots (rootP and rootQ). We then connect the smaller tree to the root of the larger tree.
This simple change has a profound impact. It ensures our trees stay relatively short and bushy, preventing the worst-case scenario of a long chain. The maximum height of any tree is now guaranteed to be at most log(N), which makes our find operation much faster (O(log N)).
Example:
- Tree A has 5 nodes.
- Tree B has 2 nodes.
- To union them, we make the root of Tree B a child of the root of Tree A. The new combined tree has a size of 7.
The Final Touch: Path Compression
We can do even better. This optimization is applied during the find operation and is incredibly clever.
When we call find(p), we traverse a path of nodes from p up to the root. After we find the root, we can go back along that same path and make every node we visited point directly to the root.
Before Path Compression: find(0) requires traversing 0 -> 1 -> 2 -> 3 -> 4 (root)
4 | 3 / \ 2 5 | 1 | 0After Path Compression on find(0): Now, nodes 0, 1, 2, and 3 all point directly to the root, 4.
4 / | | \ / | | \ 3 2 1 0 | 5The next time we call find on any of those nodes (0, 1, 2, or 3), we'll get to the root in a single step! Over many operations, this keeps the trees incredibly flat.
The Result: Nearly Constant Time
When you combine Weighted Quick-Union with Path Compression, the performance becomes astonishingly good. The amortized time complexity for both union and find is nearly constant, often written as O(α(N)), where α(N) is the Inverse Ackermann function.
This function grows so slowly that for any input size you could possibly encounter in the real world (even larger than the number of atoms in the universe), α(N) is never greater than 5. For all practical purposes, the algorithm runs in constant time per operation.
Golang Implementation
Here is a full implementation in Go that combines both optimizations.
package main import "fmt" // WeightedQuickUnionPathCompression implements the union-find data structure with // both weighting and path compression optimizations. type WeightedQuickUnionPathCompression struct { // parent[i] = parent of i parent []int // size[i] = number of nodes in the subtree rooted at i size []int // count is the number of disjoint sets count int } // New initializes a new union-find data structure with n elements. // Each element initially is in its own set. func New(n int) *WeightedQuickUnionPathCompression { parent := make([]int, n) size := make([]int, n) for i := 0; i < n; i++ { parent[i] = i size[i] = 1 } return &WeightedQuickUnionPathCompression{ parent: parent, size: size, count: n, } } // Find returns the root of the component/set containing element p. // It uses path compression to flatten the tree structure. func (uf *WeightedQuickUnionPathCompression) Find(p int) int { // Find the root root := p for root != uf.parent[root] { root = uf.parent[root] } // Path compression: make every node on the path point to the root for p != root { newp := uf.parent[p] uf.parent[p] = root p = newp } return root } // Connected returns true if elements p and q are in the same set. func (uf *WeightedQuickUnionPathCompression) Connected(p, q int) bool { return uf.Find(p) == uf.Find(q) } // Union merges the set containing element p with the set containing element q. // It uses weighting (union by size) to keep the trees flat. func (uf *WeightedQuickUnionPathCompression) Union(p, q int) { rootP := uf.Find(p) rootQ := uf.Find(q) if rootP == rootQ { return } // Weighted union: attach the smaller tree to the root of the larger tree. if uf.size[rootP] < uf.size[rootQ] { uf.parent[rootP] = rootQ uf.size[rootQ] += uf.size[rootP] } else { uf.parent[rootQ] = rootP uf.size[rootP] += uf.size[rootQ] } uf.count-- } // Count returns the number of disjoint sets. func (uf *WeightedQuickUnionPathCompression) Count() int { return uf.count } func main() { // Example Usage: // Consider 10 elements, 0 through 9. uf := New(10) fmt.Printf("Initial components: %d\n", uf.Count()) // 10 uf.Union(4, 3) uf.Union(3, 8) uf.Union(6, 5) uf.Union(9, 4) uf.Union(2, 1) fmt.Printf("Are 8 and 9 connected? %t\n", uf.Connected(8, 9)) // true fmt.Printf("Are 5 and 4 connected? %t\n", uf.Connected(5, 4)) // false uf.Union(5, 0) uf.Union(7, 2) uf.Union(6, 1) uf.Union(1, 8) fmt.Printf("Are 5 and 4 connected now? %t\n", uf.Connected(5, 4)) // true fmt.Printf("Final components: %d\n", uf.Count()) // 1 }Conclusion
The Weighted Quick-Union with Path Compression algorithm is a testament to how clever optimizations can turn a slow, impractical solution into one that is breathtakingly fast. It's a fundamental tool in a programmer's arsenal, perfect for any problem that can be modeled as a set of objects with evolving connections. Its elegance and efficiency make it a classic and beautiful piece of computer science.