Posted on

Table of Contents

Union find also called a Disjoint Set Datastructure. It stores 1 or more disjoint sets of elements.

The simplest way is to make an array which looks like,

F(x) = parent(x)

In this array, The index i is the item/node and the value at parent[i] is it's parent. Initially, Each node points to itself.

To find parent of a node, we can do

find (node)
    if node != parent[node]
        return find(parent[node])
    return node

To check if two nodes are in the same set, we can do

find(node1) == find(node2)

This'll recursively check if the two nodes are in the same set since they'll have the same parent.

If we want to join 2 nodes, we can do

pa = find(a)
pb = find(b)

if pa != pb
	parent[b] = a;

Optimizations

  1. Path Compression. By default, We look up the full path and trace the parent. This can take a lot of time and we can reduce this by compressing the path. In the find function, We have already done the work of looking up a node's parent. We can update the node's parent to point to it's main parent directly. In future, Lookups will only take 1 hop rather than jumping more times so the look up can go from 1->2->3->4->5 to just 1->5

  2. We can keep depth of each node when we join two sets. Currently, We simply join the second tree into the first one. If the second tree was much larger than the first tree than look ups into the nodes in second tree will not take 1 more hop.

To avoid this, We can keep depths of each node and then join second into first OR first into second depending on the one that has less depth.

References

  1. https://www.scaler.com/topics/data-structures/disjoint-set/