Connected Sets Labeling or Connected Components Labeling is the process of assigning unique labels to elements in a matrix or image, in such a way that adjacent elements are assigned the same label. The elements within a connected set will be similar to each other in terms of a given criteria. Alternate terms for connected-sets-labeling include connected-component-analysis, blob-extraction, blob-discovery, region-finding, region-labeling and region-extraction. Connected-sets-labeling is an important problem that has many applications in graph theory and computer vision. In computer vision, connected-sets-labeling is used in image analysis to find groups of similar pixels. Identifying and labeling of various disjoint or connected regions in an image is useful in many automated image analysis applications (example: image segmentation).
This article explains a sequential algorithm (with C# source-code) for finding disjoint connected sets in a given matrix. Using the Silverlight demo provided, you can dynamically create an input matrix and mark/unmark the cells (by clicking) to see the generated connected-sets (or regions) visually. Full source-code of the demo is available for download.
Connected Sets Demo in Silverlight
Click on the matrix to mark/unmark the cells.
Connected Sets Labeling Process
Finding the connected sets using a sequential procedure (without recursion) consists of two phases.
- Phase1: Find & mark regions with temporary labels and record the equivalences.
- Phase2: Merge the regions (i.e. assign all equivalent regions the same region value).
Data Structures for this Article
The following data-structure classes are created. The Matrix class represents a 2-dimensional matrix or an image. The Cell class represents a cell or element in a matrix or a pixel in an image. The Set class represents a list of items (in our case numbers). The SetList class represents a list of sets.
public class Matrix
{
public int Width, Height;
public int[,] Values;
}
public class Cell
{
public int X, Y, Value;
}
public class Set : List<int>
{
}
public class SetList : Dictionary<int, Set>
{
}
Phase1: Finding and Marking the Regions
The sequential algorithm for marking the regions, is as follows:
- Create a region-counter and initialize it to 1.
- For every cell in the matrix, get the neighbor cells that match a criteria. For binary image/matrix, cell-value of 1 is the criteria. Note that there will be 8 neighbors for a given cell of a matrix, namely north, south, east, west, north-east, north-west, south-east and south-west. For 4-connectivity, consider north and west cells, and for 8-connectivity, consider north, west, north-east and north-west cells. For more information on connectivity, see http://en.wikipedia.org/wiki/Connectedness.
- If zero neighbors match the criteria, then assign the cell to current region-counter value, and increment the region-counter.
- If only one neighbor match the criteria, then assign the cell to region of that neighboring cell.
- If multiple neighbors match the criteria and all of them are of same region, then assign the cell to that region.
- If multiple neighbors match the criteria and they belong to different regions, then assign the cell to any one of those regions, and record that these regions are equivalent.
The following is the complete source-code for this algorithm. The comments inside the source-code identify the step-number in the above algorithm.
public static Matrix MarkRegions(Matrix matrix, out SetList equivalentRegions)
{
Matrix regionMatrix = new Matrix(matrix.Width, matrix.Height);
equivalentRegions = new SetList();
int currentRegion = 1; //step1
for (int y = 0; y < matrix.Height; y++)
{
for (int x = 0; x < matrix.Width; x++)
{
if (matrix.Values[x, y] == 1)
{
List<Cell> neighbors = matrix.GetNeighbors(x, y, 8); //step2
int matchCount = neighbors.Count(cell => cell.Value > 0);
if (matchCount == 0) //step3
{
regionMatrix.Values[x, y] = currentRegion;
equivalentRegions.Add(currentRegion, new Set() { currentRegion });
currentRegion += 1;
}
else if (matchCount == 1) //step4
{
Cell oneCell = neighbors.First(cell => cell.Value == 1);
regionMatrix.Values[x, y] = regionMatrix.Values[oneCell.X, oneCell.Y];
}
else if (matchCount > 1)
{
List<int> distinctRegions = neighbors.Select(cell => regionMatrix.Values[cell.X, cell.Y]).Distinct().ToList();
while (distinctRegions.Remove(0)) ;
if (distinctRegions.Count == 1) //step5
{
regionMatrix.Values[x, y] = distinctRegions[0];
}
else if (distinctRegions.Count > 1) //step6
{
int firstRegion = distinctRegions[0];
regionMatrix.Values[x, y] = firstRegion;
//save the relations of equivalent regions
foreach (int region in distinctRegions)
{
if (!equivalentRegions[firstRegion].Contains(region))
{
equivalentRegions[firstRegion].Add(region);
}
}
}
}
}
}
}
return regionMatrix;
}
For illustration, consider the following input matrix:
0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1
0 1 0 0 1 1 1 1
0 1 0 0 0 1 0 1
1 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 0 0 0 1 1 0 0
0 0 0 0 1 0 0 1
By applying the phase1 algorithm, the following region-matrix will be created.
0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1
0 2 0 0 3 3 3 1
0 2 0 0 0 3 0 1
2 2 0 0 0 0 3 0
0 2 0 0 0 0 3 0
0 0 0 0 4 4 0 0
0 0 0 0 4 0 0 5
From the region-matrix, we understand that 5 regions are marked. For each region, a equivalence-set will be created. Thus, the equivalence-sets created for 5 regions in this region-matrix will be: {1, 3}, {2}, {3, 1}, {4, 3} and {5}, respectively.
Phase2: Merging the Regions (Union-Find Algorithm)
Merging the regions is the process of assigning all equivalent regions, the same region value. This can be achieved using Union-Find algorithm.
The Union-Find algorithm, in general, is the process of finding 'disjoint' sets (with 'connected' elements) from a list of given sets. The following are the steps in Union-Find algorithm:
- Create a dictionary for storing root of all elements. Initially set the root of each element to itself.
- If two elements are 'connected', then unite the sets of those two elements. To unite sets containining the elements p and q, change the root of all elements (in the dictionary) with root(p) to root(q).
According to Union-Find algorithm, elements p and q are connected, if they have same root.
Please note that Union-Find algorithm needs the information of how the elements are 'connected' to each other. For the problem of region-merging, the equivalence-sets found in phase1 can be used in Step2 of the Union-Find algorithm.
The following is the source-code of Union-Find algorithm:
public class Vector : Dictionary<int, int>
{
}
public class UnionFind
{
public SetList Sets { get; set; } //equivalence-sets
public Vector Roots { get; set; }
public UnionFind(SetList sets)
{
this.Sets = sets;
this.Roots = new Vector();
this.Initialize();
this.Start();
}
public void Initialize()
{
//initialize Roots (add all items to Roots)
Roots.Clear();
foreach (int index in Sets.Keys)
foreach (int item in Sets[index])
if (!Roots.ContainsKey(item))
Roots.Add(item, -1);
//assign root of each item
foreach (int index in Sets.Keys)
foreach (int item in Sets[index])
Roots[item] = Sets[index][0];
}
public void Unite(int item1, int item2)
{
int item1Root = Roots[item1];
for (int index = 0; index < Roots.Count; index++)
{
int item = Roots.Keys.ElementAt(index);
if (Roots[item] == item1Root)
Roots[item] = Roots[item2];
}
}
public void Start()
{
foreach (int index in Sets.Keys)
{
var set = Sets[index];
for (int i = 0; i < set.Count; i++)
for (int j = i + 1; j < set.Count; j++)
Unite(set[i], set[j]);
}
}
}
After applying the Union-Find algorithm to the sample region-matrix illustrated above, the Roots dictionary will be: [1:4, 2:2, 3:4, 4:4, 5:5]. From this we understand that only 3 disjoint sets (namely 2, 4 and 5) are available in the region-matrix.
0 0 0 0 4 4 4 0
0 0 0 0 0 0 0 4
0 2 0 0 4 4 4 4
0 2 0 0 0 4 0 4
2 2 0 0 0 0 4 0
0 2 0 0 0 0 4 0
0 0 0 0 4 4 0 0
0 0 0 0 4 0 0 5
We have to then normalize the region-numbers i.e. assign sequential region-numbers starting from 1. Thus, the region numbers after normalizing will be 1, 2 and 3. The final region-matrix will look like this:
0 0 0 0 2 2 2 0
0 0 0 0 0 0 0 2
0 1 0 0 2 2 2 2
0 1 0 0 0 2 0 2
1 1 0 0 0 0 2 0
0 1 0 0 0 0 2 0
0 0 0 0 2 2 0 0
0 0 0 0 2 0 0 3
The techniques explained in this article can be extended to any dimensions, however with increased computational complexities. For more information on connected component analysis, see http://en.wikipedia.org/wiki/Connected-component_labeling.
Downloads for this article
File |
Language |
Tools |
Connected-Sets-Labeling-Source 518.78 kb (158 downloads) |
C#, Silverlight 4 |
Visual Studio 2010 |