67

I am looking for an example of how to use OpenCV's connectedComponentsWithStats() function in Python. Note this is only available with OpenCV 3 or newer. The official documentation only shows the API for C++, even though the function exists when compiled for Python. I could not find it anywhere online.

Milan
  • 863
  • 1
  • 9
  • 24
Zack Knopp
  • 2,517
  • 2
  • 12
  • 14

3 Answers3

135

The function works as follows:

# Import the cv2 library
import cv2
# Read the image you want connected components of
src = cv2.imread('/directorypath/image.bmp')
# Threshold it so it becomes binary
ret, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# You need to choose 4 or 8 for connectivity type
connectivity = 4  
# Perform the operation
output = cv2.connectedComponentsWithStats(thresh, connectivity, cv2.CV_32S)
# Get the results
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]

Labels is a matrix the size of the input image where each element has a value equal to its label.

Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the number of stats. It can be used with the OpenCV documentation for it:

Statistics output for each label, including the background label, see below for available statistics. Statistics are accessed via stats[label, COLUMN] where available columns are defined below.

  • cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
  • cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
  • cv2.CC_STAT_WIDTH The horizontal size of the bounding box
  • cv2.CC_STAT_HEIGHT The vertical size of the bounding box
  • cv2.CC_STAT_AREA The total area (in pixels) of the connected component

Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.

Zack Knopp
  • 2,517
  • 2
  • 12
  • 14
  • I must say that for some reason, I had to use cv2.THRESH_BINARY instead of cv2.THRESH_BINARY+cv2.THRESH_OTSU, then I had to cast src to integer and thresh to float in order for it to work. I don't know why, but it didn't work otherwise. – Бојан Матовски Jun 27 '16 at 07:54
  • I don't understand why you create the labels matrix when it is then part of the output anyway? – ypnos Jul 01 '16 at 14:14
  • 1
    @ypnos You don't need to for connected components with stats, but do for connected components without stats. I think that part was just left over from me doing it the other way. I fixed it now. Cheers! – Zack Knopp Jul 04 '16 at 17:13
  • Thanks so much for this! This is a much better description of how this works than the C++ docs have. – Haldean Brown Oct 11 '16 at 16:34
  • 1
    can some one explain how to use the labels? How to check if a centroid is what label? – recurf Dec 07 '16 at 22:28
  • 3
    Each component in the image gets a number (label). The background is label 0, and the additional objects are numbered from 1 to `num_labels-1`. The centroids are indexed by the same numbers as the labels. `centroids[0]` isn't particularly useful--it's just the background. `centroids[1:num_labels]` is what you want. – krs013 Feb 25 '17 at 21:43
  • @ZackKnopp Do you also know how I can order the labels by area, width or height? – matchifang Jul 24 '17 at 19:05
  • @ZackKnopp That's incorrect, you can use the function without stats like this as well: `_, labels = cv2.connectedComponents(segmentation)` :) – smcs Sep 01 '17 at 09:02
  • 3
    @matchifang You could create an array with the component areas: `areas=output[2][:,4]` Then an array with the numbers of components: `nr=np.arange(output[0])` Then sort them according to area size: `ranked=sorted(zip(areas,nr))` With help from here: https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list – smcs Sep 01 '17 at 12:43
  • `cv2.connectedComponentsWithStats` does not take connectivity as an input argument in OpenCV 3 or 4, and I don't think the function was present in 2. Is this simply a mixup between `conectedComponentsWithStats` and `connectedComponentsWithStatsWithAlgorithm`? `output = cv2.connectedComponentsWithStats(thresh)` gives the exact same result for me. – Atnas Feb 24 '21 at 19:44
18

I have come here a few times to remember how it works and each time I have to reduce the above code to :

_, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
connectivity = 4  # You need to choose 4 or 8 for connectivity type
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh , connectivity , cv2.CV_32S)

Hopefully, it's useful for everyone :)

Dan Erez
  • 1,204
  • 14
  • 14
10

Adding to Zack Knopp answer, If you are using a grayscale image you can simply use:

import cv2
import numpy as np

src = cv2.imread("path\\to\\image.png", 0)
binary_map = (src > 0).astype(np.uint8)
connectivity = 4 # or whatever you prefer

output = cv2.connectedComponentsWithStats(binary_map, connectivity, cv2.CV_32S)

When I tried using Zack Knopp answer on a grayscale image it didn't work and this was my solution.

Barel Levy
  • 111
  • 2
  • 9