I am looking for an example of how to use OpenCV's connectedComponentsWithStats() function in Python. Note this is only available with OpenCV 3 or newer. The official documentation only shows the API for C++, even though the function exists when compiled for Python. I could not find it anywhere online.
- 863
- 1
- 9
- 24
- 2,517
- 2
- 12
- 14
3 Answers
The function works as follows:
# Import the cv2 library
import cv2
# Read the image you want connected components of
src = cv2.imread('/directorypath/image.bmp')
# Threshold it so it becomes binary
ret, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# You need to choose 4 or 8 for connectivity type
connectivity = 4
# Perform the operation
output = cv2.connectedComponentsWithStats(thresh, connectivity, cv2.CV_32S)
# Get the results
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]
Labels is a matrix the size of the input image where each element has a value equal to its label.
Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the number of stats. It can be used with the OpenCV documentation for it:
Statistics output for each label, including the background label, see below for available statistics. Statistics are accessed via stats[label, COLUMN] where available columns are defined below.
- cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
- cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
- cv2.CC_STAT_WIDTH The horizontal size of the bounding box
- cv2.CC_STAT_HEIGHT The vertical size of the bounding box
- cv2.CC_STAT_AREA The total area (in pixels) of the connected component
Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.
- 2,517
- 2
- 12
- 14
-
I must say that for some reason, I had to use cv2.THRESH_BINARY instead of cv2.THRESH_BINARY+cv2.THRESH_OTSU, then I had to cast src to integer and thresh to float in order for it to work. I don't know why, but it didn't work otherwise. – Бојан Матовски Jun 27 '16 at 07:54
-
I don't understand why you create the labels matrix when it is then part of the output anyway? – ypnos Jul 01 '16 at 14:14
-
1@ypnos You don't need to for connected components with stats, but do for connected components without stats. I think that part was just left over from me doing it the other way. I fixed it now. Cheers! – Zack Knopp Jul 04 '16 at 17:13
-
Thanks so much for this! This is a much better description of how this works than the C++ docs have. – Haldean Brown Oct 11 '16 at 16:34
-
1can some one explain how to use the labels? How to check if a centroid is what label? – recurf Dec 07 '16 at 22:28
-
3Each component in the image gets a number (label). The background is label 0, and the additional objects are numbered from 1 to `num_labels-1`. The centroids are indexed by the same numbers as the labels. `centroids[0]` isn't particularly useful--it's just the background. `centroids[1:num_labels]` is what you want. – krs013 Feb 25 '17 at 21:43
-
@ZackKnopp Do you also know how I can order the labels by area, width or height? – matchifang Jul 24 '17 at 19:05
-
@ZackKnopp That's incorrect, you can use the function without stats like this as well: `_, labels = cv2.connectedComponents(segmentation)` :) – smcs Sep 01 '17 at 09:02
-
3@matchifang You could create an array with the component areas: `areas=output[2][:,4]` Then an array with the numbers of components: `nr=np.arange(output[0])` Then sort them according to area size: `ranked=sorted(zip(areas,nr))` With help from here: https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list – smcs Sep 01 '17 at 12:43
-
`cv2.connectedComponentsWithStats` does not take connectivity as an input argument in OpenCV 3 or 4, and I don't think the function was present in 2. Is this simply a mixup between `conectedComponentsWithStats` and `connectedComponentsWithStatsWithAlgorithm`? `output = cv2.connectedComponentsWithStats(thresh)` gives the exact same result for me. – Atnas Feb 24 '21 at 19:44
I have come here a few times to remember how it works and each time I have to reduce the above code to :
_, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
connectivity = 4 # You need to choose 4 or 8 for connectivity type
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh , connectivity , cv2.CV_32S)
Hopefully, it's useful for everyone :)
- 1,204
- 14
- 14
Adding to Zack Knopp answer,
If you are using a grayscale image you can simply use:
import cv2
import numpy as np
src = cv2.imread("path\\to\\image.png", 0)
binary_map = (src > 0).astype(np.uint8)
connectivity = 4 # or whatever you prefer
output = cv2.connectedComponentsWithStats(binary_map, connectivity, cv2.CV_32S)
When I tried using Zack Knopp answer on a grayscale image it didn't work and this was my solution.
- 111
- 2
- 9