How to classify Gaussian from bimodal mixture of 2 Gaussians?

Question

I have two types of samples:

some look like bimodal Gaussian mixture (Type A),
and some look like a Gaussian (Type B).

How can I programmatically classify/label them?

Language can be R, Python, Matlab, or anything else appropriate.

enter image description here

The data presented in the charts are Red values from jpeg images.

In R code:

# I read an image
I = readJPEG(file, native = FALSE)
# so I is now a matrix of 3 dimensions (rows, columns, 3 for Red/Green/Blue)

# I extract a vertical line from the image, and only the Red part
image_extract <- I[150:260, 194, 1]

# After reading several images, I plot the 3 images image_extract for each type (A,B)
plot(image_extract_1)
lines(image_extract_2)
lines(image_extract_3)

For Type A I plotted, 3 image extracts on the same chart. Same for Type B.

I hope it clarifies.

Are these curves, or are you showing estimated density functions from data? — Aniko, Jan 16 '14 at 13:44
@Aniko Those graphs are from R using the commans plot and lines with the data. The x-Axis is the row, and the y-Axis is the value analyzed. — Timothée HENRY, Jan 16 '14 at 14:40
If the y-axis is not the density, I completely misunderstood your question. If the y-Axis is the dependent variable, you are rather interested in categorizing the plots according to some regression model. — Horst Grünbusch, Jan 16 '14 at 16:07
@Aniko I added some explanation in my question. Hopefully that clarifies what the plots represent. — Timothée HENRY, Jan 17 '14 at 16:01

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

1

The best I could do so far was to try to fit a gaussian (see here for fitting a single/unimodal gaussian: How to fit data that looks like a gaussian?).

Then calculate the "difference" between the fit and the actual data. High "difference"would mean that it would not be a single gaussian, therefore it could be a double or something else.

edited Apr 13 '17 at 12:44

Community

1

answered Jan 31 '14 at 15:48

Timothée HENRY

1,011

score 0 · Answer 2 · answered Jul 15 '14 at 18:42

Usually the term for this is Gaussian Mixture Modeling, which along with kMeans is one of the most popular and widely implemented clustering algorithms out there.

Each Gaussian has a mean and variance that are learned from the data, often using the Expectation-Maximization algorithm to compute maximum likelihood.

One implementation is the R package EMCluster.

http://cran.r-project.org/web/packages/EMCluster/EMCluster.pdf

Another is mclust. http://cran.r-project.org/web/packages/mclust/vignettes/mclust.pdf

For python, sklearn has GMM: http://scikit-learn.org/stable/modules/mixture.html

For matlab, the Statistics toolbox has gmdistribution: http://www.mathworks.com/help/stats/gaussian-mixture-models.html

How to classify Gaussian from bimodal mixture of 2 Gaussians?

2 Answers2