I am working on a project in Python to detect and classify some bird song, and I have found myself in a position where I need to convert a wave file into frequency vs. time data. This hasn't been too much of a problem, but to be able to classify the different syllables into groups, I need to write something that will detect when the data clusters into a certain shape. To give you an idea of what the data looks like, here is an image of how the data looks when plotted:

I need some way to get each individual syllable (each shape with a separation on either side) and save them either to a variable or to their own files so that I can run Pearson correlation between them using SciPy.
Also, I prefer Python, but I am open to coding in other languages if you have another way to do it.
Thanks!