15

What would be the best way to get Hz frequency value from audio stream(music) on iOS? What are the best and easiest frameworks provided by Apple to do that. Thanks in advance.

Olga Dalton
  • 831
  • 3
  • 15
  • 25
  • You need to be more specific - what sort of input are you looking at ? Speech ? Music ? A single instrument playing a single note ? A pure tone ? – Paul R Jul 27 '12 at 13:54
  • OK - so what kind of frequency information do you hope to extract ? Just a short term power spectrum, or something more sophisticated than that ? – Paul R Jul 27 '12 at 17:56
  • I need just Hz average value of every short music segment. Segment length is smaller than 0.2 s. – Olga Dalton Jul 27 '12 at 18:04
  • 4
    There is no single "Hz value" - a complex sound like music contains energy at many different frequencies, and this distribution of energy versus frequency changes continuously. – Paul R Jul 27 '12 at 18:55

3 Answers3

20

Here is some code I use to perform FFT in iOS using Accelerate Framework, which makes it quite fast.

//keep all internal stuff inside this struct
    typedef struct FFTHelperRef {
        FFTSetup fftSetup; // Accelerate opaque type that contains setup information for a given FFT transform.
        COMPLEX_SPLIT complexA; // Accelerate type for complex number
        Float32 *outFFTData; // Your fft output data
        Float32 *invertedCheckData; // This thing is to verify correctness of output. Compare it with input.
    } FFTHelperRef;

//first - initialize your FFTHelperRef with this function.

FFTHelperRef * FFTHelperCreate(long numberOfSamples) {

    FFTHelperRef *helperRef = (FFTHelperRef*) malloc(sizeof(FFTHelperRef));
    vDSP_Length log2n = log2f(numberOfSamples);    
    helperRef->fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
    int nOver2 = numberOfSamples/2;
    helperRef->complexA.realp = (Float32*) malloc(nOver2*sizeof(Float32) );
    helperRef->complexA.imagp = (Float32*) malloc(nOver2*sizeof(Float32) );

    helperRef->outFFTData = (Float32 *) malloc(nOver2*sizeof(Float32) );
    memset(helperRef->outFFTData, 0, nOver2*sizeof(Float32) );

    helperRef->invertedCheckData = (Float32*) malloc(numberOfSamples*sizeof(Float32) );

    return  helperRef;
}

//pass initialized FFTHelperRef, data and data size here. Return FFT data with numSamples/2 size.

Float32 * computeFFT(FFTHelperRef *fftHelperRef, Float32 *timeDomainData, long numSamples) {
    vDSP_Length log2n = log2f(numSamples);
    Float32 mFFTNormFactor = 1.0/(2*numSamples);

    //Convert float array of reals samples to COMPLEX_SPLIT array A
    vDSP_ctoz((COMPLEX*)timeDomainData, 2, &(fftHelperRef->complexA), 1, numSamples/2);

    //Perform FFT using fftSetup and A
    //Results are returned in A
    vDSP_fft_zrip(fftHelperRef->fftSetup, &(fftHelperRef->complexA), 1, log2n, FFT_FORWARD);

    //scale fft 
    vDSP_vsmul(fftHelperRef->complexA.realp, 1, &mFFTNormFactor, fftHelperRef->complexA.realp, 1, numSamples/2);
    vDSP_vsmul(fftHelperRef->complexA.imagp, 1, &mFFTNormFactor, fftHelperRef->complexA.imagp, 1, numSamples/2);

    vDSP_zvmags(&(fftHelperRef->complexA), 1, fftHelperRef->outFFTData, 1, numSamples/2);

    //to check everything =============================
    vDSP_fft_zrip(fftHelperRef->fftSetup, &(fftHelperRef->complexA), 1, log2n, FFT_INVERSE);
    vDSP_ztoc( &(fftHelperRef->complexA), 1, (COMPLEX *) fftHelperRef->invertedCheckData , 2, numSamples/2);
    //=================================================    

    return fftHelperRef->outFFTData;
}

Use it like this:

  1. Initialize it: FFTHelperCreate(TimeDomainDataLenght);

  2. Pass Float32 time domain data, get frequency domain data on return: Float32 *fftData = computeFFT(fftHelper, buffer, frameSize);

Now you have an array where indexes=frequencies, values=magnitude (squared magnitudes?). According to Nyquist theorem your maximum possible frequency in that array is half of your sample rate. That is if your sample rate = 44100, maximum frequency you can encode is 22050 Hz.

So go find that Nyquist max frequency for your sample rate: const Float32 NyquistMaxFreq = SAMPLE_RATE/2.0;

Finding Hz is easy: Float32 hz = ((Float32)someIndex / (Float32)fftDataSize) * NyquistMaxFreq; (fftDataSize = frameSize/2.0)

This works for me. If I generate specific frequency in Audacity and play it - this code detects the right one (the strongest one, you also need to find max in fftData to do this).

(there's still a little mismatch in about 1-2%. not sure why this happens. If someone can explain me why - that would be much appreciated.)

EDIT:

That mismatch happens because pieces I use to FFT are too small. Using larger chunks of time domain data (16384 frames) solves the problem. This questions explains it: Unable to get correct frequency value on iphone

EDIT: Here is the example project: https://github.com/krafter/DetectingAudioFrequency

Community
  • 1
  • 1
krafter
  • 1,364
  • 1
  • 14
  • 27
  • Can you post an example project ? – MathieuF Feb 21 '14 at 08:31
  • Amazing... On my iPhone 5 it peaks at 19K Hz on this site: http://www.audionotch.com/app/tune/. – Morkrom Jan 06 '15 at 22:51
  • Has anyone used this with Novocaine? – Morkrom Jan 06 '15 at 23:25
  • Nothing stops you from using it with Novocaine. It supports low level audio access. Just set it up and find proper callback function in there (like: void AudioCallback( Float32 * buffer, UInt32 frameSize, void * userData) ) – krafter Jan 13 '15 at 18:10
  • Hi, any updates on this for iOS10? Or, an alternative? Thanks – ICL1901 Dec 19 '16 at 16:25
  • @DavidDelMonte, please see the github repo recent update. Everything seems to be working well. Please let me know if you find any problems. – krafter Dec 21 '16 at 01:38
  • hey , could you help me out with my guitar tuner? @krafter – tryKuldeepTanwar Feb 17 '17 at 10:35
  • @dreamBegin, sure, what is your question? – krafter Feb 19 '17 at 12:29
  • due to the fact that i'm a newbie! I'm little bit struggling finding the right amount of information and code for objective-C and i don't think the sample projects on github are working great . i need the best possible way to get the correct frequency and the procedure to identify which string is played and tune that particular string. all i'm asking is direction not the code , i'll be glad if you help out in any way, appreciate your time and thanks in advance. – tryKuldeepTanwar Feb 20 '17 at 07:13
  • That example project above does most of the work by finding the right freq. of the sound. And a guitar string vibrations basically are just some vibration at a certain freq. There are 2 methods to make a guitar tuner. 1. The simple one. Find the frequency for each string. (wiki Guitar_tunings). Then use the example to check if frequencies match. (use a guitar or find an online one). 2. Advanced. Strings contain one fundamental tone and also overtones. This means that more than 1 freq. is present. And sometimes overtones sound louder than the fundamental tone. – krafter Feb 20 '17 at 18:36
  • Those overtone may mess with your maximum, so you probably need to find more than one maximum (peak) and consider them, not the fundamental, as the frequency to match to. – krafter Feb 20 '17 at 18:41
  • why didn't you tag me man.i didn't know you replied so glad to know and and thanks man you're the only person who replied in this topic , one thing to ask i check your project it was working pretty good but how can decrease the time it take to capture the new samples if you know what i mean like make it fast. thanks once again.Respect.... – tryKuldeepTanwar Feb 21 '17 at 07:32
  • you can also look at my question :-http://stackoverflow.com/questions/42359344/guitar-tuner-frequency – tryKuldeepTanwar Feb 21 '17 at 07:33
  • @dreamBegin. The FFT needs a chunk of time domain data to listen to, that's why I accumulate frames in a buffer. When the accumulator is full, compute it and display the result. You can change the amount of frames to buffer in _accumulatorDataLenght_ . Depending on your sample rate (how many frames per second) you are able to know how many seconds of audio the buffer contains (accumulatorDataLenght / fps). Just use some other values in there. Note: use only values like 8, 16, 32, 64, 128, 256, 512 - meaning 2 to the power of integer value. – krafter Feb 21 '17 at 11:05
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/136235/discussion-between-dreambegin-and-krafter). – tryKuldeepTanwar Feb 21 '17 at 11:07
  • But the smaller buffer you use - the smaller is the precision of detection. – krafter Feb 21 '17 at 11:09
  • Its taking 2-3 second to update the label. How i can get result very fast? – Kishore Suthar May 08 '18 at 11:27
  • 1
    @suthar You can use smaller values in accumulatorDataLenght. Keep in mind that the smaller the value is the less accurate the frequency is. – krafter May 08 '18 at 12:11
  • @suthar I don't swift version of this code. It's pretty swift as it is, because the core code is in C. – krafter May 10 '18 at 06:56
  • 1
    Thank you for this fine effort! – elight Jan 15 '19 at 17:27
  • @krafter: is there any library available for this in react native – Kishore Suthar Feb 28 '19 at 16:46
  • Well, if it allows you to write native code you can use this – krafter Mar 01 '19 at 14:49
15

Questions like this are asked a lot here on SO. (I've answered a similar one here) so I wrote a little tutorial with code that you can use even in commercial and closed source apps. This is not necessarily the BEST way, but it's a way that many people understand. You will have to modify it based on what you mean by "Hz average value of every short music segment". Do you mean the fundamental pitch or the frequency centroid, for example.

You might want to use Apple's FFT in the accelerate framework as suggested by another answer.

Hope it helps.

http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html

Community
  • 1
  • 1
Bjorn Roche
  • 10,988
  • 6
  • 32
  • 57
5

Apple does not provide a framework for frequency or pitch estimation. However, the iOS Accelerate framework does include routines for FFT and autocorrelation which can be used as components of more sophisticated frequency and pitch recognition or estimation algorithms.

There is no way that is both easy and best, except possibly for a single long continuous constant frequency pure sinusoidal tone in almost zero noise, where an interpolated magnitude peak of a long windowed FFT might be suitable. For voice and music, that simple method will very often not work at all. But a search for pitch detection or estimation methods will turn up lots of research papers on more suitable algorithms.

hotpaw2
  • 69,203
  • 14
  • 86
  • 150