I have an audio clip and I copy a 256-point part satarting at 10th second. Then I create a 256-point frame on the original clip. At every itreation, I calculate the DFT of the frame, multiply it with the DFT of the interval I copied, take the mean of the result and append it to a list and shift the frame 1 point right. When I graph the means, I know I should get the maximum value of convolution when the copied interval's DFT is multiplied with itself. And this is supposed to show that convolution can be used as a similarity metric. However, when I apply the explained procedure, I can't get a peak that is distinguishable from any other value. What I have done is as follows:
import numpy as np
import scipy.io.wavfile
import matplotlib.pyplot as plt
rate1, data1 = scipy.io.wavfile.read('Africa.wav')
data1 = np.array(data1, dtype=np.float64)
interval_1 = data1[rate1 * 10: rate1 * 10 + 256]
dft_1 = np.fft.fft(interval_1)
cv = []
for i in range(data1.size - 256):
dft = np.fft.fft(data1[i: i + 256])
Y = np.multiply(dft, dft_1)
Y = np.abs(Y)
Y = np.mean(Y)
cv.append(Y)
cv = np.array(cv)
plt.figure()
plt.title("Convolution with 256 point sample at 10th second")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.plot(cv)
plt.show()
I get the following graph:
