Rectangle 27 1

python Analyze audio using Fast Fourier Transform?


Hi, sorry for the initial (wrong) answer... didn't get the math right. This should be correct now.

If you want to have 32 bars, you should as far as I understand take the average of four successive amplitudes, getting 256 / 4 = 32 bars as you want.

Please note that, if c is a complex number, sqrt(c.real2 + c.imag2) == abs(c)

You need to convert the complex numbers into amplitude by calculating the sqrt(i2 + j2) where i and j are the real and imaginary parts, resp.

what you have is a sample whose length in time is 256/44100 = 0.00580499 seconds. This means that your frequency resolution is 1 / 0.00580499 = 172 Hz. The 256 values you get out from Python correspond to the frequencies, basically, from 86 Hz to 255*172+86 Hz = 43946 Hz. The numbers you get out are complex numbers (hence the "j" at the end of every second number).

Note
Rectangle 27 1

python Analyze audio using Fast Fourier Transform?


+1 - Although I know about FFTs already - one of the best plain English explanations on the web.

+1 Awesome, this helped me understand what I was doing wrong.

+1 for the great answer and making me learn a new idiom, as I'm not a native english speaker. ;)

Finally, you should be averaging the two channels together if you want to show the frequency content of the entire audio signal as a whole. You are mixing the stereo audio into mono audio and showing the combined frequencies. If you want two separate displays for right and left frequencies, then you will need to perform the Fourier Transform on each channel separately.

I can mostly only find overly complicated explanations of FFT online, this was a great, and simple explanation of how the number of sampled points affects the results of the FFT. Thank you for this!

Once you have calculated the magnitude of each FFT coefficient, you need to figure out which audio frequency each FFT coefficient belongs to. An N point FFT will give you the frequency content of your signal at N equally spaced frequencies, starting at 0. Because your sampling frequency is 44100 samples / sec. and the number of points in your FFT is 256, your frequency spacing is 44100 / 256 = 172 Hz (approximately)

So that is what those numbers represent. Converting to a percentage of height could be done by scaling each frequency component magnitude by the sum of all component magnitudes. Although, that would only give you a representation of the relative frequency distribution, and not the actual power for each frequency. You could try scaling by the maximum magnitude possible for a frequency component, but I'm not sure that that would display very well. The quickest way to find a workable scaling factor would be to experiment on loud and soft audio signals to find the right setting.

That should be enough information to get you started. If you would like a much more approachable introduction to FFTs than is given on Wikipedia, you could try Understanding Digital Signal Processing: 2nd Ed.. It was very helpful for me.

The array you are showing is the Fourier Transform coefficients of the audio signal. These coefficients can be used to get the frequency content of the audio. The FFT is defined for complex valued input functions, so the coefficients you get out will be imaginary numbers even though your input is all real values. In order to get the amount of power in each frequency, you need to calculate the magnitude of the FFT coefficient for each frequency. This is not just the real component of the coefficient, you need to calculate the square root of the sum of the square of its real and imaginary components. That is, if your coefficient is a + b*j, then its magnitude is sqrt(a^2 + b^2).

The first coefficient in your array will be the 0 frequency coefficient. That is basically the average power level for all frequencies. The rest of your coefficients will count up from 0 in multiples of 172 Hz until you get to 128. In an FFT, you only can measure frequencies up to half your sample points. Read these links on the Nyquist Frequency and Nyquist-Shannon Sampling Theorem if you are a glutton for punishment and need to know why, but the basic result is that your lower frequencies are going to be replicated or aliased in the higher frequency buckets. So the frequencies will start from 0, increase by 172 Hz for each coefficient up to the N/2 coefficient, then decrease by 172 Hz until the N - 1 coefficient.

Note
Rectangle 27 1

python Analyze audio using Fast Fourier Transform?


Upper limit = 1000 * 2 ^ ( 1 / ( 2 * 3 ) ) = 1122.5
Lower limit = 1000 / 2 ^ ( 1 / ( 2 * 3 ) ) =  890.9
next lower =  A / 2 ^ ( 1 / X )
next higher = A * 2 ^ ( 1 / X )
upper limit = A * 2 ^ ( 1 / 2X )
lower limit = A / 2 ^ ( 1 / 2X )

(1/3 octave bars would get you around 30 bars between ~40hz and ~20khz). As you can figure out by now, as we go higher we will average a larger range of numbers. Low bars typically only include 1 or a small number of data points. While the higher bars can be the average of hundreds of points. The reason being that 86hz is an octave above 43hz... while 10086hz sounds almost the same as 10043hz.

As for the division into bars this should not be done as antti suggest, by dividing the data equally based on the number of bars. The most useful would be to divide the data into octave parts, each octave being double the frequency of the previous. (ie. 100hz is one octave above 50hz, which is one octave above 25hz).

Depending on how many bars you want, you divide the whole range into 1/X octave ranges. Based on a given center frequency of A on the bar, you get the upper and lower limits of the bar from:

For example: We want to divide into 1/3 octaves ranges and we start with a center frequency of 1khz.

Given 44100hz and 1024 samples (43hz between each data point) we should average out values 21 through 26. ( 890.9 / 43 = 20.72 ~ 21 and 1122.5 / 43 = 26.10 ~ 26 )

To calculate the next adjoining center frequency you use a similar calculation:

You then average the data that fits into these ranges to get the amplitude for each bar.

Note
Rectangle 27 0

python Analyze audio using Fast Fourier Transform?


Hi, sorry for the initial (wrong) answer... didn't get the math right. This should be correct now.

If you want to have 32 bars, you should as far as I understand take the average of four successive amplitudes, getting 256 / 4 = 32 bars as you want.

Please note that, if c is a complex number, sqrt(c.real2 + c.imag2) == abs(c)

You need to convert the complex numbers into amplitude by calculating the sqrt(i**2 + j**2) where i and j are the real and imaginary parts, resp.

what you have is a sample whose length in time is 256/44100 = 0.00580499 seconds. This means that your frequency resolution is 1 / 0.00580499 = 172 Hz. The 256 values you get out from Python correspond to the frequencies, basically, from 86 Hz to 255*172+86 Hz = 43946 Hz. The numbers you get out are complex numbers (hence the "j" at the end of every second number).

Note