AForge.NET is an open-source library with Fast Fourier Transform support.ExocortexDSP is also another option.

```Exocortex.DSP.ComplexF[] complexData = new Exocortex.DSP.ComplexF;
for (int i = 0; i < 512; ++i)
{
// Fill the complex data
}

// FFT the time domain data to get frequency domain data
Exocortex.DSP.Fourier.FFT(complexData, Exocortex.DSP.FourierDirection.Forward);

float[] mag_dat_buffer = new float[complexData.Length];
// Loop through FFT'ed data and do something with it
for (int i = 0; i < complexData.Length; ++i)
{
// Calculate magnitude or do something with the new complex data
mag_data_buffer[i] = ImaginaryNumberMagnitude(complexData[i].Im, complexData[i].Re);
}```

Thank you, as in the original question, I am after examples of pulling said data from mic or line in. Would I need a second library for that? Thank you.

Correct. There is no built in functionality for sound. You would need something like this: codeproject.com/KB/audio-video/cswavrec.aspx

is this "512 length array" means . 512 consecutive audio sample pieces ? if yes , i got it .

## audio - Fast Fourier Transform in C# - Stack Overflow

c# audio signal-processing fft

The array you are showing is the Fourier Transform coefficients of the audio signal. These coefficients can be used to get the frequency content of the audio. The FFT is defined for complex valued input functions, so the coefficients you get out will be imaginary numbers even though your input is all real values. In order to get the amount of power in each frequency, you need to calculate the magnitude of the FFT coefficient for each frequency. This is not just the real component of the coefficient, you need to calculate the square root of the sum of the square of its real and imaginary components. That is, if your coefficient is a + b*j, then its magnitude is sqrt(a^2 + b^2).

Once you have calculated the magnitude of each FFT coefficient, you need to figure out which audio frequency each FFT coefficient belongs to. An N point FFT will give you the frequency content of your signal at N equally spaced frequencies, starting at 0. Because your sampling frequency is 44100 samples / sec. and the number of points in your FFT is 256, your frequency spacing is 44100 / 256 = 172 Hz (approximately)

The first coefficient in your array will be the 0 frequency coefficient. That is basically the average power level for all frequencies. The rest of your coefficients will count up from 0 in multiples of 172 Hz until you get to 128. In an FFT, you only can measure frequencies up to half your sample points. Read these links on the Nyquist Frequency and Nyquist-Shannon Sampling Theorem if you are a glutton for punishment and need to know why, but the basic result is that your lower frequencies are going to be replicated or aliased in the higher frequency buckets. So the frequencies will start from 0, increase by 172 Hz for each coefficient up to the N/2 coefficient, then decrease by 172 Hz until the N - 1 coefficient.

That should be enough information to get you started. If you would like a much more approachable introduction to FFTs than is given on Wikipedia, you could try Understanding Digital Signal Processing: 2nd Ed.. It was very helpful for me.

So that is what those numbers represent. Converting to a percentage of height could be done by scaling each frequency component magnitude by the sum of all component magnitudes. Although, that would only give you a representation of the relative frequency distribution, and not the actual power for each frequency. You could try scaling by the maximum magnitude possible for a frequency component, but I'm not sure that that would display very well. The quickest way to find a workable scaling factor would be to experiment on loud and soft audio signals to find the right setting.

Finally, you should be averaging the two channels together if you want to show the frequency content of the entire audio signal as a whole. You are mixing the stereo audio into mono audio and showing the combined frequencies. If you want two separate displays for right and left frequencies, then you will need to perform the Fourier Transform on each channel separately.

+1 for the great answer and making me learn a new idiom, as I'm not a native english speaker. ;)

+1 Awesome, this helped me understand what I was doing wrong.

+1 - Although I know about FFTs already - one of the best plain English explanations on the web.

I can mostly only find overly complicated explanations of FFT online, this was a great, and simple explanation of how the number of sampled points affects the results of the FFT. Thank you for this!

## python - Analyze audio using Fast Fourier Transform - Stack Overflow

python audio signal-processing fft spectrum

However, the math of the Fourier transform assumes that the signal being Fourier transformed is periodic over the time span in question.

This mismatch between the Fourier assumption of periodicity, and the real world fact that audio signals are generally non-periodic, leads to errors in the transform.

These errors are called "spectral leakage", and generally manifest as a wrongful distribution of energy across the power spectrum of the signal.

Notice the distribution of energy above the -60 dB line, and the three distinct peaks at roughly 440 Hz, 880 Hz, and 1320 Hz. This particular distribution of energy contains "spectral leakage" errors.

To somewhat mitigate the "spectral leakage" errors, you can pre-multiply the signal by a window function designed specifically for that purpose, like for example the Hann window function.

The plot below shows the Hann window function in the time-domain. Notice how the tails of the function go smoothly to zero, while the center portion of the function tends smoothly towards the value 1.

Now let's apply the Hann window to the guitar's audio data, and then FFT the resulting signal.

The plot below shows a closeup of the power spectrum of the same signal (an acoustic guitar playing the A4 note), but this time the signal was pre-multiplied by the Hann window function prior to the FFT.

Notice how the distribution of energy above the -60 dB line has changed significantly, and how the three distinct peaks have changed shape and height. This particular distribution of spectral energy contains fewer "spectral leakage" errors.

The acoustic guitar's A4 note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, no other instruments or voices, and no post processing.

Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here:

## Why do I need to apply a window function to samples when building a po...

audio signal-processing fft spectrum window-functions
• For every image in the database, generate a fingerprint using a Fourier Transform
• Load the source image, make a fingerprint of the image
• Calculate the Euclidean Distance between the source and all the images in the database

## c# - Finding matches between high quality and low quality, pixelated i...

c# image algorithm image-processing pattern-matching

To detect frequency you should check out the fast Fourier transform (FFT) algorithm.

## audio - How to detect sound frequency / pitch on an iPhone? - Stack Ov...

iphone audio frequency pitch

To detect frequency you should check out the fast Fourier transform (FFT) algorithm.

## audio - How to detect sound frequency / pitch on an iPhone? - Stack Ov...

iphone audio frequency pitch

That approach goes by the name Short-time Fourier transform. You get all the answers to your question on wikipedia: https://en.wikipedia.org/wiki/Short-time_Fourier_transform

It works great in practice and you can even get better resolution out of it compared to what you would expect from a rolling window by using the phase difference between the fft's.

Here is one article that does pitch shifting of audio signals. The way how to get higher frequency resolution is well explained: http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/

## signal processing - Is a "rolling" FFT possible and could it be of use...

signal-processing fft processing

Accelerate provides hundreds of mathematical functions optimized for iPhone and iPod touch, including signal-processing routines, fast Fourier transforms, basic vector and matrix operations, and industry-standard functions for factoring matrices and solving systems of linear equations.

## iphone - Do you know a good and efficient FFT? - Stack Overflow

iphone fft

Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.

Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal

Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...

another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.

however, mp3 achieves it's compression by chopping off the frequencies outside of human hearing. Fourier may not be the right tool here.

AH. Good point, that.

MP3 doesn't 'chop off' frequencies outside of human hearing and it performs cosine transformations (related to Fourier's) individually to enveloped windows about 1 ms wide each. I'd try dmckee's first suggestion on 10 s long windows and see what comes out.

## algorithm - How to detect the BPM of a song in php - Stack Overflow

algorithm audio signal-processing beat-detection

Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.

Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal

Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...

another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.

however, mp3 achieves it's compression by chopping off the frequencies outside of human hearing. Fourier may not be the right tool here.

AH. Good point, that.

MP3 doesn't 'chop off' frequencies outside of human hearing and it performs cosine transformations (related to Fourier's) individually to enveloped windows about 1 ms wide each. I'd try dmckee's first suggestion on 10 s long windows and see what comes out.

## algorithm - How to detect the BPM of a song in php - Stack Overflow

algorithm audio signal-processing beat-detection

I cannot open your first image. I implemented the Fourier transform on your second one, and you can see frequency responses at specific points:

You can further process the image by extract the local maximum of the magnitude, and they share the same distance to the center (zero frequency). This may be considered as repetitive patterns.

Regarding the case that patterns share major similarity instead of repetitive feature, it is hard to tell whether the frequency magnitude still has such evident response. It depends on how the pattern looks like.

Another possible approach is the auto-correlation on your image.

umm, Thanks for the good info! I've updated the first image. One thing that I'm not familiar with that much is the role of FFT in this context; how transforming into a frequency domain helps?

@user3204706, spatial info corresponding to frequency info. For a image with highly periodic pattern in spatial domain, it will also contain highly periodic frequency distribution in frequency domain is also periodic, and also conjugate. Conjugate means, if you have the frequency of u+iv in 2D domain, you must also have u-iv, -u+iv, and -u-iv.

## How to detect if an image is a texture or a pattern-based image? - Sta...

image-processing textures computer-vision image-recognition

First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:

```import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
numBands = int(numCoefficients)
maxMel = int(freqToMel(maxHz))
minMel = int(freqToMel(minHz))

# Create a matrix for triangular filters, one row per filter
filterMatrix = numpy.zeros((numBands, blockSize))

melRange = numpy.array(xrange(numBands + 2))

melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

# each array index represent the center of each triangular filter
aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
aux = 0.5 + 700 * blockSize * aux
aux = numpy.floor(aux)  # Arredonda pra baixo
centerIndex = numpy.array(aux, int)  # Get int values

for i in xrange(numBands):
start, centre, end = centerIndex[i:i + 3]
k1 = numpy.float32(centre - start)
k2 = numpy.float32(end - centre)
up = (numpy.array(xrange(start, centre)) - start) / k1
down = (end - numpy.array(xrange(centre, end))) / k2

filterMatrix[i][start:centre] = up
filterMatrix[i][centre:end] = down

return filterMatrix.transpose()

def freqToMel(freq):
return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
return 700 * (math.exp(mel / 1127.01048) - 1)```

Hi, Do you mean for "file.wav" to be a frame (10ms to 30ms)? If not, you need to split signal into small frames and then apply the operations you did to each frame. For each frame, you should get out 13 coefficients.

... I was confused with that too. I assumed he was talking about the size of the window. It's Where we grab the values and then compute the FFT on it. Please confirm

but what happens once i have the coefficients? what do it do with them? im assuming i get the coefficients of sound one and then the coefficients of sound 2... then what

Sorry! I extracted it from my research code, then i forgot do made it clear. Consider "file.wav" as a sound frame with 10ms to 30ms! I think just have the coefficients isn't enough. You need to pass MFCC for an algorithm to classify it. I'm using a back-propagation neural network here to classify percussive sounds. An interesting project that uses MFCC to classify drum and other percursive sounds is william brandt's timbreID

there's a bug in the melToFreq() function, the -1 should be outside the inner parentheses

## logging - HOW to get MFCC from an FFT on a signal? - Stack Overflow

logging signal-processing fft

This can be solved in O(k*log k) (where k is a maximal difference) if you use Fourier transform for multiplication of polynomials.

Consider the following problem: having two sets A = a_1, ..., a_n and B = b_1, ..., b_m, for each X find the number of pairs (i, j) such that a_i + b_j = X. It can be solved as follows.

Let Pa = x**a_1 + ... + x**a_n, Pb = x**b_1 + ... + x**b_m. If you look at Pa * Pb, you may find that the coefficient for x**R is an answer for the problem where X = R. So, multiply this polynomials using Fourier transform, and you will find the answer for every X in O(n*log n).

Afterwards, your problem may be reduced to this one saying A = arr_1, ..., arr_n, B = -arr_1, ..., -arr_n and shifting (adding a constant) to every value of A and B to make them lay between 0 and k.

Sometimes I don't understand SO, you have the best answer by far but I think most people don't take the time to understand it so it goes ignored .... (Basically you are using a generating function to do the counting for you, a great solution.)

@ldog The post claims this is both O(nlog n) and O(klog k) where k is the maximal distance. I believe the latter is true, and that makes this worse than the brute force O(n^2) approach unless k is sufficiently small. In those cases it may well be worth the added complexity, but it doesn't seem to blow the brute force approach out of the water. Still, a nice result.

@ldog Obviously an answer that's very easy to understand is expected to be upvoted more than an answer that could require significant time to understand, regardless of whether or not the latter is better. Not to mention that this answer is 8 hours older, which tends to make a big difference. And while I do like great answers, I can't spend the whole day reading up on things to understand them.

## algorithm - Find the number of couples with the same difference in a s...

arrays algorithm

PCM audio is not stored as a series of pitches. To figure that up, you need a Fast Fourier Transform, or FFT. See https://stackoverflow.com/search?q=pitch+detection, there are 10s of posts about this already.

Think of a audio waveform. PCM encoding is simply sampling that wave a certain number of times per second, and using a specific number of bits per sample.

16-bit Mono PCM at 44.1kHz means that 44,100 times per second, a 16-bit value (2 bytes) will be stored that represents the waveform at the specific time the sample was taken. 44.1kHz is fast enough to store frequencies that approach 22kHz (see Nyquist Frequency).

FFT turns those samples from the time domain to the frequency domain. That is, you can find what the levels of all the frequencies are for a particular period of time. The more bands you look at, the more computational intensive it is.

So I was able to implement code I found here, but I get inaccurate results. For instance, when playing middle C on a piano (both virtually and in the real world), it reports the frequency as approximately 484 Hz, when in reality a middle C is closer to 261 Hz. Any idea why this would be the case?

@Drew: pitch detection is quite tricky - most musical instruments have complex power spectra and often the fundamental is not the strongest component - the ear "fills in" the fundamental pitch based on information from the harmonics. Search SO for "pitch detection" as there are already a lot of questions with good answers on this whole subject.

@Drew, I recommend spending some time looking at a spectrum analyzer to get a better idea of what you are looking at. Unless you are looking at sine waves, what you hear is actually made up of many pitches that give the sound its sonic characteristics, such as timbre. Psychoacoustics comes into play.

## c# - Getting audio information from PCM data - Stack Overflow

c# audio frequency pcm

You could study the Fast Fourier Transform methods for multiplication.O(N log N)

You might be able to do something similar with your problem.

## c++ - Efficient Longest arithmetic progression for a set of linear Poi...

c++ python algorithm math geometry

Audio analysis is a difficult thing requiring a lot of complex math (think Fourier Transforms). The question you have to ask is "what is silence". If the audio that you are trying to edit is captured from an analog source, the chances are that there isn't any silence... they will only be areas of soft noise (line hum, ambient background noise, etc).

All that said, an algorithm that should work would be to determine a minimum volume (amplitude) threshold and duration (say, <10dbA for more than 2 seconds) and then simply do a volume analysis of the waveform looking for areas that meet this criteria (with perhaps some filters for millisecond spikes). I've never written this in C#, but this CodeProject article looks interesting; it describes C# code to draw a waveform... that is the same kind of code which could be used to do other amplitude analysis.

It's not, it's alive and kicking! Not sure if the project is working well though. This is 8 years later.

## .net - Detecting audio silence in WAV files using C# - Stack Overflow

c# .net audio

Audio analysis is a difficult thing requiring a lot of complex math (think Fourier Transforms). The question you have to ask is "what is silence". If the audio that you are trying to edit is captured from an analog source, the chances are that there isn't any silence... they will only be areas of soft noise (line hum, ambient background noise, etc).

All that said, an algorithm that should work would be to determine a minimum volume (amplitude) threshold and duration (say, <10dbA for more than 2 seconds) and then simply do a volume analysis of the waveform looking for areas that meet this criteria (with perhaps some filters for millisecond spikes). I've never written this in C#, but this CodeProject article looks interesting; it describes C# code to draw a waveform... that is the same kind of code which could be used to do other amplitude analysis.

It's not, it's alive and kicking! Not sure if the project is working well though. This is 8 years later.

## .net - Detecting audio silence in WAV files using C# - Stack Overflow

c# .net audio

Since you mention Fourier transforms as an application, you might also consider to compute your sines/cosines using the equations

I.e. you can compute sin(n * x), cos(n * x) for n = 0, 1, 2 ... iteratively from sin((n-1) * x), cos((n-1) * x) and the constants sin(x), cos(x) with 4 multiplications. Of course that only works if you have to evaluate sin(x), cos(x) on an arithmetic sequence.

Comparing the approaches without the actual implementation is difficult. It depends a lot on how well your tables fit into the caches.

I've see this approach used in oscillators. It's a good one.

I tried this once for an FFT implementation, which is one application the OP mentions. I still used tables in the end because the result needed not to be precise and hence a small table was enough.

Nice. Thanks a lot.

## c# - Calculating vs. lookup tables for sine value performance? - Stack...

c# performance math signal-processing

If the arrays contain integers of limited size (i.e. in range -u to u) then you can solve this in O(n+ulogu) time by using the fast Fourier transform to convolve the histograms of each collection together.

For example, the set a=[-1,2,2,2,2,3] would be represented by a histogram with values:

```ha[-1] = 1
ha  = 4
ha  = 1```

After convolving all the histograms together with the FFT, the resulting histogram will contain entries where the value for each bin tells you the number of ways of combining the numbers to get each possible total. To find the answer to your question with a total of 0, all you need to do is read the value of the histogram for bin 0.

## algorithm - 5 numbers such that their sum equals 0 - Stack Overflow

algorithm search

For the scale factor a, you can estimate it by computing the ratio of the amplitude spectra of the two signals since the Fourier transform is invariant to shift.

Similarly, you can estimate the shift factor b by using the Mellin transform, which is scale invariant.

The Fourier transform trick was already tried, but the results are not accurate enough, not even on simulated data, nevertheless on real-life problems. I'm was not very familiar with the Mellin transform, but now I am :). The problem here is the same as in the Fourier transform - inaccuracy. Note that If I would have guessed the scale successively, simple correlation would give me the shift.

## algorithm - finding the best/ scale/shift between two vectors - Stack ...

algorithm matlab signal-processing octave model-fitting