After your FFT and filter, you need to do an inverse FFT to get the data back to the time domain. Then you want to add that set of samples to your .WAV file.
As far as producing the file itself goes, the format is widely documented (Googling for ".WAV format" should turn up more results than you have any use for), and pretty simple. It's basically a simple header (called a "chunk") that says it's a .WAV file (or actually a "RIFF" file). Then there's an "fmt " chunk that tells about the format of the samples (bits per sample, samples per second, number of channels, etc.) Then there's a "data" chunk that contains the samples themselves.
Since it sounds like you're going to be doing this in real time, my advice would be to forget about doing your FFT, filter, and iFFT. An FIR filter will give essentially the same results, but generally a lot faster. The basic idea of the FIR filter is that instead of converting your data to frequency domain, filtering it, then converting back to time domain, you convert your filter coefficients to time domain, and apply them (fairly) directly to your input data. This is where DSPs earn their keep: nearly all of them have multiply-accumulate instructions, which can implement most of a FIR filter in one instruction. Even without that, however, getting a FIR filter to run in real time on a modern processor doesn't take any real trick unless you're doing really fast sampling. In any case, it's a lot easier that getting an FFT/filter/iFFT to operate at the same speed.
@Jerry Coffin - I have to disagree on the speed of FIR versus FFT/multiply/IFFT. For a 64 tap FIR filter, each output sample requires 64 multiply-accumulates. Via an FFT using an N=128 transform and overlap-save processing (en.wikipedia.org/wiki/Overlap-save_method), you do a transform, complex buffer multiply, and inverse transform = 2*128*log2(128) + 6*128 = 2560 operations, which would calculate 64 samples for a ops/sample count of 40, saving you 24 cycles. There's some handwaving on memory access etc here, but as your filter gets longer, the FFT method shines.
There is some point at which a FFT will be better, that's true -- IME, that's pretty rare in practice though. In particular, a FIR is extremely cache friendly (linear read through the data and coefficients). By contrast, an FFT practically defines "cache hostile". A single cache miss is virtually guaranteed to be at least 50 cycles on a modern processor. On a modern processor, you can often treat CPU cycles as free; the limiting factor is memory bandwidth.
I put the example that is here: ccrma.stanford.edu/courses/422/projects/WaveFormat into a .txt and then changed the extension to .wav and didn't work. It wasn't supposed to work just like this?
Glancing at that, it shows the bytes in hex -- did you enter them as hexadecimal text? If so, it shouldn't work. Rather, those are supposed to be entered as the binary values of individual bytes. It also looks like the sample they show is incomplete -- it shows the headers and the first few samples, but its header says it'll have a lot more samples than show up there.