## Audio Signal Processing BasicsJarno Seppänen27.5.1999 |
Tampere University of TechnologySignal Processing Laboratory |

There is a signal processing glossary on a page of its own.

For a more exhaustive list of English-Finnish translations, see the *
Audiosignaalinkäsittelyn sanasto* by Vesa Välimäki.

- The sine wave is more or less the building block of all signals, musical or not.
- There is exactly one frequency present in a signal with one steady sine wave.
- Three parameters, the
*frequency*, the*amplitude*and the*initial phase*, characterize every steady sine wave completely. - The Fourier transform can be used to inspect what kinds of sine waves there are in a signal.

fs = 44100; t = 0:1/fs:0.001; s = sin(2 * pi * 1700 * t); subplot(211), stem(abs(fft(s))), title('abs(fft(s))') subplot(212), stem(s), title('s')

- white noise has an equal amount of energy on every frequency
- in music, there is often band-limited noise present

fs = 44100; n = randn(fs, 1); n = n / max(abs(n)); subplot(211) plot(n), axis tight subplot(212) specgram(n)

- The speech sample shown is the finnish word "seitsemän"
- You can listen to the speech sample.

[s, fs] = wavread('seiska.wav'); plot(s),axis tight,grid on

The following is the *spectrogram* of the above speech sound.

specgram(s, 512, fs); colorbar

- The piano sample shown is the middle C, whose
*fundamental frequency*is 261 Hz. - The piano sample is an example of a
*harmonic*sound; this means that the sound consists of sine waves which are integer multiples of the fundamental frequency. (Actually the piano is not perfectly harmonic.) - You can listen to the piano sample.

[s, fs] = wavread('pia60.wav'); plot(s),axis tight,grid on

The following is the *spectrogram* of the above piano sound, resampled
to 16000 Hz sample rate. Here the overtones can be seen clearly.

s2 = resample(s, 1, 3); specgram(s2, 512, fs / 3); colorbar

- The snare drum sample doesn't have a fundamental frequency nor does it have overtones.
- You can listen to the snare drum sample.

[s, fs] = wavread('snareHit.wav'); plot(s),axis tight,grid on

And here is the spectrogram of the snare drum hit, without resampling. Notice the lack of harmonic content.

specgram(s, 512, fs); colorbar

- There are broadly two kinds of digital linear filters: finite impulse response (FIR) and infinite impulse response (IIR) filters.
- Comparison of FIR vs. IIR filters:
**FIR****IIR**linear phase response possible yes no overall frequency response control good bad nearly-"brickwall" response possible no yes efficiency (multiplications required) bad good - Conclusion: in audio applications, when the phase response isn't critical, it is often profitable to use IIR filters because of their efficiency.

fir_b = remez(30,[0 0.2 0.3 1], [1 1 0 0]); [iir_b, iir_a] = butter(10, 0.2); subplot(211) impz(fir_b, 1, 41) title('An FIR filter'), axis([0 40 -0.1 0.25]), grid on subplot(212) impz(iir_b, iir_a, 41) title('An IIR filter'), axis([0 40 -0.1 0.25]), grid on

- The Fourier transform can be used to find out the frequency domain representation of a time domain signal. The inverse Fourier transform converts a frequency domain representation into time domain.
- When the MATLAB
`FFT`function is used to compute the Fourier transform, the resulting vector will contain amplitude and phase information on positive and negative frequencies. The positive and negative frequencies will be equal, iff the time-domain signal is real. - The Fourier transform decomposes a signal into a sum of stationary sinusoids. Therefore, when a whole regular sound signal is transformed, the changes in frequency content cannot be observed. Therefore short-time windowed FFT is usually used to observe the instantaneous frequency content.

[s, fs] = wavread('snareHit.wav'); subplot(211), plot(abs(fft(s))), title('abs(fft(s))') subplot(212), plot(s), title('s')

- The following spectrogram is as computed above, using 11.6 ms windows which overlap by 50%
- the spectrum displayed below the spectrogram is taken at 0.2 seconds time

u = s(0.2 * fs:0.2 * fs + 511) .* hanning(512); U = fft(u); f = (0:256) / 256 * fs / 2; plot(f, 20 * log10(abs(U(1:257)))) axis tight,grid on xlabel('frequency [Hz]') ylabel('amplitude [dB]')

- short-time signal processing is practically always done using
*windowing* - in short-time signal processing, signals are cut into small pieces called
*frames*, which are processed one at a time - frames are
*windowed*with a*window function*in order to improve the frequency-domain representation - what windowing essentially means is multiplying the signal frame with the window function point-by-point

[s, fs] = wavread('pia60.wav', [5000 6000]); subplot(131) plot(s(1:512)) title('s(1:512)'), axis tight, grid on subplot(132) plot(hanning(512),'r') title('hanning(512)'), axis tight, grid on subplot(133) plot(s(1:512) .* hanning(512)) title('s(1:512) .* hanning(512)'), axis tight, grid on

S1 = fft(s(1:512)); S2 = fft(s(1:512) .* hanning(512)); f = (0:256) / 256 * fs / 2; plot(f, 20 * log10(abs(S1(1:257)))) hold on plot(f, 20 * log10(abs(S2(1:257))), 'r') axis tight,grid on xlabel('frequency [Hz]') ylabel('amplitude [dB]')

- the
*cross-correlation*between two signals tells how `identical' the signals are - in other words, if there is correlation between the signals, then the signals are more or less dependant on each other
- for example, the correlation between two sine waves with different periods is zero

t = 0:1/fs:0.2; a = 2 * sin(2 * pi * 20 * t); b = 2 * sin(2 * pi * 30 * t); ep(a, b) sum(a .* b) ans = -4.7622e-12

- normally the correlation value is computed with different alignments, called lags, between the signals
- the correlation value is computed between a[n] and b[n - l], a[n] and b[n - l + 1], a[n] and b[n - l + 2], ..., a[n] and b[n - 1], a[n] and b[n], a[n] and b[n + 1], ..., a[n] and b[n + l - 1] and finally between a[n] and b[n + l]
- i.e. the other signal is held static and the other signal is shifted one sample at a time and the correlation value is computed every time

*autocorrelation*means the cross-correlation of a signal with itself- the autocorrelation value on lag 0 is equal to the energy of the signal

subplot(211) n = randn(4000, 1); [ac, l] = xcorr(n, n, 1000); plot(l, ac), axis tight, grid on title('gaussian noise autocorrelation') subplot(212) [s, fs] = wavread('pia60.wav'); [ac, l] = xcorr(s, s, 1000); plot(l, ac), axis tight, grid on title('piano autocorrelation')

- here are 6 sine waves: sine1, sine2, sine3, sine4, sine5, sine6
- what frequency belongs to which sine: (a) 61 Hz, (b) 688 Hz, (c) 1364 Hz, (d) 4539 Hz, (e) 8200 Hz, (f) 13954 Hz?

fs = 44100; f = 2 .^ (10 * rand(1, 6) + 4.2); t = 0:1/fs:1; for i = 1:length(f) s = 0.2 * sin(2 * pi * f(i) * t); wavwrite(s, fs, 16, ['sine' num2str(i) '.wav']); end

- here are 6 bandpass filtered versions of a piece of music: filter1, filter2, filter3, filter4, filter5, filter6
- what bandpass frequency belongs to which version: (a) 212 Hz, (b) 552 Hz, (c) 1300 Hz, (d) 1428 Hz, (e) 2676 Hz and (f) 6480 Hz

[s, fs] = wavread('helmi.wav'); f = 2 .^ (7 * rand(1, 6) + 6); for i = 1:length(f) ff = [0.6 0.9 1.1 1.4] * f(i) * 2 / fs; ff = [0 ff 1]; b = remez(300, ff, [0 0 1 1 0 0], [10 1 10]); freqz(b, 1, 512, fs); drawnow ss = 4 * filter(b, 1, s); wavwrite(ss, fs, 16, ['filter' num2str(i) '.wav']); end

- correct answers to the sine frequencies: 1-c (1364 Hz), 2-d (4539 Hz), 3-f (13954 Hz), 4-b (688 Hz), 5-e (8200 Hz) and 6-a (61 Hz)
- correct answers to the filter bandpass frequencies: 1-e (2676 Hz), 2-b (552 Hz), 3-c (1300 Hz), 4-f (6480 Hz), 5-d (1428 Hz) and 6-a (212 Hz)

- Pohlmann, Ken, ``Principles of Digital Audio.'' 3rd Edition, ISBN 0-07-050468-7, McGraw-Hill, Inc., 1995
- Oppenheim, Alan V. and Schafer, Ronald W., ``Discrete-Time Signal Processing.''
- Ifeachor and Jervis, ``Digital Signal Processing.''

*
http://www.cs.tut.fi/sgn/arg/intro/
Last modified: Tue Jun 1 11:58:55 1999
*