# Audio Signal Processing Basics

Jarno Seppänen
27.5.1999

Tampere University of Technology
Signal Processing Laboratory

## Glossary

There is a signal processing glossary on a page of its own.

For a more exhaustive list of English-Finnish translations, see the Audiosignaalinkäsittelyn sanasto by Vesa Välimäki.

## Primitive Signals

### Sine wave

• The sine wave is more or less the building block of all signals, musical or not.
• There is exactly one frequency present in a signal with one steady sine wave.
• Three parameters, the frequency, the amplitude and the initial phase, characterize every steady sine wave completely.
• The Fourier transform can be used to inspect what kinds of sine waves there are in a signal.
```fs = 44100;
t = 0:1/fs:0.001;
s = sin(2 * pi * 1700 * t);
subplot(211), stem(abs(fft(s))), title('abs(fft(s))')
subplot(212), stem(s), title('s')
```

### Noise

• white noise has an equal amount of energy on every frequency
• in music, there is often band-limited noise present
```fs = 44100;
n = randn(fs, 1);
n = n / max(abs(n));
subplot(211)
plot(n), axis tight
subplot(212)
specgram(n)
```

### Speech

• The speech sample shown is the finnish word "seitsemän"
• You can listen to the speech sample.
```[s, fs] = wavread('seiska.wav');
plot(s),axis tight,grid on
```

The following is the spectrogram of the above speech sound.

```specgram(s, 512, fs);
colorbar
```

### Piano

• The piano sample shown is the middle C, whose fundamental frequency is 261 Hz.
• The piano sample is an example of a harmonic sound; this means that the sound consists of sine waves which are integer multiples of the fundamental frequency. (Actually the piano is not perfectly harmonic.)
• You can listen to the piano sample.
```[s, fs] = wavread('pia60.wav');
plot(s),axis tight,grid on
```

The following is the spectrogram of the above piano sound, resampled to 16000 Hz sample rate. Here the overtones can be seen clearly.

```s2 = resample(s, 1, 3);
specgram(s2, 512, fs / 3);
colorbar
```

### Snare drum

• The snare drum sample doesn't have a fundamental frequency nor does it have overtones.
• You can listen to the snare drum sample.
```[s, fs] = wavread('snareHit.wav');
plot(s),axis tight,grid on
```

And here is the spectrogram of the snare drum hit, without resampling. Notice the lack of harmonic content.

```specgram(s, 512, fs);
colorbar
```

## Linear Filters

• There are broadly two kinds of digital linear filters: finite impulse response (FIR) and infinite impulse response (IIR) filters.
• Comparison of FIR vs. IIR filters:  FIR IIR linear phase response possible yes no overall frequency response control good bad nearly-"brickwall" response possible no yes efficiency (multiplications required) bad good
• Conclusion: in audio applications, when the phase response isn't critical, it is often profitable to use IIR filters because of their efficiency.
```fir_b = remez(30,[0 0.2 0.3 1], [1 1 0 0]);
[iir_b, iir_a] = butter(10, 0.2);
subplot(211)
impz(fir_b, 1, 41)
title('An FIR filter'), axis([0 40 -0.1 0.25]), grid on
subplot(212)
impz(iir_b, iir_a, 41)
title('An IIR filter'), axis([0 40 -0.1 0.25]), grid on
```

## Time and Frequency Domains

• The Fourier transform can be used to find out the frequency domain representation of a time domain signal. The inverse Fourier transform converts a frequency domain representation into time domain.
• When the MATLAB FFT function is used to compute the Fourier transform, the resulting vector will contain amplitude and phase information on positive and negative frequencies. The positive and negative frequencies will be equal, iff the time-domain signal is real.
• The Fourier transform decomposes a signal into a sum of stationary sinusoids. Therefore, when a whole regular sound signal is transformed, the changes in frequency content cannot be observed. Therefore short-time windowed FFT is usually used to observe the instantaneous frequency content.
```[s, fs] = wavread('snareHit.wav');
subplot(211), plot(abs(fft(s))), title('abs(fft(s))')
subplot(212), plot(s), title('s')
```
• The following spectrogram is as computed above, using 11.6 ms windows which overlap by 50%
• the spectrum displayed below the spectrogram is taken at 0.2 seconds time
```u = s(0.2 * fs:0.2 * fs + 511) .* hanning(512);
U = fft(u);
f = (0:256) / 256 * fs / 2;
plot(f, 20 * log10(abs(U(1:257))))
axis tight,grid on
xlabel('frequency [Hz]')
ylabel('amplitude [dB]')
```

## Windowing

• short-time signal processing is practically always done using windowing
• in short-time signal processing, signals are cut into small pieces called frames, which are processed one at a time
• frames are windowed with a window function in order to improve the frequency-domain representation
• what windowing essentially means is multiplying the signal frame with the window function point-by-point
```[s, fs] = wavread('pia60.wav', [5000 6000]);
subplot(131)
plot(s(1:512))
title('s(1:512)'), axis tight, grid on
subplot(132)
plot(hanning(512),'r')
title('hanning(512)'), axis tight, grid on
subplot(133)
plot(s(1:512) .* hanning(512))
title('s(1:512) .* hanning(512)'), axis tight, grid on
```
```S1 = fft(s(1:512));
S2 = fft(s(1:512) .* hanning(512));
f = (0:256) / 256 * fs / 2;
plot(f, 20 * log10(abs(S1(1:257))))
hold on
plot(f, 20 * log10(abs(S2(1:257))), 'r')
axis tight,grid on
xlabel('frequency [Hz]')
ylabel('amplitude [dB]')
```

## Correlation

• the cross-correlation between two signals tells how `identical' the signals are
• in other words, if there is correlation between the signals, then the signals are more or less dependant on each other
• for example, the correlation between two sine waves with different periods is zero
```t = 0:1/fs:0.2;
a = 2 * sin(2 * pi * 20 * t);
b = 2 * sin(2 * pi * 30 * t);
ep(a, b)
sum(a .* b)

ans =

-4.7622e-12

```
• normally the correlation value is computed with different alignments, called lags, between the signals
• the correlation value is computed between a[n] and b[n - l], a[n] and b[n - l + 1], a[n] and b[n - l + 2], ..., a[n] and b[n - 1], a[n] and b[n], a[n] and b[n + 1], ..., a[n] and b[n + l - 1] and finally between a[n] and b[n + l]
• i.e. the other signal is held static and the other signal is shifted one sample at a time and the correlation value is computed every time
• autocorrelation means the cross-correlation of a signal with itself
• the autocorrelation value on lag 0 is equal to the energy of the signal
```subplot(211)
n = randn(4000, 1);
[ac, l] = xcorr(n, n, 1000);
plot(l, ac), axis tight, grid on
title('gaussian noise autocorrelation')
subplot(212)
[ac, l] = xcorr(s, s, 1000);
plot(l, ac), axis tight, grid on
title('piano autocorrelation')
```

## Frequency quiz

• here are 6 sine waves: sine1, sine2, sine3, sine4, sine5, sine6
• what frequency belongs to which sine: (a) 61 Hz, (b) 688 Hz, (c) 1364 Hz, (d) 4539 Hz, (e) 8200 Hz, (f) 13954 Hz?
```fs = 44100;
f = 2 .^ (10 * rand(1, 6) + 4.2);
t = 0:1/fs:1;
for i = 1:length(f)
s = 0.2 * sin(2 * pi * f(i) * t);
wavwrite(s, fs, 16, ['sine' num2str(i) '.wav']);
end
```
```[s, fs] = wavread('helmi.wav');
f = 2 .^ (7 * rand(1, 6) + 6);
for i = 1:length(f)
ff = [0.6 0.9 1.1 1.4] * f(i) * 2 / fs;
ff = [0 ff 1];
b = remez(300, ff, [0 0 1 1 0 0], [10 1 10]);
freqz(b, 1, 512, fs); drawnow
ss = 4 * filter(b, 1, s);
wavwrite(ss, fs, 16, ['filter' num2str(i) '.wav']);
end
```
• correct answers to the sine frequencies: 1-c (1364 Hz), 2-d (4539 Hz), 3-f (13954 Hz), 4-b (688 Hz), 5-e (8200 Hz) and 6-a (61 Hz)
• correct answers to the filter bandpass frequencies: 1-e (2676 Hz), 2-b (552 Hz), 3-c (1300 Hz), 4-f (6480 Hz), 5-d (1428 Hz) and 6-a (212 Hz)

## References

• Pohlmann, Ken, ``Principles of Digital Audio.'' 3rd Edition, ISBN 0-07-050468-7, McGraw-Hill, Inc., 1995
• Oppenheim, Alan V. and Schafer, Ronald W., ``Discrete-Time Signal Processing.''
• Ifeachor and Jervis, ``Digital Signal Processing.''

http://www.cs.tut.fi/sgn/arg/intro/