Implementation of Speech Coder Using Sub-Bands:
Sub band processing is based on
splitting the frequency range into M segments (subbands), which together
encompass the entire range. Each subband is processed independently, as called
for by the specific application. The subbands are recombined after processing,
to form an output signal whose bandwidth occupies the entire frequency range.
In subband coding, the speech is first
split into frequency bands using a bank of band-pass filters. The individual
band pass signals are then decimated and encoded for transmission. A filter
bank is a collection of band-pass filters, all processing the same input
signal. The important parameters in subband coders are the number of frequency
bands and the frequency coverage of the system, and the way subband coders are
coded.
There
are two kinds of subband structures: Uniform band structures, where all the
bands have equal widths, and octave band structures, where the bandwidths are
half as great as the higher adjacent band and twice as great as the lower
adjacent band.
This project digitizes the speech
signal and represents it with a digital bit stream and to produce the highest
possible speech quality at the lowest possible bit-rate. The speech coder generally consists of three
components: speech analysis, parameter quantization and parameter coding. After
analysis, the samples must be quantized to reduce the number of bits
required. The output of the quantizer is
provided to the coder which assigns a unique binary code to each possible
quantized representation. These binary codes are packed together for efficient
transmission.
Quantizers are generally divided in two:
Uniform or nonuniform quantizers, and adaptive quantizers. A nonuniform
quantizer or an adaptive quantizer followed by an encoder that assigns a code
to each quantization level is called companding pulse code modulation
(companding PCM) or adaptive pulse code modulation (APCM).
First, we carried out Subband coder by
creating analysis and synthesis sections. This actions does not necessary add
distortions in the voice, and the output is equal to the source, except for
some loss of higher frequencies. We recorded our voice using sound recorder at
8KHz.It means the highest freq component is 4 KHz which correspond to 1 in
Matlab. It is computationally very intensive to take FFT of entire signal;
therefore a better choice is to take some finite samples say 36000 for the
purpose of computing FFT. We implemented the code by two methods
a).Without noble identities: In
this method we passed the input signal (speech) through a low pass and high
pass filters in the first stage. After that we decimated the two signals (Lower
band and upper band).This has the disadvantage of computing the samples which
we are finally going to throw away. However the end result was good without
getting aliasing in the final signal.
b). Using noble identities: : In this method we decimated the signals first and then passed
them through a low pass and high pass filters in the first stage. We continued
with this approach till we get the four bands in the analysis section. However
the end result was aliasing in the final signal.
No comments:
Post a Comment