Thursday, December 5, 2019

Speech Enhancement Techniques and Their Comparison free essay sample

The implementation of the code is done using Graphic User Interface on MATLAB. Keywords— Speech enhancement, FFT, Spectral subtraction, Kalman filter, Wiener filter, Performance parameters I. INTRODUCTION Speech is the fundamental and common medium, hence important for us, to communicate. In general, there exists a need for voice based communications,human-machine/machine-machine interfaces, and automatic speech recognition systems to increase the reliability of these systems in noisy environments. In many cases, these systems work well in nearly noise-free conditions, but their performance deteriorates rapidly in noisy conditions. Therefore, improvement in existing pre-processing algorithms or introducing entire new class of algorithm for speech enhancement is always the objective of research community. In speech enhancement, the goal is to improve the quality of degraded speech. Speech enhancement algorithms are noise suppression techniques, using the knowledge from the field of hearing science, that mitigate the effect of the corrupting background noise, and hence improve the perceived speech quality and speech intelligibility. Enhancing of speech degraded by noise is used for many applications such as mobile phones, VoIP, teleconferencing systems, speech recognition, and hearing aids. The problem of improving performance of speech communication systems in noisy environments has been a challenging area for research for more than three decades now. Efforts to achieve higher quality and/or intelligibility of noisy speech may effectively end up improving performance of other speech applications such as speech coding/compression and speech recognition etc. given in [1][2][3][4][5]. Speech enhancement has three major goals: . To improve the quality and intelligibility of speech corrupted by background noise, reduce the perceptual fatigue. 2. To make speech coders robust when to input noise. 3. To make speech recognition systems more robust to noise. This project presents an overview of different speech enhancement methods and provides a review of some of the major aspects and approaches in this category. II. BAS IC BLOCK DIAGRAM The basic block diagram for speech enhancement is as shown below in Fig. 1. Fig. 1 Basic Block Diagram The noisy input signal is sent through the analysis window. Here, a few samples of the signal are selected at a time as the signal is continuous and big and cannot be processed in one go. Fast Fourier Transform is applied to convert the signal from time domain to frequency domain. The magnitude of noise and noisy speech are compared and noise is subtracted from the affected speech. The enhanced speech received is in frequency domain and hence requires to be converted back to frequency domain. This is done by taking Inverse Fourier Transform. Overlap and add method is applied to the recovered enhanced signal so as to compensate for the windowing method applied in the beginning. In our project, since the signal applied at input has few samples, windowing method is not implemented. III. DESCRIPTION A. SPECTRAL SUBTRACTION METHOD The Spectral Subtraction method is the most widely used due to the simplicity of implementation and also due to low computational load. As studied in [5] [6], Spectral subtraction is a method for restoration of the power spectrum or the magnitude spectrum of a signal observed in additive noise, through subtraction of an estimate of the average noise spectrum from the noisy signal spectrum. It reduces the effect of background noise based on the STSA estimation technique. The basic power spectral subtraction technique is popular due to its simple underlying concept and its effectiveness in enhancing speech degraded by additive noise. The basic principle of the spectral subtraction method is to subtract the magnitude spectrum of noise from that of the noisy Speech. The noise spectrum can be estimated, and updated, during the periods when the signal is absent or when only noise is present. The noise is assumed to be uncorrelated and additive to the Speech signal. An estimate of the noise signal is measured during silence or non-Speech activity in the signal. The phase of the noisy Speech is kept unchanged, since it is assumed that the phase distortion is not perceived by human ear. However the subtraction type algorithms have a serious drawback in that the enhanced Speech is accompanied by unpleasant musical noise artifact, which is characterized by tones with random frequencies. The simple subtraction processing comes with a price. The subtraction process needs to be done carefully to avoid any Speech distortion. If too much is subtracted, then some Speech information might be removed as well, while if too little is subtracted then much of the interfering noise remains. The block diagram given in [7] is as shown in Fig. 2. Fig. 2 SPECTRAL SUBTRACTION Noisy Speech is given as an input to this filter. Windowing is done in order to take fixed number samples of the signal which is continuous in nature. Inthis method only the magnitude is considered. The phase part is not taken into consideration as it increases the complexity and calculations. Fourier transform is applied to the signal in order to convert the signal from timedomain to frequency domain. This helps us to obtain magnitude and phase as separate values. The magnitude of estimated noise is subtracted from themagnitude of noisy signal and an enhanced Speech is obtained at the output of spectral modification block. Inverse Fourier transform of the enhanced speech is taken so as to obtain the signal in its time domain form. Phase ofsignal, in its original form, is added to the magnitude at this stage. Thus weobtain an enhanced version of the noisy Speech signal at the output end. B. WEINER FILTER Speech processing has been a growing and dynamic field for more than two decades and there is every indication that this growth will continue and even accelerate. A useful approach to filter optimization problem is to minimize the mean squared value of the error signal that is defined as the differencebetween some desired response and the actual filter output. There are workslikes [13][14], which describes Weiner filters as class of optimum linear filterswhich involve linear estimation of a desired signal sequence from anotherrelated sequence. This technique is widely used in the field of signal processing. Weiner filter is a common and adaptive filter technique and is the solution for stationary input signals. The filter has its origin in Kalmans document (1960) where it is describedas a recursive solution for the linear filtering problem for discrete data. Theresearch was in a wide context of state-space models, where the point is the estimation through the recursive least squares. The goal of the Wiener filter is to filter out noise that has corrupted a signal. It is based on a statisticalapproach. Typical filters are designed for a desired frequency response. However, the design of wiener filter takes a different approach. One is assumedto have knowledge of the spectral properties of the original signal and thenoise, nd one seeks the linear time invariant filter whose output would comeas close to the original signal as possible. Fig. 3 WEINER FILTER Shown above in Fig. 3, is the block diagram of Weiner filter. In this process, mean of all the samples is calculated. Deviation of each sample from the mean is found and the summation is represented as Pd(w). Mean of noise power is represent ed as Py(w). Py(w) is subtracted from Pd(w) and the transfer function is calculated as shown. Thus we get enhanced speech signal at the output of the filter. C. MINIMUM MEAN SQUARE ERROR Given that some a priori knowledge of the radar SNR is available, a minimum mean-squared error estimator can be implemented. This estimator is the discrete implementation of a Wiener filter and minimizes the estimate error due to both noise and clutter. In other words if the matched filter maximizes signal to noise, and the ML estimator maximizes signal to clutter, the MMSE estimator can be said to maximize signal to interference, where interference is defined as the summation of both clutter and noise energy. Accordingly, this estimator provides SAR images superior to both correlation and ML processing for all SNR. As presented in [9], the STSA estimation problem formulated here is that of estimating the modulus of each complex Fourier expansion coefficient of the Speech signal in a given analysis frame from the noisy Speech in that frame. This formulation is motivated by the fact that the Fourier expansion coefficients of a given signal segment are samples of its Fourier transform, and by the close relation between the Fourier series expansion and the discrete Fourier transform is given in [10][11]. The latter relation enables an efficient implementation of the resulting algorithm by utilizing the FFT algorithm. The basic formula for power spectral density of MMSE filter as given in [8] is: E|Xk|2Y? k=? k1+? k? dk+(? 1+? kYk)2 Where,? k is the a priori SNR. The basic block diagram of MMSE filter is shown in Fig. 4 Fig. 4 MMSE Filter To derive the MMSE STSA estimator, the a priori probability distribution of the Speech and noise Fourier expansion coefficients should be known. Since in practice they are unknown, one can think of measuring each probability distribution or, alternatively, assume a reasonable statistical model. In the discussed problem, the Speech and possibly also the noise are neither stationary nor Ergodic processes. This fact excludes the convenient possibility of obtaining the above probability distributions by examining the long-time behavior of each process. Hence, the only way which can be used is to examine independent sample functions belonging to the ensemble of each process, e. g. , for the Speech process these sample functions can be obtained from different speakers. However, since the probability distributions we are dealing with are time-varying (due to non-stationary nature of processes), their measurement and characterization by the above way is complicated, and the entire procedure seems to be impracticable. The only disadvantage of the MMSE processor as explained in [12] is the huge additional complexity in determining the linear estimator. Additionally, for large problems, the matrix inverse operation required to implement the MMSE estimator is very problematic. Especially in the field of radar signal processing computing the inverse of the large matrices can really slow down the processing speed. An iterative implementation of the MMSE algorithm can be developed where the data vector is split into smaller segments to reduce processing time. D. KALMAN FILTER Described in works of[16][17], Kalman filter (KF) algorithm, an iterative implementation of the MMSE estimator is proposed, developed, analysed andoptimized. It has been shown that the processing speed can be decreased, bybreaking the data vector into an optimal number of segments. Kalman filtering is known as an effective Speech Enhancement technique, in which Speechsignal is usually modelled as autoregressive (AR) process and represented inthe state-space domain. In the above context, all the Kalman filter-basedapproaches proposed in the past operate in two steps. They first estimatethe noise and the driving variances and parameters of the signal model, thenestimate the Speech signal. It uses a systems dynamic model (i. e. , physical laws of motion), knowncontrol inputs to that system, and measurements (such as from sensors) toform an estimate of the systems varying quantities that is better than theestimate obtained by using any one measurement alone. As such, it is acommon sensor fusion algorithm. The Kalman filter averages a predictionof a systems state with a new measurement using a weighted average. Thepurpose of the weights is that values with better estimated uncertainty are trusted more. The weights are calculated from the covariance, a measureof the estimated uncertainty of the prediction of the systems state. The result of the weighted average is a new state estimate that lies in between the predicted and measured state, and has a better estimated uncertainty than either alone. This process is repeated every time step, with new estimate and its covariance informing the prediction used in the following iteration. This means that the Kalman filter works recursively and requires only thelast best guess and not the entire history of a systems state to calculate a newstate. When performing actual calculations for the filter, the state estimateand covariance are coded into matrices to handle the multiple dimensions involved in a single set of calculations. This allows for representation of linearrelationship between different state variables such as position, velocity, andacceleration in any of the transition models or covariance. The use of Kalmanfilter for Speech Enhancement was first introduced by Paliwal (1987). Thismethod however is best suitable for reduction of white noise to comply withKalman assumption. In deriving Kalman equations it is normally assumed that the process noiseis uncorrelated and has a normal distribution. This assumption leads towhiteness character of this noise. It is assumed that Speech signal is stationary during each frame that is the AR model of Speech remains the sameacross the segment. Kalman filter is an adaptive least square error filterthat provides an efficient computational recursive solution for estimating asignal in presence of Gaussian noises. Kalman filter theory is based on astate-space approach in which a state equation odels the dynamics of thesignal generation process and an observation equation models the noisy anddistorted observation signal. The advantages of Kalman Filtering Technique [18] are: It avoids the influence of possible structural changes on the result. The recursive estimationstarts from an initial sample and updates the estimations by adding a newobservation until the end of the data. This implies that the most recent coefficients estimation is affected by the distant history; in presence of structuralchanges the data series can be cut. This cut can be corrected through thesequential estimations but with the biggest standard error. Like this, theKalman filter, like other recursive methods, uses all the series history butwith one advantage: It tries to estimate a stochastic path of the coefficientsinstead of a deterministic one. In this way it solves the possible estimationcut when structural changes happen. This filter is in equal terms with Gauss-Markov theorem and this gives to Kalman filter its enormous power to solvea wide range of problems on statistic inference. The filter is distinguishedby its skill to predict the state of a model in the past, present and future,although the exact nature of the modelled system is unknown. The dynamicmodelling of a system is one of the key features which distinguish the Kalmanmethod. The disadvantages of Kalman Filtering Technique are: That it is necessary toknow the initial conditions of the mean and variance state vector to start therecursive algorithm. There is no general consent over the way of determiningthe initial conditions. Fig. 5 As shown in Fig. 5, the input Speech signal is taken and distortion of noise isin the signal is found. The current output is based on the past output andcurrent input which is solved using Yules equation. All the parameters are represented in the form of state space matrix because it makes calculationseasier. Next filter gain is calculated and noise is then removed from the noisySpeech input to get enhanced Speech signal. IV. MEASURES OF PERFORMANCE PARAMETERS A. SIGNAL-TO-NOISE RATIO (SNR) Signal-to-noise ratio (often abbreviated as SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level ofbackground noise. Signal-to-noise ratio is sometimes used informally to refer to the ratio of usefulinformation to false or irrelevant data in a conversation or exchange. Signal-to-noise ratio is defined as the power ratio between a signal (meaningful information)and the background noise (unwanted signal). It is measured in dBs SNR=10? log10mean(Input2)mean(Input2-Enhanced2) B. MEAN SQUARE ERROR (MSE) In statistics, the Mean Squared Error (MSE) of an estimator is one of many waysto quantify the difference between values implied by an estimator and the truevalues of the quantity being estimated. MSE is a risk function, corresponding tothe expected value of the squared error loss or quadratic loss. MSE measures theaverage of the squares of the errors. The error is the amount by which the valueimplied by the estimator differs from the quantity to be estimated. The differenceoccurs because of randomness or because the estimator doesnt account for information that could produce a more accurate estimate. MSE=1length(Input)? (Enhanced-Input)2 C. NORMALIZED ROOT MEAN SQUARE ERROR (NRMSE) The Root Mean Square Error (RMSE) also known as Root-Mean-Square Deviati on(RMSD) is a frequently used measure of the differences between values predictedby a model or an estimator and the values actually observed. These individualdifferences are called residuals when the calculations are performed over the datasample that was used for estimation, and are called prediction errors when computed out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent. NRMSE=mean[(Input-Enhanced)2]mean{[Input-mean(Input)]2} D. PEAK SIGNAL-TO-NOISE RATIO(PSNR) Peak Signal-to-Noise Ratio, often abbreviated PSNR, is an engineering term forthe ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signalshave a very wide dynamic range, PSNR is usually expressed in terms of the logarithmic decibel scale. PSNR is most commonly used to measure the quality of reconstruction of lossy compression codecs (e. g. , for image compression). The signal in this case is the original data, and the noise is the error introduced by compression. When comparing compression codecs, PSNR is an approximation to human perception of reconstruction quality. Although a higher PSNR generally indicates that the reconstruction is of higher quality, in some cases the reverse may be true. PSNR=10? log10length? max? [Input2]Input2-Enhanced2 E. DISTORTION(AAD) Distortion (or warping) is the alteration of the original shape (or other characteristic) of something, such as an object, image, sound or waveform. Distortion isusually unwanted, and often efforts are to lessen it. The addition of noise or otheroutside signals (hum, interference) is not deemed distortion, though the effects ofquantization distortion are sometimes deemed noise. In this project we use theparameter AAD to measure the distortion in the given Speech signal. AAD=1lengthInput? (Enhanced-Input) V. APPLICATIONS AND FUTURE SCOPE * Cell phone speech enhancement * Pay phones in a noisy environment * Air-ground communication systems * Teleconferencing systems * Hearing aids VI. EXPERIMENTAL RESULTS The above mentioned techniques of speech enhancement were applied to the noisy speech input and the performance parameters were evaluated as below. Fig. 6 Clean Speech Fig. 7 Noisy Speech Fig. 8 Spectral Subtraction Output Fig. 9Weiner Filter Output Fig. 10MMSE Filter Output Fig. 11 Kalman Filter Output VII. CONCLUSION The technique most suitable for speech enhancement is the one which provides robustness to environmental factors, robustness to acoustical inputs. Table 1. Parameters In this project, we havereviewed the methodologies and principles of various techniques and presented the analysis in GUI MATLAB Based on theperformance parameters the following points have been concluded: (a) Wiener Filter follows statistical approach and could be tuned to provideoptimal performance b) Kalman has the ability to estimate accurately by using autoregressive(AR)modeling and is suitable for real-time applications. (c) Spectral Subtraction is a real time filter which is relatively easy toimplement for stationary noise. (d) MMSE provides best values for the most parameters under given conditionsand hence is most suitable technique for spee ch enhancement Agraphical representation for comparison of the above mentioned techniques is as below: A. SNR Fig 12. SNR Comparison The above graph provides a comparison between input SNR for each technique and their respective output SNR. The signal to noise ratio for MMSE is more than all filters for any value of input SNR whereas that of Spectral subtraction is the least for all inputs SNR. B. PSNR Fig 13. PSNR Comparison The graph given shows the value of peak Signal-to-noise ratio for all speech enhancement techniques. The value of PSNR is greatest for Wiener filter for all input SNR. C. MSE Fig 13. MSE Comaprison The graph gives the comparison of all four speech enhancement methods for mean square error. At input SNR=2DB,error reduced is the most in MMSE. Kalman filter works most efficiently at SNR input=5DB. Noise suppression is least in wiener filter for the given conditions. VIII. REFERENCES [1] Speech Signal Processing by School of Electronic Information,Wuhan University. [2] Recent Advancements in Speech Enhancement by Yariv Ephraimand Israel Cohen, March 9, 2004. [3] Speech Enhancement using Adaptive Filters by T. Lalith Kumarand Soudara Rajan. [4] http://en. wikipedia. org/wiki [5] Overview of Speech Enhancement Techniques for Automatic Speaker recognition by Javier Ortega-Garca and Joaqun Gonzlez-Rodrguez [6] Advanced Digital Signal Processing and Noise Reduction, SecondEdition by Saeed V. Vaseghi. 7] Transform Based Speech Enhancement Techniques by Soon IngYann. [8] Speech Enhancement using a Laplacian based MMSE estimator of the magnitude spectrum byDr. Bin Chen. [9] Linear Prediction Algorithms by Mohit Garg, IIT-B. [10] Speech Enhancement Using a Minimum Mean Square Error Short Time Spectral Amplitude Estimator by Yaric Ebrahim. [11] A Laplacian based MMSE estimator for Speech Enhancement by Bin Chen, Philipos C. Loizou. [12] Minimum Mean Square Error Filtering:Autocorrelation/Covariance, General Delays and Multirate Systems by Peter Kabal. [13] Improve Speech Enhancement using Weiner Filtering by S. China, Venkateswarlu, Dr. K. Satya Prasad, Dr. A. SubbaRami Reddy. [14] Performance Analysis of Multichannel Wiener amp; Filter-BasedNoise Reduction in Hearing Aids under Second Order Statistics Estimation Errors by Bram Cornelis, Marc Moonen and JanWouters. [15] A Wiener Filtering Ian V. Oppenheim and George C. Verghese,2010. [16] Dual channel Speech Enhancement using Hadamard LMS algorithm with DCT preprocessing technique by D. Deepa. [17] An improved SNR estimator for Speech Enhancement by Yao Ren and Michael T. Johnson. [18] http://www. mathworks. com/support

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.