Nuit Blanche: Sparse Representation-Based/Exemplar-Based methods for Noise Robust Automatic Speech recognition (ASR)

Thursday, June 21, 2012

Sparse Representation-Based/Exemplar-Based methods for Noise Robust Automatic Speech recognition (ASR) - implementation -

Jort Gemmeke just sent me the following:

Hi Igor,
I finally found (made) the time to construct a matlab demo (matlab only, no external dependencies beyond whats in the zip file) of the sparse representation-based/exemplar-based methods for noise robust Automatic Speech recognition (ASR) I have been advocating over the past few years. It may be interesting for the readers of Nuit Blanche - would you be so kind to mention it on your blog? The demo can be obtained from www.amadana.nl/software - the first paragraph may also make a good description for the blog - let me know. Thanks!!

Sure thing Jort. Here is the beginning of that page:

I have created a stand-alone Matlab demo of the noise-robust Automatic Speech Recognition (ASR) techniques I worked on over the past few years. All these techniques rely on finding a sparse, linear combination of noise-free speech exemplars, which is then either used to make an estimate of the clean speech, or to perform exemplar based ASR. For an overview of the methods and the relevant background, my thesis [1] is a good starting point. The demo works on a noise robust digit recognition task, AURORA-2 [2].

Download

You can grab the full archive here (81 Mb, includes exemplar dictionary and example noisy speech files), or an archive with just the Matlab codes here. To get started quickly, simply execute the top-level Matlab script sparse_ASR.m.

The techniques implemented in the demo

Sparse Imputation (SI) [3,4]. Uses a missing data mask to find a linear combination of clean speech exemplars using only reliable (noise-free) noisy speech features.
Feature Enhancement (FE) [5]. Takes a source separation approach by decomposing noisy speech into a linear combination of speech and noise exemplars.
Sparse Classification (SC) [5]. As FE, but associates each speech exemplar with HMM-state labels, and uses the exemplar activations directly as evidence for the underlying states.
Hybrid SC/FE (SCFE) [6]. Combination of SI and FE at the state posterior level using the product rule, with a conventional GMM generating the FE posteriors.

What is included in the demo

A few AURORA-2 example files (only the extracted Mel features, not the original data)
A speech and noise exemplar dictionary, created using the multi-condition AURORA-2 training set.
A simple Matlab implementation of a conventional GMM-HMM speech recognizer for use with SI/FE/SCFE. It is a straightforward implementation of a word-model based, 16-state-per-word HMM and a GMM with 32 mixtures per state operating on per-file mean/variance normalized MFCC features. Two acoustic models are included: one trained on the clean speech training set of AURORA-2 and one trained on the multi-condition training set of AURORA-2.
Visualisations of clean, noisy, noise and enhanced spectrograms, with optional mean (or mean&variance) normalization to better gauge the effect on a speech recognizer employing these normalizations. Additionally. visualisations of the state posteriorgrams obtained with SC/GMM.
FE/SC/SCFE is GPU-accelerated using GPUmat as described in [7], provided 1) a suitable GPU is available 2) GPUmat is installed and 3) the corresponding flag nmf.usegpu in the demo code is set to one.