Nuit Blanche: Deep Transform: Time-Domain Audio Error Correction via Probabilistic Re-Synthesis, Cocktail Party Source Separation via Probabilistic Re-Synthesis

Friday, March 27, 2015

Deep Transform: Time-Domain Audio Error Correction via Probabilistic Re-Synthesis, Cocktail Party Source Separation via Probabilistic Re-Synthesis

From the same author, two preprints using deep neural networks to perform a task that is traditionally performed in other domains such as communication/information theory (Error Correction) or advanced matrix factorization (BSS). Welcome to the great convergence:

Deep Transform: Time-Domain Audio Error Correction via Probabilistic Re-Synthesis by Andrew J.R. Simpson

In the process of recording, storage and transmission of time-domain audio signals, errors may be introduced that are difficult to correct in an unsupervised way. Here, we train a convolutional deep neural network to re-synthesize input time-domain speech signals at its output layer. We then use this abstract transformation, which we call a deep transform (DT), to perform probabilistic re-synthesis on further speech (of the same speaker) which has been degraded. Using the convolutive DT, we demonstrate the recovery of speech audio that has been subject to extreme degradation. This approach may be useful for correction of errors in communications devices.

Deep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis by Andrew J.R. Simpson

In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNNs as convolutive deep transform (CDT) devices to perform probabilistic re-synthesis. The CDTs operated directly in the time-domain. Our simulations demonstrate that very simple neural networks are capable of exploiting monaural and binaural information available in a cocktail party listening scenario.

Join the CompressiveSensing subreddit or the Google+ Community and post there !