Winkler, Thomas: From Acoustic Mismatch Towards Blind Acoustic Model Selection in Automatic Speech Recognition. - Bonn, 2013. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-32133
@phdthesis{handle:20.500.11811/5685,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-32133,
author = {{Thomas Winkler}},
title = {From Acoustic Mismatch Towards Blind Acoustic Model Selection in Automatic Speech Recognition},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2013,
month = may,

note = {Acoustic distortion and acoustic mismatch are two of the most critical aspects influencing automatic speech recognition. A speech signal recorded from a speaker in a certain acoustic environment compared to a signal from the same utterance recorded under different acoustic conditions can have very different characteristics. Acoustic features used for automatic speech recognition are not ideal and also incorporate such acoustic influences in addition to the information relevant for speech recognition. While distortion of the signal caused by difficult acoustic conditions already reduces the recognition accuracy, additional acoustic mismatch in case of a system trained in one particular acoustic condition and used under different acoustic conditions further decreases the performance.
In this work we offer a detailed analysis of the influences of various sources of acoustic distortion and acoustic mismatch from additive noise, microphone characteristics towards coding and transmission channel effects. We evaluate and understand their influence on the speech signal, the extracted features, and the speech recognition performance in matched and mismatched conditions. For this purpose we introduce several speech and noise corpora appropriate for evaluating these aspects. Two of these corpora are purposely designed and recorded for the presented evaluations. In particular the MoveOn Corpus offers an evaluation corpus of realistic noisy speech generally useful for research on robust automatic speech recognition beyond the scope of this thesis. Thus, design decisions and corpus development are detailed for this corpus.
Based on the presented speech corpora we analyse various acoustic conditions and show the effects of even small changes in the speech signal, which can have a significant influence on the extracted speech features and the recognition performance. The changes in features can be quite manifold and are dependent on the particular distortion and other parameters as we will discuss in detail. Thus, such changes are usually difficult to simulate or compensate by a universal approach.
As features and acoustic models commonly used for automatic speech recognition inevitably inherit part of the information on the acoustic conditions we propose and evaluate a new multi-model approach selecting a best matching set out of several sets of well adapted acoustic models solely based on the extracted features and the acoustic models. We call this approach blind acoustic model selection as it works completely blind neither incorporating additional knowledge nor any particular assumption about the type of acoustic distortion. For improved processing speed we further suggest to use a compact representation of each set of acoustic models instead of the full set. The results indicate, that the theoretical performance clearly outperforms commonly used multi-conditional acoustic models. In case of an appropriate selection of the sets of acoustic models comparable or even improved results compared to multi-conditional acoustic models are also achieved in practice with our proposed approach.},

url = {https://hdl.handle.net/20.500.11811/5685}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright