Application Test

Read Complete Research Material

APPLICATION TEST

Application Test

Application Test

Introduction

The E-set utterances - B, C, D, E, G, P, T, V and Z - form a set of highly confusable sounds. Accurate recognition of these sounds is the most significant step for the improvement of recognition performance on the connected 'alphadigit' task that includes sequences of spoken digits and letters (Bitar, 1995, 22).

. We extended our earlier event-based recognition system (EBS) and applied it to this difficult recognition task. In particular, we focused on the ability of EBS to distinguish among the stop consonants in the utterances B, D, P, and T.

EBS

In EBS, the speech signal is first segmented into the broad classes: vowel, sonorant consonant, strong fricative, weak fricative and stop. This segmentation is based on acoustic events (or landmarks) obtained in the extraction of acoustic parameters (APs) associated with the manner phonetic features sonorant, syllabic, continuant and strident, in addition to silence. (Results on the performance of the broad class recognizer are given in [1,13]).

The manner acoustic events are then used to extract APs relevant for the voiced phonetic features and for the place phonetic features. In particular, APs for the place features labial and alveolar are extracted for stops; and APs for the place feature anterior are extracted for strident fricatives. This phonetic feature hierarchy is shown in Figure 1 for the special case of the E-set. Note that this strategy is very different from the hidden Markov model (HMM) framework where all parameters are computed in every frame. Instead, EBS uses the manner landmarks to determine (1) which APs are computed for place and voicing and (2) the time region used to extract this information.

Database

The E-set utterances (B,C,D,E,G,P,T,V,Z) from the TI46 database were used for this project. To develop EBS, the TI46 training set was used which consists of these utterances spoken by 16 speakers, 8 males and 8 females. For testing the TI46 test set was used. It consists of a different set of repetitions of the E-set utterances from the same speakers.

Acoustic Parameters

Table 1 shows the APs designed to extract the acoustic correlates for the phonetic features needed to recognize voicing and place for the strident fricatives and stops. Note that Ahi-A23 and Av-Ahi are measures similar. Ahi-A23 measures the spectral tilt and Av-Ahi measures the spectral prominence of F1 (first formant) relative to the high frequency peak of the consonant. We have modifed these parameters with respect to the average frequency of the third formant (F3) over the utterance to achieve vocal tract normalization.

Acoustic Parameters and EBS

EBS uses a fuzzy logic-based approach to explicitly segment the speech into broad classes. To find the optimal linear weights for combining the APs, we generated an automatic transcription of the E-set utterances in TI46 training data. To do so, phoneme labels were formed for the e-set utterances and they were force-aligned by comparison with the broad class labels generated by EBS. For example, if EBS produces the sequence of labels (STOP, VOWEL) for the utterance ...
Related Ads