Length-Normalized Representation Learning for Speech Signals
- Authors
- KYUNGGUEN BYUN; SEYUN UM; Hong-Goo Kang
- Issue Date
- Jun-2022
- Publisher
- Institute of Electrical and Electronics Engineers
- Keywords
- Self-supervised learning; representation learning; speech and text analysis
- Citation
- IEEE Access, v.10, pp 60,362 - 60,372
- Journal Title
- IEEE Access
- Volume
- 10
- Start Page
- 60,362
- End Page
- 60,372
- URI
- https://yscholarhub.yonsei.ac.kr/handle/2021.sw.yonsei/6694
- DOI
- 10.1109/ACCESS.2022.3181298
- ISSN
- 2169-3536
- Abstract
- ABSTRACT In this study, we proposed a length-normalized representation learning method for speech and
text to address the inherent problem of sequence-to-sequence models when the input and output sequences
exhibit different lengths. To this end, the representations were constrained to a xed-length shape by
including length normalization and de-normalization processes in the pre- and post-network architecture of
the transformer-based self-supervised learning framework. Consequently, this enabled the direct modelling
of the relationships between sequences with different length without attention or recurrent network between
representation domains. This method not only achieved the aforementioned regularized length effect but
also achieved a data augmentation effect that effectively handled differently time-scaled input features.
The performance of the proposed length-normalized representations on downstream tasks for speaker
and phoneme recognition was investigated to verify the effectiveness of this method over conventional
representation methods. In addition, to demonstrate the applicability of the proposed representation method
to sequence-to-sequence modeling, a unied speech recognition and text-to-speech (TTS) system was
developed. The unied system achieved a high accuracy on a frame-wise phoneme prediction and exhibited
a promising potential for the generation of high-quality synthesized speech signals on the TTS.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Engineering > Electrical and Electronic Engineering > 1. Journal Articles
Items in Scholar Hub are protected by copyright, with all rights reserved, unless otherwise indicated.