Length-Normalized Representation Learning for Speech Signals

KYUNGGUEN BYUN; SEYUN UM; Hong-Goo Kang

doi:10.1109/ACCESS.2022.3181298

YONSEI University

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Length-Normalized Representation Learning for Speech Signals

Full metadata record

DC Field	Value	Language
dc.contributor.author	KYUNGGUEN BYUN	-
dc.contributor.author	SEYUN UM	-
dc.contributor.author	Hong-Goo Kang	-
dc.date.accessioned	2023-10-10T01:40:15Z	-
dc.date.available	2023-10-10T01:40:15Z	-
dc.date.issued	2022-06	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://yscholarhub.yonsei.ac.kr/handle/2021.sw.yonsei/6694	-
dc.description.abstract	ABSTRACT In this study, we proposed a length-normalized representation learning method for speech and text to address the inherent problem of sequence-to-sequence models when the input and output sequences exhibit different lengths. To this end, the representations were constrained to a xed-length shape by including length normalization and de-normalization processes in the pre- and post-network architecture of the transformer-based self-supervised learning framework. Consequently, this enabled the direct modelling of the relationships between sequences with different length without attention or recurrent network between representation domains. This method not only achieved the aforementioned regularized length effect but also achieved a data augmentation effect that effectively handled differently time-scaled input features. The performance of the proposed length-normalized representations on downstream tasks for speaker and phoneme recognition was investigated to verify the effectiveness of this method over conventional representation methods. In addition, to demonstrate the applicability of the proposed representation method to sequence-to-sequence modeling, a unied speech recognition and text-to-speech (TTS) system was developed. The unied system achieved a high accuracy on a frame-wise phoneme prediction and exhibited a promising potential for the generation of high-quality synthesized speech signals on the TTS.	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	Length-Normalized Representation Learning for Speech Signals	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ACCESS.2022.3181298	-
dc.identifier.scopusid	2-s2.0-85131738874	-
dc.identifier.wosid	000811541900001	-
dc.identifier.bibliographicCitation	IEEE Access, v.10, pp 60,362 - 60,372	-
dc.citation.title	IEEE Access	-
dc.citation.volume	10	-
dc.citation.startPage	60,362	-
dc.citation.endPage	60,372	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Self-supervised learning	-
dc.subject.keywordAuthor	representation learning	-
dc.subject.keywordAuthor	speech and text analysis	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > 공과대학 전기전자공학부 > 공과대학 전기전자공학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher , photo

,: 공과대학 전기전자공학과

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

Yonsei University 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea1599-1885

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE