icassp2024

Submitted to ICASSP 2024

INTELLI-Z: TOWARDS INTELLIGIBLE ZERO-SHOT TTS

Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

Kakao Brain, Seongnam, Republic of Korea

Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new one at the inference stage. In this study, we propose a zero-shot TTS model focused on intelligibility, which we refer to as Intelli- Z. Intelli-Z learns speaker embeddings by using multi-speaker TTS as its teacher and is trained with a cycle-consistency loss to include mismatched text-speech pairs for training. Additionally, it selectively aggregates speaker embeddings along the temporal dimension to minimize the interference of the text content of reference speech at the inference stage. We substantiate the effectiveness of the proposed methods with an ablation study. The Mean Opinion Score (MOS) increases by 9% for unseen speakers when the first two methods are applied, and it further improves by 16% when selective temporal aggregation is applied.

Seen speakers

Reference	Meta-StyleSpeech	Intelli-Z w/o ap	Intelli-Z

Unseen speakers

Reference	Meta-StyleSpeech	Intelli-Z w/o ap.	Intelli-Z

'TTS' 카테고리의 다른 글

AdaSpeech: Adaptive text to speech for custom voice (0)	2021.04.13
Naver Deview2020 "누구나 만드는 내 목소리 합성기2 (커스텀 보이스 파이프 라인)" (0)	2021.04.12
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment (0)	2021.04.01
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search (0)	2021.03.31
Blizzard challenge 2020 (0)	2021.02.18

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Sunghee's research blog

icassp2024

INTELLI-Z: TOWARDS INTELLIGIBLE ZERO-SHOT TTS

'TTS' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

icassp2024

INTELLI-Z: TOWARDS INTELLIGIBLE ZERO-SHOT TTS

'TTS' 카테고리의 다른 글

'TTS' Related Articles

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역