본문 바로가기

TTS

Predicting emotion from text for TTS

Emotion label specified during synthesis

 

No. Neutral Happy Sad Angry
1
2
3
4
Emotion is predicted from language model (no emotion supervision from human during synthesis stage)

 

No. Neutral Happy Sad Angry
1
2
3
4
5