Prosody Modification of Speech and Singing For Tutoring Applications

In this work, we discuss prosodic transformations in terms of syllable durations and pitch, in the context of speech and music tutoring applications. We address some specific issues that arise with the use of TD-PSOLA based time- and pitch-scaling in the context of the singing and speech transformation to pre-defined target prosody. Time alignment is performed by matching automatically detected syllable onsets of the source and target followed by time-scaling and pitch-shifting using TD-PSOLA with attention to the choice of pitch marks and analysis-synthesis windows. Experiments demonstrate that TD-PSOLA can provide artifactfree perceived quality without explicit pitch mark detection by using longer analysis synthesis windows.