Thursday, March 1, 2012

Vocaloid Technology-Singer Library

Each Vocaloid licensee develops the Singer Library, or a database of vocal fragments sampled from real people. The database must have all possible combinations of phonemes of the target language, including diphones (a chain of two different phonemes) and sustained vowels, as well as polyphones with more than two phonemes if necessary. For example, the voice corresponding to the word "sing" ([sIN]) can be synthesized by concatenating the sequence of diphones "#-s, s-I, I-N, N-#" (# indicating a voiceless phoneme) with the sustained vowel ī. The Vocaloid system changes the pitch of these fragments so that it fits the melody. In order to get more natural sounds, three or four different pitch ranges are required to be stored into the library. Japanese requires 500 diphones per pitch, whereas English requires 2,500. Japanese has fewer diphones because it has fewer phonemes and most syllabic sounds are open syllables ending in a vowel. In Japanese, there are basically three patterns of diphones containing a consonant: voiceless-consonant, vowel-consonant, and consonant-vowel. On the other hand, English has many closed syllables ending in a consonant, and consonant-consonant and consonant-voiceless diphones as well. Thus, more diphones need to be recorded into an English library than into a Japanese one. Due to this linguistic difference, a Japanese library is not suitable for singing in English.

No comments:

Post a Comment