Open Access Senior Honors Thesis
With the advance of modem computer hardware, computer animation has advanced leaps and bounds. What formerly took weeks of processing can now be generated on the fly. However, the actors in games often stand mute with faces unmoving, or speak only in canned phrases as the technology for calculating their lip positions from an arbitrary sound segment has lagged behind the technology that allowed the movement of those lips to be rendered in real-time. Traditional speech recognition techniques requires the entire utterance to be present or require at least a wide window around the text to be matched to allow for higher level structure to be used in determining what words are being spoken. However, this approach, while highly appropriate for recognizing the sounds present in an audio stream and mapping those to speech, is less applicable to the problem of "lip-syncing" in real time. This paper looks at an alternate technique for applying multivariate statistical techniques to lip-sync a cartoon or model with an audio stream in real time, which requires orders of magnitude less processing power than traditional methods.
Kmett, Edward A., "Real-Time Viseme Extraction" (2005). Senior Honors Theses. 96.