via naturalnews:
Microsoft Research Asia is forging on with a new transhumanist program called VASA that creates “lifelike talking faces of virtual characters with appealing visual effective skills (VAS), given a single static image and a speech audio clip.”
The artificial intelligence (AI) division of Microsoft in Asia has been working on the program by compiling real single images of people, real audio, and in many cases various control signals such as the movements of people’s faces as they talk. Using all this data, Microsoft Research Asia is generating moving images of fake people that could someday replace actual newscasters and podcasters – at least those with so little personality and soul that robots could basically do their job.
“Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness,” the research team wrote in a paper about these latest developments.
“The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively.”
High-quality deepfakes
The methods used by Microsoft Research Asia to develop these sort-of human-like deepfakes produce high-quality video coupled with realistic facial and head dynamics. Such video can be generated online at 512×512 with up to 40 frames per second (FPS) and negligible starting latency.
In layman’s terms, the technology is so believable that many people would probably fall for it and think these are real people on their screens. Only the most discerning will be able to tell that something is not quite right with what they are seeing.
“It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” Microsoft Research Asia proudly claims.
If you are interested in seeing a few examples of these creepy AI moving and speaking images, you can do so at Microsoft.com.
“Our method is capable of not only producing precious lip-audio synchronization, but also generating a large spectrum of expressive facial nuances and natural head motions,” the company says.
“It can handle arbitrary-length [sic] audio and stably output seamless talking face videos.”
The purpose of the research is to unleash an entire society or army of virtual AI avatars, Microsoft says, but don’t worry: it’s all “aiming for positive applications,” the company insists.
“It is not intended to create content that is used to mislead or deceive,” reads a disclaimer on the site. “However, like other related content generation techniques, it could still potentially be misused for impersonating humans.”
“We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection. Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there’s still a gap to achieve the authenticity of real videos.”
The alleged positive use cases for such technology read like a parody, with Microsoft claiming that it can create “educational equity” while “improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need …”
Sources for this article include: