Published: Jan 11, 2023

Face Rigging Technical Exploration

Exploring the use of ML enhanced facial rigging

Over a two day exploration I dove into the excitement of realtime ML enhanced face rigging for human emotion using Ziva Dynamics. Here is my first test using ARKit to capture a performance and targeting onto a Ziva Dynamics base mesh.

The realtime playback of the mesh is impress

The iPhone ARKit capture could be greatly improved with brighter lighting and full frame face. For a first pass it is roughly passable.

The video below is showing our rig running in realtime in Cinema4D ViewPort mode. This is a 6GB Alembic. Pretty heavy file size for such a short clip but running nice and smooth.

The ML based rig comes with a set of controls that are a joy to work with. They allow adding and tweaking an actors performance. i.e. An animator can change the characters emotion during a performance. ML based rigging runs faster and lighter then computational rigging.

Exporting Ziva’s mesh to Alembic allows compatibility across other 3D apps such as Blender or Cinema4D.  Playback is realtime but comes with a heavy file size 6GB.   Only Ziva’s proprietary plugins use a read/write compression algorithm allow realtime playback of their proprietary ML training without the heavy cache files. (Live Realtime face morphing will be my next exploration)

The upside of ML based rigging

In conclusion, Ziva Dynamics ML Expressions are a great way to add a layer of flexible, lightweight realtime realism to ARKit face captures.

The downside of ML based rigging

• $2000+ to train a new face rig.

• About an hour to train (but far faster than building a custom joint computational rig)

• Technically complex setup process for a new 3D mesh.

• Proprietary technology

ML based rigs reliance on a proprietary ML algorithm does not allow flexibility for enterprise projects. Long and expensive processing times to generate ML trained avatars.

Ultimately ML rigs can provide realistic but generic detail as the ML training does not perfectly reflect the users unique human traits, tics and expression.

Next steps and further thoughts

• Learn how to best capture a performance with correct lighting and camera and understand how this could be achieved at the consumer level.

• Next I would like to setup a Ziva dynamic rig working in both Unity3D and Unreal running ARKit for realtime expression.
• I would like to work through the process of preparing a custom mesh with texture maps, preparing in Wrap3 and submitting it through Ziva's ML training.

Long term product development idea

The goal application to integrate this tech into a social Metaverse would focus development of a camera app a user can easily build new avatars.

1. Scanning the user's face

2. build a 3d model with texture maps.

3. quickly ML train expression or the new scan.

This 'one click' streamlined process would lower the barriers for new users to join the Metaverse with their personal photo-real avatar.

A few artistic ideas:

What if you plugged two avatars with speech generative AI (such as InWorldAI) to have a dialogue with each other?