An image is worth a thousand words, but the face says them all.
CGSociety explores Image Metrics.
mage Metrics has developed a superior facial capture and animation system that creates
facial animation with far greater accuracy, as much as seven times faster than keyframing, and they only need video files and a facial rig and 3D model to get started. One of the latest production companies to take advantage of the new technology was Rockstar Games, producing faces for the ‘Grand Theft Auto IV’ game scenes.
The eyes, even the tongue are returned fully animated, captured from the actor without any markers and no set up time, all at the location of the clients’ choice, and essentially no limitation to movement. Performance can be captured at the same time as body motion capture or separately during voice over. Working in Maya, Softimage|XSI, Houdini, 3ds Max, LightWave 3D, or Motion Builder, finished facial animations are returned to the studio in the same file format received, ready to be dropped right back into the pipeline. And with a client list including EA, Digital Domain, Rhythm and Hues, Rockstar, Konami, Capcom, Epic, Sony and others, it seems Image Metrics is on to something big.
Nick Perrett, Image Metrics’ VP of Business Development, spoke about when the first application of Image Metrics’ rig convinced him this was what the industry needed. “Grand Theft Auto San Andreas was the first title Image Metrics worked on with Rockstar Games way back in 2004,” he said. “Rockstar has a history of being at the forefront of innovation and took a bold step to use such extensive facial animation on San Andreas. The title was one of the fastest selling games in history and contained an enormous amount of content for titles on the PlayStation2 console generation. Since then, we've seen a widespread adoption of the Image Metrics facial animation process by Rockstar and many other top game developers who are searching for efficient ways to create more realistic characters for their titles.”
Image Metrics has gone on to provide facial capture on productions like the latest installment of Grand Theft Auto. Notorious for creating some of the most cutting-edge gameplay today, Rockstar Games turned to Image Metrics to take facial animation to the next level in the eleventh instalment of their blockbuster gaming franchise.
With two methods to import video into the pipeline, the director can choose the video capture method most efficient to his or her needs. In the case of full body MoCap, a headcam is used for facial capture, allowing the actor full mobility, assuring no matter where the actor moves their head, the image is always fixed. The second option is a static camera in a VO booth, where the video is shot at the same time as recording. This video, taken with a basic video camera, along with a properly constructed facial model and rig provides all the data required for the next step.
At the Image Metrics facility in Santa Monica, CA, the recorded facial movements are tracked in a step called “Performance Analysis”. The contours and shape of the mouth, the eyes, the size of the pupils and the position of the brows are tracked frame by frame. Crinkling of the skin and the tongue position is readable in the video, providing additional texture information that adds detail usually lost in traditional motion capture. Once completed, the results are reviewed and heads over to the animation stage, applying the Performance Analysis data to the facial rig.
The facial rig is either provided by the client according to their needs, with blend shapes, joints, and clusters, or Image Metrics can design the facial rig using a combination of joints and blend shapes to get the best results, an option that game studios might find especially useful. An Image Metrics animator applies the Performance Analysis data onto the 3D model, analyzing the actor’s movements and selecting the extreme poses, building a library of calibrations. The extreme poses are sourced from the actual shot, so it’s very specific and done on a shot by shot basis.
Once the library is built, the software calculates the in-betweens. Where a program like Maya might interpolate between every two or three frames, a 100 frame shot might require a fraction of that, a difference that can easily amount to days that might be used up with traditional pipelines. Though this step is similar to keyframing, the difference is the animation starts out much closer to the final results and the animator is basically tweaking subtleties of the 3D performance resulting in a performance that is staggeringly realistic. The animator then reviews the shot to determine if it is ready, tweaking and adding calibrations as needed.
Once approved, the rig is returned to the client fully animated and in the original format, ready to be dropped back into the pipeline ready to go, so the integration process couldn’t be easier. Though it’s often beneficial to capture both the face and the body motion at the same time because movements are so integrated, sometimes there is a preference to capture the actor on set, unencumbered with the facial rig, and make adjustments after the fact. When captured separately, Image Metrics will match the facial performance to the tone of the body capture, returning a file where only small adjustments to, perhaps eyelines, or a tweak to anticipation or breathing is still needed, but these are expected, and nominal.
Most people understand the method of facial motion tracking, the daily morning routine that starts with meticulous placement of 130 markers on the actors’ face that still misses areas that can’t be marked, like the corners of the eyes, the pupils, and the tongue. The process can take hours to set up and days to capture, requiring a trained MoCap team, yet produces limited results still that requires cleanup and keyframe tweaking with an end result that is too often disappointing.
The rigorous cleanup followed by corrective animation is often as much as 50% of the animation, and only after completion can the director see if the output has the desired result. If it is unsuitable, either a second capture session is required, or the studio has to covert the MoCap data into animation space so the animation team could edit in the required changes.
By comparison, Image Metrics uses a simple video camera either placed in front of the actor in a VO booth, or is attached to the head during body motion capture, a process that takes a matter of minutes, and provides a reading to within a pixel. It puts the soul and the life back into the character that MoCap beats out of performance. The process enables directors to realize their vision from the moment that it’s captured, they can select the take that appeals to them, and that performance is what they will see in the animated character. Additionally, the video provides texture information, something that is not available with motion capture. It’s more than tracking motion, it shows how the skin texture moves.
Image Metrics data has the option of editing and recalculating through the retargeter, allowing for requested adjustments such as how to make the actor warmer here, push the performance there, or take a distracting behavior out. Humans have hundreds of tiny movements all the time that are dropped out when a computer calculates between key frames, and too often those tiny twitches and subtleties are lost. But with retargeting, all that movement remains in the animation, only limited by the detail of the rig. Image Metrics can shape the performance according to the scene or the director or VFX Supervisors’ requirements. You could do all this with key-framing, but it would take you forever to achieve the desired results with such a level of subtlety.
"The performance can be retargeted onto any rigged character," continues Nick Perrett, "be it human or a talking horse, by finding the extremities of the motion of the actor and matching the pose on the 3D model. The result is more detailed and richer, but much faster than the key-frame process and consequently less costly because the speed of production is reduced."
An actor is selected for the specific talents and persona’s they can bring to the character. The technology opens the prospect that any character at any age can have all sorts of abilities that the captured actor never had. It’s like super makeup, allowing the actor to crawl into a different persona. While animators have enjoyed that ability for years, this technology opens that ability up to actors as well.
Everybody understands CG is an approximation of what goes on in the real world, but it’s never quite the same. MoCap tends to try to capture everything specifically and transfer that to the data, but as with any transference, something is lost, and it doesn’t maintain full believability. What an animator might do is emulate an equivalent performance by pushing gestures to achieve the message they are seeking. A good animator imbues life into the character making the character believable, but not necessarily accurate. With Image Metrics technology, there appears to be the best of both worlds.
Video games often have armies of similar characters, and it’s a time consuming process to build a rig for each of them. Image Metrics has a rig transfer technology that takes the master rig from one head and transfer it to any consecutive heads. Plus, Image Metrics has designed an intelligent package to accommodate the different physiological attributes of each character. The time to tweak the rig is reduced from two weeks to perhaps a day or two. This same technology applies to film characters, though more time is required to maintain the fine details of those characters.