Performance Captured
Nov 1, 2004 12:00 PM, By Michael Goldman
Zemeckis Polar Express takes motion capture beyond its traditional application.
From a technical point of view, Robert Zemeckis'The Polar Express seems destined to do for motion capture what Zemeckis' Forrest Gump did for compositing — to permanently elevate the technique's capabilities and importance in the filmmaking equation. Similar to Gump's approach to compositing, Polar Express takes elements of a long-established technique and uses them in unique, more robust ways to make mo cap the central production method for a major studio feature.
In this case, filmmakers are calling it “performance capture,” essentially a mo cap on steroids process developed at Sony Pictures Imageworks, and they are citing Polar Express as the first animated studio feature ever made this way. As was the case with Gump, filmmakers say they developed the process to meet Zemeckis' strict creative demands for the Warner Bros. release. Those demands: to replicate as closely as possible the look and impressionistic artistry of the original short book, drawn by Chris Van Allsburg, while relying entirely on live performances from the actors, including Tom Hanks, who plays five roles, among them a child version of himself.
To permit director Robert Zemeckis to direct live performances, the production combined four Vicon MCam2 systems, using 64 cameras total. Shown here: an early version of one of the mo-cap volume stages.
Zemeckis asked Ken Ralston, his longtime visual effects collaborator, and his colleagues at SPI to help determine what production approach would best fit his vision for the film. Zemeckis, Ralston, and his partner, Jerome Chen, who ended up as co-senior visual effects supervisor with Ralston, discussed a host of options. They considered an effects-oriented live-action film, a standard CG approach, laborious combinations of keyframing and traditional mo-cap techniques, and everything in between, but none of those ideas fit the project's creative, financial, and deadline requirements.
Finding a Process
“Bob's desire to maintain the look and feel of the book dictated that it could not be live action, but if it was not live action, he wanted to raise CG animation to a much higher level of actor performance,” Ralston explains. “He wanted a different level that pure animation would not be able to capture — the nuances of a real actor's performance. The great thing about actors of the caliber of a Tom Hanks is their spontaneous movement — that serendipitous, accidental, sudden thought of an actor, the subtlety of what we do when we speak and move. Therefore, we had to design a system to allow us to capture that and put it into animated characters. We tested several processes. We experimented in Inferno with filters and looks to maintain the look of an art piece, but the more 2D it looked, I did not get the spatial sense of believing this was a real room and real characters. We did tests using more traditional motion-capture techniques, in which we captured motion of the body and head separately — the standard approach. But because the motion was captured separately, the technique did not capture the actors' true performances. That's when we realized we had to develop a way to capture the entire face and body in one pass.”
At that point, Ralston and Chen began investigating the various mo-cap systems available and ways they could use those systems, in various combinations, to meet the elusive goal of capturing detailed facial and body motion data simultaneously. When the project got underway two years ago, Chen was shocked to see how little the technology, in his perception, had evolved. “Traditionally, you do a body-capture session,” he says, “and then you sit the actor down facing the camera, so he can't move much at all, and have him act out his facial performance while looking straight at the camera. Then, later, the animator has to stick the two pieces together and adjust it by hand so that it doesn't look stilted. We tried this in our tests, but it was too difficult to get the performance to be seamless. That's when we realized we had to get it in a single pass. This had never been done before to the fidelity that we wanted for this project. That is where the moniker of ‘performance capture’ began.”
Of course, figuring out what to do is hardly the same thing as figuring out how to do it. Filmmakers brought mo-cap expert Alberto Menache onto the project as a digital effects supervisor in charge of mo cap, and Matrix veteran Demian Gordon as the mo-cap supervisor. Ralston, Chen, Menache, and Gordon searched high and low for technology they could fit together to accomplish their goal.
According to Chen, what he calls a “technical strike team” eventually settled on a combination of four full Vicon MCam2 camera systems cobbled together to form an unheard-of 64-camera mo-cap monolith. Among the calculations the group had to consider was how many cameras would be required to get 360 degrees of face and body coverage simultaneously while capturing enough clean data to satisfy Zemeckis' edict of exclusively using real actor performances for all human characters. Another challenge involved figuring out how large a volume capture area would work for getting close-ups, wide shots, effects shots, and stunt shots.
Chen emphasizes that what filmmakers concluded they needed and the previous uses of Vicon's technology were two different things. “We had to get full 360 degrees of coverage, and that meant we needed a ridiculous number of cameras,” he says. “Originally, we thought it would take almost 100 cameras. So we called up Vicon and learned they had never hooked up more than 48 cameras to their system, even in tests. No one had tried it, mainly because so much data would be generated with that level of coverage that the software would have to be extensively modified to handle it all. As it turned out, 100 cameras weren't practical, or affordable, so [Gordon] finally came up with the number 64 to get the coverage we needed. That led us to design a volume stage of 10'×10', which was big enough to capture three actors at once, for the close shots. We then realized we needed bigger volume for the wider scenes, so we designed a 30'×60' stage for those shots and a third capture volume stage of 30'×30' for stunts and physical effects. [Because the 30'×30' stage was used only for stunts, which required only body motion, the production used a lower-resolution, eight-camera Giant Studios system for that application.] This led to a workflow in which we could keep a first unit shooting in a live-action approach, while other units captured other material simultaneously.”
Mo-Cap Particulars
Vicon engineers worked closely with the production to upgrade Vicon's iQ data processing software, while Menache updated a proprietary SPI tool called Performance Facial Systems (PFS) software. Those tools, used in combination, were crucial to the production's success by uniting to form a pipeline for processing reams of body and facial movement data streaming from the stage.
“The regular Vicon [iQ] software processes video captured by the cameras,” Menache explains. “It combines all the different cameras' views of each marker to obtain the 3D coordinates of the marker in space. Operators called ‘trackers’ help this process by telling iQ which marker is which, frame after frame. The end result is a file with marker names and XYZ data for each marker. For facial movement, PFS then takes this data and uses it to compute the muscle contributions for each expression per frame. Animators have access to muscles and higher-level controls [called “poses,” groups of muscles] to tweak and modify the motion.
“Basically, to oversimplify it, the software uses the spaces between the markers to calculate where each muscle would be at a particular point in time, and it then feeds that model into the muscle software and delivers facial expressions that are extremely close to [Tom Hanks'] actual expressions. We did a lot of work figuring out what particular points on Tom's face required markers. During that process, I took 100 pictures of him doing different expressions and then traced where I thought every muscle in his face would be moving to. That's basically how I programmed his facial anatomy into the software.”
Hanks wore an amazing 152 markers on his face each day during production (Menache estimates more than 50,000 Styrofoam markers were used during the 45-day mo-cap shoot), having to undergo a grueling “makeup” session with up to 28 people applying them to his face each morning. To determine where and how to place the markers during preproduction, Menache made casts of every actor's face and drilled small holes into each mask where he wanted markers placed. That mask was placed on the actors' faces each morning, and with a makeup pencil, every marker spot was placed on the actor's face through the holes in the mask.
Filmmakers had to take yet another departure from a traditional mo-cap setup to accommodate Zemeckis' desire to direct actors as he would on a traditional live-action set — they had to remove hardware clutter entirely from the mo-cap stage. According to Chen, a separate room, dubbed “the Widowmaker” because of its tube-like submarine shape, was set up away from the stage to house all hardware operations, with data traveling from the nearby stage to the control center via a Gigabit Ethernet network. At this command center, data was carefully recorded and checked while Zemeckis directed actors several yards away.
During his perfomance-capture sessions, Tom Hanks wore 152 markers on his face each day of production, requiring a staff of 28 people to apply them each morning.
“There were 30 PCs in there, with a team of 10 technicians [for facial capture, plus five more technicians handling body-movement data] examining the data as it came in,” Menache says. “We communicated with the set by radio, with only one person on-set [Demian Gordon] dealing with us. He was the only person on set besides Bob Zemeckis, the actors, and the camera people.”
For the most part, this workflow permitted Zemeckis and the actors to enjoy a traditional live-action collaboration. There were limitations, however, that the director and his team had to work around. Actors had to remember to visualize and act out interaction with costume props like glasses, beards, and hats that could not be used on the mo-cap stage. The biggest restraint, though, was a limit on data processing power that allowed the mo-cap team to capture only about three minutes of data at a time before having to stop temporarily while that data was processed.
“For the most part, it was three minutes shooting and then waiting another three minutes between shots while the system merged the data and saved it before clearing the memory and getting ready to shoot again,” adds Menache. “And then, three times a day, we had to calibrate the system. Normally, you would calibrate the system by putting a lone object on the stage and having all the cameras see the same thing at the same time. But in this case, that was not possible, since we had so many cameras and were seeking a 360-degree view. So we developed a trial-and-error system for making sure the cameras were focused in the right place three times a day by, essentially, having a ‘dancer’ perform in a way designed to give all markers an equal view of the markers, while monitoring our software to make sure the data was properly digested by the iQ software. It slowed us down some [delaying the production about 20 minutes three times a day], but in between, everything ran smoothly, and we finished the entire mo-cap shoot in just 45 days.”
The Wheels Deal
That, however, wasn't the end of the performance construction process. Another key requirement that Zemeckis insisted on was the ability to choose “real” camera coverage and lighting for sequences in the virtual world after the motion data was captured. To this end, Ralston's team designed a so-called “wheels” system for the production — a computer device that essentially simulated the feel and function of a traditional pan-and-tilt camera gearhead, operated by an actual cinematographer. The hardware consisted of a gearhead console connected by serial cable to a computer workstation running Kaydara Motionbuilder software.
“Think of it this way — in traditional animation, the wheels process would be known as layout,” Chen explains. “We wanted to develop a way to give Bob, with his talent for visual storytelling, a way to physically manipulate the camera. When he walks onto a normal movie set, he can see the set and figure out what he wants in the physical world. But this was different — this was virtual.
“Still, Bob insisted on keeping a real cinematic feel. He felt that the problem with typical CG movies is that all cameras are obviously keyframed, and so everything is too perfect. We wanted a way to have a human camera operator doing realistic camera work on CG characters. So rather than training a computer operator to learn how to move a camera through keyframing, we created a sort of remote head replica that a typical camera operator would be used to operating.”
The console was designed to operate almost identically to something found when working a remote head on a Technocrane. “The reason we used a gearhead console was to provide the operator with a familiar environment,” Chen says, “so his operation of the ‘camera’ on the performance-captured characters looks realistic, since his controls behave like those of a real camera on a remote head.”
Assisting the DP during the process was a technical animator who functioned, in Chen's words, as “a digital grip.” The cameraman controlled pan-and-tilt functions, while the digital grip inputted into the software the start and end points of the camera move in both 3D space and time. In other words, the DP did not operate a real camera. Instead, he operated a computer-simulated replica of a Super 35mm camera with a spherical 35mm lens.
“We created a set of virtual lenses that mimic the field of view of prime 35mm optical lenses,” Chen explains. “During a layout, or wheels session, a selected action of performance capture was loaded into the software. The operator saw the digital characters in the virtual set. The camera was placed in 3D space inside the virtual set, and then, the scene was triggered. While the performance capture animated the characters, the operator manipulated the gearhead console to follow the characters just as he would if shooting live action with a real camera. As the characters moved around the virtual set, the operator rotated the pan-and-tilt wheels to follow them around, using the image depicted in the monitor as the equivalent of his eyepiece. As the operator manipulated the virtual camera, the software recorded the movements as a series of curves, defining the XYZ translation, and rotation. When the camera-move take was finished, the rendered sequences of images were uploaded to production editorial, properly slated and labeled with frame numbers and other pertinent information so that each take could be properly catalogued and loaded into an Avid for the editor's use.”
Zemeckis hired longtime collaborator Don Burgess to shoot reference footage on the mo-cap stage to help set up the mo-cap cameras, and also to R&D the wheels approach. During actual production, DP Robert Presley operated the wheels system to compose each shot's camera moves to Zemeckis' satisfaction. Both Burgess and Presley, therefore, are officially credited as DPs on what is technically an animated film.
The Polar Express required just less than two years for completion because the crew captured live performances on-set while working with modelers and integrators to apply the data to characters, and to build the animated backgrounds.
Chen calls the wheels approach liberating. “As you cut it together,” he says, “if you think up a better angle at some point, you can call in the wheels department to create the coverage later. This way, you never have a missed shot of the action. You have the performance preserved forever, and you can then manufacture the coverage you want. It leaves the film with an organic feel, giving the audience the experience that a real human was filming the action.”
Animation
While this process continued, so did the animation process for Polar Express. Ralston says that the animation approach was among the most efficient he has ever been involved with.
“For one thing, your typical animated feature takes up to four years to develop and produce,” he says. “We completed Polar Express in just under two years from the time we got the green light. We accomplished this by simultaneously capturing the live performances on-set while working with modelers and integrators to apply the data to the characters. We had multiple teams working in parallel, and to meet the production timeframe, we also had animators start work on the characters we would not be performance-capturing, such as dogs and reindeer.”
Chen emphasizes that one of the key character animation challenges was the fact that many characters had to closely resemble the actors providing the performance. Hanks, for instance, plays five characters in the film, including one that looks almost exactly like him. For Hanks and several children, the production once again relied on real-world elements to serve as key building blocks.
“Even for the group of child characters in this movie, Bob cast real kids just to base the look of the characters on them,” Chen says. “Bob didn't want their performance, just their look. [Performance for the lead child and other children in the film came from adult actors, including Hanks, but the look of the children was based on young actors.] So, after we cast the kids, we scanned the likeness of the actors and their costumes into the computer using a 3D Laser Scanner system at Gentle Giant Studios [Santa Monica]. We scanned about two-dozen children, plus Tom Hanks and other adult actors. We also took high-resolution digital photographs of their faces with a rig that basically has cameras positioned so images can be orthographic and strobe-synchronized with the cameras. That way, when the camera flashes, it gives us a good orthographic representation of their faces for textures. A texture painter then matches that scanned texture onto the digital scan of the face, giving us a solid CG representation of the person's face. We did that for several kids and for Tom Hanks. It's funny — if you watch a lot of TV, you will recognize a lot of the animated kids as resembling young actors you've seen in commercials and TV shows.”
Filmmakers used Maya for animation and modeling and Maya's Studio Paint to create 3D textures, rendering everything in Renderman. They also used Kaydara's Motion-builder for realtime playback of capture data for the wheels sessions, as well as to edit and integrate mo-cap data, along with Vicon's iQ software to process raw data from the capture cameras. In addition, an SPI proprietary compositing tool, Bonzai, was used for much of the compositing, and effects animation was created in Houdini in combination with SPI's own effects' renderer, Splat.
Chen emphasizes that Splat played an important role, but not the one it was originally designed for. “Splat was designed to give the show a more impressionistic look,” he says. “That was our original purpose. We developed the entire movie to have this impressionistic, pastel-type feel, but when we tested that on our imagery, it softened it too much and blurred eyes and made the images less intriguing. We tried this approach for months, but it became distracting. We ratcheted it back and ended up keeping the detail of the original images that we were initially trying to get rid of. So Splat ended up playing less of a role than we thought in that sense. But it still performed as a very fast renderer of smoke and snow effects, which was crucial on this show. What is always difficult about rendering out smoke and snow and things like that is the intense render times it takes to calculate the look of the elements, since they are just a bunch of particles originally. In this case, using Splat, we could render them out quickly, and therefore, do more iterations and get everything the way we wanted it on time.”
Even with performance capture's ability to provide filmmakers with seamless facial movement, and a long list of impressive tools, Ralston adds that SPI's animators still faced major challenges getting realistic eyes and lips onto the characters. “Eyes and lips could not be perfectly or practically captured, since getting sensors onto them is virtually impossible,” he says. “We had sensors around the lips and eyes, but not on them, obviously, so those had to be animated. The challenge of making that part of the character as real as the performances we captured was huge.”
In that sense, as Chen says, “every shot was like a hand-built Rolls Royce. We had to write new tools for the detail that went into the eyes — those little glints of light that are difficult to replicate, the parts of the eyeball that are too bright to animate easily. The shading controls for the eyes were extensive, giving our artists the ability to alter the color and brightness of the iris, the whites of the eye, and the meniscus. We even developed controls to define the shape of the highlight in the eye, and even how much specular breakup there would be on the surface of the eye. We also introduced unique pastel effects into the skin and cloth shading of the characters to help stylize the look.”
Finishing Up
SPI colorist Paul McGhee, supervised by Chen, performed a digital intermediate on the finished film, using a combination of inhouse, proprietary editorial and conforming tools and Discreet's Lustre color grading system. SPI filmed the movie out to Kodak ESTAR-base 2242 stock using Arri Laser recorders, and the negative was then developed and printed at Technicolor, North Hollywood, where final color tweaks were performed.
Traditionally, filmmakers and their collaborators are hesitant to proclaim their technical achievements ground-breaking, revolutionary, or industry-changing, preferring to leave those conclusions to critics, pundits, awards organizations, and ultimately history. Still, in the case of Polar Express, Ralston readily declares that the project's breakthroughs have “changed the rules.” Menache adds that the movie developed “a totally new pipeline for motion capture, never seen before, with great implications.” And Chen adds that he believes the film “could start a whole new genre, a new type of film.”
That remains to be seen, but performance capture is just getting started. Chen and his colleagues at SPI are already hard at work on the film, Monster House, slated for a 2005 release, directed by Gil Keenan and produced by Zemeckis.
Sidebar
Zemeckis On Performance Capture
Director Robert Zemeckis (right) with DP Don Burgess, who helped provide human camera coverage.
Robert Zemeckis concedes that he had no idea if The Polar Express could be made the way he envisioned. Going in, the director figured the movie would probably get made using “some version of the Star Wars or Sky Captain approach — actors in front of a bluescreen.” That, of course, was not exactly what Zemeckis had in mind. He wanted the movie to honor the artwork from the original Polar Express book while featuring live actor performances.
“[Co-senior visual effects supervisor] Ken Ralston told me I'd hate doing this movie bluescreen,” says Zemeckis. “He said [Sony Pictures Imageworks] had another way to do it where I could get the performance I wanted. He walked me through this whole performance-capture approach they were proposing. I was convinced that, in theory, they were correct about how to do it, but I needed to see an elaborate test before committing.”
That test consisted of a more traditional mo-cap approach, where body and facial capture was done separately, and then animators blended the two performances. “That was cumbersome, though,” Ze-meckis recalls. “It seemed like an unnecessary way to do it. I told them, come up with a way to do face and body together, and we'll do it.”
The resulting technique, performance capture, is, in Zemeckis' opinion, a really big deal. “It's a new and versatile and important tool for making moving images. I'm very proud of it.”
The result was hardly seamless. Zemeckis had to halt shooting every three minutes or so on the mo-cap stage to allow computers to ingest captured data, and he had to make other adjustments to his typical directing approach. But to Zemeckis, that is all beside the point because this was an animated movie.
“I had to change a few things, but it was no compromise compared to what you would do in normal 2D production,” he says. “Even though you can put a 10-minute magazine onto a film camera, most scenes in movies don't run longer than three minutes anyway. The ability to do this in continuity, in realtime, without worrying about light, since that was done in the computer, and cameras, marks, focus, hair, makeup, and all that, was beyond liberating for me. That allowed me to approach this like I would a live-action movie. Animated movies normally feature click-and-drag camera movement, but I wanted to have human feeling to the camera moves. That's where the wheels system came from: The idea was to take a remote head, pan-and-tilt wheels system, and lock it to the virtual camera, so that the two live elements — the actor's movement and the movement of the cameras — could be combined, both done by real humans. The only real difference was the fact that the actor's performance and the camera's live movement were done at different times in the process.”
But is the process cost-effective enough to become a standard production technique for animated and hybrid movies? Zemeckis notes that he is now producing Gil Keenan's Monster House for a 2005 release, using performance capture, and that project will be produced for considerably less than The Polar Express.
“[Polar Express] cost about a million dollars a minute [of running time],” he explains. “That is about average for a high-end animated movie. The biggest concern for me, going in, was that, at a million a minute, would we end up having it swell to $3 million a minute? But the system delivered, and this film came in right on the penny. And we are already doing it cheaper right now for Monster House. We're already more versatile because we solved a host of problems while developing Polar Express.
“To be honest, the only real limit this process put on me as a filmmaker was that I really had to discipline myself. With a normal production, the clock and budget pose limits to what you can do. In this situation, being able to add camera moves after shooting the actors, we could, if compelled, tinker with things endlessly. Eventually, I had to impose a certain discipline on myself to accept that it was perfect enough.”
— M. G.
Sidebar
The IMAX 3D Version
The Polar Express was slated to debut in November simultaneously in traditional theaters and in IMAX 3D theaters, making it the first major studio feature to debut in an IMAX format. Warner Bros. and Robert Zemeckis decided to take advantage of a new IMAX process called IMAX 3D DMR to digitally re-master and convert the entire film into the IMAX 15-perf, 70mm format to be projected as a 3D movie onto a gigantic screen.
The process takes the 3D modeling data from the film's original production, and then calculates the appropriate separation from the 2D POV to create a 3D viewing experience. IMAX DMR technology re-masters the material and records it out onto two, separate prints of 15/70 film for projection in IMAX 3D.
Sony Pictures Imageworks formed a large production unit to evolve the finished 2D version of the movie into a stereoscopic IMAX 3D film, under the watchful eye of SPI CG supervisor Rob Engle. According to Jerome Chen, co-visual effects supervisor on the movie, the challenge handled by Engle's team involved creating the second POV for each shot to evolve from a 2D presentation to a 3D presentation, re-rendering all those elements, and then compositing the shots all over again, twice, once for each eye in the 3D format. “Any flaws that revealed themselves in stereo required fixing,” he says.


Multimedia
Blogs
Forum
Affordable HD
Whitepapers
Advertisers
DCP Directory
Millimeter








