Although this is very impressive, don't get me wrong. I still cannot shake off that video-gamey feel the entire thing has. I watched the demo on YouTube and it looks great and all, but the moment the character began to speak, something about the way her mouth moves felt off. Am I the only one or am I just crazy?
Maybe I'm in the minority here, but I have zero interest in realistic character models in video games. There are limitless possibilities when it comes to art styles in gaming, and the one AAA picks over and over is realism. It's so boring.
A good classic modern example of this is emulated graphics mods for Breath of the Wild: this is probably the most well-known recent 3d game that went with very noticeably stylised & simplified character models, likely in part due to the limited hardware. It's also a game with excellent art direction. Now people have modded it to make it more like other AAA titles & every single mod I've seen looks absolutely terrible in comparison to the original. Because they're completely forgoing art direction for something approaching realism. The former should always be the priority.
A previous title in the same franchise (Wind Waker) was also lambasted on release for it's alternative art style & today has aged considerably better than most games of that era.
In fairness that's a particularly egregious example. This one doesn't look so bad but still gets a lot of the lighting/shadows & colour balance very wrong, flipping between being oversaturated or too dark https://www.youtube.com/watch?v=Fscwp3RkEvM
Expanding the capabilities of the tools seems important even if literally photorealistic results isn’t the goal of the creative team. It’s hard to go wrong by having the ability to render light in physically realistic ways, have realistic character animations, etc. It’s a bit like complaining about cameras with higher resolution and dynamic range because “I have zero interest in movies that simply portray how reality looks as accurately as possible.”
It’s a bit like complaining about cameras with higher resolution and dynamic range because “I have zero interest in movies that simply portray how reality looks as accurately as possible.”
That's exactly what happened. People complain about the way high-frame-rate cameras look cheap.
The language of cinema is always unreal. Even the most "natural" performances are always highly stylized. Camera motion, lighting, makeup, etc are all very different from real life. You can see it even in behind-the-scenes footage; everything looks so fake. It's designed to feel right in the captured image, and that's the only goal.
They'll use whatever tools they have for it. This might be a good tool, but it's hard to predict. They never really worked out a good use for 3D in film, even though it's also more "natural" than a moving 2D image. It's a different kind of uncanny valley: it didn't resemble real life enough, but you were content with the really unrealistic language of 2D cinema.
> People complain about the way high-frame-rate cameras look cheap.
Everyone is entitled to their opinion, especially regarding aesthetics. But this is pretty much the same sort of complaint that I was criticizing. You can have a genuine steadfast aesthetic preference for black and white films, or silent films, or SD digital video, but I disagree with extending that to anything of the form “technology X is undesirable because it makes films more realistic and films are not supposed to be realistic.”
I'd say it's not so much that "films are not supposed to be realistic" as that films simply aren't realistic. Any additional change to make them more "realistic" is really an aesthetic choice, which may or may not work for people.
People liked some things, like talkies. And they rejected other things, like 3D. I could come up with an after-the-fact rationalization for why some things worked and some didn't, but it's pretty much impossible to tell beforehand.
As to the 24 FPS feel of cinema, it's entirely what one is accustomed to. Much like classical music gets associated with grandeur and emotion, yet only in some cultures. I for one love 300x200 graphics of my youth, and still have come to appreciate higher resolutions -- even in games with pixel art assets.
Newer generations growing up with 30+ FPS and better animation will develop different tastes.
> As to the 24 FPS feel of cinema, it's entirely what one is accustomed to.
Yep, and it’s also mistakenly assumed to be a necessary ingredient of all the great films you remember, rather than a nearly unavoidable ingredient. After all, those great films were all 24 FPS! And they used that medium in really creative and technically proficient ways!
And that’s true and great. But all the bad films were also 24 FPS, and it would be weird to say “imagine how much worse that terrible low budget film from 1978 would have been if they had modern professional digital cameras and shot in 48 FPS.”
Right. I still think these kind of tools are valuable. You can do motion capture and convert it into a style that is not photorealistic. It's arguably a more compelling demo to take live action and instantly turn it into something highly stylized. Then no one is going to complain about how the skin on the characters face doesn't stretch correctly or whatever.
It also is one of the art styles that turns dated / bad quickly. Any stylized art with a decent to strong design will still look good for years and years.
I don't care about photo realistic stuff, but I do care about the mouth and eyes moving realistically (unless you are specifically going for some effect) and proper tracking.
I want game characters that feel alive in all the ways my brain expects creatures to act alive, which I have yet to see done with most NPCs. Even key NPCs usually fall very short.
Cartoons have done this for decades, but it's (relatively) easy when everything can be preplanned.
This makes that easy whether you want to go realistic or not.
Yes, I thought the best part of the demo was the more "Pixar-ish" boy shown last. For most kinds of storytelling increased realism isn't that big of a barrier since we can just point a camera at a real human. To me, the ability to bring non-human characters to life is far more creatively enabling.
I think I have to agree. I much prefer a good stylistic approach over hyper-realism in most of my games. I always point to World of Warcraft, imo the style and 'cartoony' color palette not only looked visually appealing but helped patch over the low poly models and relatively low res textures as the game aged. I understand that recent expansions have given it an HD facelift, yet the style remains strong.
I wish they would spend just as much time and energy on gameplay, and less on trying to emulate movies or create interactive stories where you watch and click at the right time for elaborate scripted sequences that make it appear as though you aren't playing a choose your own adventure book disguised with tech. Gameplay needs to come first.
I think it's harder to have a cohesive stylized artstyle when you are trying to parallelize your art across 10s to 100s of laborers. It's a natural result of gamedev via org chart.
AAA isn't a value marker. It's a shame every time I see discussions of gamedev it's always "but can you do AAA with this library?" Who cares!
Word! Although perhaps not for the same reason. I tend to look at games with a very mechanistic viewpoint, it's a puzzle to be solved so get rid of everything that gets in the way of that.
I agree, also everything made with Unreal has a very samey look to it.
Exceptions exist of course, japanese developers are able to get very cartoony visuals but that means they won't be able to use this metahuman 3d model and animation technology.
Yes, and also: the actual actress felt a tiny bit uncomfortable, maybe the performance in front of a live audience, maybe because she could have a slip of the tongue or whatever. This made her performance feel more "human". The rendered animation lacked this aspect, the character didn't seem nervous in the slightest.
Still, this is impressive tech, don't get me wrong. But we are still in the Uncanny Valley. Getting better though!
I immediately wondered if her facial expressions were deliberately chosen to fit what the algorithm is good at - her angry face looked rather silly but the generated version matched pretty good. The generated “gaze to the side” however lacked most of the emotion I read of the “original”
I also got that sense, specifically around how much she showed her teeth during her video recording. I suspect the algorithm does that with teeth, so she was told "show teeth the entire time" to make it seem like the algorithm was more correct.
That seems like a reasonable assumption, especially when the actor's previous work in the same franchise wasn't that overly dramatized, and this was a one time shot on stage sort of deal. I could see them potentially informing her to go big as the capture will work with it, vs something realistic which might come across in the demo as "not working".
Not to fit what the algorithm is good at, because because all actors that have done face capture regularly end up overdoing their facial movements, because tech didn't capture it well if they didn't. Add that to the fact that she just has a very expressive face (she is the face capture actor Senua in for Hellblade: Senua's Sacrifice, and that character also has sometimes facial animations that feel uncanny, but it's really the actress doing that much.)
It's not a given that it will happen, but you can get out of the valley. It's just that the effect is more disturbing as you get close (but not outside) the upper limit of the valley.
>Yes, and also: the actual actress felt a tiny bit uncomfortable, maybe the performance in front of a live audience, maybe because she could have a slip of the tongue or whatever
> something about the way her mouth moves felt off
To be honest, the way her mouth moves in real life felt very off. The "(over-)acting" during the capture was very unnatural, likely down to pressure of being on-stage as well as that just being a common problem for untrained actors doing anything scripted.
Would be interested to see how it handles a capture of someone just engaging in natural conversation or something else less dramatic & exaggerated.
I don't have a source for this information, I think it's something I remember from looking into this technology years ago. I believe actors are instructed to over-act on purpose, because it's better in the end to ham it up a bit to help the tracking system capture fine detail. If you risk the system losing the fine detail in your expression the end result can look a lot worse and reduce the number of useable takes.
You get similar advice pretty much any time you get near a stage, because the audience is far away and can't see subtle movements. Also they're generally talking louder than normal.
I'm not sure that explains it all, but there are expected differences compared to how someone would move while talking normally.
It would be wonderful to allow overacting, to give the actor high dynamic control over nuance “production”, … and then be able to dial back the overacting automatically & manually for generation.
I think the problem is that in computer animation skin is allowed to stretch and compress in ways which is unnatural. Real skin is quite inelastic. It wrinkles and slides which creates the appereance of elasticity. Also, when real materials are stretched or compressed, they tend to form wrinkles in particular ways. I think that human eye can tell these very small discrepancies to guess that this material is not human skin (or any physical material).
This exact problem was even called out in kotaku's reporting on this. The exaggerated mouth movement really lean into the weakness of the engine. I'm surprised that they didn't notice this during production of the clip and adjusted how the scene was acted accordingly.
There's definitely something uncanny about the lips. I feel like the edges of the face are also more static than I expect (especially when she makes dramatic facial expressions like the "angry" expression, which ought to pull the skin a bit).
Very video-gamey, not super convincing. But then again this is running a model in just a few seconds, with input from consumer hardware! Makes me wonder what you could do with more capable hardware (or if the iPhone hardware is really THAT good).
It does make me think of one excellent application for this software: scanning your own face to make an in-game character who looks exactly like you. I assume I'm not the only one who's labored over character generators for hours to produce a decent-looking character. I think there's a neat opportunity here to get video game players to scan their own face as input instead.
This demo doesn't demonstrate scanning your own face to make an in-game character though, just using your face to create the animation mesh. You then have to use an existing high-quality model to generate the final result.
Generating a high-quality model from a camera phone is a whole other can of worms.
My significant other works for Epic on the cinematics team so we've played around with Metahuman a bit. Scans we've done weren't immediately perfect (like the video on that linked page mentions, hair and ears don't scan well) but cleaning it up was basically just deleting those portions of the mesh, letting RealityCapture fill in the rest of the head for the model, and then adding similar looking hair from Metahuman's database.
So there's probably a little bit of work needed before we get "scan your head with your phone to create a video game character automatically" but if you're cool requiring hair selection as opposed to auto generating it from the scan we're pretty much there.
I mean... It's a game engine tech demo by a video game company.
So yeah.
Maybe what you're getting at is "uncanny valley" - which I did get a little bit. But it's impressive how little I felt the valley for such a low-effort demo. I am sure with good actors and professional grade capture hardware this would be significantly less.
What's really exciting to me is that this tech lowers the bar. Indie game developers are going to soon be have this sort of photorealistic quality to their characters.
There are definitely a few issues with it especially in the skin stretch and squeeze around the lips, nose, and cheeks areas. There are a lot of muscles and skin movement that’s supposed to happen and rendering that needs time and good attention to detail with the materials.
But outside of that, there are the limits of rotoscoping coming into play. Human motion has a lot going on and our 3d captures aren’t anywhere near that kind of fidelity yet. Therefore we still need to rely on what the old masters of animation discovered when they were getting past the rotoscoped era. To make a performance more believable there has to be a little more acting and exaggeration brought in. The eyes can’t just get wider. They need to have anticipation, movement, secondary movement through other pieces of the face, etc. Same with the mouth. It’s not that the mouth can’t stretch in an abnormal way. It’s that it requires a little bit extra like a neck movement creating anticipation of an angry snarl.
As far as we have gotten along with mocap tech, when it comes to film, there’s a loooot of hard work that goes into fine tuning done by professional animators to make the performance of each actor look good when it’s translated to the 3d character.
Mouth movements in particular tend to set off my uncanny valley senses too. I was recently playing God of War (2018), a very pretty game but when I paid close attention to the mouth movements as the characters speak it was noticeably low-fidelity (the movements are very coarse, and also had slight but perceptible rushing/dragging relative to the spoken audio). Compared to how good everything else looked, I found it slightly jarring.
In my case I just try not to focus too hard on any one thing, and that helps me get immersed.
I noticed the weird mouth movements right away. You would think this is an area were a GAN would help a lot. Just reject movement patterns of the mouth that don't seem to match videos of real humans, even if the AI thinks it is capturing that motion of the mouth.
I was expecting something cutesy like Apple's animated animal avatar emoji things, but instead, it's near-live generation of more creepy frozen-faced fizzogs I've seen in various videogame demos -- and that rather strange Final Fantasy movie, 20Y or so back.
I think the uncanny valley sensation would go away if they tweaked the material settings in Unreal Engine. Like her lips don't look shiny enough to me and maybe there's some light bounce issues overall. But it's super close to being convincingly real.
I felt it too - but this seems like the kind of solution that will get you 90% of the way there, but still requires an animator to make it feel 'right'. This is insane though.
Her first smiling mouth gestures in the playback definitely looked like the mouth of Wallace (from the clay animation in Wallace and Gromit), not entirely human.
The moment the actor asks and "you only need to do this once for each actor?" seems to have a subtext of questions of actors future job security and odd combination of marketing latest tech and wtf am I going to do realization packed into a single presentation.
I have no dog in this fight, but perhaps I can offer a contrarian position. I hope people are able to look past how this technology will disrupt the status quo into the future of how such technology will enable much better cinematic story-telling by those without large studio budgets. Artists of the future will have opportunities to use lower-cost talent to enable their visions that will open up the floodgates of creativity, no different than what was seen when YouTube took off.
Both actors and producers can benefit when access to talent is not held behind the gates of a few powerful entities. We've seen the damage such gatekeepers can wield with the Weinstein case and numerous subsequent claims, where vulnerable actors have their careers held hostage to avoid upsetting predators who wield all the power. If an actor can pay the bills by doing many short-term mocap jobs for independent producers armed with sophisticated software, everyone can benefit.
In addition, motion capture opens up opportunities for very talented but otherwise less conventionally attractive actors who traditionally are relegated to type-cast bit roles.
I consider the advent of this technology as whole, with the current state of Hollywood, to be concerning. It will ultimately lead to a world where actors will have their autonomy stripped away from them, even more so than they do now. Consider the status quo, whereby an actor's autonomy is somewhat determined by their start power. If you're trying to break into the industry then you have to put up and shut up with what you're given.
With the advent of motion-capture like this, actors will be reduced to a marionette of flesh to be puppeted around as the producers see fit. Perhaps Hollywood will be going towards a future where actors no longer act in movies, instead their likeness is simply licensed to a studio for a certain number of movies. Anyway, I'm a cynic when it comes to this, the industry itself is already plenty abusive and exploitive, and this could further that.
Is that what happened to musicians after music production gear became cheaper and democratized?
Arguably this is the golden age for musicians. Lowest barriers to entry ever, if you have talent. You don't need anyone's permission to be a star ... just talent.
Why would it be different for actors?
I feel like you are not thinking about the full implications of this kind of democratizing tech.
As a self-admitted lousy musician and marginal-at-best amateur filmmaker, the last two decades of technological progress have unleashed a veritable Disneyland of capability I can access anytime 24/7. As amazing as the tools are, the ability to collaborate with other amateurs around the world and to instantly publish content to a global audience with no gatekeepers is equally transformative.
Perhaps I appreciate just how much we live in "the golden age" of personal creativity because I'm >50 yrs old and lived through saving from each paycheck to afford renting a high-quality camera for a day or an edit suite for a few hours (only between 12a-6a for the reduced rate). I'll just come out and say it, "Kids these days have no idea how amazing their world is." Now, excuse me while I go outside to yell at some clouds...
Yes I think there is definitely a positive angle. Hollywood's influence has been protected in large part by union influences controlling access to bankable actors, among other reasons. The possibilities for a surge of truly independent decentralized productions is promising.
That's not how the Hollywood talent unions work...
They don't control access; any production can hire any actor they want. The unions just negotiate a minimum wage for actors working on productions at major studios.
Your suggestion is horrific: you're suggesting that indie films should be allowed to profit off of someone's image without paying for them.
I don't read it that way at all. You would no more be able to steal someone's image and pretend it was them than you could use a famous bands music in your soundtrack without paying for it.
As I see it, a budding filmmaker could use two or three actors to fill 20 different roles in a film without needing expensive prop and makeup artists to place them in detailed fantasy/sci-fi scenes.
Me eldest is 10. In the past few weeks I've gone from "I'm pretty sure all my kids are gonna be OK" to "... well, better count on about two of three living with us until they're in their 30s".
I was down on the future in general, but fairly optimistic about my kids' mid-term prospects. Not so much, now. Feels like being on the Titanic just as the deck's starting to noticeably tilt.
I'm telling my kids to get their tractor-trailer license. It'll be 30 years before trucks will be allowed to travel completely autonomously along the road, and in the meantime they'll have autopilot that removes most of the manual burden and allows them to get paid while they sit in the cab to write books, compose music or program applications.
I say it half in jest, but it'll be a long time before politicians allow 80,000lb trucks to wander our highways without a human responsible for it.
I heard it more as, can I as an actress just get to the part where I act, and skip the just-stand-there-for-an-hour part getting my face captured.
Taking parallels - I don't know if many developers complain when compilation times go down, although technically they're being paid to wait. The thing that's scary to developers is GitHub copilot. But this actress here, she's still doing all the acting.
The goal for the game industry is to not need actresses or actors.
It's been increasingly common in indie game circles to buy libraries of mocap animations and lately facial expressions. Then things like Blender and Unity have been working on amazing solutions to merge various mocaps together.
Several multi-million dollar Unity games have used that process successfully 5 years ago, and can fill in the gaps with tech like unity asset "puppet face".
Which works surprisingly well with just webcam. Unity face capture on IOS is also fantastic.
I don’t understand this comment. If my model is training for 2 days I expect to be doing other work while I wait for the results. Are/were there engineers/researchers that expect to get paid to do nothing while they wait for paint to dry?
Imagine you are an ML consultant with an activity tracker installed. The company is only going to pay you for the short fragments of time you are typing the source code and launching subtasks, but not for the duration of those tasks if your activity tracker shows no activity on your side. I had a client who wanted to negotiate two rates - one for the active part and another one for the training time part (barely covering AWS costs). If you think this is rare then you might be well-insulated from the market.
> I had a client who wanted to negotiate two rates - one for the active part and another one for the training time part (barely covering AWS costs).
This doesn’t make any sense. Training time is billed according as a passed through cost plus, not an hourly rate for the person who started the task.
I don’t know anyone who would have gotten away with billing AWS training time at their own hourly rate as if they were working. That’s ridiculous. Would be hilarious to see someone submit 24-hour days for long training runs though.
I can understand that for outside-working hours tasks. But not for working hours where many companies really try to do the activity tracking-based payments like as if the consultant didn't have time allocated on working on their tasks already.
I think stuff like this further reinforces the value of actors over the animators. Sure they only have to scan an actor once to get a metahuman model, but they need the actor to give the actual performance.
Instead of one over the other, what might happen is the role of mo-cap actor and animator become a single role. Animators and actors both create expressions and poses of the character. Animators already film themselves in poses for inspiration.
In the future, a single person might end up taking their own mo-cap video as well as polishing up the animations for several characters.
Reminds me of how people just shoot their own podcasts now. It's just become that easy. At some point you might add to the team, but if you want to just try it, you can.
I think it's amazing as well, but we need to understand the caveats.
The videogame industry is notorious for over promising and under delivering.
Game Engines included.
Unreal Nanite and Lumen seemed amazing and has a great pitch/tech demo's, but in practice it's . . . not great. FPS have constant drops, and the complexity it introduced into game making workflow was heavy.
Several medium sized companies even have dropped nanite and lumen to go back to the old way of doing things.
Game Engine companies are notorious for leaving out important details about new features.
Beyond the social media not really, reddit, twitter, idga forums.
A lot of this borders "competitive intelligence" and game development is only getting increasingly competitive, so you typically need a friend to get you an invite into various private discord servers.
I'm learning game dev in my spare time,and these various private discords are critical to learn how not to waste time on various "over-promised" techs, and focus on results.
Agreed it's incredible, and merits tons of positive feedback. Full stop.
Also -- and this really is just a question not a nitpicking criticism -- when it comes to crossing the uncanny valley, am I right that mouths and teeth, specifically, seem particularly less realistic? Lips and subtle facial expressions make sense to me as challenging areas, but it surprises me a bit that teeth aren't typically rendered better. This applies to full-studio efforts not just Epic's new tech.
20 years ago Peter Jackson had a whole team to do it, across months, with Andy Serkis in a specialized suit, on a specialized camera rig, with amazing lighting and green screens. All to map to a single animation model Golem.
This was done on an iPhone, in subpar conditions, by 2 people, with regular clothes and background, for any/all humanoid models in under 90seconds!!!
Amazing!!! It’s the only word for it and it doesn’t do the accomplishment justice.
Not to dispute this, but just to suggest that maybe the improvement is "only" four orders of magnitude instead of six, or something: didn't the Gollum animation do full-body motion capture and transfer, and isn't this demo only for the face?
>20 years ago Peter Jackson had a whole team to do it, across months, with Andy Serkis in a specialized suit, on a specialized camera rig, with amazing lighting and green screens. All to map to a single animation model Golem.
... and to do it outside while doing full-body motion capture while splashing around in streams next to other live actors on film.
It's like comparing music festival hardware with home streamer equipment. Home streamer tech can sound significantly better, for a thousandth of the cost... because you can put up acoustic tiles.
It's a big achievement, but wild comparisons don't do really do it any benefit.
With a mobile phone camera without a suit or markers on the face. This makes motion capture available to you and me without renting or building million dollar studios.
I think people are noticing the uncanny valley of these animation demos, but I was really struck by the fact that one good character actor could generate a lot of great animations that could be used across tons of models. I feel like this sort of tech mixed with recent breakthrough in AI is going to open up really crazy RPG open world games in the next few years.
Pretty cool. Maybe someone more knowledgeable can explain why this is different than existing facial capture techniques? Is it just the same stuff repackaged into a quick and user friendly package?
You can do it on an iPhone with no additional equipment and can apply it to a MetaHuman model, which is a relatively high quality human model that epic has tools to generate.
It’s worth noting that both the FaceCap and Moves By Maxon apps can also capture facial animation using only an iPhone with no additional equipment.
It’s hard to tell from the video, but I wouldn’t be surprised if they used their face capture animation to drive wrinkle maps[1] on their MetaHuman models. This is something that other apps don’t offer out of the box, but which can significantly increase realism.
What do you do with those app outputs though? I’m guessing it still feels hard because it won’t automatically map to a mesh that is good to go in your game engine?
The MetaHumans drop in models are the thing that makes this magic imo. It’s not the motion capture tech so much as the complete pipeline to produce game assets.
FaceCap gives you an FBX animated with about 50 different morph targets. If you wanted to use that in a game, you’d probably need to load all the morph targets into a shader yourself.
I agree that the automatic integration with MetaHuman is the main benefit.
Actually you’ve already been able to do that for a few years now with Epic’s LiveLinkFace app (which was basically just a wrapper around Apple’s ARKit). This is just their own (much higher quality version) of it.
I keep seeing people say this without explaining what the huge thing strapped to the actor's face is, that clearly isn't a phone. Not explained in the article either.
I’m not sure what you mean. There’s one shot in the video where they show a full motion rig and clearly explain that this system works with both a phone and/or a professional stereo system.
I’m guessing that is for when you want to move your body around and capture the facial expressions associated with doing so. You need some sort of rig to keep the camera in your face if you’re moving your face.
The detail on the new Metahuman face capture is visibly better and a smoother capture pipeline has been a valuable goal for a some time. Noisy mocap and re-rigging animations to share them between models, takes time.
[Quote from article] The algorithm uses a "semantic space solution" that Mastilovic said guarantees the resulting animation "will always work the same in any face logic... it just doesn't break when you move it onto something else."
A few differences compared to other commercial iPhone-based capture techniques:
- Whole-clip solve instead of noisy frame-by-frame streaming
- Automatically generating a face model to calibrate the rig
- Solving from raw sensor data instead of Apple ARKit pose data
VFX studios typically develop proprietary solutions a few years in advance, so it can be hard to say what is truly "new and different."
To me its weird that UE5 was originally revealed almost 3 years ago and there afaik aren't still any games using it (besides Fortnite), this despite one of the promises of UE5 was ease of development. I realize that the cycle times for games are long, but this still feels pretty extreme
No, that’s exactly it. Game development cycles are long. UE was officially released just 1 year ago. Projects take on risk if they are one of the very first adopters of a new engine since they will end up dealing with all the teething issues. Dozens of UE5 games are in active development.
There has been a lot of underutilized tech in the entertainment space. The summer blockbusters and AAA games later this decade might truly be a hair-raising experience.
From what I understand, the real value add of this is getting motion capture close enough with a solid facial model, and being able to tweak facial animations from a few seconds of acting, with parameters automatically mapped. It's not meant to be an instant coffee animation.
Very impressive demo and use of the iPhone's depth camera. I suppose as long as FaceID exists there's no reason to try to use only RGB over RGB-D, though I am curious if they'll bring it to Android phones that have the required hardware like the Pixel.
Photorealism in CG is a milestone which is there, and can be strived for. It may be a necessary step to get through before interest in more interpretive styles truly takes off.
Analogous to how modern art in painting didn't really take off until photography had made photorealism attainable and no longer interesting for painters.
I wonder if this sort of thing will become trivial w/ generative models like gans/llms and such. To paraphrase Arnold...If you can parameterize (facial expressions - already not too hard w/ motion capture software), you can generate it?
This could really help low budget games as well. One person could do all the animations for low-poly models with text based dialog or generated voices.
On Blender, the cheap answer used to be Rhubarb addon, but there's probably something better now. Also also will probably depend on the rig and other accumulata.
You're being downvoted but I feel the same exact way. I've been playing video games since pong, and through the years the 3D effects have really been used to bring the story to life. These days, the story is used to drive the 3D effects, and I'm just not here for it. I could already start to feel it in the Final Fantasy 7 days - like, yeah, it's really neat that the characters are 3D now for the first time, but why am I sitting through this 10 minute "Knights of the Round" casting animation every single battle? Let me get back to the things I found fun about the Final Fantasy series, which was fine in 2D pixel art.
And yet, for some reason, the industry ran with those cutscenes. They got worse in FF8, and then today we see the end result. I mean, they really gave it away when they made Final Fantasy: The Sprits Within and they made such a big deal about how they did the hair. The whole movie was just to showcase their hair model.
So now here we are in 2023, and we are playing the same games over and over again, but with better graphics every year. I'm sure this face capture tech or similar will make its way into all the next AAA games including the next Final Fantasy, which will have amazing hair and clothes and eyes and facial movements, but it will be the same game that's been made for 30 years.
Thankfully, not all is lost.
The best game released in decades is hands down in my opinion Dwarf Fortress, which took the graphics budget and put it toward innovative gameplay. Then there's Factorio which similarly isn't using AAA graphics, and could be called "retro". Of course Minecraft was a huge hit, and it had FFF graphics, only proving that really the graphics are next to meaningless if the gameplay is fun enough.
Today there's a whole retro-gaming movement that's dedicated to breathing new life into consoles like the NES, but embracing the graphical limitations of the platform to focus on innovative gameplay and rich storytelling. The indy game scene is pretty strong to the point of being oversaturated really. I doubt it will cool down though with all these new AI technologies; now more than ever it's possible to take a game idea all the way to fruition with an AI helper generating assets for you.
If it's slow, costly, and hard to animate characters, only cutscenes—where the animated characters are the focus—will have well-animated characters. But if it's fast, cheap, and easy, then even the incidental “scenery” NPCs can have nice animations, making the game world seem more alive.
You're right, all games need precise motion capture and photorealistic models and the concept of "NPCs", otherwise how could you even call them games? Like this ugly mess for example, ugh: https://i.kym-cdn.com/photos/images/original/001/574/413/900...
I'd be interested to know, for the people who find this convincingly human-like, for those who are impressed, if you're on the autism/Aspberger's spectrum.
I understand people on the spectrum can have difficulty reading facial expressions, so I'm curious if this is close enough for them that it appears real, or sufficiently human.
As someone not on the spectrum, I can tell you that it's immediately obviously digital, fake, and gives me an uneasy feeling. It's that "uncanny valley" thing. So, I just wonder if the presenters, creators are on the spectrum, as they felt confident enough to showcase it and pass it off as sufficiently novel or convincing.
Well, I'm not diagnosed as any of that and most people would describe me as popular and friendly in real life so it seems unlikely I'm undiagnosed but ill.
And I thought it was bloody amazing. In such a short time, highly precise motion capture. This is tremendous technology. Looks fantastic!
> pass it off as sufficiently novel or convincing.
IMHO, that's not the central point. It's notable not because it's super convincing but because it's convincing enough to be creatively useful while being incredibly fast, reasonably inexpensive and relatively easy.
AAA game studios have already been able to reach similar (and even better) results for years. It's just super expensive and labor intensive. This demo is exciting because of how it can empower small game studios and even ambitious indies.
Although this is very impressive, don't get me wrong. I still cannot shake off that video-gamey feel the entire thing has. I watched the demo on YouTube and it looks great and all, but the moment the character began to speak, something about the way her mouth moves felt off. Am I the only one or am I just crazy?
Maybe I'm in the minority here, but I have zero interest in realistic character models in video games. There are limitless possibilities when it comes to art styles in gaming, and the one AAA picks over and over is realism. It's so boring.
A good classic modern example of this is emulated graphics mods for Breath of the Wild: this is probably the most well-known recent 3d game that went with very noticeably stylised & simplified character models, likely in part due to the limited hardware. It's also a game with excellent art direction. Now people have modded it to make it more like other AAA titles & every single mod I've seen looks absolutely terrible in comparison to the original. Because they're completely forgoing art direction for something approaching realism. The former should always be the priority.
A previous title in the same franchise (Wind Waker) was also lambasted on release for it's alternative art style & today has aged considerably better than most games of that era.
> people have modded it to make it more like other AAA titles
If anyone is interested in an example, I found this: https://m.youtube.com/watch?v=IP0B45hUPgo
In fairness that's a particularly egregious example. This one doesn't look so bad but still gets a lot of the lighting/shadows & colour balance very wrong, flipping between being oversaturated or too dark https://www.youtube.com/watch?v=Fscwp3RkEvM
Thanks, I hate it
Expanding the capabilities of the tools seems important even if literally photorealistic results isn’t the goal of the creative team. It’s hard to go wrong by having the ability to render light in physically realistic ways, have realistic character animations, etc. It’s a bit like complaining about cameras with higher resolution and dynamic range because “I have zero interest in movies that simply portray how reality looks as accurately as possible.”
It’s a bit like complaining about cameras with higher resolution and dynamic range because “I have zero interest in movies that simply portray how reality looks as accurately as possible.”
That's exactly what happened. People complain about the way high-frame-rate cameras look cheap.
The language of cinema is always unreal. Even the most "natural" performances are always highly stylized. Camera motion, lighting, makeup, etc are all very different from real life. You can see it even in behind-the-scenes footage; everything looks so fake. It's designed to feel right in the captured image, and that's the only goal.
They'll use whatever tools they have for it. This might be a good tool, but it's hard to predict. They never really worked out a good use for 3D in film, even though it's also more "natural" than a moving 2D image. It's a different kind of uncanny valley: it didn't resemble real life enough, but you were content with the really unrealistic language of 2D cinema.
> People complain about the way high-frame-rate cameras look cheap.
Everyone is entitled to their opinion, especially regarding aesthetics. But this is pretty much the same sort of complaint that I was criticizing. You can have a genuine steadfast aesthetic preference for black and white films, or silent films, or SD digital video, but I disagree with extending that to anything of the form “technology X is undesirable because it makes films more realistic and films are not supposed to be realistic.”
I'd say it's not so much that "films are not supposed to be realistic" as that films simply aren't realistic. Any additional change to make them more "realistic" is really an aesthetic choice, which may or may not work for people.
People liked some things, like talkies. And they rejected other things, like 3D. I could come up with an after-the-fact rationalization for why some things worked and some didn't, but it's pretty much impossible to tell beforehand.
As to the 24 FPS feel of cinema, it's entirely what one is accustomed to. Much like classical music gets associated with grandeur and emotion, yet only in some cultures. I for one love 300x200 graphics of my youth, and still have come to appreciate higher resolutions -- even in games with pixel art assets.
Newer generations growing up with 30+ FPS and better animation will develop different tastes.
> As to the 24 FPS feel of cinema, it's entirely what one is accustomed to.
Yep, and it’s also mistakenly assumed to be a necessary ingredient of all the great films you remember, rather than a nearly unavoidable ingredient. After all, those great films were all 24 FPS! And they used that medium in really creative and technically proficient ways!
And that’s true and great. But all the bad films were also 24 FPS, and it would be weird to say “imagine how much worse that terrible low budget film from 1978 would have been if they had modern professional digital cameras and shot in 48 FPS.”
Right. I still think these kind of tools are valuable. You can do motion capture and convert it into a style that is not photorealistic. It's arguably a more compelling demo to take live action and instantly turn it into something highly stylized. Then no one is going to complain about how the skin on the characters face doesn't stretch correctly or whatever.
It also is one of the art styles that turns dated / bad quickly. Any stylized art with a decent to strong design will still look good for years and years.
[dead]
I don't care about photo realistic stuff, but I do care about the mouth and eyes moving realistically (unless you are specifically going for some effect) and proper tracking.
I want game characters that feel alive in all the ways my brain expects creatures to act alive, which I have yet to see done with most NPCs. Even key NPCs usually fall very short.
Cartoons have done this for decades, but it's (relatively) easy when everything can be preplanned.
This makes that easy whether you want to go realistic or not.
Yes, I thought the best part of the demo was the more "Pixar-ish" boy shown last. For most kinds of storytelling increased realism isn't that big of a barrier since we can just point a camera at a real human. To me, the ability to bring non-human characters to life is far more creatively enabling.
Don't fret. Realism is just the least opinionated art director. You can apply this tech to more fantastical character models.
I think I have to agree. I much prefer a good stylistic approach over hyper-realism in most of my games. I always point to World of Warcraft, imo the style and 'cartoony' color palette not only looked visually appealing but helped patch over the low poly models and relatively low res textures as the game aged. I understand that recent expansions have given it an HD facelift, yet the style remains strong.
I wish they would spend just as much time and energy on gameplay, and less on trying to emulate movies or create interactive stories where you watch and click at the right time for elaborate scripted sequences that make it appear as though you aren't playing a choose your own adventure book disguised with tech. Gameplay needs to come first.
I think it's harder to have a cohesive stylized artstyle when you are trying to parallelize your art across 10s to 100s of laborers. It's a natural result of gamedev via org chart.
AAA isn't a value marker. It's a shame every time I see discussions of gamedev it's always "but can you do AAA with this library?" Who cares!
Realism is the strictest and so most useful test.
We also happen to be very good judges of what looks real.
And at some point I want to play realistic open world VR D&D, with multi-model intelligent NPC’s, enemies and beasts! Oh, how I have been waiting…
Word! Although perhaps not for the same reason. I tend to look at games with a very mechanistic viewpoint, it's a puzzle to be solved so get rid of everything that gets in the way of that.
AAA is basically meant to indicate that realism though. If you do something non-realistic you do not need an AAA team for it.
I agree, also everything made with Unreal has a very samey look to it.
Exceptions exist of course, japanese developers are able to get very cartoony visuals but that means they won't be able to use this metahuman 3d model and animation technology.
Yes, and also: the actual actress felt a tiny bit uncomfortable, maybe the performance in front of a live audience, maybe because she could have a slip of the tongue or whatever. This made her performance feel more "human". The rendered animation lacked this aspect, the character didn't seem nervous in the slightest.
Still, this is impressive tech, don't get me wrong. But we are still in the Uncanny Valley. Getting better though!
I immediately wondered if her facial expressions were deliberately chosen to fit what the algorithm is good at - her angry face looked rather silly but the generated version matched pretty good. The generated “gaze to the side” however lacked most of the emotion I read of the “original”
I also got that sense, specifically around how much she showed her teeth during her video recording. I suspect the algorithm does that with teeth, so she was told "show teeth the entire time" to make it seem like the algorithm was more correct.
That seems like a reasonable assumption, especially when the actor's previous work in the same franchise wasn't that overly dramatized, and this was a one time shot on stage sort of deal. I could see them potentially informing her to go big as the capture will work with it, vs something realistic which might come across in the demo as "not working".
Not to fit what the algorithm is good at, because because all actors that have done face capture regularly end up overdoing their facial movements, because tech didn't capture it well if they didn't. Add that to the fact that she just has a very expressive face (she is the face capture actor Senua in for Hellblade: Senua's Sacrifice, and that character also has sometimes facial animations that feel uncanny, but it's really the actress doing that much.)
My understanding of the uncanny valley is that it doesn't get better, it only gets worse!
It's not a given that it will happen, but you can get out of the valley. It's just that the effect is more disturbing as you get close (but not outside) the upper limit of the valley.
It's a valley, not a cliff.
Maybe because she isn't a professional actress
I'm not sure I follow. I don't particularly care why Juergens looks human, I care why the rendered animation looks less human than her.
I'm only referring to your first sentence
>Yes, and also: the actual actress felt a tiny bit uncomfortable, maybe the performance in front of a live audience, maybe because she could have a slip of the tongue or whatever
> something about the way her mouth moves felt off
To be honest, the way her mouth moves in real life felt very off. The "(over-)acting" during the capture was very unnatural, likely down to pressure of being on-stage as well as that just being a common problem for untrained actors doing anything scripted.
Would be interested to see how it handles a capture of someone just engaging in natural conversation or something else less dramatic & exaggerated.
I don't have a source for this information, I think it's something I remember from looking into this technology years ago. I believe actors are instructed to over-act on purpose, because it's better in the end to ham it up a bit to help the tracking system capture fine detail. If you risk the system losing the fine detail in your expression the end result can look a lot worse and reduce the number of useable takes.
You get similar advice pretty much any time you get near a stage, because the audience is far away and can't see subtle movements. Also they're generally talking louder than normal.
I'm not sure that explains it all, but there are expected differences compared to how someone would move while talking normally.
It would be wonderful to allow overacting, to give the actor high dynamic control over nuance “production”, … and then be able to dial back the overacting automatically & manually for generation.
> video-gamey feel
I agree. It's very impressive though, a clear leap forward.
Hellblade 2's demo is even worse though. It looks like the lips are completely disconnected from the gums and flap around loosely as they move.
https://www.youtube.com/watch?v=NCYMNmkjRS4
I think the problem is that in computer animation skin is allowed to stretch and compress in ways which is unnatural. Real skin is quite inelastic. It wrinkles and slides which creates the appereance of elasticity. Also, when real materials are stretched or compressed, they tend to form wrinkles in particular ways. I think that human eye can tell these very small discrepancies to guess that this material is not human skin (or any physical material).
This exact problem was even called out in kotaku's reporting on this. The exaggerated mouth movement really lean into the weakness of the engine. I'm surprised that they didn't notice this during production of the clip and adjusted how the scene was acted accordingly.
There's definitely something uncanny about the lips. I feel like the edges of the face are also more static than I expect (especially when she makes dramatic facial expressions like the "angry" expression, which ought to pull the skin a bit).
Very video-gamey, not super convincing. But then again this is running a model in just a few seconds, with input from consumer hardware! Makes me wonder what you could do with more capable hardware (or if the iPhone hardware is really THAT good).
It does make me think of one excellent application for this software: scanning your own face to make an in-game character who looks exactly like you. I assume I'm not the only one who's labored over character generators for hours to produce a decent-looking character. I think there's a neat opportunity here to get video game players to scan their own face as input instead.
This demo doesn't demonstrate scanning your own face to make an in-game character though, just using your face to create the animation mesh. You then have to use an existing high-quality model to generate the final result.
Generating a high-quality model from a camera phone is a whole other can of worms.
Epic released the tooling for doing an iPhone face scan to create a mesh that can then be used to create a rigged Metahuman model about 6 months ago! https://www.unrealengine.com/en-US/blog/how-to-use-realityca...
My significant other works for Epic on the cinematics team so we've played around with Metahuman a bit. Scans we've done weren't immediately perfect (like the video on that linked page mentions, hair and ears don't scan well) but cleaning it up was basically just deleting those portions of the mesh, letting RealityCapture fill in the rest of the head for the model, and then adding similar looking hair from Metahuman's database.
So there's probably a little bit of work needed before we get "scan your head with your phone to create a video game character automatically" but if you're cool requiring hair selection as opposed to auto generating it from the scan we're pretty much there.
I mean... It's a game engine tech demo by a video game company.
So yeah.
Maybe what you're getting at is "uncanny valley" - which I did get a little bit. But it's impressive how little I felt the valley for such a low-effort demo. I am sure with good actors and professional grade capture hardware this would be significantly less.
What's really exciting to me is that this tech lowers the bar. Indie game developers are going to soon be have this sort of photorealistic quality to their characters.
There are definitely a few issues with it especially in the skin stretch and squeeze around the lips, nose, and cheeks areas. There are a lot of muscles and skin movement that’s supposed to happen and rendering that needs time and good attention to detail with the materials.
But outside of that, there are the limits of rotoscoping coming into play. Human motion has a lot going on and our 3d captures aren’t anywhere near that kind of fidelity yet. Therefore we still need to rely on what the old masters of animation discovered when they were getting past the rotoscoped era. To make a performance more believable there has to be a little more acting and exaggeration brought in. The eyes can’t just get wider. They need to have anticipation, movement, secondary movement through other pieces of the face, etc. Same with the mouth. It’s not that the mouth can’t stretch in an abnormal way. It’s that it requires a little bit extra like a neck movement creating anticipation of an angry snarl.
As far as we have gotten along with mocap tech, when it comes to film, there’s a loooot of hard work that goes into fine tuning done by professional animators to make the performance of each actor look good when it’s translated to the 3d character.
I agree. This is impressive, but the article calls it "almost indistinguishable from the original video" which is definitely not true.
Yes but maybe if we wanted to smooth over those uncanny valley imperfections we spend more than 5 minutes on our production and get a better result.
Seems promising if we allow for iterative improvements.
Mouth movements in particular tend to set off my uncanny valley senses too. I was recently playing God of War (2018), a very pretty game but when I paid close attention to the mouth movements as the characters speak it was noticeably low-fidelity (the movements are very coarse, and also had slight but perceptible rushing/dragging relative to the spoken audio). Compared to how good everything else looked, I found it slightly jarring.
In my case I just try not to focus too hard on any one thing, and that helps me get immersed.
I noticed the weird mouth movements right away. You would think this is an area were a GAN would help a lot. Just reject movement patterns of the mouth that don't seem to match videos of real humans, even if the AI thinks it is capturing that motion of the mouth.
From the article, it sounds like they are using a GAN. The system was trained on a large database of facial expressions/movements.
> ...something about the way her mouth moves felt off. Am I the only one or am I just crazy?
No, you're not the only one. You may still be crazy.
No, you're completely right.
I was expecting something cutesy like Apple's animated animal avatar emoji things, but instead, it's near-live generation of more creepy frozen-faced fizzogs I've seen in various videogame demos -- and that rather strange Final Fantasy movie, 20Y or so back.
I think the uncanny valley sensation would go away if they tweaked the material settings in Unreal Engine. Like her lips don't look shiny enough to me and maybe there's some light bounce issues overall. But it's super close to being convincingly real.
I felt it too - but this seems like the kind of solution that will get you 90% of the way there, but still requires an animator to make it feel 'right'. This is insane though.
Her first smiling mouth gestures in the playback definitely looked like the mouth of Wallace (from the clay animation in Wallace and Gromit), not entirely human.
I wonder if we had the same feeling. I thought that the characters upper lip moved up too much during some of the more dramatic words.
I agree - the resulting look and feel just isn't quite right for a human, and it falls in the uncanny valley.
You can really see it when she closes the eyes, that particular part makes it look super video-gamey.
I think she was told to do it on purpose to exaggerate the effect on screen.
i had the same thing with the mouth..it used to be the eyes ..now its the mouth. the shifting uncanny valley.
[dead]
The moment the actor asks and "you only need to do this once for each actor?" seems to have a subtext of questions of actors future job security and odd combination of marketing latest tech and wtf am I going to do realization packed into a single presentation.
I have no dog in this fight, but perhaps I can offer a contrarian position. I hope people are able to look past how this technology will disrupt the status quo into the future of how such technology will enable much better cinematic story-telling by those without large studio budgets. Artists of the future will have opportunities to use lower-cost talent to enable their visions that will open up the floodgates of creativity, no different than what was seen when YouTube took off.
Both actors and producers can benefit when access to talent is not held behind the gates of a few powerful entities. We've seen the damage such gatekeepers can wield with the Weinstein case and numerous subsequent claims, where vulnerable actors have their careers held hostage to avoid upsetting predators who wield all the power. If an actor can pay the bills by doing many short-term mocap jobs for independent producers armed with sophisticated software, everyone can benefit.
In addition, motion capture opens up opportunities for very talented but otherwise less conventionally attractive actors who traditionally are relegated to type-cast bit roles.
I consider the advent of this technology as whole, with the current state of Hollywood, to be concerning. It will ultimately lead to a world where actors will have their autonomy stripped away from them, even more so than they do now. Consider the status quo, whereby an actor's autonomy is somewhat determined by their start power. If you're trying to break into the industry then you have to put up and shut up with what you're given.
With the advent of motion-capture like this, actors will be reduced to a marionette of flesh to be puppeted around as the producers see fit. Perhaps Hollywood will be going towards a future where actors no longer act in movies, instead their likeness is simply licensed to a studio for a certain number of movies. Anyway, I'm a cynic when it comes to this, the industry itself is already plenty abusive and exploitive, and this could further that.
Perhaps like music there will be a split between money from 'recorded' performances and money from 'live'.
Theatre wasn't killed by films and TV, live music is still going despite recording.
Is that what happened to musicians after music production gear became cheaper and democratized?
Arguably this is the golden age for musicians. Lowest barriers to entry ever, if you have talent. You don't need anyone's permission to be a star ... just talent.
Why would it be different for actors?
I feel like you are not thinking about the full implications of this kind of democratizing tech.
As a self-admitted lousy musician and marginal-at-best amateur filmmaker, the last two decades of technological progress have unleashed a veritable Disneyland of capability I can access anytime 24/7. As amazing as the tools are, the ability to collaborate with other amateurs around the world and to instantly publish content to a global audience with no gatekeepers is equally transformative.
Perhaps I appreciate just how much we live in "the golden age" of personal creativity because I'm >50 yrs old and lived through saving from each paycheck to afford renting a high-quality camera for a day or an edit suite for a few hours (only between 12a-6a for the reduced rate). I'll just come out and say it, "Kids these days have no idea how amazing their world is." Now, excuse me while I go outside to yell at some clouds...
Yes I think there is definitely a positive angle. Hollywood's influence has been protected in large part by union influences controlling access to bankable actors, among other reasons. The possibilities for a surge of truly independent decentralized productions is promising.
That's not how the Hollywood talent unions work...
They don't control access; any production can hire any actor they want. The unions just negotiate a minimum wage for actors working on productions at major studios.
Your suggestion is horrific: you're suggesting that indie films should be allowed to profit off of someone's image without paying for them.
I don't read it that way at all. You would no more be able to steal someone's image and pretend it was them than you could use a famous bands music in your soundtrack without paying for it.
As I see it, a budding filmmaker could use two or three actors to fill 20 different roles in a film without needing expensive prop and makeup artists to place them in detailed fantasy/sci-fi scenes.
Hopefully breaking away from the continuously lazy loop of people in capes that now dominate Hollywood productions.
Artists... don't you mean AI?
The next 10 years are going to be interesting. I wouldn't want to be a 16 year old trying to decide upon a career path.
Me eldest is 10. In the past few weeks I've gone from "I'm pretty sure all my kids are gonna be OK" to "... well, better count on about two of three living with us until they're in their 30s".
I was down on the future in general, but fairly optimistic about my kids' mid-term prospects. Not so much, now. Feels like being on the Titanic just as the deck's starting to noticeably tilt.
I'm telling my kids to get their tractor-trailer license. It'll be 30 years before trucks will be allowed to travel completely autonomously along the road, and in the meantime they'll have autopilot that removes most of the manual burden and allows them to get paid while they sit in the cab to write books, compose music or program applications.
I say it half in jest, but it'll be a long time before politicians allow 80,000lb trucks to wander our highways without a human responsible for it.
I heard it more as, can I as an actress just get to the part where I act, and skip the just-stand-there-for-an-hour part getting my face captured.
Taking parallels - I don't know if many developers complain when compilation times go down, although technically they're being paid to wait. The thing that's scary to developers is GitHub copilot. But this actress here, she's still doing all the acting.
The goal for the game industry is to not need actresses or actors.
It's been increasingly common in indie game circles to buy libraries of mocap animations and lately facial expressions. Then things like Blender and Unity have been working on amazing solutions to merge various mocaps together.
Several multi-million dollar Unity games have used that process successfully 5 years ago, and can fill in the gaps with tech like unity asset "puppet face". Which works surprisingly well with just webcam. Unity face capture on IOS is also fantastic.
Many companies don't pay ML engineers/researchers for the time spent training an ML model anymore.
I don’t understand this comment. If my model is training for 2 days I expect to be doing other work while I wait for the results. Are/were there engineers/researchers that expect to get paid to do nothing while they wait for paint to dry?
Yes, many of them used the training time for quiet deep thinking without worrying about getting paid.
If they’re doing “deep thinking” about work then it’s literally work. The clock doesn’t stop just because you’re not typing.
Also, many of these positions are salaried so they aren’t actively tracking hours.
Imagine you are an ML consultant with an activity tracker installed. The company is only going to pay you for the short fragments of time you are typing the source code and launching subtasks, but not for the duration of those tasks if your activity tracker shows no activity on your side. I had a client who wanted to negotiate two rates - one for the active part and another one for the training time part (barely covering AWS costs). If you think this is rare then you might be well-insulated from the market.
> I had a client who wanted to negotiate two rates - one for the active part and another one for the training time part (barely covering AWS costs).
This doesn’t make any sense. Training time is billed according as a passed through cost plus, not an hourly rate for the person who started the task.
I don’t know anyone who would have gotten away with billing AWS training time at their own hourly rate as if they were working. That’s ridiculous. Would be hilarious to see someone submit 24-hour days for long training runs though.
I can understand that for outside-working hours tasks. But not for working hours where many companies really try to do the activity tracking-based payments like as if the consultant didn't have time allocated on working on their tasks already.
I think stuff like this further reinforces the value of actors over the animators. Sure they only have to scan an actor once to get a metahuman model, but they need the actor to give the actual performance.
Instead of one over the other, what might happen is the role of mo-cap actor and animator become a single role. Animators and actors both create expressions and poses of the character. Animators already film themselves in poses for inspiration.
In the future, a single person might end up taking their own mo-cap video as well as polishing up the animations for several characters.
Reminds me of how people just shoot their own podcasts now. It's just become that easy. At some point you might add to the team, but if you want to just try it, you can.
Animators are still needed, because not all motions can be captured.
It was probably a similar feeling to how stage actors felt when they heard that someone created a new-fangled invention called the 'video camera'
"You mean I only have to perform the play one time, and people can watch me perform it over and over....for free?"
Actors already can license their digital likeness for royalties. They would have to set this up in the same way. https://www.indiewire.com/2022/10/bruce-willis-sells-likenes...
Classic slacker news with all the negativity.
This is incredible, wow! Stuff like this is why I got into tech in the first place, indistinguishable from magic.
1. Wow a thing, people seem to be excited about it.
2. Are people excited about my thing? Am I excited about it?
3. Have I made the wrong life choices?
4. This makes me feel uncomfortable.
5. If I can defend my life choices I can feel good again.
6. I will downplay this thing, that will resolve my uncomfortable feelings.
I think it's amazing as well, but we need to understand the caveats.
The videogame industry is notorious for over promising and under delivering. Game Engines included. Unreal Nanite and Lumen seemed amazing and has a great pitch/tech demo's, but in practice it's . . . not great. FPS have constant drops, and the complexity it introduced into game making workflow was heavy.
Several medium sized companies even have dropped nanite and lumen to go back to the old way of doing things.
Game Engine companies are notorious for leaving out important details about new features.
>not great
Is there any recommended reading/ranting on this?
Beyond the social media not really, reddit, twitter, idga forums.
A lot of this borders "competitive intelligence" and game development is only getting increasingly competitive, so you typically need a friend to get you an invite into various private discord servers.
I'm learning game dev in my spare time,and these various private discords are critical to learn how not to waste time on various "over-promised" techs, and focus on results.
Unreal (and Unity) are no longer just for video games ;)
Agreed it's incredible, and merits tons of positive feedback. Full stop.
Also -- and this really is just a question not a nitpicking criticism -- when it comes to crossing the uncanny valley, am I right that mouths and teeth, specifically, seem particularly less realistic? Lips and subtle facial expressions make sense to me as challenging areas, but it surprises me a bit that teeth aren't typically rendered better. This applies to full-studio efforts not just Epic's new tech.
Yeah, I noticed it isn't perfect by any means, but leaps and bounds better than most stuff I've seen.
And any studio worth its salt would spend time fixing up the tiny inconsistencies; my understanding is this was meant to show it as raw as possible.
No kidding! What a bunch of joyless grumps. This tech looks amazing and like it will bring mocap to the game development masses.
> This is incredible
it is video tracking of 30 points of the face, and then those points are extrapolated in a model. Seems like 20 year old tech tbh
20 years ago Peter Jackson had a whole team to do it, across months, with Andy Serkis in a specialized suit, on a specialized camera rig, with amazing lighting and green screens. All to map to a single animation model Golem.
This was done on an iPhone, in subpar conditions, by 2 people, with regular clothes and background, for any/all humanoid models in under 90seconds!!!
Amazing!!! It’s the only word for it and it doesn’t do the accomplishment justice.
Not to dispute this, but just to suggest that maybe the improvement is "only" four orders of magnitude instead of six, or something: didn't the Gollum animation do full-body motion capture and transfer, and isn't this demo only for the face?
>20 years ago Peter Jackson had a whole team to do it, across months, with Andy Serkis in a specialized suit, on a specialized camera rig, with amazing lighting and green screens. All to map to a single animation model Golem.
... and to do it outside while doing full-body motion capture while splashing around in streams next to other live actors on film.
It's like comparing music festival hardware with home streamer equipment. Home streamer tech can sound significantly better, for a thousandth of the cost... because you can put up acoustic tiles.
It's a big achievement, but wild comparisons don't do really do it any benefit.
With a mobile phone camera without a suit or markers on the face. This makes motion capture available to you and me without renting or building million dollar studios.
I think people are noticing the uncanny valley of these animation demos, but I was really struck by the fact that one good character actor could generate a lot of great animations that could be used across tons of models. I feel like this sort of tech mixed with recent breakthrough in AI is going to open up really crazy RPG open world games in the next few years.
Pretty cool. Maybe someone more knowledgeable can explain why this is different than existing facial capture techniques? Is it just the same stuff repackaged into a quick and user friendly package?
You can do it on an iPhone with no additional equipment and can apply it to a MetaHuman model, which is a relatively high quality human model that epic has tools to generate.
It’s worth noting that both the FaceCap and Moves By Maxon apps can also capture facial animation using only an iPhone with no additional equipment.
It’s hard to tell from the video, but I wouldn’t be surprised if they used their face capture animation to drive wrinkle maps[1] on their MetaHuman models. This is something that other apps don’t offer out of the box, but which can significantly increase realism.
[1] https://m.youtube.com/watch?v=nydjtjIncSk
What do you do with those app outputs though? I’m guessing it still feels hard because it won’t automatically map to a mesh that is good to go in your game engine?
The MetaHumans drop in models are the thing that makes this magic imo. It’s not the motion capture tech so much as the complete pipeline to produce game assets.
FaceCap gives you an FBX animated with about 50 different morph targets. If you wanted to use that in a game, you’d probably need to load all the morph targets into a shader yourself.
I agree that the automatic integration with MetaHuman is the main benefit.
Actually you’ve already been able to do that for a few years now with Epic’s LiveLinkFace app (which was basically just a wrapper around Apple’s ARKit). This is just their own (much higher quality version) of it.
I’ve played around with it (the initial version). I didn’t think LiveLink was good enough personally.
I keep seeing people say this without explaining what the huge thing strapped to the actor's face is, that clearly isn't a phone. Not explained in the article either.
I’m not sure what you mean. There’s one shot in the video where they show a full motion rig and clearly explain that this system works with both a phone and/or a professional stereo system.
I’m guessing that is for when you want to move your body around and capture the facial expressions associated with doing so. You need some sort of rig to keep the camera in your face if you’re moving your face.
This Bebylon presentation at SIGGRAPH 2018 is an example of the (now much older) way of doing the same iPhone face capture.
https://www.youtube.com/watch?v=lXZhgkNFGfM
The detail on the new Metahuman face capture is visibly better and a smoother capture pipeline has been a valuable goal for a some time. Noisy mocap and re-rigging animations to share them between models, takes time.
[Quote from article] The algorithm uses a "semantic space solution" that Mastilovic said guarantees the resulting animation "will always work the same in any face logic... it just doesn't break when you move it onto something else."
A few differences compared to other commercial iPhone-based capture techniques:
- Whole-clip solve instead of noisy frame-by-frame streaming - Automatically generating a face model to calibrate the rig - Solving from raw sensor data instead of Apple ARKit pose data
VFX studios typically develop proprietary solutions a few years in advance, so it can be hard to say what is truly "new and different."
>why this is different than existing facial capture techniques
In the past, it was common to use video capture with reference markings on the actor so you could figure out how they were moving in three diminsions.
From this demo, it looks like they are directly leveraging the 3D data from the iPhone's Lidar sensor in addition to the video camera data.
Right but that's been available for a while through various apps.
This is really impressive.
I think part of the issue building the 'metaverse', is that content generation is just so incredibly labour-intense.
I think visual AI generation and this kind of tech will really help a ton.
To me its weird that UE5 was originally revealed almost 3 years ago and there afaik aren't still any games using it (besides Fortnite), this despite one of the promises of UE5 was ease of development. I realize that the cycle times for games are long, but this still feels pretty extreme
No, that’s exactly it. Game development cycles are long. UE was officially released just 1 year ago. Projects take on risk if they are one of the very first adopters of a new engine since they will end up dealing with all the teething issues. Dozens of UE5 games are in active development.
It officially launched last April...
There has been a lot of underutilized tech in the entertainment space. The summer blockbusters and AAA games later this decade might truly be a hair-raising experience.
From what I understand, the real value add of this is getting motion capture close enough with a solid facial model, and being able to tweak facial animations from a few seconds of acting, with parameters automatically mapped. It's not meant to be an instant coffee animation.
Isn't instant coffee the next logical progression in this tech roadmap?
Yes, but I'm noticing a lot of people complaining that it doesn't immediately look like a real person.
Very impressive demo and use of the iPhone's depth camera. I suppose as long as FaceID exists there's no reason to try to use only RGB over RGB-D, though I am curious if they'll bring it to Android phones that have the required hardware like the Pixel.
Acting will become remote work soon the way voice acting is. This’ll be a big deal for small creators and create a lot of acting opportunities.
Uncanny valley effect is still there.
Effort of doubling real life is unbelievable, but... I prefer stylized, artistically enhanced virtual reality.
What's amazing is that given another 10 years, with vastly superior hardware, this can happen in basically realtime.
Photorealism in CG is a milestone which is there, and can be strived for. It may be a necessary step to get through before interest in more interpretive styles truly takes off.
Analogous to how modern art in painting didn't really take off until photography had made photorealism attainable and no longer interesting for painters.
That's amazing. But there's still something discernibly ... off about it. But so so close.
I wonder if this sort of thing will become trivial w/ generative models like gans/llms and such. To paraphrase Arnold...If you can parameterize (facial expressions - already not too hard w/ motion capture software), you can generate it?
I can't wait to see what Corridor Crew does with this
This could really help low budget games as well. One person could do all the animations for low-poly models with text based dialog or generated voices.
This type of tech should be what the billions spent on the 'Metaverse' should be focusing on, not the creepy Wiimoji torso things.
Neat. A few years away from sending fiverr to outsource acting to a developing country while face animation and voice gets synthesized.
I hope this kind of thing will come to unity too. It's just a lot more accessible to beginners. And easier to work with for VR.
The demo on that link. State of Unreal. There’s something unnatural about the facial movement… it seems overly exaggerated and forced.
Slightly off topic, but does anyone know of any tools that convert realtime audio to lip animation for a 3d rig?
On Blender, the cheap answer used to be Rhubarb addon, but there's probably something better now. Also also will probably depend on the rig and other accumulata.
Nvidia Omniverse Audio2Face perhaps?
This would have big implications for AR/Google Glasses if the processing can be packaged like that.
Now there's no excuse to not make an L.A. Noire sequel.
Why not capturing the hair motion as well? Let her have hair in front and move it aside by quickly turning the head and blowing it.
Because hair is a whole other complicated subject.
is this already released for people to try out ?
Yet another great technology that does nothing for gameplay and only furthers moviefication.
You're being downvoted but I feel the same exact way. I've been playing video games since pong, and through the years the 3D effects have really been used to bring the story to life. These days, the story is used to drive the 3D effects, and I'm just not here for it. I could already start to feel it in the Final Fantasy 7 days - like, yeah, it's really neat that the characters are 3D now for the first time, but why am I sitting through this 10 minute "Knights of the Round" casting animation every single battle? Let me get back to the things I found fun about the Final Fantasy series, which was fine in 2D pixel art.
And yet, for some reason, the industry ran with those cutscenes. They got worse in FF8, and then today we see the end result. I mean, they really gave it away when they made Final Fantasy: The Sprits Within and they made such a big deal about how they did the hair. The whole movie was just to showcase their hair model.
So now here we are in 2023, and we are playing the same games over and over again, but with better graphics every year. I'm sure this face capture tech or similar will make its way into all the next AAA games including the next Final Fantasy, which will have amazing hair and clothes and eyes and facial movements, but it will be the same game that's been made for 30 years.
Thankfully, not all is lost.
The best game released in decades is hands down in my opinion Dwarf Fortress, which took the graphics budget and put it toward innovative gameplay. Then there's Factorio which similarly isn't using AAA graphics, and could be called "retro". Of course Minecraft was a huge hit, and it had FFF graphics, only proving that really the graphics are next to meaningless if the gameplay is fun enough.
Today there's a whole retro-gaming movement that's dedicated to breathing new life into consoles like the NES, but embracing the graphical limitations of the platform to focus on innovative gameplay and rich storytelling. The indy game scene is pretty strong to the point of being oversaturated really. I doubt it will cool down though with all these new AI technologies; now more than ever it's possible to take a game idea all the way to fruition with an AI helper generating assets for you.
If it's slow, costly, and hard to animate characters, only cutscenes—where the animated characters are the focus—will have well-animated characters. But if it's fast, cheap, and easy, then even the incidental “scenery” NPCs can have nice animations, making the game world seem more alive.
Why are there characters that need animating? I want to shoot things in hyperbolic space. I want to simulate an entire landscape. I want to be a tree.
I disagree. I'm completely tired of mannequin NPCs. Anything that will make it easier for NPCs to behave like living things is a win in my book.
You're right, all games need precise motion capture and photorealistic models and the concept of "NPCs", otherwise how could you even call them games? Like this ugly mess for example, ugh: https://i.kym-cdn.com/photos/images/original/001/574/413/900...
I wanna see a skyrim mod of this now!
Also, I don't appreciate you belittling opinion. I like npcs that can look at you. If they have mouhs and eyes that can move then I want them to.
I'm also not a fan of the interactive movie genre. It's a weird niche, and I don't really know who it serves.
Two decades ago, I would have generalized it to "console gamers", but I'm not so sure after the God of War / Elden Ring GOTY vote.
We’re getting there. Pair ChatGPT with this kind of technology, and you have a very impressive AI assistant.
I'd be interested to know, for the people who find this convincingly human-like, for those who are impressed, if you're on the autism/Aspberger's spectrum.
I understand people on the spectrum can have difficulty reading facial expressions, so I'm curious if this is close enough for them that it appears real, or sufficiently human.
As someone not on the spectrum, I can tell you that it's immediately obviously digital, fake, and gives me an uneasy feeling. It's that "uncanny valley" thing. So, I just wonder if the presenters, creators are on the spectrum, as they felt confident enough to showcase it and pass it off as sufficiently novel or convincing.
Well, I'm not diagnosed as any of that and most people would describe me as popular and friendly in real life so it seems unlikely I'm undiagnosed but ill.
And I thought it was bloody amazing. In such a short time, highly precise motion capture. This is tremendous technology. Looks fantastic!
> pass it off as sufficiently novel or convincing.
IMHO, that's not the central point. It's notable not because it's super convincing but because it's convincing enough to be creatively useful while being incredibly fast, reasonably inexpensive and relatively easy.
AAA game studios have already been able to reach similar (and even better) results for years. It's just super expensive and labor intensive. This demo is exciting because of how it can empower small game studios and even ambitious indies.