Hacker News Comments on
Face2Face: Real-time Face Capture and Reenactment of RGB Videos (CVPR 2016 Oral)
Matthias Niessner
·
Youtube
·
172
HN points
·
27
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.Founder here. AMA :)To answer a few recurring questions in the thread
---> Use case.
Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.
Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.
Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.
In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.
My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.
Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.
---> Uncanny valley
Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.
From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).
---> Tech
Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.
Learning: Going from research to working, scaleable product is hard and takes time. But very rewarding when it works.
[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44
---> Bad uses
Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.
Recommended read - deepfakes perfectly follow the story arc of any new, powerful technology: https://journals.sagepub.com/doi/full/10.1177/17456916209193...
---> Actors
Real actors getting rev share + upfront free from every video generated with their likeness. Like being a stock photo actor.
⬐ devinplattThe Snoop Dogg advertisement rebranding case study was pretty impressive to me, since there were obvious savings from reuse. Neat to see how this technology could be integrated in a subtle way with other editing techniques.It seems to me that this technology could have immediate application to dubbing over curse words in movies (since that's already done in a not so subtle way today).
The next step I see in that progression is full dubbing for translation, which already exists in a very conspicuous form. The old meme about out of sync karate movie dubs comes in mind.
How close do you think this technology is to use for syncing lips in Hollywood tier movie dubs using real voice actors? What are the main obstacles left to achieving that?
Founder hereMaybe – one of my co-founders is Prof Matthias Niessner who's been behind a large chunk of the seminal and widespread research in this space.
[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44
https://www.youtube.com/watch?v=ohmajJTcpNkFace2Face would defeat this: anyone with enough images of the victim (and copy of ID) could impersonate them in this scheme.
Photo, audio, and video evidence should already be dismissed until one is able to verify the integrity and source. All of these can already be believably faked - it's just a matter of educating people that a layperson can easily create fake things by using tools developed by research teams.Fake text is the easiest to fake if you can identify the font used - any image editor will work. HN uses 9pt Verdana, even without using dev tools I could fake your post to say anything I wanted it to say since it would just be 9pt Verdana on a solid background set to text wrap every 1050px.
See: https://www.youtube.com/watch?v=ohmajJTcpNk & https://www.youtube.com/watch?v=AmUC4m6w1wo
⬐ marcus_holmesNot even that much effort, just open the browser's dev tools and change the text in the post to say whatever you like.⬐ NadyaI'm aware - but specifically excluded dev tools as faking text in scenarios where dev tools may not exist (eg: chat programs that aren't taking place within the browser) is still trivial.
Here's another scary GAN proof of concept [0]. In this case, researchers transferred someone's face in real time to facial expression and mouth movements of public figures. Combined with DeepMind's new tech that seems to be able to produce human voice with believable candor and inflection [1], you could make some very convincing fake footage.[0]https://www.youtube.com/watch?v=ohmajJTcpNk
[1]https://research.googleblog.com/2018/03/expressive-speech-sy...
We're already at the stage where one can't trust the video evidence, unless it's backed by the camera's signature/encryption.Creating a very realistic fake is now trivial:
https://www.youtube.com/watch?v=ohmajJTcpNk
https://www.youtube.com/watch?v=nsuAQcvafCs
Google is very close to synthesizing realistic voice.
It's game over, as far as I can see.
It's a matter of time someone creates a fake video of someone famous saying something very outrageous, like nazi propaganda, and it will result in the destruction of that person's career and life.
We really need something like Secure Enclave in every camera.
EDIT:
Another related video:
⬐ Retr0spectrumI'm not sure in-camera signatures are the way to go.In the very simplest case, someone could just point a camera at a very high quality screen and record that, generating a signed video.
A more complex attack would be to effectively emulate the image sensor and pipe image data straight into the camera.
If you want to prove that a video was filmed on or before a certain time, one way would be to hash it and put that hash on a blockchain, but that doesn't really solve the problem of authenticity.
⬐ drdecaWhat if the camera also included gps and time stamp information? That might make it harder to fake, because you would have to be roughly in the location that you were claiming that the footage occurred.Are the signals from the gps satellites cryptographically signed?
Of course, the cameras would need to have very good physical security so that a person can't either extract the private key from it, or do things like replacing the camera part with something that just feeds in the data you want, and still getting it signed with the key.
I think the design that went into the ORWL pc might be good for this (which would quickly delete the private key if it detected tampering).
In order for one of these to be trusted though, there would have to be a trusted source demonstrating that it was constructed and configured correctly (rather than in a way that would allow faking). Maybe by having the construction and setup be recorded with other cameras of the same type which are already trusted, in a web of trust sort of thing? If one had enough of these cameras I don't think that the bootstrapping of the chain of trust would be too difficult.
⬐ slig⬐ devdoomariIt's pretty easy to fake GPS signal with the proper equipment. I remember seeing a video of a guy faking the GPS signal in order to cheat Pokemon Go.⬐ drdecaAw dang, ok.But if the gps satellites cryptographically signed their messages, would that help this much? And would it be all that much of a cost for future gps satellites to sign their messages?
maybe 3d-cameras can help? anyway, this forgery stuff is getting scary...⬐ olegkikinTry it. Record something off the screen (including the audio), and see what happens. I'm yet to see one example of that that looks like reality.Emulating a sensor sounds like it immediately reduces the number of perpetrators by orders of magnitude.
I agree though, what I'm suggesting is not 100% bulletproof, but it's using a proven technology, and it's relatively simple, assuming hardware manufacturers are willing to add one small chip to their cameras.
⬐ asciimoI've seen a number of YouTube and liveleak videos that were obviously made by recording a security monitor with a phone. Seems to be an acceptable practice for certain genres, and is usually credible.⬐ olegkikin⬐ CM30That's not his point. He is claiming he can record a fake video that plays on a screen, and the resulting recording will look like a recording of reality, not of the screen (and the camera will sign it as such).⬐ gregmacWhich only makes things worse, because if you are trying to create fake security camera footage, you just need to make it real enough to play back on a "security monitor" and record that with a phone camera.If you're pretending it's footage of a film, TV show or video game, then recording from the screen will fool a lot of people. And while in a lot of cases that would make the 'metadata' aspect useless anyway, there's always the possibility of a hoaxer either saying:1. This was recorded from a TV broadcast
2. Or a CCTV camera
Etc.
> Seems from right out of the gate, they are breaking their own ethical guidelines as a cheap promotional tactic. If they care that little about themselves and a former president of the United States, what do they care about your likeness.We state in our blogpost that we make an exception for Obama/Trump in order to raise public awareness. Both of them are regularly used in Machine Learning benchmarks (for example [0] [1]). Note that we don't allow users to generate from Trump/Obama's voice.
Once again, we care a lot about these issues and that's why we only allow users to copy their own voice.
[0] http://www.washington.edu/news/2017/07/11/lip-syncing-obama-... [1] https://www.youtube.com/watch?v=ohmajJTcpNk
These issues are challenging and suggestions about how you think the technology should be introduced/regulated are very welcome.
⬐ slackstationIt's still hypocritical and insulting to the reader's intelligence.You could make Obama say anything. He could say something humourous, something that he's never said before. You would have just as impressive of a demo if you had Obama say "I'm a little teapot short and stout..." and then used overlay text to promote yourself. You chose instead to make a video where he promotes your startup.
That is both hypocritical and immoral and not only using his personal likeness but, also the seat of the Presidency of the United States.
This fast and loose way that Lyrebird treats their technology only makes me think that they don't really think about the massive negative potential of the technology and just want to get scale / profitability as fast as possible.
Face2Face: Real-time Face Capture and Reenactment of RGB Videos (CVPR 2016 Oral)https://www.youtube.com/watch?v=ohmajJTcpNk
And I agree, I find this much more verosimil.
This article title is a bit off. The audio is not generated, the video is not convincing, and the original conversation is not fake.This is an art project which highlights the implications of research like Face2Face: https://youtu.be/ohmajJTcpNk
⬐ gpvosSomeone has changed the title now, which is a good thing, because the original clickbaity title wasn't very descriptive at all, and the new one is actually much more interesting.⬐ olivermarksGlad you posted this, I was about to look for this same link to share. Between this and Adobe's voice emulation software https://thenextweb.com/apps/2016/11/04/adobes-upcoming-audio... it appears possible to create very realistic 'footage' of constructions based on real people doing and saying things. The film industry create computer generated characters of deceased actors (Star Wars for example). this makes the Economist piece very out of touch with realities, and presumably there are far more sophisticated technologies we don't know about too....
Check out Real-time Face Capture and Reenactment:
Combined with Face2Face[1] live video impersonation, it is truly time to be very careful verifying videos or even live streams.
⬐ knowaveragejoeTo my knowledge, both of these particular projects are still a ways away from being used in any practical sense, let alone succeed at deceiving anyone.You are right that we'll have to worry about this soon though. Likewise, verifying the identify of people we think we're talking to over video calls for example.
⬐ anigbrowl⬐ andy_pppI would use this (the audio tech from the OP) for some edge cases in film production right now. It would also be easy to combine this with Twilio and a chatbot to scam people over the phone.Woah, reminds me of Total Recall for some reason... looks like a special effect from the 80s when actual speaking occurs, but it's very close!⬐ NTripleOne⬐ fokinseanOkay, so on an ever so slightly related note, I've always wondered this ever since I saw that movie as a kid....Is it normal to feel bad for the Johnnycab "driver" when Arnie destroys it?
Woah that's kinda scary. What could we do to determine if a video is legitimate or not?⬐ volkk⬐ netcraftMainly, practice critical thinking. Don't take anything at face value until it has been reconfirmed from many sources. At least that's what I do.⬐ loaderThe problem isn't about the ones who already critically think.⬐ mikeleeorgThis is going to be increasingly key. And even then, it will be very difficult.Books like "Trust Me, I'm Lying" reveal the lengths at which deception can occur. Though this book discusses deception that starts at the textual level (e.g. blogs), it is inevitable that these tactics will be translated to the video level once the technology catches up.
Also, "at face value" - Ha! ;)
⬐ tdeckI love critical thinking as much as the next person, but I always find statements like this to be smug and self-congratulatory cliches. Of course you take things at face value, we all do. Every waking hour we're getting new information and having to make sense of it, while still living our lives. It's not practical for anyone to pretend that every interaction can be rigorously confirmed and independently verified, which means proliferation of convenient, effective mechanisms for lying and deception should be of real concern to all of us. No one is such a great critical thinker that they're immune, and it's particularly dangerous when our few reliable avenues of verifying identity and provenance are about to be cut off.just wait till _the daily show_ and _last week tonight_ get a hold of this!⬐ JustinAiken⬐ ericfrederich..then they'll finally be able to play audio of republicans contradicting themselves! :p⬐ calimacYes!!! And video of democrats like maxine waters et al. speaking in coherent logical sentences.⬐ sidarapeNo need for that, they're already doing fine.⬐ NoneNone⬐ NoneNone⬐ divanvisagieThat's the jokeAwesome... Not sure if the voice thing can be done in realtime yet, but you're right... the combination of these two would be awesome⬐ red023⬐ nihondeHoly shit this is crazy!⬐ red023Yeah vote me down coward. Because I used a "evil" forbidden word. I even used in a positive context to show how amazed I am about this facial manipulation (much much more then about the voice thing) Flag me ban me I do not fucking care. I can make another account.⬐ MzIf you are referring to the word "shit," it is not forbidden here and is not likely the reason you were downvoted. I have a terrible potty mouth. I try to keep it PG-13ish online, but if I am tired or something, the way I actually talk tends to come out. My tendency to use the F word like other people use "very" does not appear to be in any way problematic per se.I suggest you rethink your assessment of what is happening here.
⬐ NoneNone⬐ grzmYou were likely down-voted more because your first comment doesn't add anything substantive to the discussion rather than for the language you used. As the guidelines ask, please don't comment on being downvoted, as it makes for boring reading. And doing so in the manner you did is definitely uncalled for.Without a doubt, our concept of personal identity will be completely unreliable within a few generations. Forget about privacy--we will soon have literally no way to verify who we're talking to.⬐ forgotpwtomainPelevin's novel 'Generation П' is a very interesting read on this kind of theme.⬐ mkay581No different than now, right? Technically there is no way (practically, anyway) to identify someone you're talking to over the phone for instance.⬐ d33If you have privacy, faces or sounds might not matter as much as content does - if you have common secrets, you have a way to identify a person.⬐ proaralystCrypto would still work, and this tech isn't going to work face-to-face.⬐ anigbrowlNeither will insulate you from a deception which you wish to perpetuate upon yourself, and identifying the latter is a trick that con artists specialize in.
https://www.youtube.com/watch?v=ohmajJTcpNkhf
⬐ i336_Hi.This is your first and only post, you have no submissions or favorites, and your account is 193 days old.
I'm very curious (and perplexed) as to why you have linked a video from elsewhere in this thread with no supporting context regarding its relevance other than "hf".
Already done: https://www.youtube.com/watch?v=ohmajJTcpNk
⬐ froindtCan you imagine this for the next generation of cyber bullying? It could get super messy in high schools.Alice broke up with Bob. Bob grabs the YouTube videos from Alice and makes video and voice profiles. Bob then posts a video of Alice saying how breaking up was her biggest mistake and how she misses <list of every sexual thing you can think of> because Bob does it all best.
That could end badly really easily.
⬐ brangoWow that's amazing. Combine the two and soon Hollywood stars will be redundant. Faceless session actors could just manipulate models of real people who've signed release forms with the vocal performed by similarly faceless vocal artists, or maybe even AI generated voices. Actors would lose their uniqueness and so end up being paid a pittance instead of being able to command the vast sums they can today. Another set of jobs soon to be made redundant by the rise of technology.⬐ throwaway29292⬐ NoneI wouldn't be comfortable watching a movie scene if I knew I was looking at computer-generated faces and voices.⬐ TuringNYC⬐ MayeulCDid you feel that way when seeing Grand Moff Tarkin on "Rogue One"?https://www.wired.com/video/2017/02/how-rogue-one-recreated-...
⬐ ygjb⬐ techdragonYep, fell right into the uncanny valley.⬐ ClassyJacketYes? I missed literally all his dialogue because he was so poorly animated I couldn't take my mind off it. So out of place and jarring.Are you comfortable with Auto-Tune in music, not the t-pain / etc exaggerated style... the nearly universal application of Auto-Tune to recording and live performance to ensure a "consistent product", and "save on expensive studio time"?Because the market appears to have spoken on that one and it said "meh, I don't care" with an solid shrug of indifference.
By the same logic, one can see artificially produced vocal performance combined with artificial overlaying of photorealistic 3d reproduction as a way to cost effectively maximise the performer and crew expenses, and ensure the consistency of a performance. The results may even be better than what they could have done with the real performer in the case of some attractive actors who are not very good at the acting part of being an actor/modern celebrity.
That said, I'll definitely miss the days people had to actually be able to act, but then again I also miss the days people used to actually be able to play an instrument well and or sing well if they wanted to be a famous musician.
⬐ kastnerkyleJapan, as usual, is ahead of the game here! [0]Have you seen the movie S1m0ne (2002)? I felt that it only scratched the surface of the topic (and the tech wasn't exactly at today's level), but it's otherwise pretty good.None⬐ patrics123I think its good if a tech like this is publicly available. It will be used by comedians and satire outlets and over time raise awareness about possible fakes - pics/video or it did not happen... Well, no problem anymore ;-)
While the technology is amazing, I am a bit bothered by all these picture and video modifying algorithms.The issue is that we can't know what's real any more. It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.
Now, if you see a video of a prominent politician saying something awful in your twitter timeline (or whatever), they may have actually never said anything remotely close. It could be a completely fictional video that looks perfectly realistic[0], made by some teen in Macedonia.[1]
I realize photography and video have always been used to trick people into thinking things that aren't true, but this technology enables nuclear-grade deception.
I am wondering: is there a use-case for such an algorithm that is practical and good for the world?
PS: I know an eye-rolling algo is quite innocuous but I've had this thought on my mind about these in general and needed to air it out.
[0] https://www.youtube.com/watch?v=ohmajJTcpNk [1] https://www.wired.com/2017/02/veles-macedonia-fake-news/
⬐ noobiemcfoob> The issue is that we can't know what's real any more.Ultimately, you have to rely on word of mouth and trust your communities. Same story as ever, technology just sped it up.
⬐ cooper12> It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.This isn't really true. Image manipulation was long possible in the darkroom,[0] and you only need to look to hollywood to see how extraordinary deception can be done. The only thing that's changed these days is cost and ease of doing such manipulations.
[0]: https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...
⬐ bendykstra⬐ dave_sullivanAlso, e.g., the Moon landing conspiracy theories involving Hollywood and director Stanley Kubrick.> I am wondering: is there a use-case for such an algorithm that is practical and good for the world?I mean, the most general goal would to eventually be able to create arbitrary amounts of entertainment for near zero cost. A constant stream of novel music tailored not just to model your tastes, but to expand your tastes. Ditto for movies/television/dramatic entertainment. Or food.
More near term, lots of areas of entertainment production studio workflows--mesh generation in games, color grading in movies, ability to do way cooler stuff in post production. It opens up tons of new techniques that producers and artists can use.
No doubt it enables a new level of deception, but we just need to develop better filters for it. Video and audio evidence should probably not be admissible in court anymore (or the burden a proof should be higher) but that's probably a good thing net net.
⬐ splitrocketThe scenario you speak of was true for the history of the human race prior to the invention of photography.This is why we have institutions such as newspapers, academia, government etc. institutions that have an authority to inquire into and in turn establish some semblance of truth.
Maybe now some might understand why attacking the integrity of media, academia, and government is such a pernicious, short term and ultimately destructive tactic.
⬐ RazenganTechnology will eventually give us another way for verifying fact from fake. Just as photographs did for hearsay, and videos did for photographs, now we'll have to wait for some memory-reading tech. :)(Yes, that too will in turn be offset by memory-altering tech. After that we wait for time-travel tech.)
⬐ astrodust⬐ captainmuonMemories are perhaps the most easily manipulated of all those things.⬐ RazenganMaybe, but I wonder, if it's not our interpretation of memories that becomes flawed.Maybe the "raw" record of our experiences is stored on a separate layer. I mean, I can close my eyes and flip through a series of images from my last 12 or so waking hours. They seem more like snapshots than a description of what I saw/where I was. Some people may have actual photographic memory [1].
Who knows, at some point in the future people may able to opt-in for implants that record everything around them, at a greater accuracy than brains, but use the brain for storage.
⬐ darkblackcornerReminds me of a Black Mirror episode... http://www.imdb.com/title/tt2089050/?ref_=ttep_ep3⬐ astrodustMemories made are highly subjective as you can only remember things you've perceived, and our sense of perception is highly distorted.Some people have a better memory than others, and most people prioritize what they remember in significantly different ways. They may be good with names, bad with faces, or vice-versa. They might not remember particular dates, but will remember the weather.
Even people with a very good photographic memory might be blind to things. What song was playing there? What did they say to you? Did the place smell like anything in particular? What were you feeling at the time? What was the temperature like? Was there a draft? It's rare to find someone who's paying attention to everything, all the time, and taking notes mentally.
> It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.That is only true because we are in a wierd transition period. Before photography, there was no way to create an accurate snapshot of an event. Since then, it was easy to take a photo, but hard to fake one. For the first time in history we were able to make visual proof of events. Of course, since the beginning, people have been trying to fake photographs. That has been mostly detectable, but is getting better and better all the time.
(This also gives us a certain fetishism for unedited images in journalism. I understand where it comes from, but I think it is objectively wierd that certain manipulations are allowed, and others considered scandalous.)
I believe we are moving to a period, where we can no longer consider photos to proove facts. People will resist for quite some time (as they are resisting the fact that the internet allows us to copy information for free). There will be legal push-back, image editing will be scandalized... but eventually, the technology will be come so good and ubiquotious that nobody can rely on pictures anymore.
I also believe it is a good thing! We are becoming more and more nervous about our image on the internet, about private photos leaking out there, ruining our employability etc.. What happens if everybody can say: "OK Google, make a photo of Jason where he is drunk and riding a donkey" and you get a convincing fake? Eventually, we must adapt, and these worries will go away.
We will have to learn to treat pieces of information not due their origin, but only due to their content.
For a current example: You got a picture of President Trump with prosititues in Moscow? I don't care if that really happened or not - I'm not prude, and it doesn't have immediate relevance. Let him have fun. What I do care about: Does the story fit my image of him? Do I think he is capable of doing that? Does this have explaining power? Note it might as well be a painting or a blog post instead of a photo.
In my old days it seems I am going full-on postmodern...
⬐ pegasusUsing trusted computing and related technologies it should be entirely feasible to build tamper-proof cameras that produce provably-real photos, including trusted timestamp. It should be a small step for example for Apple to add such a feature to their future iPhones, given the layers of security they already have in their hardware.⬐ loa_in_⬐ ameliusyou are talking about secure infrastructure, but the image itself can be still faked. And no such infrastructure is safe, in the end, when given entirely to the end user (modify camera sensor, transfer signed image to an intact device).Yes. And the problem exists only when a select few can manipulate photos. If everybody and their mother can do it, then indeed pictures will lose their credibility, and there is no problem (but there may be other problems, such as where do we get real evidence from now).⬐ mirimir> Faked pictures are more convincing than real pictures because you can set them up to look real. Understand this: All pictures are faked. As soon as you have the concept of a picture there is no limit to falsification.The Place of Dead Roads by William S. Burroughs (1983)
> "Fake News" is about to get a lot more compelling hen you can make anyone say anything as long as you have some previous recordings of their voice.Adobe has already developed that technology:
https://arstechnica.co.uk/information-technology/2016/11/ado...
Now imagine combining it with this:
Face2Face: Real-time Face Capture and Reenactment of RGB Videos https://www.youtube.com/watch?v=ohmajJTcpNk
Perhaps using the intonation from the face-actor's voice to guide the speech synthesis.
⬐ stevenhI agree and I've upvoted you, but I feel it's worth pointing out that Adobe's claim about their own progress in this field was fake news.https://www.youtube.com/watch?v=I3l4XLZ59iw&t=2m34s
"Wife" sounds exactly the same in both places. All they did was copy the exact waveform from one point to another. Nothing is being synthesized.
https://www.youtube.com/watch?v=I3l4XLZ59iw&t=3m54s
The word "Jordan" is not being synthesized. The speaker was recorded saying "Jordan" beforehand for this insertion demo and they're trying to play it off as though it was synthesized on the fly. This is a scripted performance and Jordan is feigning surprise.
https://www.youtube.com/watch?v=I3l4XLZ59iw&t=4m40s
The phrase "three times" here was prerecorded.
This was a phony demonstration of a nonexistent product. Reporters parroted the claims and none questioned what they witnessed. Adobe falsely took credit and received endless free publicity for a breakthrough they had no hand in by staging this fake demo right on the heels of the genuine interest generated by Google WaveNet. I suppose they're hoping they'll have a real product ready by whatever deadline they've set for themselves.
To be clear, I like Adobe and I think it's a cunning move on their part.
⬐ mbrookesThanks for the detailed breakdown. The irony is not lost!
I still think that it is incredibly important that we make tools for validation of primary source material easy to use and friendly for non-technical people.Tech for faking video is getting more powerful day by day.
http://graphics.stanford.edu/~niessner/thies2015realtime.htm...
Maybe CG will be good enough to keep casting dead actors?The face2face[1] tech looks pretty convincing to me and it already works in real-time (which isn't necessary for films) and you have plenty of old footage of them.
⬐ macintuxNeither of my parents realized Peter Cushing was played via CG while watching Rogue One, so while it was terribly distracting to me, it's apparently pretty successful.⬐ tootie⬐ MatmaRexI personally didn't realize he was fake until after. Possible because Tarkin was so stoic his face barely moves. Leia looked a but unnatural but it didn't ruin it for me.⬐ paulddraper⬐ danielweberAgreed. Though I've long been a star wars fan, i.e. not a CGI snob ;)The weirdest thing about Leia was her smile...I mean that she smiled at all. Real Leia would've told the guy to shut up and do his job.
I noticed it during his first scene, but either my suspension of disbelief won out of they got better tech as the movie progressed, because it looked totally natural later.Of course, I was expecting to notice it. If I hadn't known about it, maybe it would have gone right by me.
⬐ thrillgoreIn the first appearance it felt more like the scene was abruptly adjusted in terms of lighting and exposure to make up for the rendering of Cushing's face on the actor. It definitely got harder to notice in subsequent scenes.But once you figure out something is out of place, you can't break that thought process.
⬐ bobwaycottI wasn't expecting it, as I tried to remain ignorant of details before seeing, but I spotted it quickly. I struggled with it momentarily, though, thinking to myself, "Wait. He's dead, right? This looks a bit off." My boys, 13 and 17, lean over simultaneously and say, "Totally fake." It still looked digital later, but I think we cared less because we'd settled it in our minds. Young Leia felt more obviously digital than Tarkin, though.There was a CGI young Arnold Schwarzenegger in Terminator Salvation, in 2009. Admittedly, that scene did not require many facial expressions: https://www.youtube.com/watch?v=L7YYfgx_cHo⬐ manacharIt wasn't really good enough in Rogue One, but of course will be getting better.Personally, I hope that never really becomes popular. Culture is already far too backwards looking and nostalgia filled for my taste. I'd hate for us to get stuck using dead actors providing something mimicking a performance.
Far more interesting will be the possibility of wholly CGI actors with artificial vocaloid voices. That'll be a ways off since a voice actor would be cheaper.
⬐ rkuykendall-com> It wasn't really good enough in Rogue OneI went to Rogue One with 8 people. 3 could tell me which prominent character was CG when asked. And those 3, it was because they knew the actor was diseased.
⬐ WalterSear>It wasn't really good enough in Rogue One, but of course it will still be getting more use.⬐ Freak_NL> That'll be a ways off since a voice actor would be cheaper.Fox will probably beg to differ.
⬐ rangibaby> I'd hate for us to get stuck using dead actors providing something mimicking a performance.It's funny to think about it now, but Peter Cushing and Alec Guinness were the two "big name" actors in Star Wars, and provided some of the best acting in the entire series. The look on Alec Guinness' face when he mentioned Luke's father being his good friend said more than the accumulated total of prequel movies and spinoffs could ever hope to.
As experienced professionals (with experience working in low budget action films in Peter Cushing's case) they added a lot of nuance to and took ownership of their roles that I doubt face-swapped actors will have the authority to for a long time, if ever. To be honest, it feels kind of weird to me that they are resurrecting an actor who has been dead for such a long time.
Having said that, I think the actual technology is good enough for de-aging when the actual actor can give a performance; realistic CGI replacements for actual people have been improving since 2009 when a cameo by THE Terminator was believable enough (somewhat because it conveniently got its face blown off immediately). Jeff Bridges in Tron Legacy was not bad: http://www.danplatt.com/?cat=91, and you won't notice the de-aged actors in the latest Marvel movies unless you are actively looking for them.
Yup and in combination with this... https://youtu.be/ohmajJTcpNk the world will be easier to fool.
It gets even better when combined with Face2Face, a live-editing software that can transfer facial expressions including lip movements from actors to e.g. politicians on TV: https://www.youtube.com/watch?v=ohmajJTcpNk
OK, throw this into the mix: many of you reading this have smartphones which are voice controlled, and for which voice control is activated at all times. In the case of Google, that processing must take place on Google's centralised servers. Siri may or may not do centralised processing (and can operate in standalone modes). Microsoft's Cortana, Facebook's "M" (IIRC) and Amazon's Aero are all various stylings of "Stasi in a Glade form factor", as Maciej Czeglowski so memorably put it.[1]Voice stores distressingly cheaply in terms of space, and with the Internet of (broken) Things (that spy on you), odds of finding yourself surrounded by microphones in the most unexpected locations,[2] controlled by a wide variety of quite probably competing interests.[3] And if they cannot find what they're looking for in the surveillance tape itself, they'll simply manufacture their own evidence using your own phonemes[4] and video.[5]
________________________________
Notes:
1. https://twitter.com/pinboard/status/732985370204233728
2. http://www.inquisitr.com/3097029/government-surveillance-in-...
3. http://www.locusmag.com/Perspectives/2016/09/cory-doctorowth...
4. http://www.theatlantic.com/technology/archive/2016/09/hackin...
⬐ AlexCoventryDoesn't recognition of the initialization phrase "OK, google" take place on the phone? Sending a continuous stream of audio back to google servers sounds expensive.> In the case of Google, that processing must take place on > Google's centralised servers
⬐ dredmorbiusAFAIU (which is little), "OK, Google" is processed locally. Whatever follows is processed remotely.I should have mentioned voice-activated televisions as a whole 'nother class of attack.
Sorry! look at this video and restate your statement about "video". https://www.youtube.com/watch?v=ohmajJTcpNk
⬐ byebyetechjaw dropped. Thanks for sharing.⬐ karma_vaccum123holy shit⬐ BatFastardWow!! That is amazing!⬐ AJRFThey used this technique in Mr.Robot to emulate Obama talking about the Ecorp hack⬐ kkhireWOWi figured knowing obama's interest in good tv shows, he might have done the cameo! this is really neat stuff
you mean this?
⬐ mattnewtonWow. Alright, can't trust anything anymore. That makes sense that they solved the problem more directly - how do I move someone else face in a video.
The consumer use cases are interesting but the propagandist use cases are terrifying. Along with this (Real-time Face Capture and Reenactment): https://www.youtube.com/watch?v=ohmajJTcpNk
⬐ olewhalehunterWhich country (or organization) do you think will be (or has already been) the first to implement a program of creating physical mimics of humans for military and intelligence purposes? With technologies like this and CRISPR, video, testimony, or genetic evidence could be thrown out the window in criminal or military cases.
I find it amusing that they're touting it for an application where I find it quite likely that similar NN algorithms will excel at generating speech in someone else's voice. (c.f., "neural style transfer" in images, but applied to speech.) We're already getting pretty decent at this for video -- see, for example, https://www.youtube.com/watch?v=ohmajJTcpNk (Face2Face: Real-time Face Capture and Reenactment)
⬐ dharma1was just trying to find papers for that yesterday - neural style for voice/audio. Conceptually it sounds it should be doable, but looking at the actual implementation, I'm not sure it's doable at all with a CNN for audio.⬐ zopfUnsure if it's CNN-based, but check out http://www.wowtune.net/⬐ dharma1very impressive, even if a bit out of tune on the Autumn Leaves demo!They have a very good team with actual audio industry experience. Look forward to seeing more demos.
It sounds like a phoneme based speech/singing synthesizer, similar to Yamaha Vocaloid. I wonder how much training data is required to extract the phonemes to create a "voice"
Yea, perhaps. Like in the recent "face2face: realtime face capture and re-enactment video"[1]. and your reference to "your avatar" reminds me of the scene from the matrix where Morpheus describes "residual self-image"[2]
⬐ bloafSee also Ghost in the Shell:
⬐ vmpThis is insanely awesome. Something that comes to my mind is the use of this for dubbing movies and TV series; I'm very sensitive about correctly syncing what's being said to what we see, to the point where I only watch movies in their native tongue - even if I don't know the language and need subtitles. This could be a game-changer.⬐ ghayes⬐ drawkboxIt's even cooler since you could even do this retroactively if you had the original footage of the dubbing voice actor.⬐ Clever321I'm curious, how do you both read subtitles and watch that the sound is properly synced to an actor's lips? I can't read that fast, so I spend 80% of my time "watching a movie" simply reading text on the bottom of the screen.⬐ peteretep⬐ JoshTriplettI can't speak for the op, but I can read a great deal faster than most people speak.> I can't read that fast
⬐ 0x4a42Subtitles aren't synched to actor's lips. He talked about dubbed voices, not subtitles. :)⬐ NoneNone> Something that comes to my mind is the use of this for dubbing movies and TV seriesOne of the references towards the end of the video mentions using this for translation, so that's definitely one of the intended applications.
Very well done. The best part is how they re-enact the mouth/teeth to look so real by capturing it by sampling earlier parts in the video to then use that on the non expression still or loop. I was blown away when Trump's teeth looked so real then they explained this process and why.This could be huge (yuuuge) in games and virtual spaces. At GDC Unreal 4 has a demo recently and seems we are approaching that era[1]
⬐ kristiandupontOne of the barriers existing in webcam meetings is the inability to make eye contact. It seems subtle but I think it is more important than one might intuitively think.I've thought a lot about how to overcome this and came up with nothing but cameras beneath the screen (which Apple seems to have worked on but we have yet to see it: http://appleinsider.com/articles/09/01/08/apple_files_patent...). This technology could possibly provide a competing solution.
⬐ mchahnDiscerning fake video from real just got a lot harder. Video is one of the last bastions of honest evidence.⬐ yeukhon⬐ NeonViceI heard light source has been a way to determine whether something is likely to be fake or not.⬐ benevolThe data vacuuming companies (FB, Google, ad networks, etc.) collect the information about us required to know how to manipulate us and technologies such as this one represent the tools to actually get it done.Conan could use this for his fake celebrity interviews instead of just cutting the mouth out of an image. :)⬐ SergeyHackImagine live edit of your video conversations, that attaches joyful emotions to any mention of an advertised brand.⬐ tibbonThe paper is great, but I wanna see some source code!⬐ Hydraulix989⬐ spriggan3I don't think they want the world to see some source code, should it get into the wrong hands, though the repercussions of this work are bound to hit us sometime. [1]⬐ albertzeyerWhere can I find the paper?The video made me feel very uncomfortable, and it takes a lot to make me feel that way.⬐ zaroFaking news got an order of magnitude easier :)⬐ sageinventorIt would be cool to use this to fix movie footage in post production. You could just copy a face over if the actor screwed up⬐ rawnlq⬐ listicOr bringing back dead actors using past footage!What is Target Actor and what is Reenactment Result? The former looks better to me.⬐ izym⬐ xchipTaget actor is the source material that they're changing, and the latter is the result of that.Does anyone have the link to the paper?⬐ mccappy⬐ oliyounghttp://www.graphics.stanford.edu/~niessner/thies2016face.htm...⬐ mccappyhttp://www.ieee.org/conferences_events/conferences/conferenc...Terrifying, amazing but terrifying.⬐ namelezzThis is impressive.⬐ diskcatThe title is really underwhelming compared to how cool the demo is.⬐ imaginenoreHilarious and scary at the same time. The admissibility of videos in courts is becoming more and more questionable.⬐ deeloweWow.⬐ mortenjorckThis demo is doubly amazing: First, the obviously impressive (and slightly unsettling in its implications) manipulation of the target face, but second, the fact that this is all being done with a single RGB camera.Consider the massive rig required to perform the at-the-time groundbreaking performance capture for the game L.A. Noire: http://i.kinja-img.com/gawker-media/image/upload/s--y6fmsAIU... This is how far computer vision has come in five years.
⬐ rawnlqIn terms of gaming applications this could be huge for virtual reality avatars. You can still be anonymous but still convey facial expressions with a webcam!⬐ scootI'm curious how you think a webcam will be able to read your facial expression when you have a VR headset on you face.⬐ toisanjismaller vr rigs should just cover the eyes. or maybe there is a small camera hanging under the vr rig to capture face elements.⬐ greeneggsThis research uses a camera to capture the mouth area and strain sensors for the upper face (that part can obviously be improved).http://www.hao-li.com/Hao_Li/Hao_Li_-_publications_%5BFacial...
⬐ billconanthis is amazing! can't wait to see the paper.how do they create a 3D model out of video.