HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Face2Face: Real-time Face Capture and Reenactment of RGB Videos (CVPR 2016 Oral)

Matthias Niessner · Youtube · 172 HN points · 27 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Matthias Niessner's video "Face2Face: Real-time Face Capture and Reenactment of RGB Videos (CVPR 2016 Oral)".
Youtube Summary
CVPR 2016 Paper Video (Oral)
Project Page: http://niessnerlab.org/projects/thies2016face.html

IMPORTANT NOTE:
This demo video is purely research-focused and we would like to clarify the goals and intent of our work. Our aim is to demonstrate the capabilities of modern computer vision and graphics technology, and convey it in an approachable and fun way. We want to emphasize that computer-generated videos have been part in feature-film movies for over 30 years. Virtually every high-end movie production contains a significant percentage of synthetically-generated content (from Lord of the Rings to Benjamin Button). These results are hard to distinguish from reality and it often goes unnoticed that the content is not real. The novelty and contribution of our work is that we can edit pre-recorded videos in real-time on a commodity PC. Please also note that our efforts include the detection of edits in video footage in order to verify a clip’s authenticity. For additional information, we refer to our project website (see above). Hopefully, you enjoyed watching our video, and we hope to provide a positive takeaway :)

Paper Abstract
We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Founder here. AMA :)

To answer a few recurring questions in the thread

---> Use case.

Video is a way more effective way to communicate than text. Not for the HN crowd, but if you're a blue collar worker a 2 minute video in your native language is much preferred to a 5 page pdf for training.

Anyone who has tried to record a simple corporate video know the pain of cameras, film crews, 25 takes to get one that works and post production. Cumbersome, slow and multidisciplinary. By the time the video is done the content is out of date.

Synthetic video is not yet at the quality of real video. Eventually it will be. But the mistake many are making here is comparing it to real video; it should be compared with text.

In X years we'll be able to make Hollywood films on a laptop without needing anything but time and imagination. Just like we can digitally compose music in Ableton, create images in Photoshop and type novels on keyboards rather than with pen and paper.

My (obviously biased;)) belief is that synthetic media will eventually become foundational technology that will move media production from cameras/microphones to API's. We'll be able to do all kind of things we couldn't do before.

Eg. personalized and interactive rich media, video-driven chatbots and eventually Hollywood blockbusters made by your favourite YouTuber from his or her bedroom.

---> Uncanny valley

Simulating real video is incredibly hard. We're constantly improving and launching more expressive synthesis soon.

From our tests with some of our largest clients 8/10 people don't realise it's a synthetic video (unless they are asked to look for it).

---> Tech

Has been developed over the last 3 yrs. Origins/team from Stanford/UCL/TUM.

Learning: Going from research to working, scaleable product is hard and takes time. But very rewarding when it works.

[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44

---> Bad uses

Bad actors will do bad things with synthetic media. Like with any other technology from smartphones to cars. We're moderating all content and building safeguards and verification + working with FAANG and others on detection and provenance technology.

Recommended read - deepfakes perfectly follow the story arc of any new, powerful technology: https://journals.sagepub.com/doi/full/10.1177/17456916209193...

---> Actors

Real actors getting rev share + upfront free from every video generated with their likeness. Like being a stock photo actor.

devinplatt
The Snoop Dogg advertisement rebranding case study was pretty impressive to me, since there were obvious savings from reuse. Neat to see how this technology could be integrated in a subtle way with other editing techniques.

It seems to me that this technology could have immediate application to dubbing over curse words in movies (since that's already done in a not so subtle way today).

The next step I see in that progression is full dubbing for translation, which already exists in a very conspicuous form. The old meme about out of sync karate movie dubs comes in mind.

How close do you think this technology is to use for syncing lips in Hollywood tier movie dubs using real voice actors? What are the main obstacles left to achieving that?

Founder here

Maybe – one of my co-founders is Prof Matthias Niessner who's been behind a large chunk of the seminal and widespread research in this space.

[1] https://www.youtube.com/watch?v=ohmajJTcpNk [2] https://www.youtube.com/watch?v=qc5P2bvfl44

https://www.youtube.com/watch?v=ohmajJTcpNk

Face2Face would defeat this: anyone with enough images of the victim (and copy of ID) could impersonate them in this scheme.

Jun 20, 2018 · Nadya on Keybase Exploding Messages
Photo, audio, and video evidence should already be dismissed until one is able to verify the integrity and source. All of these can already be believably faked - it's just a matter of educating people that a layperson can easily create fake things by using tools developed by research teams.

Fake text is the easiest to fake if you can identify the font used - any image editor will work. HN uses 9pt Verdana, even without using dev tools I could fake your post to say anything I wanted it to say since it would just be 9pt Verdana on a solid background set to text wrap every 1050px.

See: https://www.youtube.com/watch?v=ohmajJTcpNk & https://www.youtube.com/watch?v=AmUC4m6w1wo

marcus_holmes
Not even that much effort, just open the browser's dev tools and change the text in the post to say whatever you like.
Nadya
I'm aware - but specifically excluded dev tools as faking text in scenarios where dev tools may not exist (eg: chat programs that aren't taking place within the browser) is still trivial.
May 18, 2018 · 2 points, 0 comments · submitted by swyx
Here's another scary GAN proof of concept [0]. In this case, researchers transferred someone's face in real time to facial expression and mouth movements of public figures. Combined with DeepMind's new tech that seems to be able to produce human voice with believable candor and inflection [1], you could make some very convincing fake footage.

[0]https://www.youtube.com/watch?v=ohmajJTcpNk

[1]https://research.googleblog.com/2018/03/expressive-speech-sy...

Oct 07, 2017 · 1 points, 0 comments · submitted by handpickednames
We're already at the stage where one can't trust the video evidence, unless it's backed by the camera's signature/encryption.

Creating a very realistic fake is now trivial:

https://www.youtube.com/watch?v=ohmajJTcpNk

https://www.youtube.com/watch?v=nsuAQcvafCs

Google is very close to synthesizing realistic voice.

It's game over, as far as I can see.

It's a matter of time someone creates a fake video of someone famous saying something very outrageous, like nazi propaganda, and it will result in the destruction of that person's career and life.

We really need something like Secure Enclave in every camera.

EDIT:

Another related video:

https://www.youtube.com/watch?v=hPksv1gJet4

Retr0spectrum
I'm not sure in-camera signatures are the way to go.

In the very simplest case, someone could just point a camera at a very high quality screen and record that, generating a signed video.

A more complex attack would be to effectively emulate the image sensor and pipe image data straight into the camera.

If you want to prove that a video was filmed on or before a certain time, one way would be to hash it and put that hash on a blockchain, but that doesn't really solve the problem of authenticity.

drdeca
What if the camera also included gps and time stamp information? That might make it harder to fake, because you would have to be roughly in the location that you were claiming that the footage occurred.

Are the signals from the gps satellites cryptographically signed?

Of course, the cameras would need to have very good physical security so that a person can't either extract the private key from it, or do things like replacing the camera part with something that just feeds in the data you want, and still getting it signed with the key.

I think the design that went into the ORWL pc might be good for this (which would quickly delete the private key if it detected tampering).

In order for one of these to be trusted though, there would have to be a trusted source demonstrating that it was constructed and configured correctly (rather than in a way that would allow faking). Maybe by having the construction and setup be recorded with other cameras of the same type which are already trusted, in a web of trust sort of thing? If one had enough of these cameras I don't think that the bootstrapping of the chain of trust would be too difficult.

slig
It's pretty easy to fake GPS signal with the proper equipment. I remember seeing a video of a guy faking the GPS signal in order to cheat Pokemon Go.
drdeca
Aw dang, ok.

But if the gps satellites cryptographically signed their messages, would that help this much? And would it be all that much of a cost for future gps satellites to sign their messages?

devdoomari
maybe 3d-cameras can help? anyway, this forgery stuff is getting scary...
olegkikin
Try it. Record something off the screen (including the audio), and see what happens. I'm yet to see one example of that that looks like reality.

Emulating a sensor sounds like it immediately reduces the number of perpetrators by orders of magnitude.

I agree though, what I'm suggesting is not 100% bulletproof, but it's using a proven technology, and it's relatively simple, assuming hardware manufacturers are willing to add one small chip to their cameras.

asciimo
I've seen a number of YouTube and liveleak videos that were obviously made by recording a security monitor with a phone. Seems to be an acceptable practice for certain genres, and is usually credible.
olegkikin
That's not his point. He is claiming he can record a fake video that plays on a screen, and the resulting recording will look like a recording of reality, not of the screen (and the camera will sign it as such).
gregmac
Which only makes things worse, because if you are trying to create fake security camera footage, you just need to make it real enough to play back on a "security monitor" and record that with a phone camera.
CM30
If you're pretending it's footage of a film, TV show or video game, then recording from the screen will fool a lot of people. And while in a lot of cases that would make the 'metadata' aspect useless anyway, there's always the possibility of a hoaxer either saying:

1. This was recorded from a TV broadcast

2. Or a CCTV camera

Etc.

> Seems from right out of the gate, they are breaking their own ethical guidelines as a cheap promotional tactic. If they care that little about themselves and a former president of the United States, what do they care about your likeness.

We state in our blogpost that we make an exception for Obama/Trump in order to raise public awareness. Both of them are regularly used in Machine Learning benchmarks (for example [0] [1]). Note that we don't allow users to generate from Trump/Obama's voice.

Once again, we care a lot about these issues and that's why we only allow users to copy their own voice.

[0] http://www.washington.edu/news/2017/07/11/lip-syncing-obama-... [1] https://www.youtube.com/watch?v=ohmajJTcpNk

These issues are challenging and suggestions about how you think the technology should be introduced/regulated are very welcome.

slackstation
It's still hypocritical and insulting to the reader's intelligence.

You could make Obama say anything. He could say something humourous, something that he's never said before. You would have just as impressive of a demo if you had Obama say "I'm a little teapot short and stout..." and then used overlay text to promote yourself. You chose instead to make a video where he promotes your startup.

That is both hypocritical and immoral and not only using his personal likeness but, also the seat of the Presidency of the United States.

This fast and loose way that Lyrebird treats their technology only makes me think that they don't really think about the massive negative potential of the technology and just want to get scale / profitability as fast as possible.

Face2Face: Real-time Face Capture and Reenactment of RGB Videos (CVPR 2016 Oral)

https://www.youtube.com/watch?v=ohmajJTcpNk

And I agree, I find this much more verosimil.

This article title is a bit off. The audio is not generated, the video is not convincing, and the original conversation is not fake.

This is an art project which highlights the implications of research like Face2Face: https://youtu.be/ohmajJTcpNk

gpvos
Someone has changed the title now, which is a good thing, because the original clickbaity title wasn't very descriptive at all, and the new one is actually much more interesting.
olivermarks
Glad you posted this, I was about to look for this same link to share. Between this and Adobe's voice emulation software https://thenextweb.com/apps/2016/11/04/adobes-upcoming-audio... it appears possible to create very realistic 'footage' of constructions based on real people doing and saying things. The film industry create computer generated characters of deceased actors (Star Wars for example). this makes the Economist piece very out of touch with realities, and presumably there are far more sophisticated technologies we don't know about too....
Check out Real-time Face Capture and Reenactment:

https://youtu.be/ohmajJTcpNk

Combined with Face2Face[1] live video impersonation, it is truly time to be very careful verifying videos or even live streams.

https://www.youtube.com/watch?v=ohmajJTcpNk

knowaveragejoe
To my knowledge, both of these particular projects are still a ways away from being used in any practical sense, let alone succeed at deceiving anyone.

You are right that we'll have to worry about this soon though. Likewise, verifying the identify of people we think we're talking to over video calls for example.

anigbrowl
I would use this (the audio tech from the OP) for some edge cases in film production right now. It would also be easy to combine this with Twilio and a chatbot to scam people over the phone.
andy_ppp
Woah, reminds me of Total Recall for some reason... looks like a special effect from the 80s when actual speaking occurs, but it's very close!
NTripleOne
Okay, so on an ever so slightly related note, I've always wondered this ever since I saw that movie as a kid.

...Is it normal to feel bad for the Johnnycab "driver" when Arnie destroys it?

fokinsean
Woah that's kinda scary. What could we do to determine if a video is legitimate or not?
volkk
Mainly, practice critical thinking. Don't take anything at face value until it has been reconfirmed from many sources. At least that's what I do.
loader
The problem isn't about the ones who already critically think.
mikeleeorg
This is going to be increasingly key. And even then, it will be very difficult.

Books like "Trust Me, I'm Lying" reveal the lengths at which deception can occur. Though this book discusses deception that starts at the textual level (e.g. blogs), it is inevitable that these tactics will be translated to the video level once the technology catches up.

Also, "at face value" - Ha! ;)

tdeck
I love critical thinking as much as the next person, but I always find statements like this to be smug and self-congratulatory cliches. Of course you take things at face value, we all do. Every waking hour we're getting new information and having to make sense of it, while still living our lives. It's not practical for anyone to pretend that every interaction can be rigorously confirmed and independently verified, which means proliferation of convenient, effective mechanisms for lying and deception should be of real concern to all of us. No one is such a great critical thinker that they're immune, and it's particularly dangerous when our few reliable avenues of verifying identity and provenance are about to be cut off.
netcraft
just wait till _the daily show_ and _last week tonight_ get a hold of this!
JustinAiken
..then they'll finally be able to play audio of republicans contradicting themselves! :p
calimac
Yes!!! And video of democrats like maxine waters et al. speaking in coherent logical sentences.
sidarape
No need for that, they're already doing fine.
None
None
None
None
divanvisagie
That's the joke
ericfrederich
Awesome... Not sure if the voice thing can be done in realtime yet, but you're right... the combination of these two would be awesome
red023
Holy shit this is crazy!
red023
Yeah vote me down coward. Because I used a "evil" forbidden word. I even used in a positive context to show how amazed I am about this facial manipulation (much much more then about the voice thing) Flag me ban me I do not fucking care. I can make another account.
Mz
If you are referring to the word "shit," it is not forbidden here and is not likely the reason you were downvoted. I have a terrible potty mouth. I try to keep it PG-13ish online, but if I am tired or something, the way I actually talk tends to come out. My tendency to use the F word like other people use "very" does not appear to be in any way problematic per se.

I suggest you rethink your assessment of what is happening here.

None
None
grzm
You were likely down-voted more because your first comment doesn't add anything substantive to the discussion rather than for the language you used. As the guidelines ask, please don't comment on being downvoted, as it makes for boring reading. And doing so in the manner you did is definitely uncalled for.

https://news.ycombinator.com/newsguidelines.html

nihonde
Without a doubt, our concept of personal identity will be completely unreliable within a few generations. Forget about privacy--we will soon have literally no way to verify who we're talking to.
forgotpwtomain
Pelevin's novel 'Generation П' is a very interesting read on this kind of theme.

[0] https://en.wikipedia.org/wiki/Generation_%22%D0%9F%22

mkay581
No different than now, right? Technically there is no way (practically, anyway) to identify someone you're talking to over the phone for instance.
d33
If you have privacy, faces or sounds might not matter as much as content does - if you have common secrets, you have a way to identify a person.
proaralyst
Crypto would still work, and this tech isn't going to work face-to-face.
anigbrowl
Neither will insulate you from a deception which you wish to perpetuate upon yourself, and identifying the latter is a trick that con artists specialize in.
https://www.youtube.com/watch?v=ohmajJTcpNk

hf

i336_
Hi.

This is your first and only post, you have no submissions or favorites, and your account is 193 days old.

I'm very curious (and perplexed) as to why you have linked a video from elsewhere in this thread with no supporting context regarding its relevance other than "hf".

Already done: https://www.youtube.com/watch?v=ohmajJTcpNk
froindt
Can you imagine this for the next generation of cyber bullying? It could get super messy in high schools.

Alice broke up with Bob. Bob grabs the YouTube videos from Alice and makes video and voice profiles. Bob then posts a video of Alice saying how breaking up was her biggest mistake and how she misses <list of every sexual thing you can think of> because Bob does it all best.

That could end badly really easily.

brango
Wow that's amazing. Combine the two and soon Hollywood stars will be redundant. Faceless session actors could just manipulate models of real people who've signed release forms with the vocal performed by similarly faceless vocal artists, or maybe even AI generated voices. Actors would lose their uniqueness and so end up being paid a pittance instead of being able to command the vast sums they can today. Another set of jobs soon to be made redundant by the rise of technology.
throwaway29292
I wouldn't be comfortable watching a movie scene if I knew I was looking at computer-generated faces and voices.
TuringNYC
Did you feel that way when seeing Grand Moff Tarkin on "Rogue One"?

https://www.wired.com/video/2017/02/how-rogue-one-recreated-...

ygjb
Yep, fell right into the uncanny valley.
ClassyJacket
Yes? I missed literally all his dialogue because he was so poorly animated I couldn't take my mind off it. So out of place and jarring.
techdragon
Are you comfortable with Auto-Tune in music, not the t-pain / etc exaggerated style... the nearly universal application of Auto-Tune to recording and live performance to ensure a "consistent product", and "save on expensive studio time"?

Because the market appears to have spoken on that one and it said "meh, I don't care" with an solid shrug of indifference.

By the same logic, one can see artificially produced vocal performance combined with artificial overlaying of photorealistic 3d reproduction as a way to cost effectively maximise the performer and crew expenses, and ensure the consistency of a performance. The results may even be better than what they could have done with the real performer in the case of some attractive actors who are not very good at the acting part of being an actor/modern celebrity.

That said, I'll definitely miss the days people had to actually be able to act, but then again I also miss the days people used to actually be able to play an instrument well and or sing well if they wanted to be a famous musician.

kastnerkyle
Japan, as usual, is ahead of the game here! [0]

[0] https://www.youtube.com/watch?v=pEaBqiLeCu0

MayeulC
Have you seen the movie S1m0ne (2002)? I felt that it only scratched the surface of the topic (and the tech wasn't exactly at today's level), but it's otherwise pretty good.
None
None
patrics123
I think its good if a tech like this is publicly available. It will be used by comedians and satire outlets and over time raise awareness about possible fakes - pics/video or it did not happen... Well, no problem anymore ;-)
While the technology is amazing, I am a bit bothered by all these picture and video modifying algorithms.

The issue is that we can't know what's real any more. It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.

Now, if you see a video of a prominent politician saying something awful in your twitter timeline (or whatever), they may have actually never said anything remotely close. It could be a completely fictional video that looks perfectly realistic[0], made by some teen in Macedonia.[1]

I realize photography and video have always been used to trick people into thinking things that aren't true, but this technology enables nuclear-grade deception.

I am wondering: is there a use-case for such an algorithm that is practical and good for the world?

PS: I know an eye-rolling algo is quite innocuous but I've had this thought on my mind about these in general and needed to air it out.

[0] https://www.youtube.com/watch?v=ohmajJTcpNk [1] https://www.wired.com/2017/02/veles-macedonia-fake-news/

noobiemcfoob
> The issue is that we can't know what's real any more.

Ultimately, you have to rely on word of mouth and trust your communities. Same story as ever, technology just sped it up.

cooper12
> It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.

This isn't really true. Image manipulation was long possible in the darkroom,[0] and you only need to look to hollywood to see how extraordinary deception can be done. The only thing that's changed these days is cost and ease of doing such manipulations.

[0]: https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...

bendykstra
Also, e.g., the Moon landing conspiracy theories involving Hollywood and director Stanley Kubrick.
dave_sullivan
> I am wondering: is there a use-case for such an algorithm that is practical and good for the world?

I mean, the most general goal would to eventually be able to create arbitrary amounts of entertainment for near zero cost. A constant stream of novel music tailored not just to model your tastes, but to expand your tastes. Ditto for movies/television/dramatic entertainment. Or food.

More near term, lots of areas of entertainment production studio workflows--mesh generation in games, color grading in movies, ability to do way cooler stuff in post production. It opens up tons of new techniques that producers and artists can use.

No doubt it enables a new level of deception, but we just need to develop better filters for it. Video and audio evidence should probably not be admissible in court anymore (or the burden a proof should be higher) but that's probably a good thing net net.

splitrocket
The scenario you speak of was true for the history of the human race prior to the invention of photography.

This is why we have institutions such as newspapers, academia, government etc. institutions that have an authority to inquire into and in turn establish some semblance of truth.

Maybe now some might understand why attacking the integrity of media, academia, and government is such a pernicious, short term and ultimately destructive tactic.

Razengan
Technology will eventually give us another way for verifying fact from fake. Just as photographs did for hearsay, and videos did for photographs, now we'll have to wait for some memory-reading tech. :)

(Yes, that too will in turn be offset by memory-altering tech. After that we wait for time-travel tech.)

astrodust
Memories are perhaps the most easily manipulated of all those things.
Razengan
Maybe, but I wonder, if it's not our interpretation of memories that becomes flawed.

Maybe the "raw" record of our experiences is stored on a separate layer. I mean, I can close my eyes and flip through a series of images from my last 12 or so waking hours. They seem more like snapshots than a description of what I saw/where I was. Some people may have actual photographic memory [1].

Who knows, at some point in the future people may able to opt-in for implants that record everything around them, at a greater accuracy than brains, but use the brain for storage.

[1] https://en.wikipedia.org/wiki/Eidetic_memory

darkblackcorner
Reminds me of a Black Mirror episode... http://www.imdb.com/title/tt2089050/?ref_=ttep_ep3
astrodust
Memories made are highly subjective as you can only remember things you've perceived, and our sense of perception is highly distorted.

Some people have a better memory than others, and most people prioritize what they remember in significantly different ways. They may be good with names, bad with faces, or vice-versa. They might not remember particular dates, but will remember the weather.

Even people with a very good photographic memory might be blind to things. What song was playing there? What did they say to you? Did the place smell like anything in particular? What were you feeling at the time? What was the temperature like? Was there a draft? It's rare to find someone who's paying attention to everything, all the time, and taking notes mentally.

captainmuon
> It used to be if you saw a video or a photo depicting an event you could be pretty sure that what you're looking at actually happened.

That is only true because we are in a wierd transition period. Before photography, there was no way to create an accurate snapshot of an event. Since then, it was easy to take a photo, but hard to fake one. For the first time in history we were able to make visual proof of events. Of course, since the beginning, people have been trying to fake photographs. That has been mostly detectable, but is getting better and better all the time.

(This also gives us a certain fetishism for unedited images in journalism. I understand where it comes from, but I think it is objectively wierd that certain manipulations are allowed, and others considered scandalous.)

I believe we are moving to a period, where we can no longer consider photos to proove facts. People will resist for quite some time (as they are resisting the fact that the internet allows us to copy information for free). There will be legal push-back, image editing will be scandalized... but eventually, the technology will be come so good and ubiquotious that nobody can rely on pictures anymore.

I also believe it is a good thing! We are becoming more and more nervous about our image on the internet, about private photos leaking out there, ruining our employability etc.. What happens if everybody can say: "OK Google, make a photo of Jason where he is drunk and riding a donkey" and you get a convincing fake? Eventually, we must adapt, and these worries will go away.

We will have to learn to treat pieces of information not due their origin, but only due to their content.

For a current example: You got a picture of President Trump with prosititues in Moscow? I don't care if that really happened or not - I'm not prude, and it doesn't have immediate relevance. Let him have fun. What I do care about: Does the story fit my image of him? Do I think he is capable of doing that? Does this have explaining power? Note it might as well be a painting or a blog post instead of a photo.

In my old days it seems I am going full-on postmodern...

pegasus
Using trusted computing and related technologies it should be entirely feasible to build tamper-proof cameras that produce provably-real photos, including trusted timestamp. It should be a small step for example for Apple to add such a feature to their future iPhones, given the layers of security they already have in their hardware.
loa_in_
you are talking about secure infrastructure, but the image itself can be still faked. And no such infrastructure is safe, in the end, when given entirely to the end user (modify camera sensor, transfer signed image to an intact device).
amelius
Yes. And the problem exists only when a select few can manipulate photos. If everybody and their mother can do it, then indeed pictures will lose their credibility, and there is no problem (but there may be other problems, such as where do we get real evidence from now).
mirimir
> Faked pictures are more convincing than real pictures because you can set them up to look real. Understand this: All pictures are faked. As soon as you have the concept of a picture there is no limit to falsification.

The Place of Dead Roads by William S. Burroughs (1983)

https://books.google.de/books?id=VZLqAQAAQBAJ

> "Fake News" is about to get a lot more compelling hen you can make anyone say anything as long as you have some previous recordings of their voice.

Adobe has already developed that technology:

https://arstechnica.co.uk/information-technology/2016/11/ado...

Now imagine combining it with this:

Face2Face: Real-time Face Capture and Reenactment of RGB Videos https://www.youtube.com/watch?v=ohmajJTcpNk

Perhaps using the intonation from the face-actor's voice to guide the speech synthesis.

stevenh
I agree and I've upvoted you, but I feel it's worth pointing out that Adobe's claim about their own progress in this field was fake news.

https://www.youtube.com/watch?v=I3l4XLZ59iw&t=2m34s

"Wife" sounds exactly the same in both places. All they did was copy the exact waveform from one point to another. Nothing is being synthesized.

https://www.youtube.com/watch?v=I3l4XLZ59iw&t=3m54s

The word "Jordan" is not being synthesized. The speaker was recorded saying "Jordan" beforehand for this insertion demo and they're trying to play it off as though it was synthesized on the fly. This is a scripted performance and Jordan is feigning surprise.

https://www.youtube.com/watch?v=I3l4XLZ59iw&t=4m40s

The phrase "three times" here was prerecorded.

This was a phony demonstration of a nonexistent product. Reporters parroted the claims and none questioned what they witnessed. Adobe falsely took credit and received endless free publicity for a breakthrough they had no hand in by staging this fake demo right on the heels of the genuine interest generated by Google WaveNet. I suppose they're hoping they'll have a real product ready by whatever deadline they've set for themselves.

To be clear, I like Adobe and I think it's a cunning move on their part.

mbrookes
Thanks for the detailed breakdown. The irony is not lost!
I still think that it is incredibly important that we make tools for validation of primary source material easy to use and friendly for non-technical people.

Tech for faking video is getting more powerful day by day.

http://graphics.stanford.edu/~niessner/thies2015realtime.htm...

https://www.youtube.com/watch?v=ohmajJTcpNk

Jan 10, 2017 · eternalban on Julian Assange AMA
> this is not possible yet.

Take a good look:

https://youtu.be/ohmajJTcpNk

Dec 27, 2016 · rawnlq on Carrie Fisher has died
Maybe CG will be good enough to keep casting dead actors?

The face2face[1] tech looks pretty convincing to me and it already works in real-time (which isn't necessary for films) and you have plenty of old footage of them.

[1] https://www.youtube.com/watch?v=ohmajJTcpNk

macintux
Neither of my parents realized Peter Cushing was played via CG while watching Rogue One, so while it was terribly distracting to me, it's apparently pretty successful.
tootie
I personally didn't realize he was fake until after. Possible because Tarkin was so stoic his face barely moves. Leia looked a but unnatural but it didn't ruin it for me.
paulddraper
Agreed. Though I've long been a star wars fan, i.e. not a CGI snob ;)

The weirdest thing about Leia was her smile...I mean that she smiled at all. Real Leia would've told the guy to shut up and do his job.

danielweber
I noticed it during his first scene, but either my suspension of disbelief won out of they got better tech as the movie progressed, because it looked totally natural later.

Of course, I was expecting to notice it. If I hadn't known about it, maybe it would have gone right by me.

thrillgore
In the first appearance it felt more like the scene was abruptly adjusted in terms of lighting and exposure to make up for the rendering of Cushing's face on the actor. It definitely got harder to notice in subsequent scenes.

But once you figure out something is out of place, you can't break that thought process.

bobwaycott
I wasn't expecting it, as I tried to remain ignorant of details before seeing, but I spotted it quickly. I struggled with it momentarily, though, thinking to myself, "Wait. He's dead, right? This looks a bit off." My boys, 13 and 17, lean over simultaneously and say, "Totally fake." It still looked digital later, but I think we cared less because we'd settled it in our minds. Young Leia felt more obviously digital than Tarkin, though.
MatmaRex
There was a CGI young Arnold Schwarzenegger in Terminator Salvation, in 2009. Admittedly, that scene did not require many facial expressions: https://www.youtube.com/watch?v=L7YYfgx_cHo
manachar
It wasn't really good enough in Rogue One, but of course will be getting better.

Personally, I hope that never really becomes popular. Culture is already far too backwards looking and nostalgia filled for my taste. I'd hate for us to get stuck using dead actors providing something mimicking a performance.

Far more interesting will be the possibility of wholly CGI actors with artificial vocaloid voices. That'll be a ways off since a voice actor would be cheaper.

rkuykendall-com
> It wasn't really good enough in Rogue One

I went to Rogue One with 8 people. 3 could tell me which prominent character was CG when asked. And those 3, it was because they knew the actor was diseased.

WalterSear
>It wasn't really good enough in Rogue One, but of course it will still be getting more use.
Freak_NL
> That'll be a ways off since a voice actor would be cheaper.

Fox will probably beg to differ.

rangibaby
> I'd hate for us to get stuck using dead actors providing something mimicking a performance.

It's funny to think about it now, but Peter Cushing and Alec Guinness were the two "big name" actors in Star Wars, and provided some of the best acting in the entire series. The look on Alec Guinness' face when he mentioned Luke's father being his good friend said more than the accumulated total of prequel movies and spinoffs could ever hope to.

As experienced professionals (with experience working in low budget action films in Peter Cushing's case) they added a lot of nuance to and took ownership of their roles that I doubt face-swapped actors will have the authority to for a long time, if ever. To be honest, it feels kind of weird to me that they are resurrecting an actor who has been dead for such a long time.

Having said that, I think the actual technology is good enough for de-aging when the actual actor can give a performance; realistic CGI replacements for actual people have been improving since 2009 when a cameo by THE Terminator was believable enough (somewhat because it conveniently got its face blown off immediately). Jeff Bridges in Tron Legacy was not bad: http://www.danplatt.com/?cat=91, and you won't notice the de-aged actors in the latest Marvel movies unless you are actively looking for them.

Nov 11, 2016 · 2 points, 0 comments · submitted by FuNe
Yup and in combination with this... https://youtu.be/ohmajJTcpNk the world will be easier to fool.
It gets even better when combined with Face2Face, a live-editing software that can transfer facial expressions including lip movements from actors to e.g. politicians on TV: https://www.youtube.com/watch?v=ohmajJTcpNk
OK, throw this into the mix: many of you reading this have smartphones which are voice controlled, and for which voice control is activated at all times. In the case of Google, that processing must take place on Google's centralised servers. Siri may or may not do centralised processing (and can operate in standalone modes). Microsoft's Cortana, Facebook's "M" (IIRC) and Amazon's Aero are all various stylings of "Stasi in a Glade form factor", as Maciej Czeglowski so memorably put it.[1]

Voice stores distressingly cheaply in terms of space, and with the Internet of (broken) Things (that spy on you), odds of finding yourself surrounded by microphones in the most unexpected locations,[2] controlled by a wide variety of quite probably competing interests.[3] And if they cannot find what they're looking for in the surveillance tape itself, they'll simply manufacture their own evidence using your own phonemes[4] and video.[5]

________________________________

Notes:

1. https://twitter.com/pinboard/status/732985370204233728

2. http://www.inquisitr.com/3097029/government-surveillance-in-...

3. http://www.locusmag.com/Perspectives/2016/09/cory-doctorowth...

4. http://www.theatlantic.com/technology/archive/2016/09/hackin...

5. https://www.youtube.com/watch?v=ohmajJTcpNk

AlexCoventry

  > In the case of Google, that processing must take place on
  > Google's centralised servers
Doesn't recognition of the initialization phrase "OK, google" take place on the phone? Sending a continuous stream of audio back to google servers sounds expensive.
dredmorbius
AFAIU (which is little), "OK, Google" is processed locally. Whatever follows is processed remotely.

I should have mentioned voice-activated televisions as a whole 'nother class of attack.

Sorry! look at this video and restate your statement about "video". https://www.youtube.com/watch?v=ohmajJTcpNk
byebyetech
jaw dropped. Thanks for sharing.
karma_vaccum123
holy shit
BatFastard
Wow!! That is amazing!
AJRF
They used this technique in Mr.Robot to emulate Obama talking about the Ecorp hack
kkhire
WOW

i figured knowing obama's interest in good tv shows, he might have done the cameo! this is really neat stuff

you mean this?

https://www.youtube.com/watch?v=ohmajJTcpNk

mattnewton
Wow. Alright, can't trust anything anymore. That makes sense that they solved the problem more directly - how do I move someone else face in a video.
Aug 04, 2016 · jefe_ on Interactive Dynamic Video
The consumer use cases are interesting but the propagandist use cases are terrifying. Along with this (Real-time Face Capture and Reenactment): https://www.youtube.com/watch?v=ohmajJTcpNk
olewhalehunter
Which country (or organization) do you think will be (or has already been) the first to implement a program of creating physical mimics of humans for military and intelligence purposes? With technologies like this and CRISPR, video, testimony, or genetic evidence could be thrown out the window in criminal or military cases.
I find it amusing that they're touting it for an application where I find it quite likely that similar NN algorithms will excel at generating speech in someone else's voice. (c.f., "neural style transfer" in images, but applied to speech.) We're already getting pretty decent at this for video -- see, for example, https://www.youtube.com/watch?v=ohmajJTcpNk (Face2Face: Real-time Face Capture and Reenactment)
dharma1
was just trying to find papers for that yesterday - neural style for voice/audio. Conceptually it sounds it should be doable, but looking at the actual implementation, I'm not sure it's doable at all with a CNN for audio.
zopf
Unsure if it's CNN-based, but check out http://www.wowtune.net/
dharma1
very impressive, even if a bit out of tune on the Autumn Leaves demo!

They have a very good team with actual audio industry experience. Look forward to seeing more demos.

It sounds like a phoneme based speech/singing synthesizer, similar to Yamaha Vocaloid. I wonder how much training data is required to extract the phonemes to create a "voice"

Yea, perhaps. Like in the recent "face2face: realtime face capture and re-enactment video"[1]. and your reference to "your avatar" reminds me of the scene from the matrix where Morpheus describes "residual self-image"[2]

[1] https://www.youtube.com/watch?v=ohmajJTcpNk

[2] https://youtu.be/AGZiLMGdCE0?t=26

bloaf
See also Ghost in the Shell:

https://www.youtube.com/watch?v=poKi7YyuamI

Mar 22, 2016 · 2 points, 0 comments · submitted by officialjunk
Mar 20, 2016 · 3 points, 0 comments · submitted by officialjunk
Mar 18, 2016 · 160 points, 35 comments · submitted by benevol
vmp
This is insanely awesome. Something that comes to my mind is the use of this for dubbing movies and TV series; I'm very sensitive about correctly syncing what's being said to what we see, to the point where I only watch movies in their native tongue - even if I don't know the language and need subtitles. This could be a game-changer.
ghayes
It's even cooler since you could even do this retroactively if you had the original footage of the dubbing voice actor.
Clever321
I'm curious, how do you both read subtitles and watch that the sound is properly synced to an actor's lips? I can't read that fast, so I spend 80% of my time "watching a movie" simply reading text on the bottom of the screen.
peteretep

    > I can't read that fast
I can't speak for the op, but I can read a great deal faster than most people speak.
0x4a42
Subtitles aren't synched to actor's lips. He talked about dubbed voices, not subtitles. :)
None
None
JoshTriplett
> Something that comes to my mind is the use of this for dubbing movies and TV series

One of the references towards the end of the video mentions using this for translation, so that's definitely one of the intended applications.

drawkbox
Very well done. The best part is how they re-enact the mouth/teeth to look so real by capturing it by sampling earlier parts in the video to then use that on the non expression still or loop. I was blown away when Trump's teeth looked so real then they explained this process and why.

This could be huge (yuuuge) in games and virtual spaces. At GDC Unreal 4 has a demo recently and seems we are approaching that era[1]

[1] https://youtu.be/JbQSpfWUs4I?t=6m

kristiandupont
One of the barriers existing in webcam meetings is the inability to make eye contact. It seems subtle but I think it is more important than one might intuitively think.

I've thought a lot about how to overcome this and came up with nothing but cameras beneath the screen (which Apple seems to have worked on but we have yet to see it: http://appleinsider.com/articles/09/01/08/apple_files_patent...). This technology could possibly provide a competing solution.

mchahn
Discerning fake video from real just got a lot harder. Video is one of the last bastions of honest evidence.
yeukhon
I heard light source has been a way to determine whether something is likely to be fake or not.
benevol
The data vacuuming companies (FB, Google, ad networks, etc.) collect the information about us required to know how to manipulate us and technologies such as this one represent the tools to actually get it done.
NeonVice
Conan could use this for his fake celebrity interviews instead of just cutting the mouth out of an image. :)
SergeyHack
Imagine live edit of your video conversations, that attaches joyful emotions to any mention of an advertised brand.
tibbon
The paper is great, but I wanna see some source code!
Hydraulix989
I don't think they want the world to see some source code, should it get into the wrong hands, though the repercussions of this work are bound to hit us sometime. [1]

[1] https://www.youtube.com/watch?v=GBkT19uH2RQ

albertzeyer
Where can I find the paper?
spriggan3
The video made me feel very uncomfortable, and it takes a lot to make me feel that way.
zaro
Faking news got an order of magnitude easier :)
sageinventor
It would be cool to use this to fix movie footage in post production. You could just copy a face over if the actor screwed up
rawnlq
Or bringing back dead actors using past footage!
listic
What is Target Actor and what is Reenactment Result? The former looks better to me.
izym
Taget actor is the source material that they're changing, and the latter is the result of that.
xchip
Does anyone have the link to the paper?
mccappy
http://www.graphics.stanford.edu/~niessner/thies2016face.htm...
mccappy
http://www.ieee.org/conferences_events/conferences/conferenc...
oliyoung
Terrifying, amazing but terrifying.
namelezz
This is impressive.
diskcat
The title is really underwhelming compared to how cool the demo is.
imaginenore
Hilarious and scary at the same time. The admissibility of videos in courts is becoming more and more questionable.
deelowe
Wow.
mortenjorck
This demo is doubly amazing: First, the obviously impressive (and slightly unsettling in its implications) manipulation of the target face, but second, the fact that this is all being done with a single RGB camera.

Consider the massive rig required to perform the at-the-time groundbreaking performance capture for the game L.A. Noire: http://i.kinja-img.com/gawker-media/image/upload/s--y6fmsAIU... This is how far computer vision has come in five years.

rawnlq
In terms of gaming applications this could be huge for virtual reality avatars. You can still be anonymous but still convey facial expressions with a webcam!
scoot
I'm curious how you think a webcam will be able to read your facial expression when you have a VR headset on you face.
toisanji
smaller vr rigs should just cover the eyes. or maybe there is a small camera hanging under the vr rig to capture face elements.
greeneggs
This research uses a camera to capture the mouth area and strain sensors for the upper face (that part can obviously be improved).

http://www.hao-li.com/Hao_Li/Hao_Li_-_publications_%5BFacial...

Mar 18, 2016 · 2 points, 1 comments · submitted by ratneshmadaan
billconan
this is amazing! can't wait to see the paper.

how do they create a 3D model out of video.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.