What are deepfakes? How and why they work

Deepfakes swap celebrities' faces into porn videos and put words in politicians' mouths, but they could do a lot worse.

Comments

Deepfakes are fake videos or audio recordings that look and sound just like the real thing. Once the bailiwick of Hollywood special effects studios and intelligence agencies producing propaganda, like the CIA or GCHQ's JTRIG directorate, today anyone can download deepfake software and create convincing fake videos in their spare time.

So far, deepfakes have been limited to amateur hobbyists putting celebrities' faces on porn stars' bodies and making politicians say funny things. However, it would be just as easy to create a deepfake of an emergency alert warning an attack was imminent, or destroy someone's marriage with a fake sex video, or disrupt a close election by dropping a fake video or audio recording of one of the candidates days before voting starts.

This makes a lot of people nervous, so much so that Marco Rubio, the Republican senator from Florida and 2016 presidential candidate, called them the modern equivalent of nuclear weapons.

"In the old days," he told an audience in Washington a couple weeks ago, "if you wanted to threaten the United States, you needed 10 aircraft carriers, and nuclear weapons, and long-range missiles. Today, you just need access to our internet system, to our banking system, to our electrical grid and infrastructure, and increasingly, all you need is the ability to produce a very realistic fake video that could undermine our elections, that could throw our country into tremendous crisis internally and weaken us deeply."

Political hyperbole skewed by frustrated ambition, or are deepfakes really a bigger threat than nuclear weapons? To hear Rubio tell it, we're headed for Armageddon. Not everyone agrees, however.

"As dangerous as nuclear bombs? I don't think so," says Tim Hwang, director of the Ethics and Governance of AI Initiative at the Berkman-Klein Center and MIT Media Lab. "I think that certainly the demonstrations that we've seen are disturbing. I think they're concerning and they raise a lot of questions, but I'm skeptical they change the game in a way that a lot of people are suggesting."

How deepfakes work

Seeing is believing, the old saw has it, but the truth is that believing is seeing: Human beings seek out information that supports what they want to believe and ignore the rest.

Hacking that human tendency gives malicious actors a lot of power. We see this already with disinformation (so-called "fake news") that creates deliberate falsehoods that then spread under the guise of truth. By the time fact checkers start howling in protest, it's too late, and #PizzaGate is a thing.

Deepfakes exploit this human tendency using generative adversarial networks (GANs), in which two machine learning (ML) models duke it out. One ML model trains on a data set and then creates video forgeries, while the other attempts to detect the forgeries. The forger creates fakes until the other ML model can't detect the forgery. The larger the set of training data, the easier it is for the forger to create a believable deepfake. This is why videos of former presidents and Hollywood celebrities have been frequently used in this early, first generation of deepfakes — there's a ton of publicly available video footage to train the forger.

Who's wagging whom?

David Mamet's wickedly funny 1997 film Wag the Dog satirised a president running for re-election who fakes a war using special effects to cover up a sex scandal. Prophetic for its time, the ability to "fake TV news" has been around for a while and is now in the hands on pretty much every laptop owner on the planet.

GANs, of course, have many other uses than making fake sex videos and putting words in politicians' mouths. GANs are a big leap forward in what's known as "unsupervised learning" — when ML models teach themselves. This holds great promise in improving self-driving vehicles' ability to recognise pedestrians and bicyclists, and to make voice-activated digital assistants like Alexa and Siri more conversational. Some herald GANs as the rise of "AI imagination."

Ordinary users can download FakeApp and get started creating their own deepfakes right away. Using the app isn't super-easy, but a moderately geeky user should have no trouble, as Kevin Roose demonstrated for the New York Times earlier this year.

That said, there are so many other forms of effective disinformation that focusing on playing "Whack-a-Mole" with deepfakes is the wrong strategy, Hwang tells CSO. "I think that even in the present it turns out there are lots of cheap ways that don't require deep learning or machine learning to deceive and shape public opinion."

For instance, taking a video of people beating someone up in the street, and then creating a false narrative around that video — perhaps claiming that the attackers are immigrants to the U.S., for example — doesn't require a fancy ML algorithm, just a believable false narrative and a video that fits.

How to detect deepfakes

Detecting deepfakes is a hard problem. Amateurish deepfakes can, of course, be detected by the naked eye. Other signs that machines can spot include a lack of eye blinking or shadows that look wrong. GANs that generate deepfakes are getting better all the time, and soon we will have to rely on digital forensics to detect deepfakes — if we can, in fact, detect them at all.

This is such a hard problem that DARPA is throwing money at researchers to find better ways to authenticate video. However, because GANs can themselves be trained to learn how to evade such forensics, it's not clear that this is a battle we can win.

"Theoretically, if you gave a GAN all the techniques we know to detect it, it could pass all of those techniques," David Gunning, the DARPA program manager in charge of the project, told MIT Technology Review. "We don't know if there's a limit. It's unclear."

If we are unable to detect fake videos, we may soon be forced to distrust everything we see and hear, critics warn. The internet now mediates every aspect of our lives, and an inability to trust anything we see could lead to an "end of truth." This threatens not only faith in our political system, but, over the longer term, our faith in what is shared objective reality. If we can't agree on what is real and what is not, how can we possibly debate policy issues? alarmists lament.

Hwang thinks this is exaggeration, however. "This is one of my biggest critiques," he says. "I don't see us crossing some mystical threshold after which we're not going to know what's real and what's not."

At the end of the day, the hype around deepfakes may be the greatest protection we have. We are on alert that video can be forged in this way, and that takes the sting out of deepfakes. (CSO)