In under six years, the development of artificial intelligence has made it possible for almost anyone to create fake images that are indistinguishable from reality. From the porn business to a coup d'état in Gabon, the Internet is disseminating a new phantom threat: that we will never know what is true again.
During the last legislative elections in New Delhi, candidate Manoj Tiwari surprised his voters with a video speaking in Hindi, another in English, and in Haryanvi. Before becoming the leading figure of the Indian People's Party (BJP) in the country's capital, Tiwari was an actor, popular singer, and reality show star. Yet, no one suspected that he could speak English (a valuable asset to the urban classes), let alone that he could speak the dialect of the Haryana area.
The truth came out days later: an advertising agency had proposed to the BJP, to which Prime Minister Narendra Modi belongs, to extend the electoral offer by using artificial intelligence to create Tiwari deepfakes. With previous recordings and cutting-edge software, they put words into his mouth that he did not know and spread his message through WhatsApp to voters outside of their core support.
It is not the first time that a candidate has imposed his voice to approach new fellow citizens. Nor is it the first time that artificial intelligence has been used in politics. Yes, as far as we know, it is the first time that a candidate has changed his own body and voice with deep learning to improve his performance.
From Bolsonaro as The Red Grasshopper to Cristina Kirchner as a Drag Queen in RuPaul’s Drag Race, the Internet became filled with videos for (mostly) humorous purposes, although the overwhelming majority was still pornographic
Deepfakes first appeared in 2017, the year of the fake news boom. The Reddit user /r/deepfakes first published his pornographic creations using algorithms and image libraries with free access, producing amazing results.
In sync with the emergence of TikTok and aging or facial rejuvenation apps, this anonymous user’s technique became popular, and soon after, the first app incorporating any face to an existing video came out. From Bolsonaro as The Red Grasshopper to Cristina Kirchner as a Drag Queen from RuPaul’s Race, the Internet was filled with videos for (mostly) humorous purposes, although the overwhelming majority was still pornographic.
What’s most remarkable about deepfakes is the improvement of its quality. In August, a fan published his own version of a young Robert De Niro in The Irishman. A comparison between Netflix's CGI work and this YouTube user's deepfake (and the millions of dollars of difference) gives an indication of the accessibility and potential effectiveness of this tool.
For these creations, an autocoder is used to create a latent image with only a few variables (smile parameters, frown, etc.), which then replaces the final image with others (the same gestures with another face, or the same face with another speech, for example).
Nevertheless, we are not only talking about still or moving images, but also sound. The false news based on a viral audio about Lionel Messi's alleged move to Manchester City could have done without a talented copycat. The audio could have well been created with a software such as the one used by Boston Children's Hospital to recreate the voice of those who have lost their speech.
In September, the first major deepfake scam became public: according to the Wall Street Journal, the CEO of an English company transferred 220,000 euros on the order of software that impersonated the voice of his German boss.
Not only does the mere existence of this technology enable the possibility of creating fakes - with unusual political and social consequences - but also displaces the reality of their status: if what exists can be adulterated or directly invented, everyone has a right to mistrust. The most paradigmatic example of this problem, reports Rob Toews in Forbes magazine, happened in Gabon.
For many months in 2018, Gabon’s president, Ali Bongo, did not appear publicly. The rumours about his health status and even his possible death forced the government to reveal that Bongo had suffered a stroke, but he was recovering and was scheduled to give a speech for the New Year. The rigidity and apparent artificiality of the leader's movements in the recorded message quickly awakened the psychosis of the opposition: the video is false, they declared.
A week later, and relying on the apparent acephaly, a faction of the army wanted to stage a coup d'état in Gabon, although it was later repressed... by Bongo himself, who is still in charge of the government. The video had not been altered.
Nothing but the truth
The pandemic took our relationship with virtual images to unsuspected levels. Job interviews, classes, baptisms, doctor appointments, court hearings, legislative sessions, and even sex. Being “present" is now an increasingly dispensable requirement in the rituals and institutions that constitute our society.
Conversely, virtual identity, or "fingerprint", is becoming increasingly relevant, and not only in legal but also in practical terms. Authentication is vital where quotidian life finds its way only through digital projection.
Children from all over the world know that, similarly to what the Argentinian Senator Esteban Bullrich did in Congress, they can make fun of their teachers by putting images on a loop in a virtual classroom.
Deepfakes present more complicated problems. Artificial intelligence (AI) is already used for the mass creation of comments to position a product or service on e-commerce platforms, and also for political purposes, as proven during Argentina’s presidential campaign in 2019.
Why not imagine mass protests or mobilisations, summary executions, repressions, street crimes, and other invented visual records? If the "smear campaigns" are already a consolidated tool, both for those who practice them and for those who use them as an excuse, what possibilities do deepfakes open up? What levels of political misery can be brought about by the possibility of a visual record being false?
In October, Facebook set up a $10 million fund to develop the tools to quickly detect fake images.
According to one analysis from the Crime Science Journal, the deepfakes with criminal intent are the most damaging (or lucrative) artificial intelligence-based crimes and the most difficult to defeat. Among its modalities, there is the extortive counterfeiting of kidnappings through voice impersonation or video image to access secure systems, and other wide range of extortions.
These concerns have already triggered some reactions. China banned the dissemination of deepfakes without a warning, and the State of California prohibited their use for political purposes during election periods. In October, Facebook set up a 10 million dollar fund to develop the tools to quickly detect fake images.
Microsoft, for its part, has just presented its "Video Authenticator", a tool for detecting deepfakes. And even Sensity, the "first visual threat intelligence company", has emerged. Sensity combines monitoring and algorithmic detection of deep fakes.
According to Sensity, until July of 2019 there were less than 15,000 deepfakes circulating on the web. A year later, the figure grew to almost 50,000. 96% are pornographic and, so far in 2020, more than a thousand deepfakes have been uploaded monthly on pornography sites alone, where the so-called "forbidden videos" of celebrities and influencers are appearing more and more frequently.
"The companies behind the porn site do not consider this to be a problem," Sensity’s CEO, Giorgio Patrini, told Wired. It is quite the contrary. A deepfake of Emma Watson has 23 million views on Xvideos, Xnxx and xHamster, three of the world's largest porn sites, whose monetization logic consists of directing massive traffic to paid content.
Among the most twisted speculations is the cross between deepfakes and virtual reality, where real people (celebrities or not) can come to life as virtual sex slaves. This should not be the main concern for societies like those in Latin America, where access to the Internet is not even guaranteed. But the last few years have shown that the future is never too far away.
No one can deny it
A deepfake is not just any kind of video editing, but the application of a specific technology for a specific purpose: deep learning in a false record. At the same time, deep learning is not just any kind of artificial intelligence.
According to the definition in Ian Goodfellow's book (2014), Deep Learning seeks to solve " "the tasks that are easy for people to perform but hard for people to describe formally".
For example, recognizing an image. The development of computer science went in the opposite direction: as early as 1997, IBM's Deep Blue computer managed to beat the best living chess player in the world. But, much more recent is the computer’s ability to interpret a mood, distinguish a dog from a cat, or directly "speak" - tasks that any wild human being can perform without any specific training.
The irony is locked in some captcha: "Show that you are a human by identifying this traffic light". What a great skill, Mr. Human. Congratulations.
Ian Goodfellow had already caused a stir among his colleagues with his book when, that same year, he devised the invention that placed him in the global pantheon of the fundamental minds of artificial intelligence: the Generative Adversarial Network (GAN), an algorithmic model that, among other things, made the emergence of deepfakes possible.
The current director of Machine Learning at Apple and former chief researcher at Google Brain (who is not yet 35) was drinking beer in a Montreal bar while discussing with friends the ability of artificial intelligence to generate realistic photos. Alcohol fueled an idea he would have dismissed under the influence of sobriety.
For a neural network to learn how to create an image, it must not only observe millions of images but also know whether what it has created is right or wrong. To solve this problem, Goodfellow proposed to pit two networks against each other: a "generating" network, trained to create the images, and a "discriminating" network, trained specifically to detect the differences between a real image and an artificially created one.
Unlike other technologies, "democratisation" will not solve the dilemmas presented by deepfakes. Who will we demand the truth?
Through successive rounds, the networks automatically improve the parameters on which they perform their task. And eventually, the discriminating network will no longer be able to detect what is real and what is false. Goodfellow's theory was tested in practice and, among other less publicized uses, deepfakes emerged in the suburbs of the Internet.
Goodfellow's invention involves a Faustian logic: you will be able to create the real thing, but you will no longer know what the real thing is. In an interview with the MIT Technology Review, he admits that there will be no technical solution to the problem of authentication, but that it will be a social requirement to educate and raise awareness of the dangers of this technology and the possibility that the images we see may or may not be real. "How would you prove that you are a human and not a robot?”, Lex Fridman asked him on his podcast. "According to my own research methodology there is no way of knowing at this point," answered Goodfellow, who from his surname to his monotone tone and discursive accuracy could pass for an android. "To prove that something is real because of its own content is very difficult. We are capable of simulating almost anything, so you would have to use something beyond the content to prove that it is real," Goodfellow continued.
The bad reputation of simulation should not, nevertheless, overshadow its potential: the testing of simulated drugs on simulated organs, affected by simulated diseases; subatomic experimentation for the development of alternative energies; the algorithmic projection of space travel; industrial, agro-food and even artistic applications.
Most of these disciplines require immense computational capacity (and in that field the biggest bet is quantum computing), but what is interesting is the underlying premise. Goodfellow seeks to ensure that networks "understand the world in terms of a hierarchy of concepts, each defined by simpler concepts", coming from experience.
If the neuronal networks of artificial intelligence continue at this rate of acceleration, humanity will have at its disposal tools capable of dislocating its experience with the world. Forever. Unlike other technologies, "democratisation" will not solve the dilemmas presented by deepfakes. Who will we demand the truth? Perhaps we will have to get used to living without it.
This article is published within the framework of the editorial alliance between democraciaAbierta and Nueva Sociedad. Read the original in Spanish here.