The TED AI Show
Is AI destroying our sense of reality? with Sam Gregory
May 21, 2024
[00:00:00] Bilawal Sidhu:
Okay, picture this. We're in Miami. There's sun, sand, palm trees in a giant open air mall, right on the water. Pretty nice, right? But then heat one Monday night in January of this year. Things get weird. Cop cars swarmed them all, like dozens of them. I'm talking six city blocks, shut down, lights flashing people everywhere, and no one knows what's going on.
The news footage hits the internet, and naturally the hive mind goes into overdrive. Speculation, conspiracy theories are flying and one idea takes hold. Aliens. Folks are dissecting grainy helicopter footage in the comments. Zooming in. Analyzing it frame by frame to find any evidence of aliens. So I thought, I'm a TikToker.
What if I brought this online fever dream to life and shared it with the masses? Using the latest AI tools, I created a video towering shadowy figures, silently materializing amidst the flashing lights of the police cars, an alien invasion in the middle of Miami's Bayside Marketplace. “Just a bit of fun,” I thought.
Some got the joke. A Miami twist on Stranger Things, but I watched as other people flocked into my comment section to declare this as bonafide evidence that aliens do in fact exist. Now you might be wondering what actually happened. A bunch of teenagers got into a fight at the mall. The police showed up to break it up, and that's all it took to trigger this mass hysteria.
It was too easy, too easy to make people believe something happened that never actually happened, and that's kind of terrifying. I am Bilawal Sidhu, and this is The TED AI Show where we figure out how to live and thrive in a world where AI has changed everything.
So I've been making visual effects, blending realities on my computer since I was a kid. I remember watching a show called Mega Movie Magic, which revealed the secrets behind movie’s special effects. I learned about CGI and practical effects in movies like Star Wars, Godzilla, and Independence Day. I was already into computer graphics, but seeing how they could create visuals indistinguishable from reality was a game changer.
It sparked a lifelong passion to blend the physical and digital worlds. Several years ago, I started my TikTok channel. I'd upload my own creations and share them with hundreds and thousands and now millions of viewers. I mean, just five years ago, if I wanted to make a video of Giant Aliens invading a mall in Miami, it would've taken me a week and at least five pieces of software.
But this aliens video. It took me just a day to make using tools like Midjourney, RunwayML, and Adobe Premier. Tools that anyone with a laptop can access. Since Chat GPT came on the scene in late 2022, there's been a lot of talk about the Turing test where a human evaluator tries to figure out if the person at the other end of a text chat is a machine or another human, but what about the visual Turing test where machines can create images that are indistinguishable from reality?
And now Open AI has come out with Sora, a video generation tool that will create impressively lifelike video from a single text prompt. It's basically like Chat GPT or DALL-E, but instead of text or images, it generates video.
And don't get me wrong, there are other video generation tools out there, but when I first saw Sora, the realism blew my socks off. I mean, with these other programs, you can make short videos like a couple seconds long, but with Sora, we're talking minute long videos. The 3D consistency with those long dynamic camera moves definitely impressed me.
There's so much high frequency detail and the scene is just brimming with life. And if we can just punch in a single text prompt into Sora and it'll give us full on video that's visually convincing to the point that some people could mistake it for something real. Well, you could imagine some of the problems that might stem from that.
So we're at a turning point. Not only have we shattered the visual Turing test, we're re shattering it every day. Images, audio, video 3D, the list goes on. I mean, you've probably seen the headlines. AI generated nude photographs of Taylor Swift circulating on Twitter. A generated video of Volodymyr Zelenskyy, surrendering to the Russian Army.
A fraudster successfully impersonating a CFO on a video call to scam a Hong Kong company outta tens of millions of dollars. And as bad as the hoaxes and the fakes and the scams are, there's a more insidious danger. What if we stop believing anything we see? Think about that. Think about a future where you don't believe the news, you don't trust the video evidence you see in court.
You're not even sure that the person on the other end of the zoom call is real. This isn't some far-flung future. In fact, I'd argue we're living in it now. So given that we're in this new world where we're constantly shattering and re-shattering the visual Turing test, how do we protect our own sense of reality?
I reached out to Sam Gregory to talk me through what we're up against. Sam is an expert on generative AI and misinformation, and is the executive director of the Human Rights Network Witness. His organization has been working with journalists, human rights advocates, and technologists to come up with solutions that help us separate the real from the fake.
Sam, thank you for joining us. I have to ask you, as we're seeing these AI tools proliferate just over the last two years, are we correspondingly seeing a massive uptick of these visual hoaxes?
[00:06:04] Sam Gregory:
The vast majority are still these shallow fakes because anyone can make a shallow fake that's trivially easy, right?
Just to take an image, grab it out, a Google search, and claim it's from another place. What we're seeing though is this uptick that's happening in, in a whole range of ways people are using this generative media for, for deception. So you see images sometimes deliberately shared to deceive people, right?
Someone will share an image claiming it's, you know, of an event that never happened and then, you know, we're seeing a lot of audio 'cause it's so trivially easy to make right few seconds of, of your voice. And you can, you can churn out endless, endless, um, uh, cloned voice. We are not seeing so much video, right?
And that's, you know, um, a reflection that, you know, really doing complex video recreation is still not quite there, right?
[00:06:51] Bilawal Sidhu:
Yeah. Video is significantly harder, at least for the moments, and I personally hope that it would stay pretty hard for a while, though some of these generations are getting absolutely wild.
I had a bit of an existential moment looking at this one video from Sora. It's the underwater diver video. For anyone who hasn't seen this, uh, there's a diver swimming underwater, you know, investigating this historic, almost archeological spaceship that's crashed into the waterbed. And it looked absolutely real.
And I was thinking through what that would've taken for me to do the old fashioned way. And I was just gasping that this was just a simple prompt that produced this immaculate one minute video. I'm kind of curious, have you had such a moment yourself?
[00:07:40] Sam Gregory:
It, it's funny because I was literally showing that video to my colleagues and I didn't queue them up that it was made with with Sora.
'Cause I wanted to see whether they clicked that it was, um, an AI generated video 'cause I think it's a fascinating one. It's kind of on the edge of possibility. There's definitely a kind of a moment that's happening now for me and it, and it, and it's really interesting 'cause you know, we first started working on this like five or six years ago, and we were just doing what we described as prepare, don't panic, and really trying to puncture people's high, particularly around video deep fakes, because people kept implying that they were really easy to do and that we were surrounded by them.
And the reality was it wasn't easy to fake, you know, convincing video and to do that at scale. So it's certainly for me, Sora has been a click moment in terms of the possibility here, even though it feels like a black box and I'm not quite sure how they've done it and how accessible this is actually gonna be and how quickly.
[00:08:32] Bilawal Sidhu:
So related to this, a lot of these visual hoaxes tend to be whimsical, even innocuous, right?
In other words, they don't cause serious harm in the real world and are almost akin to pranks. But some of these visual hoaxes can be a lot more serious. Can you tell me a little bit about what you're seeing out there?
[00:08:51] Sam Gregory:
The most interesting examples right now, um, are happening in election context globally, and they're typically, um, people having words put in their mouths.
In the recent elections in Pakistan, in Bangladesh, you had candidates saying, uh, boycott the vote, or vote for the other party right? And that. Quite compelling as at a first glance, particularly if you're not very familiar with how AI can be used. And, and they're often deployed right before an election.
So those are clearly in, in most cases, malicious, they're designed to deceive. And then you're also seeing ones that are kind of these leaked conversation ones, so they're not visual hoaxes. And so you've got really, you know, quite deceptive use is happening there, either directly just with audio or at the intersection of audio with animated faces or audio with the ability to make a lip sync with a, with a video.
[00:09:39] Bilawal Sidhu:
If I, if I wanted to ask you to zoom in on one single example that's disturbed you the most, something that exemplifies what you are the most worried about, what would it be?
[00:09:50] Sam Gregory:
I'm gonna pick one that is, uh, it's actually a whole genre and, and I'm gonna describe this genre 'cause I think it's the one that people are familiar with.
But once you start to think about it, you realize how easy it is to do this. And that is pretty much everyone has seen Elon Musk selling a crypto scam, right? Often paired up with a newscaster your favorite newscaster, or your favorite political figure in every country in which I work, people have experienced that.
They've seen that video where it's like the newscaster says, “Hey, Elon, come on and explain how you follow this new crypto scam or, come on political candidate and explain why you're investing in this crypto scam.”
[00:10:25] Bilawal Sidhu:
For anyone who hasn't seen it, these are just videos with a deep fake Elon Musk trying to guilt you into buying crypto as a part of their Bitcoin giveaway program.
[00:10:35] Sam Gregory:
And so the reason I point to that is not 'cause it has massive human rights impacts or massive news impacts, but it's just, this is so commodified, but we have this sort of bigger question of how it plays into our overarching understanding of what we trust, right? Does this undermine people's confidence in almost any way in which they experience audio or video or photos that they encounter online?
Does it just reinforce what they want to believe? And for other people, just let them believe that nothing can be trusted.
[00:11:06] Bilawal Sidhu:
We're gonna take a quick break. When we come back, we're gonna talk with Sam about how we can train ourselves to better distinguish the real from the unreal using a little system he calls Sift. More on that in just a minute.
We're back with Sam Gregory of Witness. Before the break, we were talking about how these fake videos are starting to erode our trust in everything we see. And yeah, maybe you can find flaws in a lot of these videos, but some of them are really, really good and nobody's zooming in at 300% looking for those minor imperfections, especially when they're scrolling through a feed, right, like before their morning commute or something?
[00:11:52] Sam Gregory:
Yeah, and you're hitting on the thing that I think, you know, the, the news media has often done a disservice to people about how to think about spotting AI, right? We put such an emphasis on kind of like, you know, you should have spotted the Pope, you know, had his ring finger on the wrong hand in that puffer jacket image, right?
Or didn't you see that his hair didn't look quite right on the hairline, or didn't you see he didn't link at the regular rate? And it's, it's just so cruel almost to us as consumers to expect us to spot those things. We don't do it. I don't look at every TikTok video in my, you know, in in my for you page and go like, let me just look at this really carefully and make sure if someone's trying to deceive me.
And so we've, we've done a disservice often because people point out these glitches and then they expect people to spot them. And it's, it's, it creates this whole culture where we distrust everything we look at. Um, and we try and apply this sort of personal forensic skepticism and it, it doesn't lead us to great places.
[00:12:45] Bilawal Sidhu:
I wanna talk about mitigation. How do we prepare and what can we do right now?
[00:12:50] Sam Gregory:
When we first started saying prepare, don't panic, it was five or six years ago, and it was in the first deepfakes hype cycle, which was like the 2018 elections when everyone was like, deepfakes are gonna destroy the elections.
And I I don't think there was a single deep fake in the 2018 US elections of any note. Now, let's fast forward to now, right? 2024. When we look around the world, the threat is clear and present now and it's escalating. So prepare is about acting, listening to the right voices and thinking about how we balance out creativity, expression, human rights, and do that from a global perspective.
'Cause so much of this conversation often is also very US or Europe centric. Um, so what can we do now? You know, the first part of it is who are we listening to about this? And I often get frustrated in AI conversations could get this very abstract discussion around AI harms and AI safety. And it feels very different from the conversation I'm having with journalists and human rights defenders on the ground who are saying, “I got targeted with a non-consensual sexual, deepfake. I got, uh, my footage dismissed as faked by a politician. 'Cause he said, “It could have been made by, by AI.””
So as we prepare, the first thing is how, who do we listen to? Right? And we should listen to the people who actually are experiencing this. And then we need to think what is it that we need to help people understand how AI is being used.
This kind of question of the recipe. Um, and I use the recipe analogy because I think, we're not in a world where it's AI or not, it's even in the photos we take on our iPhones. We're already combining AI and human right, the human input, then the AI modifications that make our photos look better. So we need to think, how do we communicate that AI was used in the media we make.
We need to show people how AI and human were involved in the creation of a piece of media, how it was edited and how it's distributed. The second part of it is around access to detection, and the thing that we've seen is there's a huge gap in access to the detection tools for the people who need it most, like journalists and election officials and human rights defenders globally.
And so they're kind of stuck. They get this piece of video or an image, and they are doing the same things that we're encouraging ordinary people to do. Look for the glitches, you know, take a guess, drop it in an online detector, and all of those things are as likely to give a false positive, um, or a false negative as they are to give a reliable result that you can explain.
So you've got those two things. You've got an absence of transparency explaining the recipe. You've got gaps in access to detection, and neither of those will work well unless the whole of the AI pipeline, uh, plays its part in making sure the signals of that authenticity. And the ability to detect is retained all the way through.
So those are the three key things that we point to is transparency done right, detection available to those who need it most, and the importance of having an AI pipeline where the responsibility is shared across the whole AI industry.
[00:15:37] Bilawal Sidhu:
I think you covered like three questions beautifully right here. A key challenge is telling what content is generated by humans versus synthetically generated by machines.
And one of the efforts you're involved in is the appropriately named Content Authenticity Initiative. Could you talk a bit about how does that play into a world where we will have fake content purporting to be real?
[00:15:59] Sam Gregory:
Yes. So, um, about five years ago there was, there were a couple of initiatives founded by a mix of companies and media entities, and Witness joined those early on to see how we could bring a human rights voice to them.
And one of them was something called the Content Authenticity Initiative that Adobe kicked off. And another was something called the Coalition for Content Provenance and Authenticity. Uh, the shorthand for that is C2PA. So let me explain a little more about what C2PA is. It's, it's basically a technical standard for showing what we might describe as the provenance of an image or a video or another piece of media.
And provenance is basically the trail of how it was created, right? This is a standard that's being increasingly adopted by platforms. In the last couple of months, you've seen Google and Meta adopted as a way they're gonna show to people how the media they encounter online, particularly AI generated or edited media was made.
Um, it's also a direction that governments are moving in. Some key things that, that we point to around standards like the C2PA is, you know, the first thing is they are not a foolproof way of showing whether something was made with AI, made with a human. What I mean by that is they tell you information, but you know, we know that people can remove that metadata, for example, they can strip out the metadata.
And we also know that some people may not add this in for a range of reasons. So we're creating a system that allows additional signals of, of trust or additional piece of information, but no one confirmation of authenticity or reality.
I think that's really important that we be clear that this is in some sense a, a harm reduction approach. It's a way to give people more information, but it's not gonna be conclusive in a kind of sort of silver bullet like way. Um, and then the second sort of thing that we need to think about these is, you know, we need to really make sure that this is about the how of how media was made, not the who of who made it.
Uh, otherwise we open a back door to surveillance. We open a back door to the ways this will be used to target and criminalize journalists and people who speak out against governments globally.
[00:17:55] Bilawal Sidhu:
Beautifully said. I, especially in the last point, I, I noticed Tim Sweeney had some interesting remarks about all of the content authenticity initiatives happening, as kind of described it as sort of, uh, surveillance DRM where you cannot upload a piece of content, right?
Like if, if people like you aren't pushing on this direction, we may well end up in a world where you cannot upload imagery onto the internet without having your identity tied to it. And I think that would be a scary world indeed.
[00:18:20] Sam Gregory:
The thing that we have consistently pushed back on in systems like C2PA and is on the idea that identity should be the center of how you're trusted online.
It's, it's helpful, right? And in many times I want people to know who I am, but if we start to premise trust online in individual identity as the center it and require people to do that, that brings all kinds of risks that we already have a history of understanding from social media, right? That's not to say we shouldn't think about things like proof of personhood, right?
Like how do we understand that someone who created media was a human may be important, right? As we enter an AI generated world. That's not the same as knowing that it was Sam who made it not a generic human who made it right? So I think that's really important.
[00:19:02] Bilawal Sidhu:
It's a slippery slope indeed. And really good point on sort of the distinction between validating you’re a human being versus, you know, validating you are Sam Gregory, that's a, there, it's a very subtle, but you know, a crucial distinction.
Let's move over to fears and hopes. You know, back in 2017 you felt the, the fear around deepfakes were overblown. Clearly now it is, is far more of a clear and present danger. Where do you stand now? What are your hopes and fears at the moment?
[00:19:30] Sam Gregory:
So we've gone from a scenario in 2017 where the, the, the primary harm was the one that people didn't discuss that was gender-based violence.
Um, and the harm everyone discussed political usage was non-existent to a scenario now where the gender-based violence has got far worse, right? And targets everyone from public figures to teenagers in schools all around the world. And the political usage is now very real. And the third thing is you have people realizing there's this incredibly good excuse for a piece of compromising media, which is just to say, “Hey, that was faked.” Or, “Hey plausibly, I can deny that piece of media by saying that it was faked.”
And so those three are the sort of the core fears that I experience now that have translated into reality. Now, in terms of hopes, I don't think we've acted yet on those three core problems sufficiently, right? We need to address those and we need to make sure that you know, we, we criminalize the ways in which people target primarily women with, um, non non-consensual sexual deep fakes, which are escalating in the second area of fears, which is the fears around their misuse to in, in politics and, and to undermine news footage and, and human rights content.
I think that's where we need to lean into a lot of the, um, the approaches like the authenticity and provenance infrastructures like the C2PA, the access to detection tools for the journalists who need it most, and then smart laws that can help us rule out some usages, right?
And make sure that it is clear that some uses are unacceptable. And then the third area, that's the hardest one 'cause we just don't have the research yet about what is the impact of this constant sort of drip, drip, drip of, you can't believe what you see in here. We can only reach an 84% probability that it's real or false, which is not great for public confidence.
But we also don't know how this plays into this broader societal trust crisis we have where already people wanna lean into kind of almost plausible believability on stuff they care about or just plausibly ignoring anything that, you know, challenges those beliefs.
[00:21:35] Bilawal Sidhu:
I think you brought up a really good point about, it's almost like the world is fracturing into the multiverse of madness, I like to call it, where people are looking for whatever validation to sort of confirm their beliefs at the same time it can, it can result in people being jaded, right?
Where they're just gonna be detached. Well, I, I don't trust anything. And so I'm curious, how do you see consumers behaviors changing in this world where the visual Turing test gets shattered over and over again for all sorts of different, more complex domains?
Are people gonna get savvier? What do you think is going to happen to society in such a world?
[00:22:11] Sam Gregory:
So we have to hope that we walk a fine line. We're gonna need to be more skeptical of audio and images and video that we encounter online. Um, but we're gonna have to do that with a skepticism that's supported by signals that help us.
What I mean by that is if we enter a world wedge where we're just like, Hey, everyone, everything could be faked. It's getting better every day. Hey, look out for the glitch. Then we enter a world where people's skepticism, quite rightly will accelerate 'cause all of us will experience, like on a daily basis being deceived, right? And I think that's very legitimate for us to then feel like we can't trust anything.
[00:22:45] Bilawal Sidhu:
Right? In the ideal world, everyone's labeling what's real or fake, but when that's not happening, what do people do?
[00:22:54] Sam Gregory:
I always go back to, you know, basic media literacy. I use an acronym called Sift that was invented by an ac, um, an academic called Mike Caulfield.
And Sifts is S-I-F-T. S stands for Stop, right? Because it's basically stop before you're emotionally triggered, right? Whenever you see something that's too good to be true. Um, I stands for investigate. The source, which is like, um, who shared this is, is it someone I should trust? The F stands for find alternative coverage, right?
Did someone already write about this and say, wait, that's not the pope in a puffer jacket. In reality, that's an AI image. And then the, the fourth part of that, which is. Um, getting complicated is T for trace the original, which used to always be a great way of doing it in the shallow fake era 'cause you'd find that an image had been recycled, but it's getting harder now.
So when I look at the sort of the knife edge we've gotta walk, it's to help people do Sift in an environment that is structured to give them better signals of AI was used, and where the law has set parameters about what is definitely not acceptable. And where all the companies, all the players in that AI pipeline are playing their part to make sure that we can see how the recipe of AI and human was used and that it's as easy as possible to detect when AI was used to, uh, manipulate or create, uh, a piece of imagery, audio or video.
[00:24:10] Bilawal Sidhu:
I really like Sift. I think that's also very good advice for people when they come across something that is indeed too good to be true. Very often we will be like, oh, well that's interesting and go about our day. The devices we use every day aren't foolproof, right? They've got vulnerabilities. There is this game of whack-a-mole that happens with patching those vulnerabilities.
And now we've got these cognitive vulnerabilities almost. And you know, on the detection side, the tools are gonna need to keep improving because people are gonna find ways to use the detectors to create new generators that evade them, right? And so that game of whack-a-mole will continue. But that isn't to say that all hope is lost.
We can adapt and we can still have an information landscape where we can all thrive together.
[00:24:54] Sam Gregory:
That's the future. I want the way we describe it, at Witness, we talk about fortifying the truth, which is that we need to find ways to defend that there is a reality out there.
[00:25:04] Bilawal Sidhu:
Thank you so much, Sam. I, I will certainly sleep easier at night knowing there are people like you out there making sure we can tell the difference between the real and unreal.
Thank you so much for joining us.
Sam Gregory and I had this conversation in mid-March and a few days later there was another development. YouTube came out with a new rule. If you have AI generated content in your video, and it's not obvious, you have to disclose it's AI. This move from YouTube is an important one. The kind Sam and his colleagues at Witness have been advocating for.
It shifts the onus onto creators and platforms and away from everyday viewers because ultimately it's unfair to make all of us become AI detectives scrutinizing every video for that missing shadow or impossible physics, especially in a world where the visual Turing test is continually being shattered. And look, I'm not gonna sugarcoat this.
This is a huge problem and it's gonna be difficult for everyone. Folks like Sam Gregory have their work cut out for them. And massive organizations like TikTok, Google, and Meta do too. But listen, I'm gonna be back here this week and the week after that, and the week after that, helping you figure out how to navigate this new world order, how to live with AI, and yes, thrive with it too.
We'll be talking to researchers, artists, journalists, academics, who can help us demystify the technology as it evolves together. We're gonna figure out how to navigate AI before it navigates us, this is The TED AI Show. I hope you'll join us.
The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Elah Feder and Sarah McCrea. Our editors are Banban Cheng and Alejandra Salazar. Our showrunner is Ivana Tucker, and our associate producer is Ben Montoya. Our engineer is Aja Pilar Simpson.
Our technical director is Jacob Winik, and our executive producer is Eliza Smith. Our fact checker is Krystian Aparta. And I'm your host, Bilawal Sidhu. See y'all on the next one.