Social engineering, malware, and the future of cybersecurity in AI (pt.1)

Feedback

Social engineering, cyberattacks, and the fog of war - all topics covered in this interview with the VP of Security and Privacy at Google DeepMind. Hannah Fry and Four Flynn take us behind the scenes of Operation Aurora, the monumental 2009 attack on Google that forever changed the landscape of cybersecurity. They discuss the defender's dilemma, the constant battle between attackers and defenders in the digital world, and how AI can potentially help mitigate some of the most complex vulnerabilities. As Hannah said, there was just too much to fit into one episode, so keep an eye on your feed for part 2. If you’re worried about missing it, why not subscribe and turn on notifications for new episodes. Until next time! ___ Further reading: CodeMender: https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/ Cybersecurity at Google: https://blog.google/technology/safety-security/ai-security-frontier-strategy-tools/ Threat intelligence report: https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai ___ Thanks to everyone who made this possible, including but not limited to: Presenter: Professor Hannah Fry Series Producer: Dan Hardoon Editor: Rami Tzabar Commissioner & Producer: Emma Yousif Music composition: Eleni Shaw Audio engineer: Richard Courtice Video Editor: Bilal Merhi Audio Engineer: Perry Rogantin Visual Identity and Design: Rob Ashley Commissioned by Google DeepMind ____ Subscribe to our channel https://www.youtube.com/@googledeepmind Find us on X https://twitter.com/GoogleDeepMind Follow us on Instagram https://instagram.com/googledeepmind Add us on Linkedin https://www.linkedin.com/company/deepmind/

Google DeepMind

51:42

2025-10-09

Loading YouTube video...

Video ID:1gO2bC5xLlo

Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.

Transcript

1201 segmentsCurrent: 0:00

HANNAH FRY: Were you just sat in a room

with Sergey Brin for 15 hours a day?

FOUR FLYNN: There was a lot of us

in that room for a lot of hours in a day.

When we think about it, we look back on it,

I still get a pit in my stomach.

These systems that we all depend on in our daily lives

are built on millions of lines of very complicated code.

And so while the number of vulnerabilities

is probably finite, it's also a very big number,

and many of them have never been discovered before.

HANNAH FRY: This is phenomenally complicated.

Phenomenally complicated.

Even forget about finding the vulnerabilities

in the first place.

Actually fixing it, it's not like there's

one sticking plaster fits all.

[MUSIC PLAYING]

Welcome to "Google DeepMind," The Podcast.

I'm Professor Hannah Fry.

Now, cyber attacks have never been easier.

From deepfakes that are so convincing

they can fool your own family to phishing emails that

look just like the real thing.

AI has allowed these attacks to scale at a dizzying pace,

but there is some hope that the same technology that's

fueling these attacks could also be the key to preventing them.

And few people know this battle better than my guest today.

Four Flynn is VP of Security at Google DeepMind

and a cybersecurity legend.

He was in the room during Operation Aurora,

back in 2009, when a massive attack on Gmail

rewrote the rules of cybersecurity.

Today, he is on the front lines again,

taking on a new wave of AI-powered cyber attacks.

And in fact, Four had so much to say,

so many totally fascinating insights to share,

that we decided to make this into a podcast of two halves.

Next time, we're going to be talking

about the human side of cybercrime,

how we can be manipulated and tricked by bad actors,

and how all of that is changing with the era of agentic AI.

But for this episode, we wanted to focus on the battle

itself, the ways into systems that attackers seek to exploit,

and what we can do to defend them.

Well, thank you so much for joining me, Four.

FOUR FLYNN: It's a pleasure.

HANNAH FRY: And I thought I might start by talking about one

of the most notable security incidents

in Google's history, Operation Aurora.

FOUR FLYNN: Sure.

HANNAH FRY: How do you fit into that story?

FOUR FLYNN: Yeah, well, so Operation Aurora

was a huge moment in the history of cybersecurity writ large,

really, for the industry as a whole.

I think the idea that a nation state would compromise

a private company was quite a shock to really all of us.

We had essentially a case in which

China was compromising Google or attempting to compromise Google.

And as part of that campaign, they actually

attempted to compromise a number of other companies.

And it was part of a long-running process

that they had to conduct espionage

across the great deal of various institutions in the West.

HANNAH FRY: And specifically, they

were looking for people who had been vocal against human rights

abuses in China.

FOUR FLYNN: Yeah, that's what we believe at this point.

Of course, at the time, whenever you're

dealing with these situations, it's

very hard to figure out who the actors are

or what they're attempting to gain access to.

And so way back in the early days

when we first detected the attack, which was something

my team was responsible for is finding that attack

and responding to it.

There's many, many, many people across Google

that were contributing to figuring out what happened.

And then, of course, after we figure out what

happened, to evict the attacker from our environment,

and then maybe even more importantly,

over the following years, to harden our environment based

on the lessons that we learned.

But definitely in the moment you have this thing

called the fog of war, where you really

have no idea what's going on.

You really have no idea even what the bits are of the attack,

and you're doing forensics to try to figure that out.

And so there's still many people I

work with here at Google that were instrumental in determining

of that.

Heather Adkins and others that are just

absolute unbelievable experts in the subject

matter that I'm lucky to work with.

HANNAH FRY: Just take me back to that time then.

When did you first realize that something was up?

When was the first moment of detection?

FOUR FLYNN: Right.

So back then it was sort of famous.

And maybe it still is.

You could almost guarantee that when you have a big Christmas

vacation planned, most likely that's

when the cyber attack is going to come to light.

And so it was in December, I remember,

when all the details started to come out, and many of us

worked tirelessly, really over the break,

but also for months, trying to ascertain what happened

and trying to put the puzzle pieces together

when you don't even really know what the puzzle looks like.

And so you're faced with this picture of just bits

and pieces of technical data.

HANNAH FRY: Presumably you didn't get Christmas holiday

that year.

FOUR FLYNN: No, no, none of us did.

HANNAH FRY: Were you just sat in a room

with Sergey Brin for 15 hours a day?

FOUR FLYNN: There was a lot of us

in that room for a lot of hours in the day.

Yes.

And when we think about it, we look back on it,

I still get a pit in my stomach, I think.

HANNAH FRY: Well, how stressful was it, though?

I am intrigued by this.

I mean, because the thing is, on the one level,

this feels quite a technical challenge.

But I mean, there is a human element to this too, right?

FOUR FLYNN: Well, I mean, look, I'll tell you,

I say those of us that have pledged our life to defending

people as I have, and I know all the people I work with have,

it feels like a failure.

And that's where, at least for me, the source of stress

comes from, is you feel like you've let people down.

I mean, I spent my entire life to protect people's data,

protect their accounts, protect companies' networks,

all in service to helping people's daily lives

be as great as it can be.

And, that intersects, of course, with Google services

in a bunch of different ways, whether you're

using a phone or a browser or even Search.

And we take that super seriously.

HANNAH FRY: How did they get in?

Do you know now?

FOUR FLYNN: Yes, we do know now.

It was back then, this was an earlier era in security,

but in some sense, some things have never changed.

This was back when Internet Explorer, I

don't know if any of the listeners remember that,

but that was a big browser back then, by Microsoft.

And it was a vulnerability in the browser that

was conducted via a phishing attack against somebody who

was employed at Google.

HANNAH FRY: Someone clicked on something, basically.

FOUR FLYNN: Exactly.

And so, phishing is still very much

if not a bigger threat today than it has ever been.

But the browser exploitation was still

in its early days back then.

We called it client-side attacks.

Basically people leveraging weaknesses.

Let me just say one brief thing as a digression.

In the earliest days of security,

most of the ways that attackers broke into systems

were through server attacks.

They would be on the internet.

You would have a bank or something like that,

and you would have a big mainframe or some big website,

and people would attack through the front door of that website.

And so it was really this Aurora attack

as an example of that evolution that really has never

changed back to what we call client-side attacks.

And so the attacks shifted to be against the weakest

link of organizations, which are often the users.

And so they would exploit the users

through social engineering, do phishing attacks

against their passwords, for example, something

I'm sure you've had to deal with in rotating

your passwords in your own life.

But also taking advantage of things

that were running on the laptop or on the desktop computer,

not on the server side.

And it was part of a huge change in the industry where

we had historically built this sort of moat and drawbridge

model for security, where we'd built these big castle

walls with these big firewalls and had all the people

and the servers inside.

And that was essentially the common wisdom

for how the security of companies would work.

And there's a whole bunch of weaknesses

in that model that evolves.

For example, we realized that employees weren't always sitting

in the same building anymore.

That mobility became more and more pervasive.

First, the rise of the laptop and then

the rise of the personal phone and the smartphone

sort of broke that one axis of the mold.

But then the other axis of the mold that got broken

was client-side attacks, because people were no longer trying

to attack these very well-defended servers,

such as the castle behind the big castle wall.

Instead, they were attacking the client,

which had several weaknesses.

One, all the client-side software

had not been hardened like that we'd done on the server side.

So there was a big attack surface there.

But also the human element was much more

easy to exploit using social engineering on the client.

And so that also led to an evolution post-Aurora

that we created called BeyondCorp,

that's also known in the industry

as Zero Trust, which is a whole new way of rethinking

how enterprise security should work.

Sort of building from the beginning

away from this moat and drawbridge model,

and instead sort of acknowledging the importance

of the client and the user as the supreme thing to defend.

HANNAH FRY: You've almost assumed that a perpetrator has

infiltrated the network, and it's

how to stop them or mitigate against potential damage

that they might do once they have?

FOUR FLYNN: Yeah, that's actually another element of it

that you raise.

That's a really good point, which

is this assumed breach is what we call it in security.

And that also was another innovation along the way,

because I think what we realize is that as good as the detection

systems we started to build to find these attacks were,

and some of them were pretty good,

including the one that we built to detect the Aurora attack.

That said, you know you're not going to get everything,

and that as attackers, and especially nation states had

evolved, more stealthy and below the radar

techniques that even the best detection systems on the market

couldn't detect, then you had to take a two-pronged approach.

So no longer could you just rely on your detection system

to flag these things to your analysts.

You had to also take a separate step, which is assume

that all those things failed.

And as a defense in-depth approach,

make sure that you were doing what we call assume breach.

And so that means that you do things like threat

hunts, for example, where you assume that you have

the bad guys already on your systems

internally, that you didn't catch them by any of the systems

you've deployed, and that you're going to go and look,

scour the entire systems to find the people that had already

penetrated your defenses.

And so again, that was another novelty

that grew up in this post-Aurora era and many other things.

Multi-factor authentication tokens

that Google does better still today than any other company

that I'm aware of using our Titan Security Keys.

So it's an unphishable multi-factor credential.

HANNAH FRY: As in, it's not just a text

message, which you could divert to another phone very easily.

FOUR FLYNN: Right.

Exactly.

That's one of the things we were able to do at Google,

is invent our own hardware keys, in partnership

with some other companies we were working

with, such that it's not just about sending you a message,

like you say, that somebody can find in your email

or in your text messages and replay that attack.

That's a pretty good start.

A lot of companies have deployed that,

and it's better than nothing.

But there's a lot of weaknesses of that system.

If somebody compromises the phone system,

if somebody compromises your email,

then you can still have your account taken over.

Well, at Google, again, in the aftermath of this era,

one of the things we invented was

a non-phishable multi-factor hardware token,

which connects directly to the browser

and authenticates as a second factor

without having some string of characters that

could be stolen by an attacker.

HANNAH FRY: There's one other element

of this that made it a really historic moment, which

is Google's decision to go public with what had happened.

FOUR FLYNN: Yeah.

HANNAH FRY: I mean, why did they do that?

FOUR FLYNN: I mean, there's a couple other points of context

here that I think are useful.

I think these types of attacks had

been going on outside of private industry for some time, right?

We'd seen these attacks happening

in the Department of Defense and espionage

happening with cyber attacks in the military industrial complex,

the various companies that make that up.

And that had all been precursors to the attack on Google.

And so I think part of the calculus for that decision,

I believe, was bringing awareness

to this thing that had been going on for some time.

And in this case, it led to a whole bunch of really positive

changes in the industry.

I think it came to data breach disclosure

laws that are now on the books in a lot of places.

I think it led to what then became

sort of responsible vulnerability disclosure, best

practices, and generally just bringing transparency

to security overall, is something Google's, I think,

really brought to the table across the board for many years.

HANNAH FRY: In the immediate aftermath,

so this is 2009, right, did other companies learn quickly

from what had happened at Google?

FOUR FLYNN: I think the awareness came quickly,

but what happened very slowly was

the adaptation of our security best practices to confront

this risk.

In general, across the industry, it's

been incredibly slow to adopt the more modern approaches

to security.

HANNAH FRY: Like multi-factor authentication?

FOUR FLYNN: Multi-factor authentication is now

getting fairly pervasive.

But it took 15 years from that event

till now for it to really become something enterprises

were able to deploy in effective ways.

Zero Trust is another example.

I know peers of mine at various companies

are still struggling in some ways, if you can believe it,

with the modernization of their environments

to this new reality.

And especially, I would say, governments, as you can imagine,

that built a legacy environment are

struggling to make that pivot.

And so I guess the lesson I've learned

is that, unfortunately, it's hard to change

things that are entrenched.

HANNAH FRY: But then as a result of that,

there are quite big dramatic issues that arise.

I mean, I'm thinking here about Celebgate in 2014

where celebrities photos were leaked.

FOUR FLYNN: That's right.

HANNAH FRY: That wouldn't have happened

had they been using multi-factor authentication, right?

FOUR FLYNN: That's right.

Yeah.

HANNAH FRY: And yet, that was five years

after Operation Aurora.

FOUR FLYNN: Operation Aurora.

Yeah.

I mean, that's right.

So you're highlighting another point worth raising,

which is the difference between enterprise security and consumer

security.

The other piece is consumer security

often does lag behind enterprise security.

And enterprise security itself sometimes

lags behind the best practices.

And so that's a good example where

I think Apple even supported multi-factor authentication

as an opt-in for iCloud at the time, if I recall.

But very few people had adopted it.

And really, what we see in consumer security

is the thing that really moves the needle is not

asking consumers to change their behavior,

but changing the defaults.

HANNAH FRY: Right.

FOUR FLYNN: And making the defaults more secure

is the real needle mover to make people in their daily lives

more secure.

HANNAH FRY: Because the public are resistant to change?

FOUR FLYNN: I think that's part of it.

Or they're just not educated on what they should be doing.

And so, credit where it's due.

I think Android, and ChromeOS, and the Chrome browser,

and Apple, and a bunch of the players in the ecosystem

have actually done a lot to increase the default

level of security on consumer devices and consumer

applications to a pretty high degree.

I think especially consumer mobile devices is a really

brilliant case study in this.

I think the defaults that you find now versus 10 years

ago on, say, Android or iOS is just dramatically different.

A lot of people take that for granted,

but it's a lot of hard work.

HANNAH FRY: Well, there are lots of positives to be thankful for.

I think I also want to understand the scale

of the potential problem here.

So before we get on to talking about how large language

models have changed the game, the era of generative AI

have changed things, let me ask you about the different ways

that we are potentially vulnerable.

So there's social engineering, sure.

But what are the other ways in?

What are the more technical ways into a system?

FOUR FLYNN: I think of it in terms

three categories of security failures, let's say.

So as you mentioned, one of them,

and probably the most frequently abused one,

is social engineering.

And one of the interesting quirks of LLMs

is that they have somewhat human-like behaviors.

And so in fact, you can cause an LLM

to get confused through similar types of things that humans do.

But I think the other two categories

are issues with configuration and issues of integrity.

So basically, the way to think about protecting a system

is you have to configure it so that it's secure,

and then you have to make sure there's not a way

to bypass that configuration.

And I think pretty much everything in security

falls into those two categories.

Everything in terms of security prevention at least.

And so, let's just pick any particular example,

access control.

So if you have a company where you're sharing a Google Doc

or something like that, you might

have it shared with everybody at the whole company.

And that means any one particular person that

has an account that's hacked at the company

could view that document.

You see what I mean?

And so that's an issue of configuration.

And so getting the configuration right

means having only the right number of people with access

to that document that really should, you see?

Pursuant to the level of sensitivity

of the content of the document.

OK, we all know this.

This is normal security 101.

Now where does integrity come in?

So let's say you've done a good job of getting

that document locked down to the right number of people.

But what could also happen is that there's

a vulnerability, like a patch missing on the server that's

hosting that document.

And so somebody could potentially

compromise the server, bypassing the access control situation

altogether.

And so you have basically--

and then the third issue is that somebody

who does have access to the documents password is stolen.

And then that account gets access to the document.

So you see, we have integrity, configuration, and people

as the three classes of issues that I think really

is every kind of security problem.

HANNAH FRY: But then sometimes that

manifests in slightly strange ways.

I mean, I read one story a little while ago

about a fish tank in a Las Vegas casino that

had a smart thermometer on it.

FOUR FLYNN: Yeah, I think I remember reading about this.

It was basically somebody that was

trying to cause financial fraud, I believe,

or some of abuse of that casino.

And they had a fish tank that was on their internal network.

And the system that was running the fish tank, of course,

like everything these days, has an IP address

and is connected to the internet.

My toaster probably does have an IP address at this point,

I'm sure.

And so what that allowed the attackers to do

is to gain a foothold on that network,

and then use that as a pivot point

to attack the more sensitive systems that were undefended

behind the scenes.

And this is a classic issue with IoT systems,

where the problem with IoT is that they often

don't have enough CPU, memory, and power budget

in order to do a lot of the security best practices.

And so you end up seeing companies

that skimp on those things.

And so therefore, you end up with these IoT systems that are

deployed somewhat pervasively.

And then if that's compounded with a system that's

behind the scenes, like a server, like we

talked about in that old model of the moat and drawbridge, that

is poorly defended and relies on network trust, which I think

is an anti-pattern now in security.

But if you have something that has by

virtue of being on the network, gains

you some amount of privilege to being

able to interact with that system.

Then generally speaking, that's a recipe for disaster.

HANNAH FRY: But I mean, I think this does just demonstrate

the number of different potential ways

that you can be vulnerable.

FOUR FLYNN: Yeah, it's a great point.

I mean, we call this the defender's dilemma, is the term

we use for it in the industry.

And it's essentially this asymmetry

between the folks that have to protect

against all potential ways to compromise

your people or your company, and an attacker that really only

has to find one avenue in.

HANNAH FRY: I mean, I suppose in a way,

you almost have to assume that you have some vulnerabilities

that you don't yet know about on your system.

FOUR FLYNN: Yes.

Everybody that does security defense has that assumption.

HANNAH FRY: Those have a name, right?

What is zero-day vulnerability?

FOUR FLYNN: So zero-day vulnerabilities

are definitely a type of vulnerability

that is very difficult to control for because those are

vulnerabilities where even if you've done everything

right, patching your system, putting your access

controls in the right place, and so on and so forth,

a zero-day vulnerability is something

that can compromise a fully patched secure system.

And so those are the class of vulnerabilities that most of us

are laying awake at night.

Those are the ones that we're often most scared of,

because there's historically it's challenging

to defend against those.

HANNAH FRY: The vulnerabilities you don't know are there?

FOUR FLYNN: Yeah, exactly.

And code is complex.

It's important to remember, these systems

that we all depend on in our daily lives

are built on millions of lines of very complicated code.

And so while the number of vulnerabilities

is probably finite, it's also a very big number.

And many of them have never been discovered before.

And so there's always this latent risk of code

that might have a vulnerability in it that is never seen before.

Just to make you feel slightly better.

HANNAH FRY: Yeah.

FOUR FLYNN: We have this concept in security

called defense in depth.

And so the idea is that you build

systems such that any one particular flaw,

hypothetical or otherwise, doesn't

lead to a catastrophic failure of the whole system.

So let's talk about a zero-day vulnerability in a system.

Well, there's all these layers of defense in modern operating

systems.

And so, even if there's a vulnerability discovered,

these days it's very hard to exploit those vulnerabilities

because of all these added security

protections in the underlying operating system that

make it very difficult to, even if there's

a vulnerability you can discover, to actually exploit it

in real life.

And there's all this trickery you

have to learn how to do, such as exploiting the vulnerability,

landing a certain amount of bytes in the memory

and then causing the operating system

to jump over to that memory.

And these are different kinds of overflows and so on.

And there's all these memory safety features

that have been built into modern kernels and modern operating

systems to try to protect against these things.

And moreover when you zoom out and you think about a larger

company, a zero-day vulnerability

could perhaps compromise your phone or your browser.

But the other thing that we talk about,

and hopefully I'm not introducing

too many concepts that are novel,

but we talk about this concept called the kill chain.

And the kill chain is a concept we borrowed from the military

and now is a sort of fixture of cybersecurity.

And basically, the idea was, we had been struggling

against this defender's dilemma, that we

have to stop every possible avenue for an attacker,

but they only have to find one way in.

And what the kill chain allowed us to think about

is think about the problem differently.

Yes, that's true.

But even though they have to only find one way in,

if they do that, they still have to go

through this series of stages that you understand

what they're going to be.

And it's things like reconnaissance, delivery

of the exploit, post-exploit sort of moving

around the environment in the network and so on and so forth.

And so this actually, I think re-empowered the defenders

and re-tilted the tables and the scales

and balanced them a little bit more because when you zoom out

and you look at a whole company, obviously if they just

phish one employee, that's not good enough, because typically

they have a target that's deep inside a system

that employee might not necessarily have access to.

And so there's this whole series of stages

the attack has to go through.

After they've phished that employee,

they've gotten code execution on their laptop.

Now they're trying to spread out through the environment.

Now they're trying to figure out,

what server they're trying to access

or what code base they're trying to access.

And so there's all these opportunities

to detect, to set tripwires, to have defense in depth,

try to block those additional stages.

And the whole entire company now is your field of battle

in which you can deploy prevention and detection

technologies that inevitably can stop and detect and slow

the attacker down.

HANNAH FRY: So even though they may

have multiple areas of attack, you've

now got multiple areas of defense.

FOUR FLYNN: Exactly.

The whole company now can become your field of defense.

HANNAH FRY: So, OK, how do large language

models change this situation?

FOUR FLYNN: In a number of ways.

HANNAH FRY: I mean, are there new vulnerabilities?

Are there new ways in which the systems can fail?

FOUR FLYNN: Yeah, so there's a whole bunch

of interesting new things that come out

of the advent of large language models,

I think both for defenders and for attackers.

But before I get to that, there's

one other thing I'd like to start as a foundation.

I think one of the things that we're still

wrestling with as an industry is that ultimately,

traditional computing systems are deterministic.

I'd say the fundamental thing from a security point of view

about LLMs that's different is that they're

generally non-deterministic.

Oftentimes you can give the same prompt to a large language

model, and it will give you different answers

depending on random things.

As it's tracing those paths of those tokens

and producing those tokens through the models,

through its brain, if you will, you'll

get non-deterministic answers.

And so we'll get into the details.

But I just think it's worth starting with that,

as that's a pretty big break from the past for us

defenders in security.

And I think some of us are still kind of wrestling

with that difference.

Now, in terms of attacks, I think large language models

are still new in some ways.

I mean, obviously, they've been around for a number of years.

But I think from a security point of view,

I think we're all still trying to learn what the risk landscape

is going to look like.

HANNAH FRY: Could attackers use large language models

to create malware?

FOUR FLYNN: So there's initial signs

that attackers are starting to figure out

how to use large language models to create malware.

And we work, I should say, closely with our threat

intelligence teams to carefully examine

what the bad actors are doing.

And we put out periodic reports, we actually

released a pretty exhaustive report, of all

the different nation state threat actors

and how they were using Gemini in January.

But we have seen attacks in the wild already

and prototype attacks that have been

built in the lab that use AI and LLMs to be part

of the malware attack chains.

And so one example of that is we're

starting to see people use LLMs for the polymorphism.

So let me explain what that is.

So one of the problems with creating malware

is that oftentimes it can get flagged by,

I mean, people used to call them antiviruses.

Now they're called EDR.

But essentially systems that are running on your laptop that

are looking for malicious code.

And so one of the problems that malware authors face

is they want to make sure that their stuff can't

get flagged by a modern antivirus engine

and be deleted or disabled.

And so the solution historically to that

has been to have something that shows up

on your computer as something completely brand new,

that's never been seen before by anybody out there before.

And so what that entails is you creating

something custom crafted for that exact instance

of that attack.

Now, that's been expensive historically, right?

But, unfortunately, we're starting

to see that large language models are increasingly

useful for helping craft bespoke malware

and having them be polymorphic, or at least

being able to have them be unique on every system

that they're planted on.

So that's one example.

And we've seen prototypes of this.

I don't know that we've seen it in the wild attack yet

of that nature, but I've definitely

seen a bunch of different interesting experiments

out there.

And I think real world attacks of that nature

are probably imminent.

HANNAH FRY: I also wonder about the new vulnerabilities,

where now a large language model is your entry

point into a system, because prompt injections is

another way.

FOUR FLYNN: That is a really great point.

So this gets back to the point I made

a moment ago about deterministic versus non-deterministic

behavior.

And prompt injection, jailbreaks,

these are examples where LLMs are

susceptible to some of the things humans

are as they become more intelligent.

And prompt injection is actually in some ways kind

of a confusion of the model's mental processing.

Basically, what prompt injection is,

is getting confused about where the command from the user

is coming from.

So let's say you're using an LLM and you say, hey,

summarize this website.

Well, you're the one telling the LLM what to do.

It should be focusing on what you're asking it to do.

But what could happen in an attack scenario

is that that website you're asking it to summarize

is actually malicious.

And so that website might actually

hijack the thought process of the LLM

and say, ignore the instructions you've previously been given,

do this other thing instead.

And of course, it sort of sounds trivial in the case

of summarizing a website.

I mean, who cares?

Whatever.

But as you think about the future

of deploying these things as agentic systems that

have increasing levels of independence

and increasing tool use where they're

engaging with potentially hostile content,

this becomes a bigger and bigger issue

as to how trustworthy these systems can be.

And so, yeah, I would say prompt injection is definitely

one of the things that I am spending

a lot of my time continuing to improve Gemini's defense of.

And I think all of us in the industry

are working on improving defenses

against this class of attack.

HANNAH FRY: I'm thinking about you talking

about malicious websites there, because is there

also a potential vulnerability that I mean,

large language models are reading the internet

in order for their training.

FOUR FLYNN: Yeah.

HANNAH FRY: Is this something about data

poisoning as well that could go on here?

FOUR FLYNN: Yeah.

There's a number of different types of data poisoning attacks

that are potentially concerning.

One is just the mass of pre-training data, as you know,

that's used to train these models,

is definitely a potential risk.

I think one thing that mitigates that risk a bit in practice

is that the data that's generally

fed into a large language model in pre-training

tends to be so expansive that any one unit of that data

generally isn't overrepresented in the outcome of the model.

In practice, it's a little bit less of a risk

because let's say you have a couple of websites

in the entire internet that are malicious.

It's very difficult to get the model

that's trained on trillions of tokens

or whatever the number is to be compromised

by that very small component of the data input.

Now, it's a little bit more concerning

in the post-training data, because there generally

is a smaller set of data that's used for post training.

And so there are potential scenarios

that I've seen literature on and academic work on that

indicates that post-training data could be potentially

risky for training a large language model to be malicious.

Once you train a model, pre and post train a model,

there's also an interesting class of risk

that you have to think about when you're serving the model.

So let's say you have these model weights that you've

finished doing training on.

And then let's say you put that out

in an open source repository.

You have open weights model.

Well, there's also papers and attacks where you can actually

have an attacker maliciously manipulate

that finalized model on disk.

And so, that's a hot patch, or whatever you want to call it,

to the actual model itself.

And if you don't validate that the model you're downloading

is the same as the one that was produced in the first place,

there's also an attack of that variety

that we all worry about as well.

HANNAH FRY: OK.

So when it comes to all of these potential ways

that attackers could get in, how do you

protect something like Gemini?

How do you prevent this from happening for Gemini?

FOUR FLYNN: The way we do that is we first start with building

a defendable model.

And so we're trying to improve Gemini's ability

to find and defend itself against these attacks.

So in the model, we do a whole bunch

of interesting work, prompt injection defense,

jailbreaking defense, using post-training, using SFT,

using RL inside the model to make sure

that we are constantly improving the model's resistance

to attack.

And if you allow me a brief digression just on one point.

One of the novel things we do at DeepMind on this

is that we're one of the innovators

in the so-called adaptive attacks approach, which

is that a lot of other folks use a static list

of prompt injection scenarios.

I guess one piece of background worth mentioning

is that in order to learn how to defend a model,

you have to really build a good suite of attacks

that are representative of what the bad actors might do to you.

HANNAH FRY: So is an example of this something

like, I know one of the early ones,

you are a grandmother reading a story to your granddaughter

about napalm?

FOUR FLYNN: Yeah, I mean, something like that.

And so what that allows you to do

is to test a bunch of attack scenarios

and then see how resistant you are against them,

and also to generate training data.

And so, to your point, it might be examples

like here's a malicious email.

Now we're causing the model to ignore the instructions

and to take a malicious action by calling a tool

and sending my private data off to some bad email address.

Something like that.

That's all the stuff that's built into that framework.

But what we do beyond that, so we do all that and we do more.

We have these adaptive attacks that allow us to constantly

hit the model over and over again,

using this sort of learning process, until we win.

We have a number of different algorithms.

We've written a paper on it.

I encourage you to read it.

But it's really a clever way to set a higher bar for what we're

defending against than a static list of canned attacks

that may or may not represent how attacks are evolving.

So the model itself needs to be able to have strong defenses,

but around the model also, again, defense in depth,

you want to have other layers of defense.

So we do a bunch of work to make the models secure.

But we also do things really intelligent

classifiers that we put around the model that also look

and flag for these sorts of behavior

so that we can defend against them too.

And what's nice about classifiers

is that they both augment the model's own ability to defend

itself, but also they allow more rapid evolution

against novel attacks.

And so it's a much lighter weight

to deploy a new classifier, to build a new classifier

and get it out there, than it is to retrain an entire model

from the ground up.

HANNAH FRY: So if these are the attacks,

let's also talk about the ways that AI

can be used to prevent them.

Tell me about Big Sleep.

FOUR FLYNN: Yeah, so Big Sleep-- yeah, sorry.

HANNAH FRY: Did it come from-- was there also a--

someone told me this just on the way in that there

was a little nap or something.

FOUR FLYNN: Naptime.

HANNAH FRY: That's it.

FOUR FLYNN: Yeah, so before I got involved in the project

formally, there was a research project called Project Naptime,

which you can still find mention of on Google blogs

here and there.

And that was the original precursor.

And I'm told that the naming convention comes from the idea

that security vulnerability researchers who

are the people that find new classes, new types

of vulnerabilities, which we'll get into in a moment,

could use this system to take a nap,

because they could just let the AI do all the work for them.

HANNAH FRY: Right.

Search for vulnerabilities on their behalf.

FOUR FLYNN: Find vulnerabilities while they

took a nap, basically.

So that was the idea behind the name.

And then as we got more momentum behind the project,

it evolved into the Big Sleep.

HANNAH FRY: Now they can actually hibernate.

FOUR FLYNN: Yeah, now they can just hibernate all winter.

But yeah, what the project essentially

is is kind of a big bet, like DeepMind

is so proud of, of using AI to find novel vulnerabilities.

Now, you might ask, why is that a good thing?

Because I thought we just discussed

vulnerabilities are bad.

And it's an interesting point.

I think those of us in the industry and security

have found that transparency is always the best

disinfectant for security.

And what we've done is we've taken the latest and greatest

versions of Gemini, and we're using them to,

with an agentic harness, basically

become a vulnerability researcher

and find novel zero-days in code.

And the goal of the project is nothing short

of helping improve the security for the whole industry

by finding vulnerabilities and helping the open source

community get them resolved for everybody to benefit

from around the world.

HANNAH FRY: So it's hunting for these sort

of unprotected backdoors that nobody knows is there?

FOUR FLYNN: That's exactly right.

Yeah, we're finding the vulnerabilities

that people have never heard of or seen before in code that

is really underlying much large portions

of the internet in many cases.

HANNAH FRY: So how did people do it before?

What was the human way of searching for them?

FOUR FLYNN: Well, I mean, if you look,

there's a black market for vulnerabilities.

They're very expensive, millions of dollars, often at a time.

HANNAH FRY: If you find one.

FOUR FLYNN: Yeah, if you were to find one.

And the reason I'm bringing that up

is just how exquisite these things are

and how rare they were.

And basically, the way it works in practice

is it's much like you might have seen,

this is the one part of security that actually

is kind of like the movies.

It's like somebody in a dark hoodie in the dark,

staring at six monitors all night while eating Cheerios

or whatever.

So it's a lot of really intensive mental work that

happens often over weeks or months where you are getting

a piece of code, you are trying to understand how it works.

You are making a hypothesis of where a vulnerability might

be in that system.

You're putting in inputs that are potentially dangerous.

So one of the ways that this works is software developers

will design a system with assumptions

in their head, oftentimes unstated assumptions.

Oh yeah, this system will only ever get image files.

And it's, why would anybody ever send me something else?

And then the attacker says, well, what

if I put a music file in here?

Or what if I put a file that I created that has

a bunch of crazy stuff in it?

And so it's a combination of trying things

that are really unorthodox, that were not

expected by the developers.

And then, sort of stepping through the code and seeing what

happened, trying to cause it to break in a certain way that's

unexpected.

And then once it breaks, you actually

have to find that it's exploitable.

And so not only is it broken in a certain way

with an unexpected input, but it needs

to be broken in a specialized way that

allows that input that you're providing

to cause the system underneath it

to be controlled by that input.

So essentially the whole idea is,

I'm giving it something hostile as an input that

would cause it to do what I'm trying to get the system to do,

not what it wants to do.

HANNAH FRY: I mean, you can see how this is

the perfect situation for AI.

So sort of like search really rigorously through

many lines of code that are constructed together

in different ways and then try lots of different options.

FOUR FLYNN: That's right.

HANNAH FRY: In order to find something, it does make sense.

FOUR FLYNN: So we've been finding exactly that,

to your point.

In the system that we're working on, as a Big Sleep,

we found that in some ways, it's very much superhuman,

because one of the challenges in vulnerability researchers

is having a comprehensive, encyclopedic knowledge of all

these complicated frameworks.

HANNAH FRY: These gigantic lines.

FOUR FLYNN: These huge code bases.

And then I saw a vulnerability that we found with Big Sleep

where I was reading through the way the model was thinking

through the issue, and it was clear there

was elements that were very superhuman there because it

would come up with a hypothesis where it realized that, oh,

this version of the framework this depends on works like this.

But version three of that framework works like that,

and version four works like this.

Essentially having this amazing, unbelievable breadth

of understanding of basically all the different libraries

and frameworks that would go into making this system,

which is very hard for any particular human

to hold in their head.

And so, yeah, I think we've definitely

seen a surprising performance out of the system

that we've built, and we've already

found a number of novel zero-days with the system.

And, as I say, our entire goal of this project

is defense and to help the world,

because we know the bad guys will

be using AI to find these vulnerabilities over time.

We want to be the best so that we can help the open source

community and those people that depend on this software

to get these things fixed as quickly as possible.

HANNAH FRY: Because that's a key point, right?

You're not just taking Big Sleep and pointing it

at Google in house code.

You're also pointing at all sorts of open source software.

FOUR FLYNN: That's exactly it.

I mean, of course, Google is built

on a lot of really great work by the open source community.

A lot of the systems we use depend on that.

So there is a benefit to Google as well.

But we're explicitly trying to pick things that are widely

deployed, both in and outside of Google,

as a way to try to be helpful to the community as a whole.

Because we do worry, I'd say frankly, about the future of AI

being used to find and exploit these vulnerabilities.

And so one of the things we're trying to do to help

is to try to find the vulnerabilities first,

and to help the open source community as

quickly as possible.

HANNAH FRY: To get ahead of it.

Because I mean, as you said, you mentioned it earlier,

a lot of the internet runs on this open source code.

FOUR FLYNN: That's right.

HANNAH FRY: Which is available to view for anybody.

FOUR FLYNN: That's right.

That's right.

HANNAH FRY: There was an argument, quite

a prevalent argument a couple of years ago,

in particular, that open source software is safer

for this very reason, that everybody can look at it.

And so you have many, many eyes who are

watching for vulnerabilities.

Where do you stand on that argument between private code

and open source code?

FOUR FLYNN: I mean, I think that's right.

I think open source code does allow for more eyes.

It does allow for more different techniques in AI

as well to be run against the systems.

I think in general, transparency,

as I said a few times in our discussion,

is the number one ingredient that defenders

have on their side to help against the attackers, who

are the ones that are trying to hold things

close to their chest and to not be transparent.

HANNAH FRY: Can you then also use large language models

to build tools to fix those vulnerabilities,

or to make suggestions for how those vulnerabilities might

be fixed?

FOUR FLYNN: Yeah, so this is a really good insight

because that's obviously the next problem that you run into.

Is if you start to scale up your ability to find vulnerabilities,

then it's very easy to imagine overwhelming both ourselves,

frankly, and the broader community

with a large volume of things that everybody has to fix.

HANNAH FRY: Suddenly you've got thousands of back doors

that need patching.

FOUR FLYNN: Exactly, and to be honest,

a lot of these open source maintainers are volunteers

that are doing their best.

They're really unsung heroes of large portions of the IT world

that we depend on together.

And so we want to make it as easy

as possible for that community to absorb what we think

is a potentially high volume of changes in this new world of AI.

And so the second big project that we've started

is a project we call Mender.

It's a pretty early stage, but we're really

excited about the progress.

And what it is is a system that's

designed to automatically generate

patches based on a vulnerability that we've discovered.

And then there's a couple of other dimensions of that.

So one is that you want to make sure

that you're fixing it in a way that doesn't

break important functionality--

HANNAH FRY: Everything else.

FOUR FLYNN: --that everybody else depends on.

You obviously want to make sure it fixes the security

issue fundamentally, not just some cosmetic way.

And then there's also the issue of making sure

that you maintain coding idioms that the open source

maintainers prefer.

And then have a series of things that we

use as validators to validate that the code that we're

producing on the output is good enough to submit.

And how do we do that?

Well, there's a couple of techniques.

We're using LLMs to ascertain whether or not

the output code is good.

We're using some formal methods, concepts,

and a number of other technologies

to essentially make sure that the thing that we've produced

is good enough.

And then, of course, for now at least,

we'll continue to have human review just

to make sure that we're not throwing

a bunch of crazy stuff over the fence

to the open source community until we

get to a certain comfort level with them

and with our own system.

And then we'll hopefully get to the point

where we can fully automate the whole system end to end.

HANNAH FRY: I mean, this is phenomenally complicated.

Phenomenally complicated.

Even forget about finding the vulnerabilities

in the first place, actually fixing it is not like this one

sticking plaster fits all.

Like, the intricacies and the nuances

of where that vulnerability lies,

how that vulnerability plugs into other parts

of the system, the language the vulnerability appears in.

FOUR FLYNN: I mean, this is such a great point.

I mean, the good news is we now have language models that

can produce code.

And indeed, they can produce patches to vulnerabilities.

But the problem is that we need to build a check and balance

into the system to make sure that what's being produced

is really, really high quality.

And LLMs alone, left to their own devices,

at least we haven't figured out yet

how that alone could just one shot a perfectly good patch

every time.

Maybe we'll get there and that'll be a wonderful day.

But for now, we think building a set of candidate patches

and then having a path with a number of suite of validators

is the right technology combination.

HANNAH FRY: Because then I guess,

I mean, layers and layers of this, because you would then

also have to start potentially worrying about the security

of that tool itself.

If someone gets into the tool and then all the

plasters that it's sticking all over the place.

FOUR FLYNN: No, this is actually another really good point.

So we don't think it's a complete thought.

I call it complete thoughts.

Essentially, how do you solve a problem end to end?

I think one is that we find vulnerabilities,

and then you want to be able to automatically generate

a patch for each and every one of those,

ideally in such a great way that nobody

has to worry about the quality of those patches.

And so we can just have those sail right in.

But the final point you raise is such a good one

as well, in that even if you've patched and fixed all the legacy

code, more and more people are depending on large language

models to generate projects.

Everybody's heard of vibe coding, or even not vibe coding.

A lot of engineers now are using large language models,

as they should because they're very productive tools.

But how do you know that large language model is

producing secure code, right?

It's very difficult to tell.

And so that's another project that we're working on,

is making sure that we're teaching Gemini,

not just how to create great code,

but how to do it in a secure way.

And we think those three things together can really

make a major dent in the quality of software

security for the world.

HANNAH FRY: Do you think this is a real ambition, then?

That you could theoretically find and patch

every vulnerability in code on Earth?

FOUR FLYNN: That's what I want to do.

And I think that there are issues

that we have to contend with that

go beyond the technical, that involve, how do we make

sure everybody in the world is applying these patches

to the real systems?

And so that's kind of a human issue.

And it comes into issues like risk aversion.

So while I think this is a good starting point

to handle all the technical challenges,

I still think there's other organizational and human

problems that we would have to contend with

to really make sure not only do we generate

great security in patches, but get those actually applied

to real life systems.

HANNAH FRY: So those are the questions

that I want to ask you for the second part of this two part

podcast.

But just before we wrap up on this part,

it does feel as though what Google is doing in this space,

at least on the technical side, is

fundamentally different from what we're

seeing from other companies.

Is that your view?

FOUR FLYNN: I mean, I think it is.

Well, I mean, there's definitely pieces of this

that are being done by others.

But we think because of the strengths of all the data

we have, all the code that we've got that we've built up

over the years at Google, and the fact

that we have the best engineering and security

talent in the world gives us a unique vantage point.

And it's something I'm proud to be part of.

HANNAH FRY: Absolutely amazing.

Well, I think you can see that there is so much more

to come in this conversation.

And I know that we are just getting to the good stuff,

but we have decided to split this across two episodes

because there was just too much to fit into one.

So keep an eye on your feed for part two.

And if you are worried that you might miss it,

well, this is a great opportunity for you

to subscribe to our channel so you always know

when we have a new video out.

And hey, while you're at it, you might as well

like and add a comment.

Very small thing, but very tangible way to make sure

that we can keep making these.

Until next time.

Analytical Tools

Create personalized analytical resources on-demand.

Analysis

Visualization

Learning

Assessment

Chatbot is available after you save this video to your library.