Social engineering, malware, and the future of cybersecurity in AI (pt.1)
Loading YouTube video...
Video ID:1gO2bC5xLlo
Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.
Transcript
HANNAH FRY: Were you just sat in a room
with Sergey Brin for 15 hours a day?
FOUR FLYNN: There was a lot of us
in that room for a lot of hours in a day.
When we think about it, we look back on it,
I still get a pit in my stomach.
These systems that we all depend on in our daily lives
are built on millions of lines of very complicated code.
And so while the number of vulnerabilities
is probably finite, it's also a very big number,
and many of them have never been discovered before.
HANNAH FRY: This is phenomenally complicated.
Phenomenally complicated.
Even forget about finding the vulnerabilities
in the first place.
Actually fixing it, it's not like there's
one sticking plaster fits all.
[MUSIC PLAYING]
Welcome to "Google DeepMind," The Podcast.
I'm Professor Hannah Fry.
Now, cyber attacks have never been easier.
From deepfakes that are so convincing
they can fool your own family to phishing emails that
look just like the real thing.
AI has allowed these attacks to scale at a dizzying pace,
but there is some hope that the same technology that's
fueling these attacks could also be the key to preventing them.
And few people know this battle better than my guest today.
Four Flynn is VP of Security at Google DeepMind
and a cybersecurity legend.
He was in the room during Operation Aurora,
back in 2009, when a massive attack on Gmail
rewrote the rules of cybersecurity.
Today, he is on the front lines again,
taking on a new wave of AI-powered cyber attacks.
And in fact, Four had so much to say,
so many totally fascinating insights to share,
that we decided to make this into a podcast of two halves.
Next time, we're going to be talking
about the human side of cybercrime,
how we can be manipulated and tricked by bad actors,
and how all of that is changing with the era of agentic AI.
But for this episode, we wanted to focus on the battle
itself, the ways into systems that attackers seek to exploit,
and what we can do to defend them.
Well, thank you so much for joining me, Four.
FOUR FLYNN: It's a pleasure.
HANNAH FRY: And I thought I might start by talking about one
of the most notable security incidents
in Google's history, Operation Aurora.
FOUR FLYNN: Sure.
HANNAH FRY: How do you fit into that story?
FOUR FLYNN: Yeah, well, so Operation Aurora
was a huge moment in the history of cybersecurity writ large,
really, for the industry as a whole.
I think the idea that a nation state would compromise
a private company was quite a shock to really all of us.
We had essentially a case in which
China was compromising Google or attempting to compromise Google.
And as part of that campaign, they actually
attempted to compromise a number of other companies.
And it was part of a long-running process
that they had to conduct espionage
across the great deal of various institutions in the West.
HANNAH FRY: And specifically, they
were looking for people who had been vocal against human rights
abuses in China.
FOUR FLYNN: Yeah, that's what we believe at this point.
Of course, at the time, whenever you're
dealing with these situations, it's
very hard to figure out who the actors are
or what they're attempting to gain access to.
And so way back in the early days
when we first detected the attack, which was something
my team was responsible for is finding that attack
and responding to it.
There's many, many, many people across Google
that were contributing to figuring out what happened.
And then, of course, after we figure out what
happened, to evict the attacker from our environment,
and then maybe even more importantly,
over the following years, to harden our environment based
on the lessons that we learned.
But definitely in the moment you have this thing
called the fog of war, where you really
have no idea what's going on.
You really have no idea even what the bits are of the attack,
and you're doing forensics to try to figure that out.
And so there's still many people I
work with here at Google that were instrumental in determining
of that.
Heather Adkins and others that are just
absolute unbelievable experts in the subject
matter that I'm lucky to work with.
HANNAH FRY: Just take me back to that time then.
When did you first realize that something was up?
When was the first moment of detection?
FOUR FLYNN: Right.
So back then it was sort of famous.
And maybe it still is.
You could almost guarantee that when you have a big Christmas
vacation planned, most likely that's
when the cyber attack is going to come to light.
And so it was in December, I remember,
when all the details started to come out, and many of us
worked tirelessly, really over the break,
but also for months, trying to ascertain what happened
and trying to put the puzzle pieces together
when you don't even really know what the puzzle looks like.
And so you're faced with this picture of just bits
and pieces of technical data.
HANNAH FRY: Presumably you didn't get Christmas holiday
that year.
FOUR FLYNN: No, no, none of us did.
HANNAH FRY: Were you just sat in a room
with Sergey Brin for 15 hours a day?
FOUR FLYNN: There was a lot of us
in that room for a lot of hours in the day.
Yes.
And when we think about it, we look back on it,
I still get a pit in my stomach, I think.
HANNAH FRY: Well, how stressful was it, though?
I am intrigued by this.
I mean, because the thing is, on the one level,
this feels quite a technical challenge.
But I mean, there is a human element to this too, right?
FOUR FLYNN: Well, I mean, look, I'll tell you,
I say those of us that have pledged our life to defending
people as I have, and I know all the people I work with have,
it feels like a failure.
And that's where, at least for me, the source of stress
comes from, is you feel like you've let people down.
I mean, I spent my entire life to protect people's data,
protect their accounts, protect companies' networks,
all in service to helping people's daily lives
be as great as it can be.
And, that intersects, of course, with Google services
in a bunch of different ways, whether you're
using a phone or a browser or even Search.
And we take that super seriously.
HANNAH FRY: How did they get in?
Do you know now?
FOUR FLYNN: Yes, we do know now.
It was back then, this was an earlier era in security,
but in some sense, some things have never changed.
This was back when Internet Explorer, I
don't know if any of the listeners remember that,
but that was a big browser back then, by Microsoft.
And it was a vulnerability in the browser that
was conducted via a phishing attack against somebody who
was employed at Google.
HANNAH FRY: Someone clicked on something, basically.
FOUR FLYNN: Exactly.
And so, phishing is still very much
if not a bigger threat today than it has ever been.
But the browser exploitation was still
in its early days back then.
We called it client-side attacks.
Basically people leveraging weaknesses.
Let me just say one brief thing as a digression.
In the earliest days of security,
most of the ways that attackers broke into systems
were through server attacks.
They would be on the internet.
You would have a bank or something like that,
and you would have a big mainframe or some big website,
and people would attack through the front door of that website.
And so it was really this Aurora attack
as an example of that evolution that really has never
changed back to what we call client-side attacks.
And so the attacks shifted to be against the weakest
link of organizations, which are often the users.
And so they would exploit the users
through social engineering, do phishing attacks
against their passwords, for example, something
I'm sure you've had to deal with in rotating
your passwords in your own life.
But also taking advantage of things
that were running on the laptop or on the desktop computer,
not on the server side.
And it was part of a huge change in the industry where
we had historically built this sort of moat and drawbridge
model for security, where we'd built these big castle
walls with these big firewalls and had all the people
and the servers inside.
And that was essentially the common wisdom
for how the security of companies would work.
And there's a whole bunch of weaknesses
in that model that evolves.
For example, we realized that employees weren't always sitting
in the same building anymore.
That mobility became more and more pervasive.
First, the rise of the laptop and then
the rise of the personal phone and the smartphone
sort of broke that one axis of the mold.
But then the other axis of the mold that got broken
was client-side attacks, because people were no longer trying
to attack these very well-defended servers,
such as the castle behind the big castle wall.
Instead, they were attacking the client,
which had several weaknesses.
One, all the client-side software
had not been hardened like that we'd done on the server side.
So there was a big attack surface there.
But also the human element was much more
easy to exploit using social engineering on the client.
And so that also led to an evolution post-Aurora
that we created called BeyondCorp,
that's also known in the industry
as Zero Trust, which is a whole new way of rethinking
how enterprise security should work.
Sort of building from the beginning
away from this moat and drawbridge model,
and instead sort of acknowledging the importance
of the client and the user as the supreme thing to defend.
HANNAH FRY: You've almost assumed that a perpetrator has
infiltrated the network, and it's
how to stop them or mitigate against potential damage
that they might do once they have?
FOUR FLYNN: Yeah, that's actually another element of it
that you raise.
That's a really good point, which
is this assumed breach is what we call it in security.
And that also was another innovation along the way,
because I think what we realize is that as good as the detection
systems we started to build to find these attacks were,
and some of them were pretty good,
including the one that we built to detect the Aurora attack.
That said, you know you're not going to get everything,
and that as attackers, and especially nation states had
evolved, more stealthy and below the radar
techniques that even the best detection systems on the market
couldn't detect, then you had to take a two-pronged approach.
So no longer could you just rely on your detection system
to flag these things to your analysts.
You had to also take a separate step, which is assume
that all those things failed.
And as a defense in-depth approach,
make sure that you were doing what we call assume breach.
And so that means that you do things like threat
hunts, for example, where you assume that you have
the bad guys already on your systems
internally, that you didn't catch them by any of the systems
you've deployed, and that you're going to go and look,
scour the entire systems to find the people that had already
penetrated your defenses.
And so again, that was another novelty
that grew up in this post-Aurora era and many other things.
Multi-factor authentication tokens
that Google does better still today than any other company
that I'm aware of using our Titan Security Keys.
So it's an unphishable multi-factor credential.
HANNAH FRY: As in, it's not just a text
message, which you could divert to another phone very easily.
FOUR FLYNN: Right.
Exactly.
That's one of the things we were able to do at Google,
is invent our own hardware keys, in partnership
with some other companies we were working
with, such that it's not just about sending you a message,
like you say, that somebody can find in your email
or in your text messages and replay that attack.
That's a pretty good start.
A lot of companies have deployed that,
and it's better than nothing.
But there's a lot of weaknesses of that system.
If somebody compromises the phone system,
if somebody compromises your email,
then you can still have your account taken over.
Well, at Google, again, in the aftermath of this era,
one of the things we invented was
a non-phishable multi-factor hardware token,
which connects directly to the browser
and authenticates as a second factor
without having some string of characters that
could be stolen by an attacker.
HANNAH FRY: There's one other element
of this that made it a really historic moment, which
is Google's decision to go public with what had happened.
FOUR FLYNN: Yeah.
HANNAH FRY: I mean, why did they do that?
FOUR FLYNN: I mean, there's a couple other points of context
here that I think are useful.
I think these types of attacks had
been going on outside of private industry for some time, right?
We'd seen these attacks happening
in the Department of Defense and espionage
happening with cyber attacks in the military industrial complex,
the various companies that make that up.
And that had all been precursors to the attack on Google.
And so I think part of the calculus for that decision,
I believe, was bringing awareness
to this thing that had been going on for some time.
And in this case, it led to a whole bunch of really positive
changes in the industry.
I think it came to data breach disclosure
laws that are now on the books in a lot of places.
I think it led to what then became
sort of responsible vulnerability disclosure, best
practices, and generally just bringing transparency
to security overall, is something Google's, I think,
really brought to the table across the board for many years.
HANNAH FRY: In the immediate aftermath,
so this is 2009, right, did other companies learn quickly
from what had happened at Google?
FOUR FLYNN: I think the awareness came quickly,
but what happened very slowly was
the adaptation of our security best practices to confront
this risk.
In general, across the industry, it's
been incredibly slow to adopt the more modern approaches
to security.
HANNAH FRY: Like multi-factor authentication?
FOUR FLYNN: Multi-factor authentication is now
getting fairly pervasive.
But it took 15 years from that event
till now for it to really become something enterprises
were able to deploy in effective ways.
Zero Trust is another example.
I know peers of mine at various companies
are still struggling in some ways, if you can believe it,
with the modernization of their environments
to this new reality.
And especially, I would say, governments, as you can imagine,
that built a legacy environment are
struggling to make that pivot.
And so I guess the lesson I've learned
is that, unfortunately, it's hard to change
things that are entrenched.
HANNAH FRY: But then as a result of that,
there are quite big dramatic issues that arise.
I mean, I'm thinking here about Celebgate in 2014
where celebrities photos were leaked.
FOUR FLYNN: That's right.
HANNAH FRY: That wouldn't have happened
had they been using multi-factor authentication, right?
FOUR FLYNN: That's right.
Yeah.
HANNAH FRY: And yet, that was five years
after Operation Aurora.
FOUR FLYNN: Operation Aurora.
Yeah.
I mean, that's right.
So you're highlighting another point worth raising,
which is the difference between enterprise security and consumer
security.
The other piece is consumer security
often does lag behind enterprise security.
And enterprise security itself sometimes
lags behind the best practices.
And so that's a good example where
I think Apple even supported multi-factor authentication
as an opt-in for iCloud at the time, if I recall.
But very few people had adopted it.
And really, what we see in consumer security
is the thing that really moves the needle is not
asking consumers to change their behavior,
but changing the defaults.
HANNAH FRY: Right.
FOUR FLYNN: And making the defaults more secure
is the real needle mover to make people in their daily lives
more secure.
HANNAH FRY: Because the public are resistant to change?
FOUR FLYNN: I think that's part of it.
Or they're just not educated on what they should be doing.
And so, credit where it's due.
I think Android, and ChromeOS, and the Chrome browser,
and Apple, and a bunch of the players in the ecosystem
have actually done a lot to increase the default
level of security on consumer devices and consumer
applications to a pretty high degree.
I think especially consumer mobile devices is a really
brilliant case study in this.
I think the defaults that you find now versus 10 years
ago on, say, Android or iOS is just dramatically different.
A lot of people take that for granted,
but it's a lot of hard work.
HANNAH FRY: Well, there are lots of positives to be thankful for.
I think I also want to understand the scale
of the potential problem here.
So before we get on to talking about how large language
models have changed the game, the era of generative AI
have changed things, let me ask you about the different ways
that we are potentially vulnerable.
So there's social engineering, sure.
But what are the other ways in?
What are the more technical ways into a system?
FOUR FLYNN: I think of it in terms
three categories of security failures, let's say.
So as you mentioned, one of them,
and probably the most frequently abused one,
is social engineering.
And one of the interesting quirks of LLMs
is that they have somewhat human-like behaviors.
And so in fact, you can cause an LLM
to get confused through similar types of things that humans do.
But I think the other two categories
are issues with configuration and issues of integrity.
So basically, the way to think about protecting a system
is you have to configure it so that it's secure,
and then you have to make sure there's not a way
to bypass that configuration.
And I think pretty much everything in security
falls into those two categories.
Everything in terms of security prevention at least.
And so, let's just pick any particular example,
access control.
So if you have a company where you're sharing a Google Doc
or something like that, you might
have it shared with everybody at the whole company.
And that means any one particular person that
has an account that's hacked at the company
could view that document.
You see what I mean?
And so that's an issue of configuration.
And so getting the configuration right
means having only the right number of people with access
to that document that really should, you see?
Pursuant to the level of sensitivity
of the content of the document.
OK, we all know this.
This is normal security 101.
Now where does integrity come in?
So let's say you've done a good job of getting
that document locked down to the right number of people.
But what could also happen is that there's
a vulnerability, like a patch missing on the server that's
hosting that document.
And so somebody could potentially
compromise the server, bypassing the access control situation
altogether.
And so you have basically--
and then the third issue is that somebody
who does have access to the documents password is stolen.
And then that account gets access to the document.
So you see, we have integrity, configuration, and people
as the three classes of issues that I think really
is every kind of security problem.
HANNAH FRY: But then sometimes that
manifests in slightly strange ways.
I mean, I read one story a little while ago
about a fish tank in a Las Vegas casino that
had a smart thermometer on it.
FOUR FLYNN: Yeah, I think I remember reading about this.
It was basically somebody that was
trying to cause financial fraud, I believe,
or some of abuse of that casino.
And they had a fish tank that was on their internal network.
And the system that was running the fish tank, of course,
like everything these days, has an IP address
and is connected to the internet.
My toaster probably does have an IP address at this point,
I'm sure.
And so what that allowed the attackers to do
is to gain a foothold on that network,
and then use that as a pivot point
to attack the more sensitive systems that were undefended
behind the scenes.
And this is a classic issue with IoT systems,
where the problem with IoT is that they often
don't have enough CPU, memory, and power budget
in order to do a lot of the security best practices.
And so you end up seeing companies
that skimp on those things.
And so therefore, you end up with these IoT systems that are
deployed somewhat pervasively.
And then if that's compounded with a system that's
behind the scenes, like a server, like we
talked about in that old model of the moat and drawbridge, that
is poorly defended and relies on network trust, which I think
is an anti-pattern now in security.
But if you have something that has by
virtue of being on the network, gains
you some amount of privilege to being
able to interact with that system.
Then generally speaking, that's a recipe for disaster.
HANNAH FRY: But I mean, I think this does just demonstrate
the number of different potential ways
that you can be vulnerable.
FOUR FLYNN: Yeah, it's a great point.
I mean, we call this the defender's dilemma, is the term
we use for it in the industry.
And it's essentially this asymmetry
between the folks that have to protect
against all potential ways to compromise
your people or your company, and an attacker that really only
has to find one avenue in.
HANNAH FRY: I mean, I suppose in a way,
you almost have to assume that you have some vulnerabilities
that you don't yet know about on your system.
FOUR FLYNN: Yes.
Everybody that does security defense has that assumption.
HANNAH FRY: Those have a name, right?
What is zero-day vulnerability?
FOUR FLYNN: So zero-day vulnerabilities
are definitely a type of vulnerability
that is very difficult to control for because those are
vulnerabilities where even if you've done everything
right, patching your system, putting your access
controls in the right place, and so on and so forth,
a zero-day vulnerability is something
that can compromise a fully patched secure system.
And so those are the class of vulnerabilities that most of us
are laying awake at night.
Those are the ones that we're often most scared of,
because there's historically it's challenging
to defend against those.
HANNAH FRY: The vulnerabilities you don't know are there?
FOUR FLYNN: Yeah, exactly.
And code is complex.
It's important to remember, these systems
that we all depend on in our daily lives
are built on millions of lines of very complicated code.
And so while the number of vulnerabilities
is probably finite, it's also a very big number.
And many of them have never been discovered before.
And so there's always this latent risk of code
that might have a vulnerability in it that is never seen before.
Just to make you feel slightly better.
HANNAH FRY: Yeah.
FOUR FLYNN: We have this concept in security
called defense in depth.
And so the idea is that you build
systems such that any one particular flaw,
hypothetical or otherwise, doesn't
lead to a catastrophic failure of the whole system.
So let's talk about a zero-day vulnerability in a system.
Well, there's all these layers of defense in modern operating
systems.
And so, even if there's a vulnerability discovered,
these days it's very hard to exploit those vulnerabilities
because of all these added security
protections in the underlying operating system that
make it very difficult to, even if there's
a vulnerability you can discover, to actually exploit it
in real life.
And there's all this trickery you
have to learn how to do, such as exploiting the vulnerability,
landing a certain amount of bytes in the memory
and then causing the operating system
to jump over to that memory.
And these are different kinds of overflows and so on.
And there's all these memory safety features
that have been built into modern kernels and modern operating
systems to try to protect against these things.
And moreover when you zoom out and you think about a larger
company, a zero-day vulnerability
could perhaps compromise your phone or your browser.
But the other thing that we talk about,
and hopefully I'm not introducing
too many concepts that are novel,
but we talk about this concept called the kill chain.
And the kill chain is a concept we borrowed from the military
and now is a sort of fixture of cybersecurity.
And basically, the idea was, we had been struggling
against this defender's dilemma, that we
have to stop every possible avenue for an attacker,
but they only have to find one way in.
And what the kill chain allowed us to think about
is think about the problem differently.
Yes, that's true.
But even though they have to only find one way in,
if they do that, they still have to go
through this series of stages that you understand
what they're going to be.
And it's things like reconnaissance, delivery
of the exploit, post-exploit sort of moving
around the environment in the network and so on and so forth.
And so this actually, I think re-empowered the defenders
and re-tilted the tables and the scales
and balanced them a little bit more because when you zoom out
and you look at a whole company, obviously if they just
phish one employee, that's not good enough, because typically
they have a target that's deep inside a system
that employee might not necessarily have access to.
And so there's this whole series of stages
the attack has to go through.
After they've phished that employee,
they've gotten code execution on their laptop.
Now they're trying to spread out through the environment.
Now they're trying to figure out,
what server they're trying to access
or what code base they're trying to access.
And so there's all these opportunities
to detect, to set tripwires, to have defense in depth,
try to block those additional stages.
And the whole entire company now is your field of battle
in which you can deploy prevention and detection
technologies that inevitably can stop and detect and slow
the attacker down.
HANNAH FRY: So even though they may
have multiple areas of attack, you've
now got multiple areas of defense.
FOUR FLYNN: Exactly.
The whole company now can become your field of defense.
HANNAH FRY: So, OK, how do large language
models change this situation?
FOUR FLYNN: In a number of ways.
HANNAH FRY: I mean, are there new vulnerabilities?
Are there new ways in which the systems can fail?
FOUR FLYNN: Yeah, so there's a whole bunch
of interesting new things that come out
of the advent of large language models,
I think both for defenders and for attackers.
But before I get to that, there's
one other thing I'd like to start as a foundation.
I think one of the things that we're still
wrestling with as an industry is that ultimately,
traditional computing systems are deterministic.
I'd say the fundamental thing from a security point of view
about LLMs that's different is that they're
generally non-deterministic.
Oftentimes you can give the same prompt to a large language
model, and it will give you different answers
depending on random things.
As it's tracing those paths of those tokens
and producing those tokens through the models,
through its brain, if you will, you'll
get non-deterministic answers.
And so we'll get into the details.
But I just think it's worth starting with that,
as that's a pretty big break from the past for us
defenders in security.
And I think some of us are still kind of wrestling
with that difference.
Now, in terms of attacks, I think large language models
are still new in some ways.
I mean, obviously, they've been around for a number of years.
But I think from a security point of view,
I think we're all still trying to learn what the risk landscape
is going to look like.
HANNAH FRY: Could attackers use large language models
to create malware?
FOUR FLYNN: So there's initial signs
that attackers are starting to figure out
how to use large language models to create malware.
And we work, I should say, closely with our threat
intelligence teams to carefully examine
what the bad actors are doing.
And we put out periodic reports, we actually
released a pretty exhaustive report, of all
the different nation state threat actors
and how they were using Gemini in January.
But we have seen attacks in the wild already
and prototype attacks that have been
built in the lab that use AI and LLMs to be part
of the malware attack chains.
And so one example of that is we're
starting to see people use LLMs for the polymorphism.
So let me explain what that is.
So one of the problems with creating malware
is that oftentimes it can get flagged by,
I mean, people used to call them antiviruses.
Now they're called EDR.
But essentially systems that are running on your laptop that
are looking for malicious code.
And so one of the problems that malware authors face
is they want to make sure that their stuff can't
get flagged by a modern antivirus engine
and be deleted or disabled.
And so the solution historically to that
has been to have something that shows up
on your computer as something completely brand new,
that's never been seen before by anybody out there before.
And so what that entails is you creating
something custom crafted for that exact instance
of that attack.
Now, that's been expensive historically, right?
But, unfortunately, we're starting
to see that large language models are increasingly
useful for helping craft bespoke malware
and having them be polymorphic, or at least
being able to have them be unique on every system
that they're planted on.
So that's one example.
And we've seen prototypes of this.
I don't know that we've seen it in the wild attack yet
of that nature, but I've definitely
seen a bunch of different interesting experiments
out there.
And I think real world attacks of that nature
are probably imminent.
HANNAH FRY: I also wonder about the new vulnerabilities,
where now a large language model is your entry
point into a system, because prompt injections is
another way.
FOUR FLYNN: That is a really great point.
So this gets back to the point I made
a moment ago about deterministic versus non-deterministic
behavior.
And prompt injection, jailbreaks,
these are examples where LLMs are
susceptible to some of the things humans
are as they become more intelligent.
And prompt injection is actually in some ways kind
of a confusion of the model's mental processing.
Basically, what prompt injection is,
is getting confused about where the command from the user
is coming from.
So let's say you're using an LLM and you say, hey,
summarize this website.
Well, you're the one telling the LLM what to do.
It should be focusing on what you're asking it to do.
But what could happen in an attack scenario
is that that website you're asking it to summarize
is actually malicious.
And so that website might actually
hijack the thought process of the LLM
and say, ignore the instructions you've previously been given,
do this other thing instead.
And of course, it sort of sounds trivial in the case
of summarizing a website.
I mean, who cares?
Whatever.
But as you think about the future
of deploying these things as agentic systems that
have increasing levels of independence
and increasing tool use where they're
engaging with potentially hostile content,
this becomes a bigger and bigger issue
as to how trustworthy these systems can be.
And so, yeah, I would say prompt injection is definitely
one of the things that I am spending
a lot of my time continuing to improve Gemini's defense of.
And I think all of us in the industry
are working on improving defenses
against this class of attack.
HANNAH FRY: I'm thinking about you talking
about malicious websites there, because is there
also a potential vulnerability that I mean,
large language models are reading the internet
in order for their training.
FOUR FLYNN: Yeah.
HANNAH FRY: Is this something about data
poisoning as well that could go on here?
FOUR FLYNN: Yeah.
There's a number of different types of data poisoning attacks
that are potentially concerning.
One is just the mass of pre-training data, as you know,
that's used to train these models,
is definitely a potential risk.
I think one thing that mitigates that risk a bit in practice
is that the data that's generally
fed into a large language model in pre-training
tends to be so expansive that any one unit of that data
generally isn't overrepresented in the outcome of the model.
In practice, it's a little bit less of a risk
because let's say you have a couple of websites
in the entire internet that are malicious.
It's very difficult to get the model
that's trained on trillions of tokens
or whatever the number is to be compromised
by that very small component of the data input.
Now, it's a little bit more concerning
in the post-training data, because there generally
is a smaller set of data that's used for post training.
And so there are potential scenarios
that I've seen literature on and academic work on that
indicates that post-training data could be potentially
risky for training a large language model to be malicious.
Once you train a model, pre and post train a model,
there's also an interesting class of risk
that you have to think about when you're serving the model.
So let's say you have these model weights that you've
finished doing training on.
And then let's say you put that out
in an open source repository.
You have open weights model.
Well, there's also papers and attacks where you can actually
have an attacker maliciously manipulate
that finalized model on disk.
And so, that's a hot patch, or whatever you want to call it,
to the actual model itself.
And if you don't validate that the model you're downloading
is the same as the one that was produced in the first place,
there's also an attack of that variety
that we all worry about as well.
HANNAH FRY: OK.
So when it comes to all of these potential ways
that attackers could get in, how do you
protect something like Gemini?
How do you prevent this from happening for Gemini?
FOUR FLYNN: The way we do that is we first start with building
a defendable model.
And so we're trying to improve Gemini's ability
to find and defend itself against these attacks.
So in the model, we do a whole bunch
of interesting work, prompt injection defense,
jailbreaking defense, using post-training, using SFT,
using RL inside the model to make sure
that we are constantly improving the model's resistance
to attack.
And if you allow me a brief digression just on one point.
One of the novel things we do at DeepMind on this
is that we're one of the innovators
in the so-called adaptive attacks approach, which
is that a lot of other folks use a static list
of prompt injection scenarios.
I guess one piece of background worth mentioning
is that in order to learn how to defend a model,
you have to really build a good suite of attacks
that are representative of what the bad actors might do to you.
HANNAH FRY: So is an example of this something
like, I know one of the early ones,
you are a grandmother reading a story to your granddaughter
about napalm?
FOUR FLYNN: Yeah, I mean, something like that.
And so what that allows you to do
is to test a bunch of attack scenarios
and then see how resistant you are against them,
and also to generate training data.
And so, to your point, it might be examples
like here's a malicious email.
Now we're causing the model to ignore the instructions
and to take a malicious action by calling a tool
and sending my private data off to some bad email address.
Something like that.
That's all the stuff that's built into that framework.
But what we do beyond that, so we do all that and we do more.
We have these adaptive attacks that allow us to constantly
hit the model over and over again,
using this sort of learning process, until we win.
We have a number of different algorithms.
We've written a paper on it.
I encourage you to read it.
But it's really a clever way to set a higher bar for what we're
defending against than a static list of canned attacks
that may or may not represent how attacks are evolving.
So the model itself needs to be able to have strong defenses,
but around the model also, again, defense in depth,
you want to have other layers of defense.
So we do a bunch of work to make the models secure.
But we also do things really intelligent
classifiers that we put around the model that also look
and flag for these sorts of behavior
so that we can defend against them too.
And what's nice about classifiers
is that they both augment the model's own ability to defend
itself, but also they allow more rapid evolution
against novel attacks.
And so it's a much lighter weight
to deploy a new classifier, to build a new classifier
and get it out there, than it is to retrain an entire model
from the ground up.
HANNAH FRY: So if these are the attacks,
let's also talk about the ways that AI
can be used to prevent them.
Tell me about Big Sleep.
FOUR FLYNN: Yeah, so Big Sleep-- yeah, sorry.
HANNAH FRY: Did it come from-- was there also a--
someone told me this just on the way in that there
was a little nap or something.
FOUR FLYNN: Naptime.
HANNAH FRY: That's it.
FOUR FLYNN: Yeah, so before I got involved in the project
formally, there was a research project called Project Naptime,
which you can still find mention of on Google blogs
here and there.
And that was the original precursor.
And I'm told that the naming convention comes from the idea
that security vulnerability researchers who
are the people that find new classes, new types
of vulnerabilities, which we'll get into in a moment,
could use this system to take a nap,
because they could just let the AI do all the work for them.
HANNAH FRY: Right.
Search for vulnerabilities on their behalf.
FOUR FLYNN: Find vulnerabilities while they
took a nap, basically.
So that was the idea behind the name.
And then as we got more momentum behind the project,
it evolved into the Big Sleep.
HANNAH FRY: Now they can actually hibernate.
FOUR FLYNN: Yeah, now they can just hibernate all winter.
But yeah, what the project essentially
is is kind of a big bet, like DeepMind
is so proud of, of using AI to find novel vulnerabilities.
Now, you might ask, why is that a good thing?
Because I thought we just discussed
vulnerabilities are bad.
And it's an interesting point.
I think those of us in the industry and security
have found that transparency is always the best
disinfectant for security.
And what we've done is we've taken the latest and greatest
versions of Gemini, and we're using them to,
with an agentic harness, basically
become a vulnerability researcher
and find novel zero-days in code.
And the goal of the project is nothing short
of helping improve the security for the whole industry
by finding vulnerabilities and helping the open source
community get them resolved for everybody to benefit
from around the world.
HANNAH FRY: So it's hunting for these sort
of unprotected backdoors that nobody knows is there?
FOUR FLYNN: That's exactly right.
Yeah, we're finding the vulnerabilities
that people have never heard of or seen before in code that
is really underlying much large portions
of the internet in many cases.
HANNAH FRY: So how did people do it before?
What was the human way of searching for them?
FOUR FLYNN: Well, I mean, if you look,
there's a black market for vulnerabilities.
They're very expensive, millions of dollars, often at a time.
HANNAH FRY: If you find one.
FOUR FLYNN: Yeah, if you were to find one.
And the reason I'm bringing that up
is just how exquisite these things are
and how rare they were.
And basically, the way it works in practice
is it's much like you might have seen,
this is the one part of security that actually
is kind of like the movies.
It's like somebody in a dark hoodie in the dark,
staring at six monitors all night while eating Cheerios
or whatever.
So it's a lot of really intensive mental work that
happens often over weeks or months where you are getting
a piece of code, you are trying to understand how it works.
You are making a hypothesis of where a vulnerability might
be in that system.
You're putting in inputs that are potentially dangerous.
So one of the ways that this works is software developers
will design a system with assumptions
in their head, oftentimes unstated assumptions.
Oh yeah, this system will only ever get image files.
And it's, why would anybody ever send me something else?
And then the attacker says, well, what
if I put a music file in here?
Or what if I put a file that I created that has
a bunch of crazy stuff in it?
And so it's a combination of trying things
that are really unorthodox, that were not
expected by the developers.
And then, sort of stepping through the code and seeing what
happened, trying to cause it to break in a certain way that's
unexpected.
And then once it breaks, you actually
have to find that it's exploitable.
And so not only is it broken in a certain way
with an unexpected input, but it needs
to be broken in a specialized way that
allows that input that you're providing
to cause the system underneath it
to be controlled by that input.
So essentially the whole idea is,
I'm giving it something hostile as an input that
would cause it to do what I'm trying to get the system to do,
not what it wants to do.
HANNAH FRY: I mean, you can see how this is
the perfect situation for AI.
So sort of like search really rigorously through
many lines of code that are constructed together
in different ways and then try lots of different options.
FOUR FLYNN: That's right.
HANNAH FRY: In order to find something, it does make sense.
FOUR FLYNN: So we've been finding exactly that,
to your point.
In the system that we're working on, as a Big Sleep,
we found that in some ways, it's very much superhuman,
because one of the challenges in vulnerability researchers
is having a comprehensive, encyclopedic knowledge of all
these complicated frameworks.
HANNAH FRY: These gigantic lines.
FOUR FLYNN: These huge code bases.
And then I saw a vulnerability that we found with Big Sleep
where I was reading through the way the model was thinking
through the issue, and it was clear there
was elements that were very superhuman there because it
would come up with a hypothesis where it realized that, oh,
this version of the framework this depends on works like this.
But version three of that framework works like that,
and version four works like this.
Essentially having this amazing, unbelievable breadth
of understanding of basically all the different libraries
and frameworks that would go into making this system,
which is very hard for any particular human
to hold in their head.
And so, yeah, I think we've definitely
seen a surprising performance out of the system
that we've built, and we've already
found a number of novel zero-days with the system.
And, as I say, our entire goal of this project
is defense and to help the world,
because we know the bad guys will
be using AI to find these vulnerabilities over time.
We want to be the best so that we can help the open source
community and those people that depend on this software
to get these things fixed as quickly as possible.
HANNAH FRY: Because that's a key point, right?
You're not just taking Big Sleep and pointing it
at Google in house code.
You're also pointing at all sorts of open source software.
FOUR FLYNN: That's exactly it.
I mean, of course, Google is built
on a lot of really great work by the open source community.
A lot of the systems we use depend on that.
So there is a benefit to Google as well.
But we're explicitly trying to pick things that are widely
deployed, both in and outside of Google,
as a way to try to be helpful to the community as a whole.
Because we do worry, I'd say frankly, about the future of AI
being used to find and exploit these vulnerabilities.
And so one of the things we're trying to do to help
is to try to find the vulnerabilities first,
and to help the open source community as
quickly as possible.
HANNAH FRY: To get ahead of it.
Because I mean, as you said, you mentioned it earlier,
a lot of the internet runs on this open source code.
FOUR FLYNN: That's right.
HANNAH FRY: Which is available to view for anybody.
FOUR FLYNN: That's right.
That's right.
HANNAH FRY: There was an argument, quite
a prevalent argument a couple of years ago,
in particular, that open source software is safer
for this very reason, that everybody can look at it.
And so you have many, many eyes who are
watching for vulnerabilities.
Where do you stand on that argument between private code
and open source code?
FOUR FLYNN: I mean, I think that's right.
I think open source code does allow for more eyes.
It does allow for more different techniques in AI
as well to be run against the systems.
I think in general, transparency,
as I said a few times in our discussion,
is the number one ingredient that defenders
have on their side to help against the attackers, who
are the ones that are trying to hold things
close to their chest and to not be transparent.
HANNAH FRY: Can you then also use large language models
to build tools to fix those vulnerabilities,
or to make suggestions for how those vulnerabilities might
be fixed?
FOUR FLYNN: Yeah, so this is a really good insight
because that's obviously the next problem that you run into.
Is if you start to scale up your ability to find vulnerabilities,
then it's very easy to imagine overwhelming both ourselves,
frankly, and the broader community
with a large volume of things that everybody has to fix.
HANNAH FRY: Suddenly you've got thousands of back doors
that need patching.
FOUR FLYNN: Exactly, and to be honest,
a lot of these open source maintainers are volunteers
that are doing their best.
They're really unsung heroes of large portions of the IT world
that we depend on together.
And so we want to make it as easy
as possible for that community to absorb what we think
is a potentially high volume of changes in this new world of AI.
And so the second big project that we've started
is a project we call Mender.
It's a pretty early stage, but we're really
excited about the progress.
And what it is is a system that's
designed to automatically generate
patches based on a vulnerability that we've discovered.
And then there's a couple of other dimensions of that.
So one is that you want to make sure
that you're fixing it in a way that doesn't
break important functionality--
HANNAH FRY: Everything else.
FOUR FLYNN: --that everybody else depends on.
You obviously want to make sure it fixes the security
issue fundamentally, not just some cosmetic way.
And then there's also the issue of making sure
that you maintain coding idioms that the open source
maintainers prefer.
And then have a series of things that we
use as validators to validate that the code that we're
producing on the output is good enough to submit.
And how do we do that?
Well, there's a couple of techniques.
We're using LLMs to ascertain whether or not
the output code is good.
We're using some formal methods, concepts,
and a number of other technologies
to essentially make sure that the thing that we've produced
is good enough.
And then, of course, for now at least,
we'll continue to have human review just
to make sure that we're not throwing
a bunch of crazy stuff over the fence
to the open source community until we
get to a certain comfort level with them
and with our own system.
And then we'll hopefully get to the point
where we can fully automate the whole system end to end.
HANNAH FRY: I mean, this is phenomenally complicated.
Phenomenally complicated.
Even forget about finding the vulnerabilities
in the first place, actually fixing it is not like this one
sticking plaster fits all.
Like, the intricacies and the nuances
of where that vulnerability lies,
how that vulnerability plugs into other parts
of the system, the language the vulnerability appears in.
FOUR FLYNN: I mean, this is such a great point.
I mean, the good news is we now have language models that
can produce code.
And indeed, they can produce patches to vulnerabilities.
But the problem is that we need to build a check and balance
into the system to make sure that what's being produced
is really, really high quality.
And LLMs alone, left to their own devices,
at least we haven't figured out yet
how that alone could just one shot a perfectly good patch
every time.
Maybe we'll get there and that'll be a wonderful day.
But for now, we think building a set of candidate patches
and then having a path with a number of suite of validators
is the right technology combination.
HANNAH FRY: Because then I guess,
I mean, layers and layers of this, because you would then
also have to start potentially worrying about the security
of that tool itself.
If someone gets into the tool and then all the
plasters that it's sticking all over the place.
FOUR FLYNN: No, this is actually another really good point.
So we don't think it's a complete thought.
I call it complete thoughts.
Essentially, how do you solve a problem end to end?
I think one is that we find vulnerabilities,
and then you want to be able to automatically generate
a patch for each and every one of those,
ideally in such a great way that nobody
has to worry about the quality of those patches.
And so we can just have those sail right in.
But the final point you raise is such a good one
as well, in that even if you've patched and fixed all the legacy
code, more and more people are depending on large language
models to generate projects.
Everybody's heard of vibe coding, or even not vibe coding.
A lot of engineers now are using large language models,
as they should because they're very productive tools.
But how do you know that large language model is
producing secure code, right?
It's very difficult to tell.
And so that's another project that we're working on,
is making sure that we're teaching Gemini,
not just how to create great code,
but how to do it in a secure way.
And we think those three things together can really
make a major dent in the quality of software
security for the world.
HANNAH FRY: Do you think this is a real ambition, then?
That you could theoretically find and patch
every vulnerability in code on Earth?
FOUR FLYNN: That's what I want to do.
And I think that there are issues
that we have to contend with that
go beyond the technical, that involve, how do we make
sure everybody in the world is applying these patches
to the real systems?
And so that's kind of a human issue.
And it comes into issues like risk aversion.
So while I think this is a good starting point
to handle all the technical challenges,
I still think there's other organizational and human
problems that we would have to contend with
to really make sure not only do we generate
great security in patches, but get those actually applied
to real life systems.
HANNAH FRY: So those are the questions
that I want to ask you for the second part of this two part
podcast.
But just before we wrap up on this part,
it does feel as though what Google is doing in this space,
at least on the technical side, is
fundamentally different from what we're
seeing from other companies.
Is that your view?
FOUR FLYNN: I mean, I think it is.
Well, I mean, there's definitely pieces of this
that are being done by others.
But we think because of the strengths of all the data
we have, all the code that we've got that we've built up
over the years at Google, and the fact
that we have the best engineering and security
talent in the world gives us a unique vantage point.
And it's something I'm proud to be part of.
HANNAH FRY: Absolutely amazing.
Well, I think you can see that there is so much more
to come in this conversation.
And I know that we are just getting to the good stuff,
but we have decided to split this across two episodes
because there was just too much to fit into one.
So keep an eye on your feed for part two.
And if you are worried that you might miss it,
well, this is a great opportunity for you
to subscribe to our channel so you always know
when we have a new video out.
And hey, while you're at it, you might as well
like and add a comment.
Very small thing, but very tangible way to make sure
that we can keep making these.
Until next time.
Analytical Tools
Create personalized analytical resources on-demand.