I Built a YouTube Deep Research Agent (Automates Research)
Loading YouTube video...
Video ID:86Lu4EcP6zc
Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.
Transcript
This week I set myself a challenge.
Build a deep research agent with one
goal. Beat Chat GPT and Claude. When it
comes to researching YouTube topics for
content ideation, instead of just
building a basic agent, I'll be using
multi-agent architecture, the same
system described in Anthropics research
paper. In multi- aent collaboration, I'm
going over the ins and outs of a multi-
aent system and how I would approach
this project. and hopefully teaching you
some new and important concepts along
the way. Even if you're new to agents,
my goal is to spark curiosity to show
you this stuff isn't just for the big
research AI labs. So, first before we
get started, I'm going to go over what
is a multi- aent system.
So, this comes from Anthropics Research
and they did some very good papers about
multi-agent systems. A multi- aent
system is a team of AI agents working
together to achieve one main goal.
Instead of using one complex AI agent,
the system breaks down complex tasks
into smaller parts. As you can see in
this graph, each sub aent is given a
specific job and tools that it needs to
complete the job. Some agents might
research, others might write code, and
some might analyze data or fact check
results. At the top of the system is the
orchestrator agent. Think of it like the
project manager of the task. The lead
agent plans the overall approach and
decides when it makes sense to bring in
the sub aents. If the sub agent can help
it move closer to the main goal, then
the lead agent will assign it a task.
This approach lets AI solve harder
problems, reduce mistakes, and it works
more efficiently by dividing and
conquering. It works just like a real
team. So, here's the plan. You type in a
topic like AI agents and the system goes
to work. It pulls transcripts,
thumbnails from about 20 of the top
YouTube videos in that niche. Then it
compiles everything into a detailed
research report. From there, my agent
analyzes what those videos cover and
more importantly, what they don't. It
then suggests fresh content angles,
script ideas, and even flags knowledge
gaps where you could create some unique
content. Essentially, it's going to be
like having a personal research team
that studies YouTube trends,
competitors, and audience gaps all
automatically. One thing to consider
when building multi- aent systems is, do
you really need to build a multi- aent
system? It's important to keep your
software as simple as possible to
achieve the final goal.
One thing you'd look at mine is this
could be a traditional automation flow.
But because I plan to build on it in the
future, add LinkedIn X, Tik Tok, then
the chat will blow up. So when a user
will ask, hey, I want to research this
Tik Tok, the agent will also
automatically assign it to my Tik Tok
agents. Let's go over a highle view of
the agents plan and the tech stack I
used and why I used it. Before we get
into all this stuff about the agent
design, I want to go over the framework
I used. I ended up going with Langraph
just because I can keep a persistent
state across the agents. So every agent
has the same state once it's been
updated by another one. It acts as like
a shared memory across all the agents.
One reason I chosen to do this is I had
a problem in testing. When the YouTube
extraction bot ran, it started running
multiple times. So instead of doing
three videos, it ended six, 9, 12, and
it wouldn't stop. So, I had to add some
extra state that tracked the YouTube
search count. And then I'd hardcode it
to stop the agent from hallucinating and
ending on a continuous cycle of just
calling YouTube, YouTube, YouTube. Think
of Langrasp as a tool that lets you
design how different AI agents talk to
each other. Almost like drawing a
flowchart for the AI's brain. Each node
in the graph represents a different
agent or step like researcher,
summarizer, planner, and langraph
handles the information that flows
between them. It's fast to set up, super
modular, and perfect for experimenting
with complex agent workflows. So, in my
system, I've gone for orchestrator agent
that controls the whole system. The
orchestrator agent has access to the
create to-do tools. This will create the
list of to-dos from the incoming tasks
and then pass it to the delegation
agent. The delegation agent then takes
all the to-do list. It then is
responsible for assigning tasks to each
sub agent that specializes in that area.
As you can see, this keeps it simple and
this is the way the power of a multi-
aent system. There's not one agent
that's having access to all of these
tools which can cause hallucinations and
context overload. Just remember, every
tool that's added to a sub agent or a
main agent eats into the context window.
That's why these multi- aent systems are
so powerful. You can offload context by
passing tools to the sub aents. In my
system, I've got five sub aents
currently, each responsible for a
different task. So, I've got the
research report agent. This is one of
the final agents that I'll probably get
called as it's responsible for creating
the structure of the final report.
So this is my YouTube research agent
who's responsible for discovering
videos, getting all the metadata like
transcriptions,
views, who made it. The embedded one
I'll talk about in a minute. So this is
the missing topic research agent. Its
main role is if the final report they
start finding topics that haven't been
discussed by YouTubers yet, it'll do
some web searches and try and find
information and find you some talking
points. It's like an extra angle that
people haven't talked about yet. You
might have noticed I've not talked about
a few tools in this list yet. That's
because I want to cover an important
topic in multi- aent systems. Context
engineering. If you just pass huge
blocks of context between agents, things
get messy fast. Agents starts wasting
tokens, losing accuracy, and even
hallucinating because they're drowning
in information overload. This also adds
up onto your bill at the front end.
That's why I use something called
context engineering. Making sure each
agent sees only what it needs to see,
nothing more. So I needed a way to
control context without overloading
prompts. My solution was to store
context externally using Cloudfare.
Instead of passing raw content around
each agent, it stores the files in
markdown in Cloudflare. Then only the
file URLs and metadata are stored in my
langraph state. That means agents pass
around lightweight references of the
massive payloads I was handling. You
think some of the YouTube videos I was
pulling was two hours long. That's
thousands and thousands and thousands of
tokens. Sub agents fetch whatever
context they need now on demand, which
keeps tokens usage low and accuracy
high. Each sub aent has the tool and the
ability to fetch the files from
Cloudflare whenever they need them. But
I came across an issue when testing my
agent that was offloading summaries to
CloudFare in markdown format. I was
still hitting context windows fast as
some of the files were passing
transcriptions with 200,000 characters
in which was way too much for me to pull
back down. So for the transcriptions I
opted for a different method. I split
the transcriptions into chunks and store
them as a embedding on a Postgress
database table. This way, when a sub
agent asks a retrieval question, it
pulls only the most relevant pieces of
the meaning, not just keywords, keeping
the context small and answers accurate.
Few really important things to know when
building multi- aent systems is each
agent will have a name, a prompt, and a
description.
What I found is the description is
probably one of the most important
things about the whole system. You can't
have overlapping descriptions, otherwise
your lead and task agents will get
extremely confused and start assigning
tasks to the wrong agents or if you have
overlapping ones, it will get confused
and start calling the wrong ones or both
at the same time. Another thing to
consider is to add recursion limits.
This will prevent a bunch of L&M nodes
spiraling out of control, emptying your
bank account and causing endless loops.
When creating an agent like this, it
helps to create a solid boilerplate base
and make it dynamic and only have a few
places where you add sub aents and
tools. This way, you can expand on it
and scale it really easily. Giving
agents a clean way to access tools and
developers to add new ones will help you
develop multi-agent systems bigger and
faster. So, to make all this work, I
built a simple tool system in my project
inside a tools module.ts.
I registered all my tools my agents can
use like the file manager, YouTube tool,
reporting tool and more. There's also a
main controller called tool registry
service which acts as like a tool
dictionary. It tells the agent which
tools are available and connects them to
the right one when needed. This gave me
a reusable way to create tools and sub
aents dynamically on the fly, giving me
massive room for expansion. So in the
future, all I need to do is add a new
LinkedIn tool, a new Tik Tok tool, and
then I can create a sub agent instantly.
They're expanding my system easily. And
because tools are passed around, file
URLs instead of full context, agents can
stay lightweight and only load extra
information when they actually need it.
That keeps the whole system fast,
efficient, and easy to scale. So let's
do a sidebyside test. First, we'll test
Claw. I've made a little bit of a a
prompt. It's not a massive one, but
because it's the research agent, it's
got some inbuilt prompting enhancements.
So, I've basically asked me to create
topics around AI agents. So, it should
create me this full research report with
script angles, quotes, common talking
points, missing talking points,
citations,
thumbnail analysis, common styles,
recommendations, and examples. So, let's
see how it does. I reckon it will take
about 11 minutes. So, we'll probably
jump back when it's done. So, it's just
finished now. They've actually uh
they've seem to have speeded it up since
they've started using 4.5. It it must
have took only about 3 or 4 minutes. So,
we've got the final report here. Um,
looking through
everything kind of looks a little bit
generic because what I think they've
done is yeah, they've they've basically
just done a web scrape on blogs and
that's okay. But
quite a big quite a lot of problems with
blogs these days. They're all AI
generated. So, it's kind of like some
AI generation inception going on where
we're feeding it more. So
this top section is just all generic
AI slop that's came from these articles.
I think this section is quite good. The
not notable quotes. It's pulled out some
good quotes from some actual real
research. I did double check it does
exist. Um the common talk importance
lang chain which we've actually been
talking about crew langraph which we've
been talking about but everything's
generic. It's not actually it is
structured in the way I asked, but
it is just a lot. It's not that much
useful. It's just a load of bullet
points. You're still going to have to go
on and do quite a lot of research and
it's absolutely massive.
So hopefully I can beat this. I hope so.
Let's give it a try. I went a bit deep
with mine and got a bit carried away and
made a landing page and a accounts page
just because I thought if people watch
this video they might want to try it and
maybe it will send people to my actual
website for the agency. So let's compare
now mine versus Claude and send a
research. We'll do the exact same search
term. It will bring us to this page.
We'll then press research.
So as you can see the research has been
initiated as in I mentioned in my
system. This is the initial lead agent
booting up and it's creating the to-do
list tasks.
So now an agent's been spawned up called
the YouTube video researcher. It's now
going to fetch 20 to 30 depends on the
how I've set up in the day uh videos of
that relevant search term. For each one
of those videos, it will then fetch the
transcripts and slowly bring them back.
As you can see, it's got the thumbnail
already.
It's loading the video and the agents
about to fetch the transcripts, which
should show shortly. So, the first
video's transcripts been dropped in. You
can now copy it. And now that would have
been offloaded
as context to the embedding table, which
I'll put up a shot of it now being
added. The next one's being worked
already. It's fetching the transcript.
It will show up here. It will then
create a superb basease embed in. And
then it will get the third video because
I've only done three for the demo for
time sake. And then it will create the
final research report which I'll show
you. So now it's at a step where one of
the research report agents of sub aents
have been called and it's analyzing the
research data from all of the thumbnails
and the transcriptions. And then it will
start generating that structure that we
asked for. the same sort of research we
prompted Claude for. While it's
generating the report, I just want to
show you how the embeddings are sort of
working. As you can see, one of the
agents here is pulling out relevant
content for the report that it's about
to generate. So, all of these are the
previous embeddings from transcriptions
we fetched from the videos above. So,
now the research agent's done. It's
fetched free transcripts from that topic
AI agents. It's also got the free
thumbnails, but just to make it clear
that it can fetch way more. It can fetch
100 if I wanted to, but just for the
demonstration purposes, I kept it to
free for this demo. I still think I can
beat Claude with free.
So, as we remember, Claude's was quite
generic. There was loads of bullet
points. It had random information. It
was massive. It wasn't that much use.
So, I decided to condense it down into a
report that's actually useful. and it
creates a script from those three
videos.
So, as you can see, we've got the AI
agent research report. So, it's gave us
six different script angles where we can
talk about from extracts it's found in
that video. So, some of the ideas it's
came up with AI agents build a simple
assistant in three steps. Dive the DNA
AI agent reasoning, action, and memory.
So, it's came up with a little overview
of that video. You could make a hook and
some key points that you'd mention and
it even referencing some of the videos
that it seen. So, it's gave you quite a
lot of videos. I asked it to generate me
a full script. So, it's gone from the
extracts of all the transcripts and it's
it's pulled out all these different
sections of the script. It even gives a
conclusion and a call to action that you
might need. It extracts all the key
quotes from the the um the different
videos and it even citations them which
timestamps they came at and which video
they came from. And then finally at the
bottom it's got the free videos that it
found and it sort of took reference off.
As you can see it's got base 44 in this
video which was mentioned quite a bit in
here.
So there you have it. There was my
attempt to beat Claude at a research in
a specific topic. I think I did pretty
well. I've gone little bit different
angle to them extracting YouTube data
and trying to really craft a specific
research agent. Although I've created
multiple AI agents before, I did learn
some things from this project. I need to
practice what I preach. What I first set
out for and the video I created was way
over complicated. I created a multi-
aent system that was spawning different
agents and creating the agents
dynamically itself. it wasn't actually
solving the issue I needed to in its
simplest form. I also started off as a
stock agent and I started doing deep
stock agent research and I didn't have a
clear defined problem. This is why 95%
of the actual AI agents problems fail is
because people don't set out with a
clear problem. So to make AI agents
succeed in a business, you really need
to define that problem and have it well
defined across everyone else and know
exactly what the outcome of the agent
needs to be. You don't want to fall in
that 95% failure statistic bracket
because it will cost you money. Running
these agents and experimenting with them
is not cheap sometimes. Prompting is so
important. People talk about prompt
engineering and people just ignore it
because they use chat GPT and Claude and
it helps them with it. But when
designing agents and creating the
descriptions and the tool names, it is
so important that they don't overlap.
They're really descriptive. The the tool
descriptions have to be really
descriptive. Otherwise, the agent will
not call it or will not know exactly
what it does. And you have to think of
it from an agent's point of view, not
your view. It wouldn't need to be fetch
this website. It needs to be this tool
is used for fetching websites to get X
data from Y. Finally, businesses can
really harness the power of deep agents.
Imagine having one of these in your full
business where one agent you can query
your whole business and generate reports
from different sections of the business.
Basically, we'll know your full business
context and you'll be able to input any
questions to it. Because it took a while
to build, I thought, let's let people
try it. I'll put the link in the
description below. It's one free
generation and it will grab you free
videos max unless you want to pay for it
to try it out more. I will think of a
better pricing plan, but the reason it's
not free is because these are expensive
to run. It goes through multiple open AI
calls. I don't want to go bankrupt from
running this agent. If people liked it,
I will do a follow-up video where I'll
add a LinkedIn, Tik Tok, or other social
medias and expand it. But thanks for
watching. I hope you enjoyed.
Analytical Tools
Create personalized analytical resources on-demand.