I Built a YouTube Deep Research Agent (Automates Research)

Feedback

In this video, I build a YouTube research agent using real multi-agent architecture. It analyzes top videos, finds content gaps, and generates new topic ideas automatically. Powered by LangGraph, TypeScript, and dynamic tool routing. Full breakdown + live test. Try it (please dont break my demo for the video thanks) https://www.nexforgeresearch.com/ [00:00:00] Intro: Deep research agent challenge [00:00:47] Multi-agent system overview [00:01:50] How the agent works: YouTube research process [00:03:00] Tech stack: Langraph and shared state [00:04:12] Orchestrator and delegation agents [00:05:02] Sub-agents and their roles [00:05:48] Context engineering and external storage [00:07:42] Embeddings and handling large transcripts [00:08:44] Agent setup: names, prompts, and recursion limits [00:09:38] Tool system and dynamic expansion [00:10:44] Side-by-side test: Claude vs. my agent [00:14:06] Research report generation process [00:15:51] My agent results and script creation [00:17:02] Lessons learned: clear problem, prompts, and tool descriptions [00:18:02] Business applications and future plans

Andrew

18:36

2025-10-15

Loading YouTube video...

Video ID:86Lu4EcP6zc

Use the "Analytical Tools" panel on the right to create personalized analytical resources from this video content.

Transcript

475 segmentsCurrent: 0:00

This week I set myself a challenge.

Build a deep research agent with one

goal. Beat Chat GPT and Claude. When it

comes to researching YouTube topics for

content ideation, instead of just

building a basic agent, I'll be using

multi-agent architecture, the same

system described in Anthropics research

paper. In multi- aent collaboration, I'm

going over the ins and outs of a multi-

aent system and how I would approach

this project. and hopefully teaching you

some new and important concepts along

the way. Even if you're new to agents,

my goal is to spark curiosity to show

you this stuff isn't just for the big

research AI labs. So, first before we

get started, I'm going to go over what

is a multi- aent system.

So, this comes from Anthropics Research

and they did some very good papers about

multi-agent systems. A multi- aent

system is a team of AI agents working

together to achieve one main goal.

Instead of using one complex AI agent,

the system breaks down complex tasks

into smaller parts. As you can see in

this graph, each sub aent is given a

specific job and tools that it needs to

complete the job. Some agents might

research, others might write code, and

some might analyze data or fact check

results. At the top of the system is the

orchestrator agent. Think of it like the

project manager of the task. The lead

agent plans the overall approach and

decides when it makes sense to bring in

the sub aents. If the sub agent can help

it move closer to the main goal, then

the lead agent will assign it a task.

This approach lets AI solve harder

problems, reduce mistakes, and it works

more efficiently by dividing and

conquering. It works just like a real

team. So, here's the plan. You type in a

topic like AI agents and the system goes

to work. It pulls transcripts,

thumbnails from about 20 of the top

YouTube videos in that niche. Then it

compiles everything into a detailed

research report. From there, my agent

analyzes what those videos cover and

more importantly, what they don't. It

then suggests fresh content angles,

script ideas, and even flags knowledge

gaps where you could create some unique

content. Essentially, it's going to be

like having a personal research team

that studies YouTube trends,

competitors, and audience gaps all

automatically. One thing to consider

when building multi- aent systems is, do

you really need to build a multi- aent

system? It's important to keep your

software as simple as possible to

achieve the final goal.

One thing you'd look at mine is this

could be a traditional automation flow.

But because I plan to build on it in the

future, add LinkedIn X, Tik Tok, then

the chat will blow up. So when a user

will ask, hey, I want to research this

Tik Tok, the agent will also

automatically assign it to my Tik Tok

agents. Let's go over a highle view of

the agents plan and the tech stack I

used and why I used it. Before we get

into all this stuff about the agent

design, I want to go over the framework

I used. I ended up going with Langraph

just because I can keep a persistent

state across the agents. So every agent

has the same state once it's been

updated by another one. It acts as like

a shared memory across all the agents.

One reason I chosen to do this is I had

a problem in testing. When the YouTube

extraction bot ran, it started running

multiple times. So instead of doing

three videos, it ended six, 9, 12, and

it wouldn't stop. So, I had to add some

extra state that tracked the YouTube

search count. And then I'd hardcode it

to stop the agent from hallucinating and

ending on a continuous cycle of just

calling YouTube, YouTube, YouTube. Think

of Langrasp as a tool that lets you

design how different AI agents talk to

each other. Almost like drawing a

flowchart for the AI's brain. Each node

in the graph represents a different

agent or step like researcher,

summarizer, planner, and langraph

handles the information that flows

between them. It's fast to set up, super

modular, and perfect for experimenting

with complex agent workflows. So, in my

system, I've gone for orchestrator agent

that controls the whole system. The

orchestrator agent has access to the

create to-do tools. This will create the

list of to-dos from the incoming tasks

and then pass it to the delegation

agent. The delegation agent then takes

all the to-do list. It then is

responsible for assigning tasks to each

sub agent that specializes in that area.

As you can see, this keeps it simple and

this is the way the power of a multi-

aent system. There's not one agent

that's having access to all of these

tools which can cause hallucinations and

context overload. Just remember, every

tool that's added to a sub agent or a

main agent eats into the context window.

That's why these multi- aent systems are

so powerful. You can offload context by

passing tools to the sub aents. In my

system, I've got five sub aents

currently, each responsible for a

different task. So, I've got the

research report agent. This is one of

the final agents that I'll probably get

called as it's responsible for creating

the structure of the final report.

So this is my YouTube research agent

who's responsible for discovering

videos, getting all the metadata like

transcriptions,

views, who made it. The embedded one

I'll talk about in a minute. So this is

the missing topic research agent. Its

main role is if the final report they

start finding topics that haven't been

discussed by YouTubers yet, it'll do

some web searches and try and find

information and find you some talking

points. It's like an extra angle that

people haven't talked about yet. You

might have noticed I've not talked about

a few tools in this list yet. That's

because I want to cover an important

topic in multi- aent systems. Context

engineering. If you just pass huge

blocks of context between agents, things

get messy fast. Agents starts wasting

tokens, losing accuracy, and even

hallucinating because they're drowning

in information overload. This also adds

up onto your bill at the front end.

That's why I use something called

context engineering. Making sure each

agent sees only what it needs to see,

nothing more. So I needed a way to

control context without overloading

prompts. My solution was to store

context externally using Cloudfare.

Instead of passing raw content around

each agent, it stores the files in

markdown in Cloudflare. Then only the

file URLs and metadata are stored in my

langraph state. That means agents pass

around lightweight references of the

massive payloads I was handling. You

think some of the YouTube videos I was

pulling was two hours long. That's

thousands and thousands and thousands of

tokens. Sub agents fetch whatever

context they need now on demand, which

keeps tokens usage low and accuracy

high. Each sub aent has the tool and the

ability to fetch the files from

Cloudflare whenever they need them. But

I came across an issue when testing my

agent that was offloading summaries to

CloudFare in markdown format. I was

still hitting context windows fast as

some of the files were passing

transcriptions with 200,000 characters

in which was way too much for me to pull

back down. So for the transcriptions I

opted for a different method. I split

the transcriptions into chunks and store

them as a embedding on a Postgress

database table. This way, when a sub

agent asks a retrieval question, it

pulls only the most relevant pieces of

the meaning, not just keywords, keeping

the context small and answers accurate.

Few really important things to know when

building multi- aent systems is each

agent will have a name, a prompt, and a

description.

What I found is the description is

probably one of the most important

things about the whole system. You can't

have overlapping descriptions, otherwise

your lead and task agents will get

extremely confused and start assigning

tasks to the wrong agents or if you have

overlapping ones, it will get confused

and start calling the wrong ones or both

at the same time. Another thing to

consider is to add recursion limits.

This will prevent a bunch of L&M nodes

spiraling out of control, emptying your

bank account and causing endless loops.

When creating an agent like this, it

helps to create a solid boilerplate base

and make it dynamic and only have a few

places where you add sub aents and

tools. This way, you can expand on it

and scale it really easily. Giving

agents a clean way to access tools and

developers to add new ones will help you

develop multi-agent systems bigger and

faster. So, to make all this work, I

built a simple tool system in my project

inside a tools module.ts.

I registered all my tools my agents can

use like the file manager, YouTube tool,

reporting tool and more. There's also a

main controller called tool registry

service which acts as like a tool

dictionary. It tells the agent which

tools are available and connects them to

the right one when needed. This gave me

a reusable way to create tools and sub

aents dynamically on the fly, giving me

massive room for expansion. So in the

future, all I need to do is add a new

LinkedIn tool, a new Tik Tok tool, and

then I can create a sub agent instantly.

They're expanding my system easily. And

because tools are passed around, file

URLs instead of full context, agents can

stay lightweight and only load extra

information when they actually need it.

That keeps the whole system fast,

efficient, and easy to scale. So let's

do a sidebyside test. First, we'll test

Claw. I've made a little bit of a a

prompt. It's not a massive one, but

because it's the research agent, it's

got some inbuilt prompting enhancements.

So, I've basically asked me to create

topics around AI agents. So, it should

create me this full research report with

script angles, quotes, common talking

points, missing talking points,

citations,

thumbnail analysis, common styles,

recommendations, and examples. So, let's

see how it does. I reckon it will take

about 11 minutes. So, we'll probably

jump back when it's done. So, it's just

finished now. They've actually uh

they've seem to have speeded it up since

they've started using 4.5. It it must

have took only about 3 or 4 minutes. So,

we've got the final report here. Um,

looking through

everything kind of looks a little bit

generic because what I think they've

done is yeah, they've they've basically

just done a web scrape on blogs and

that's okay. But

quite a big quite a lot of problems with

blogs these days. They're all AI

generated. So, it's kind of like some

AI generation inception going on where

we're feeding it more. So

this top section is just all generic

AI slop that's came from these articles.

I think this section is quite good. The

not notable quotes. It's pulled out some

good quotes from some actual real

research. I did double check it does

exist. Um the common talk importance

lang chain which we've actually been

talking about crew langraph which we've

been talking about but everything's

generic. It's not actually it is

structured in the way I asked, but

it is just a lot. It's not that much

useful. It's just a load of bullet

points. You're still going to have to go

on and do quite a lot of research and

it's absolutely massive.

So hopefully I can beat this. I hope so.

Let's give it a try. I went a bit deep

with mine and got a bit carried away and

made a landing page and a accounts page

just because I thought if people watch

this video they might want to try it and

maybe it will send people to my actual

website for the agency. So let's compare

now mine versus Claude and send a

research. We'll do the exact same search

term. It will bring us to this page.

We'll then press research.

So as you can see the research has been

initiated as in I mentioned in my

system. This is the initial lead agent

booting up and it's creating the to-do

list tasks.

So now an agent's been spawned up called

the YouTube video researcher. It's now

going to fetch 20 to 30 depends on the

how I've set up in the day uh videos of

that relevant search term. For each one

of those videos, it will then fetch the

transcripts and slowly bring them back.

As you can see, it's got the thumbnail

already.

It's loading the video and the agents

about to fetch the transcripts, which

should show shortly. So, the first

video's transcripts been dropped in. You

can now copy it. And now that would have

been offloaded

as context to the embedding table, which

I'll put up a shot of it now being

added. The next one's being worked

already. It's fetching the transcript.

It will show up here. It will then

create a superb basease embed in. And

then it will get the third video because

I've only done three for the demo for

time sake. And then it will create the

final research report which I'll show

you. So now it's at a step where one of

the research report agents of sub aents

have been called and it's analyzing the

research data from all of the thumbnails

and the transcriptions. And then it will

start generating that structure that we

asked for. the same sort of research we

prompted Claude for. While it's

generating the report, I just want to

show you how the embeddings are sort of

working. As you can see, one of the

agents here is pulling out relevant

content for the report that it's about

to generate. So, all of these are the

previous embeddings from transcriptions

we fetched from the videos above. So,

now the research agent's done. It's

fetched free transcripts from that topic

AI agents. It's also got the free

thumbnails, but just to make it clear

that it can fetch way more. It can fetch

100 if I wanted to, but just for the

demonstration purposes, I kept it to

free for this demo. I still think I can

beat Claude with free.

So, as we remember, Claude's was quite

generic. There was loads of bullet

points. It had random information. It

was massive. It wasn't that much use.

So, I decided to condense it down into a

report that's actually useful. and it

creates a script from those three

videos.

So, as you can see, we've got the AI

agent research report. So, it's gave us

six different script angles where we can

talk about from extracts it's found in

that video. So, some of the ideas it's

came up with AI agents build a simple

assistant in three steps. Dive the DNA

AI agent reasoning, action, and memory.

So, it's came up with a little overview

of that video. You could make a hook and

some key points that you'd mention and

it even referencing some of the videos

that it seen. So, it's gave you quite a

lot of videos. I asked it to generate me

a full script. So, it's gone from the

extracts of all the transcripts and it's

it's pulled out all these different

sections of the script. It even gives a

conclusion and a call to action that you

might need. It extracts all the key

quotes from the the um the different

videos and it even citations them which

timestamps they came at and which video

they came from. And then finally at the

bottom it's got the free videos that it

found and it sort of took reference off.

As you can see it's got base 44 in this

video which was mentioned quite a bit in

here.

So there you have it. There was my

attempt to beat Claude at a research in

a specific topic. I think I did pretty

well. I've gone little bit different

angle to them extracting YouTube data

and trying to really craft a specific

research agent. Although I've created

multiple AI agents before, I did learn

some things from this project. I need to

practice what I preach. What I first set

out for and the video I created was way

over complicated. I created a multi-

aent system that was spawning different

agents and creating the agents

dynamically itself. it wasn't actually

solving the issue I needed to in its

simplest form. I also started off as a

stock agent and I started doing deep

stock agent research and I didn't have a

clear defined problem. This is why 95%

of the actual AI agents problems fail is

because people don't set out with a

clear problem. So to make AI agents

succeed in a business, you really need

to define that problem and have it well

defined across everyone else and know

exactly what the outcome of the agent

needs to be. You don't want to fall in

that 95% failure statistic bracket

because it will cost you money. Running

these agents and experimenting with them

is not cheap sometimes. Prompting is so

important. People talk about prompt

engineering and people just ignore it

because they use chat GPT and Claude and

it helps them with it. But when

designing agents and creating the

descriptions and the tool names, it is

so important that they don't overlap.

They're really descriptive. The the tool

descriptions have to be really

descriptive. Otherwise, the agent will

not call it or will not know exactly

what it does. And you have to think of

it from an agent's point of view, not

your view. It wouldn't need to be fetch

this website. It needs to be this tool

is used for fetching websites to get X

data from Y. Finally, businesses can

really harness the power of deep agents.

Imagine having one of these in your full

business where one agent you can query

your whole business and generate reports

from different sections of the business.

Basically, we'll know your full business

context and you'll be able to input any

questions to it. Because it took a while

to build, I thought, let's let people

try it. I'll put the link in the

description below. It's one free

generation and it will grab you free

videos max unless you want to pay for it

to try it out more. I will think of a

better pricing plan, but the reason it's

not free is because these are expensive

to run. It goes through multiple open AI

calls. I don't want to go bankrupt from

running this agent. If people liked it,

I will do a follow-up video where I'll

add a LinkedIn, Tik Tok, or other social

medias and expand it. But thanks for

watching. I hope you enjoyed.

Analytical Tools

Create personalized analytical resources on-demand.

Analysis

Visualization

Learning

Assessment

Chatbot is available after you save this video to your library.