One Agent
can't finish anything worth shipping.
It's not that the Agent isn't strong enough — single-agent as a structure has a ceiling.
Skills and Multi-agent aren't two new technologies. They're two solutions to the old problem of division of labor.
- The 4 ceilings of a single Agent — when you must divide work, or things break
- Skills is the "lightweight solution" — one Agent can act like a team
- The 4 Multi-agent collaboration patterns — which one fits which task
- 6 real workflows + a selection decision tree — usable today
Agent / Skill / Multi-agent — draw the boundaries first
Everything in this episode rides on these three words. The market mixes them, swaps them, packages them as the same thing — but engineering gives them clear boundaries. Get the definitions straight first; only then does the rest make sense.
- LLM — the brain, doing reasoning and decisions
- Tools — interfaces to external capabilities (API / database / browser / filesystem…)
- Memory — short-term context + long-term memory (vector store / files)
- Loop — iterate until the task succeeds / fails / hits a stop condition
Can autonomously decide the next step = agent
Answers one question, no next action = chatbot (not an agent, out of scope here)
"AI assistant / Copilot / 智能体" — covered in Ep 01: mostly marketing language, not discussed further.
- Trigger description — "when should I be called" (so the agent knows when to use it)
- I/O contract — what input it needs, what shape of output it produces
- Prompt — the core instruction for this segment (can include few-shot examples)
- Tool subset — the tools this skill is allowed to call (not the agent's full tool set)
Still runs inside the same agent's context — a Skill is the agent's "internal module," not a separate LLM instance.
Calling a prompt file a "Skill" is the same kind of naming as calling a reusable function a "module" — the point isn't the word, it's the structure.
"$29 for 1000 prompts" — that's not a Skill. That's a static prompt library: no trigger logic, no I/O contract, not programmatically callable.
- N agents — each with their own LLM + Tools + Memory + Loop (i.e., N complete agents)
- Protocol — how messages/state are exchanged (message queue / shared files / API calls)
- Coordination — who splits the task, who aggregates, who stops the whole flow
- Failure handling — what to do when one agent crashes or goes off the rails
The agents' contexts are genuinely independent = multi-agent
Writing "you are A, then you are B" in a single prompt is NOT multi-agent — that's role-play, and A and B share the same context.
"My Agent has 100 Skills" — that's one agent with many Skills, not multi-agent. This is the most common naming confusion in the market.
One Agent doing everything breaks in 4 ways
Why do you need multi-agent or skills? Because past a certain complexity, a single Agent must break. The 4 failure modes below have nothing to do with "the agent isn't smart enough." They're the limits of the single-context structure itself.
-
CEILING · 01 · CONTEXT DRIFT
Past 50 steps, the original instructions get diluted by new content
You give the agent a system prompt + a goal, let it run 80 steps. By step 50, the agent can barely see the original system prompt anymore — it's drowning in its own recent outputs + tool results. Symptom: opens strong, then drifts, forgets rules, answers the wrong question. No stronger model fixes this — it's a physical limit of attention + context length.
-
CEILING · 02 · TOOL OVERLOAD
One Agent with 20+ tools — the routing decision falls apart
As you add more tools to an agent (Gmail, Calendar, Notion, SQL, Web search, custom APIs…), every decision becomes "pick one of N tools" — pick wrong and it drifts. Measured in production: above 15 tools, tool-selection accuracy drops linearly; above 30, it's basically unusable. More tools, less competence — this is a structural problem with Function Calling, not something prompt-tuning can fix.
-
CEILING · 03 · ERROR COMPOUNDING
Single-thread tasks: one wrong step poisons everything downstream
The agent misreads a customer name at step 12. Every action from step 13 on (lookup, draft reply, send email) is based on the wrong name, but the agent can't see this — there's no second pair of eyes on its intermediate results. By step 30 you notice, dig through the log, trace it back to step 12 — but it already happened in production. This is the most common incident pattern in single-agent setups.
-
CEILING · 04 · ROLE COLLAPSE
Asking it to "do + check + report" at once — none of them done well
You write a giant prompt: research the market, draft copy, critically review it, output a markdown report. The agent skims every role — it can't actually switch reasoning modes within one inference pass. Especially "Drafter + Critic" — without separation it always rationalizes its own work: the writer and the critic are the same reasoning stream.
Skills — the "lightweight" division of labor
Most people think Skills is a way to add new capabilities to an Agent. More accurate: Skill is how you divide labor INSIDE one Agent — extract a workflow segment into a callable, standalone unit. The Agent calls it when relevant, and doesn't pollute the main context when not.
What a Skill contains
A standard Skill file (e.g. Claude Skills' SKILL.md) has four parts:
When Skills fit
- One Agent doing many kinds of work: instead of cramming 10 if-else branches into a giant prompt, make one Skill per branch and let the agent route to it
- A workflow segment is reused across places: generic actions like email triage or CSV exploration — write them as Skills and multiple higher-level agents can call them
- You need versioning / audit trail: Skills are files, git-tracked; prompt changes are diff-able, errors are revertable
- Team collaborating on the same agent: each person edits the Skill file they own, instead of stomping over each other in one giant system prompt
3 Skills that actually run
Multi-agent — the "heavyweight" division of labor
When Skills isn't enough — task too large, single context can't hold it, or you must have an independent reviewer — go multi-agent. "Multi" here isn't a count, it's independent contexts: each agent sees a different slice of the world, exchanging state through a protocol.
One test for "is this real multi-agent": are the two agents' context windows genuinely independent? If it's just "you are agent A… you are agent B" inside one system prompt, that's role-play — not multi-agent.
Real multi-agent has 4 archetypal collaboration patterns. They solve different problems, at different complexity levels. Most production systems combine them.
-
PATTERN · 01 · MANAGER–WORKER
One Manager splits the task, N Workers each execute a slice
The most common, easiest to grasp. The Manager agent receives a big task, splits it into N subtasks, dispatches them to specialized Worker agents, each worker runs its slice in an independent context, reports back, and the manager aggregates and produces the final output.
·MANAGERsplit into 8 chapters → assign to 8 Researchers·WORKER ×8each independently researches and writes their chapter · no interference·MANAGERcollect 8 chapters → unify voice → write intro + conclusionWHEN TO USEtask splits statically · subtasks have no hard dependencies -
PATTERN · 02 · PIPELINE
Agent A → Agent B → Agent C — each finishes, hands off downstream
Assembly-line style. Each agent owns one stage of transformation, the previous agent's output = the next agent's input. Difference from Manager–Worker: a pipeline is serial and single-track, with no central dispatcher.
01RESEARCHERmine 10 viral pieces from the niche → extract common patterns02IDEATORbased on research → produce 7 topic ideas + outlines03WRITERwrite the full draft04EDITORcopyedit, adjust to voice corpus05PUBLISHERformat + multi-platform adapt + scheduleWHEN TO USEtask is a linear chain · each stage needs a different specialty -
PATTERN · 03 · SPECIALIST ROUNDTABLE
Multiple expert Agents look at the same problem in parallel, one Synthesizer combines
"Three doctors in consultation" mode. The same input goes simultaneously to multiple expert agents, each looks through their own professional lens and produces an independent verdict; a final synthesizer merges the perspectives. Key point: parallel — the experts don't talk to each other, to avoid cross-contamination.
·QUANTtechnicals (MA crossover, RSI, volume/price) · independent score·FUNDAMENTALearnings, industry, valuation · independent score·SENTIMENTnews, social, institutional moves · independent score·RISKposition size, volatility, correlation · independent score·SYNTHESIZERaggregate 4 independent verdicts → produce a recommendationWHEN TO USEyou need multiple independent perspectives · afraid of single-track thinking -
PATTERN · 04 · ADVERSARIAL
Drafter writes, Critic tears it apart, iterate until convergence
Adversarial. One agent produces (Drafter), another agent exists to find problems (Critic). They go back and forth until the Critic has nothing left or the iteration limit hits. The core problem it solves: a single agent reviewing its own work always rationalizes.
01DRAFTERread issue → write implementation + tests → submit v102CRITICtear it apart (edge cases, security, perf, readability) → list fixes03DRAFTERapply feedback → submit v204CRITICre-review → until approved or 3 rounds doneWHEN TO USEoutput quality is non-negotiable · wrong = expensive (code / contracts / legal opinions)
Each workflow is one of the structures from §03/§04
6 systems below, covering the 4 multi-agent patterns + Skills combos. Pick the one closest to your work — copy the structure, swap the prompts and tools — in most cases it runs. Each is tagged with the pattern it uses, so you can cross-reference §03/§04.
-
CASE · 01 · Customer Support · MANAGER-WORKER Per-ticket 8min → 1min · headcount halved
200 tickets, 5 people can't clear them — manager classifies, workers handle
Contrarian take: support throughput isn't bottlenecked by typing speed. It's bottlenecked by classification. One manager reading + classifying a ticket in 30 seconds beats 5 agents each spending 10 minutes on "what is this even". Then each ticket type goes to a dedicated worker — each worker has only seen training context for its category, so its handling quality beats a generalist.
01INTAKEZendesk webhook · triggered on new ticket02MANAGERread ticket · classify into 5 buckets (refund / technical / complaint / inquiry / escalation)03WORKER Arefund skill · pull order + draft refund flow04WORKER Btechnical skill · pull logs + draft fix steps05WORKER Ccomplaint skill · draft apology + escalate to human06SUPERVISORrandom 5% reviewed by human · feedback tunes the managerSETUP2 weeks · ROI8min → 1min · support team 5 → 2 -
CASE · 02 · Sales Outreach · PIPELINE 3× outreach · reply rate +40%
200 prospects, a 4-stage pipeline does what an SDR needs 4 days for
Contrarian take: personalized email isn't slow because writing is slow. It's slow because "research + write" thrashes inside one human brain. Split those two steps into two agents, add a sender + tracker, and one person running the pipeline does the work of four.
01RESEARCHERcompany list in · pull filings / news / decision chain / pain signals02PERSONALIZERbased on research · write custom opener + value hook03SENDERsend at optimal time · track opens / clicks04TRACKERno reply in 3 days → trigger differentiated follow-up (A/B)05FEEDBACKreply content → loop back to Personalizer to tune voiceSETUP1 week · ROI200 prospects/week · reply rate 5% → 7% -
CASE · 03 · Code Review · ADVERSARIAL Production bugs -50% · PR pass rate +30%
Drafter + 3 Critics tearing it apart cuts bugs in half
Contrarian take: a single agent that writes code and reviews it always concludes its own work is fine. You need critic agents with independent contexts, and each critic must own one class of risk. The security critic isn't distracted by perf, the perf critic doesn't skip readability — three critics piling on is what forces the drafter to actually fix things.
01DRAFTERread issue + impact area · write implementation + tests · submit v102CRITIC.SECSQL injection / XSS / authz boundaries / secret leakage03CRITIC.PERFN+1 / infinite loops / O(n²) / missing cache04CRITIC.READnaming / abstraction layers / comments / dead code05DRAFTERmerge 3 sets of feedback → submit v2 · max 3 rounds06HUMANyou only see the final version + each critic's final verdictSETUP3 days · ROIPR review time 30min → 5min · production bugs -50% -
CASE · 04 · Competitive / Market Intel · ROUNDTABLE Per-target research 1 week → 1 hour
4 expert agents look at the same company in parallel; synthesizer outputs a threat rating
Contrarian take: ask one generalist agent for a "complete competitive report" and you get something balanced but devoid of insight. Use 4 specialist agents in parallel on the same company, each looking only at their domain, then a synthesizer merges — the report is 3-5× stronger than a single agent's, because the perspectives are genuinely independent.
·TECHstack / product matrix / engineering capability / hiring signals·FINANCIALfunding cadence / filings / valuation / burn rate·GTMchannels / pricing / marketing moves / customer base·PEOPLEexec changes / team size / key people's posts on X★SYNTHESIZERaggregate 4 independent verdicts · output "threat level + what you should do"SETUP3 days · ROIcompetitive research 1 week → 1 hour · monthly refresh of 5 targets -
CASE · 05 · Content Production · HYBRID (Pipeline + Adversarial) 1 piece/person/day → 5 pieces/day
Pipeline of 4 stations + an Adversarial loop in the middle — higher quality than single-agent + 5× volume
Contrarian take: 99% of "AI writing tools" run a single agent across the whole flow — the result reeks of AI and reads from one angle. A real content factory is hybrid: Pipeline gives you assembly-line throughput, and the critical "write + revise" station uses Adversarial for quality.
01RESEARCHERscan niche + Perplexity Deep Research · output topic + source material02aWRITERwrite first draft from research · tune voice from persona corpus02bEDITOR ⟷tear apart (pacing / hook / contrarian strength / AI tone)02cWRITER ⟷revise · re-review · max 3 rounds until EDITOR passes03ADAPTERone draft → multi-platform (Twitter / Xiaohongshu / WeChat / YouTube desc)04SCHEDULERBuffer / Hypefury scheduling + analytics feedbackSETUP2 weeks (building voice corpus) · ROI1 piece/day → 5/day per person · AI-tone detection drops noticeably -
CASE · 06 · Cross-database Data Exploration · MANAGER-WORKER Multi-table join analysis 4h → 15min
50-table cross-join — one Agent can't hold it all; 10 Workers query in parallel
Contrarian take: the data analyst's worst problem isn't writing SQL. It's that 10 tables' schemas combined don't fit in one agent's context. Push "understand the schema" down to each worker (each worker knows one table), let the manager only split the question and stitch the answer — a task that used to "not fit in context" now ships in 10 minutes.
01MANAGERread user question · split into N independent subqueries (each touches 1-2 tables)02WORKER ×Neach worker runs in its own context · only loads the schema it owns03WORKER ×Nrun SQL · validate results · emit structured data04MANAGERcollect N subresults · cross-result JOIN / compute · chart05MANAGERwrite insight + anomaly flagsSETUP1 week · ROIpreviously context-blocked tasks · now runnable · 4h → 15min
Single Agent · Skill · Multi-agent — when to pick which
Don't pick based on "what looks more sophisticated" — pick based on whether your task triggers a specific signal. Ask the 3 questions below in order, stop at the first YES.
-
LEVEL · 01 · SINGLE AGENT
Can your task finish in 10 steps and won't recur?
If yes: don't reach for Skill, definitely not Multi-agent. All you need is a ChatGPT / Claude window + a prompt. Skills and Multi-agent both carry maintenance cost — don't build infrastructure for a one-off.
Trigger signal: ad-hoc tasks that happen 0-1×/week · single function · no parallelism needed
Real example: writing one-off email / researching an ad-hoc question / looking up a reference -
LEVEL · 02 · SINGLE AGENT + SKILLS
Does the same Agent need to handle 5+ task types, or is one workflow segment reused everywhere?
If yes: add Skills, but you don't need Multi-agent yet. Skills lets one Agent operate like a team — same context, callable modular capabilities. Moderate maintenance cost: you write SOPs, but only one agent runs.
Trigger signal: same agent handles 5+ task types · one workflow segment reused across places · team needs to collaborate on prompts
Real example: personal assistant (email + calendar + Notion + memos) / all-in-one support (multi-class tickets + knowledge base + escalation) -
LEVEL · 03 · MULTI-AGENT
Any of: context overload / need independent reviewer / need real parallelism?
If yes: go Multi-agent. But be ready to spend 5-10× more on protocol design, debugging the chain, monitoring state. Multi-agent isn't "the more advanced version" — it's the choice when you have no option but to divide labor.
Trigger signal: single agent drifts past 50 steps · must have independent eyes critiquing · 4+ subtasks parallelizable · context can't hold the full schema·MANAGER-WORKERtask splits statically · subtasks independent·PIPELINEtask is sequential transformation · each stage needs a specialty·ROUNDTABLEneed multiple independent perspectives · afraid of groupthink·ADVERSARIALquality is non-negotiable · wrong = expensive (code / contracts)REMEMBERmost production systems are 2-3 patterns nested together
5 things the market says about Multi-agent and Skills that aren't quite right
Ep 01 covered the "AI peripheral market" — selling anxiety, selling certainty. This section narrows to Ep 02's two words: what the market has done to multi-agent and skills. Once you see it, you can tell faster whether a product / article / course has anything real behind it.
-
OBSERVATION · 01 · LANGGRAPH WRAPPER
"Multi-agent Platform / OS"
2024-2026 produced a flood of "Multi-agent OS / platform" startups. Most are LangGraph + a UI layer, with no genuinely independent contexts between agents. The test is simple: ask "what does each of your two agents see in their system prompt?" — if they can't name specific boundaries, it's a wrapper.
-
OBSERVATION · 02 · MORE TOOLS > BETTER AGENT
"Our Agent has 100+ Skills"
"Number of Skills" gets pitched as a selling point, but the Tool Overload ceiling from §02-02 applies to skills too — the more you load in, the worse the agent's skill-selection gets. Production measurement: past 15-20 skills, recall accuracy falls off a cliff. 10 sharp Skills > 50 noisy ones.
-
OBSERVATION · 03 · ACADEMIC vs PRACTICAL
"Multi-agent is the path to AGI"
This is an academic debate 20 years old (agent-based modeling, swarm intelligence, multi-agent reinforcement learning…) — it has nothing to do with you building a system that actually handles support tickets today. This grand narrative gets used to dress up simple LangGraph demos — when you hear it, remember: however grand, it still has to handle a ticket.
-
OBSERVATION · 04 · PROPRIETARY FRAMING
"Skill is an Anthropic-proprietary feature"
Wrong. Skill isn't any vendor's feature — it's a design pattern. OpenAI Custom GPT Actions, LangChain Tools, your own modular prompt files, even Unix tools + a README — they're all the same thing at the core: package reusable capability into a unit that's discoverable and callable. Who packages it cleanest is a separate question, but the concept doesn't belong to anyone.
-
OBSERVATION · 05 · WRONG ORDER
"Complex tasks require Multi-agent"
Usually said backwards. The first thing to do with a complex task is turn it into an SOP — break it into repeatable steps with explicit input/output per step. After the SOP, most tasks' "complexity" drops to where single agent + skills handle it; only what truly remains — the parts needing parallelism or independent review — goes multi-agent. Reverse the order — stack multi-agent first — and you're using expensive coordination cost to mask the fact you haven't thought through the workflow.
12 weeks from single Agent to Multi-agent
This extends Ep 01's 90-day path — assuming you can already handle daily tasks with a single agent + 1 Project (the Week 9-12 state from Ep 01). The next 12 weeks take you from "single agent + 1 Skill" to "3-agent coordinated system." Each week has a failure signal — when it lights up, roll back.
-
WEEK 1 – 2 · FIND CANDIDATES
List the 3 highest-frequency subtasks in your workflow
Not "things I want to replace with AI" — things you do "5+ times a week, almost the same steps each time." These are the natural Skill candidates. Write them down: name / frequency / input / output / current time spent. Failure signal: can't list 3 — your workflow isn't regular enough yet, go back to Ep 01. What to do: spend another week doing it purely manually, jotting down every task as you do it.
-
WEEK 3 – 4 · WRITE THE FIRST SKILL
Pick the most painful candidate — write it as SKILL.md and run it
Write a standalone file with the 4 parts from §03 (DESCRIPTION / CONTRACT / PROMPT / TOOLS). Hook it into Claude Code or your own agent and run it for a week — every time the task comes up, just call it, never hand-write the prompt. Failure signal: by end of Week 4 you're still frequently editing the prompt inside SKILL.md. What to do: it's not Skills failing — your task isn't SOP-ready. Go back and decompose it further.
-
WEEK 5 – 8 · FILL UP ONE AGENT
Write 3-5 Skills and load them into the same Agent
Turn the other 2-4 tasks from Week 1-2 into Skills too. Your agent should now handle 5 different task types in a single context — experientially, it goes from "generalist assistant" to "colleague with real expertise". Failure signal: agent routinely picks the wrong Skill / mixes logic from two Skills. What to do: you loaded too many — trim back to the 3 highest-frequency, save the rest for the Multi-agent phase.
-
WEEK 9 – 12 · FIRST DIVISION OF LABOR
Pick a complex task that won't run, design a 2-3 agent collaboration
Find a task that actually hits a §02 ceiling — single agent breaks past 50 steps, or needs an independent critic, or needs parallel processing. Pick one of the 4 patterns in §04 and build a minimal viable multi-agent system. Don't go big, don't aim for complete — just make the simplest version that runs end-to-end. Failure signal: by Week 12 you're still tweaking prompts / protocols, end-to-end doesn't run. What to do: usually the system you designed is too complex — strip it down to two agents, then ask which one really needs to be independent.
After 12 weeks, you should be able to answer:
"Which part of my daily work is one Agent calling a set of Skills? Which part is multiple Agents collaborating?"
If you can point to both, with outputs and logs — you've crossed the Ep 01 → Ep 02 boundary.
If you can't — you're still working out of "one ChatGPT window," not "building a system."
From single Agent to Multi-agent — what's missing isn't a new concept,
it's you picking one workflow and sitting down to build it
After this episode, you can tell where single Agent / Skills / Multi-agent each belong. But "telling them apart" ≠ "building one". What the next episode covers depends on whether you actually land this one in a workflow that runs.
One complete, copyable Agent case study every week — prompts, tools, steps, all public.
This newsletter doesn't "cheer you on." Whether it runs is on you.
← EP 01 · What is AI Agent · EP 02 (you are here) · EP 03 (coming)