Multiagentsystem: Arkitektur och fallgropar
A single AI agent works until it does not. You give it a customer support task, then add CRM updates, then lead scoring, then email drafting, then analytics queries. By the fifth tool, the agent is confused, slow, and expensive. The context window is stuffed, the model struggles to choose between 15 tools, and response quality drops with every addition.
Multi-agent systems solve this by splitting complex workflows across specialized agents. Each agent does one thing well. An orchestrator routes tasks to the right specialist. The result is faster, cheaper, and more reliable than a single monolithic agent.
But multi-agent systems introduce their own failure modes. Here is how to architect them correctly and avoid the pitfalls.
When Single Agents Break Down
Three signals indicate you need multi-agent architecture:
Tool Overload
Models perform worse as the number of available tools increases. Our benchmarks show:
| Tools Available | Correct Tool Selection | Response Quality |
|---|---|---|
| 3-5 tools | 95%+ | Baseline |
| 6-10 tools | 88-92% | -5% |
| 11-20 tools | 75-85% | -15% |
| 20+ tools | 60-70% | -25% |
If your agent needs more than 8-10 tools, split it into specialized agents with 3-5 tools each.
Context Window Pressure
A single agent handling a complex workflow accumulates context: system prompt, tool definitions, conversation history, retrieved documents, tool results. A customer support agent with fine-tuning-comparison">RAG, CRM access, and order management can easily consume 50,000+ tokens per request. That costs $0.15 per interaction on Claude Sonnet and degrades response quality as the model processes more irrelevant context.
Specialization Requirements
Different parts of a workflow need different expertise. A sales pipeline agent needs to:
- Qualify the lead (language understanding)
- Look up the company (data retrieval)
- Score the opportunity (numerical reasoning)
- Draft a response (creative writing)
Each sub-task benefits from a different system prompt, different temperature setting, and potentially a different model. A multi-agent system gives each sub-task its optimal configuration.
Orchestration Patterns
Sequential Pipeline
Agents execute in order. The output of one becomes the input of the next.
Input → Agent A (Extract) → Agent B (Enrich) → Agent C (Score) → Agent D (Draft) → Output
Best for: Workflows with clear sequential stages. Document processing pipelines, content creation workflows, data transformation chains.
Advantages: Simple to implement, easy to debug (check each stage's output independently), natural checkpoints for human review.
Disadvantage: Total latency is the sum of all agent latencies. A 5-agent pipeline with 2-second agents takes 10 seconds minimum.
Parallel Fan-Out
Multiple agents process the same input simultaneously. Results are aggregated.
┌→ Agent A (Sentiment) ──┐
Input ───┤→ Agent B (Category) ───┤→ Aggregator → Output
└→ Agent C (Priority) ───┘
Best for: Tasks where multiple independent analyses are needed. Support ticket processing (classify, prioritize, route simultaneously), content analysis (SEO score, readability, compliance check in parallel).
Advantages: Total latency equals the slowest agent, not the sum. A 3-agent fan-out with 2-second agents takes 2 seconds, not 6.
Disadvantage: All agents must work from the same input. If later agents need earlier agents' outputs, fan-out does not work.
Hierarchical Delegation
A manager agent decides which specialist agents to invoke and in what order.
Input → Manager Agent → decides route
├→ Specialist A (if billing question)
├→ Specialist B (if technical issue)
├→ Specialist C (if feature request)
└→ Specialist D (if escalation needed)
Best for: Open-ended workflows where the path depends on the input. Customer support, general-purpose assistants, complex decision-making.
Advantages: Flexible — new specialists can be added without changing the orchestration logic. The manager agent adapts routing based on context.
Disadvantage: The manager agent is a single point of failure. If it misroutes, the specialist produces a wrong answer confidently. Manager routing accuracy must be 95%+ for the system to work.
Iterative Refinement
Multiple agents refine the same output in rounds.
Input → Generator Agent → Critic Agent → Generator (revised) → Critic → ... → Output
Best for: Content quality, code generation, analysis tasks where initial outputs need improvement. The critic agent catches errors, missing context, or quality issues that the generator missed.
Advantage: Output quality improves with each round.
Disadvantage: Each round costs tokens and adds latency. Diminishing returns after 2-3 rounds. Set a maximum iteration count.
Communication Between Agents
Information loss between agents is the most common multi-agent failure. Agent A knows something critical. Agent B does not receive it. The final output is wrong.
Structured Handoffs
Define an explicit data contract between agents:
interface AgentHandoff {
task_id: string;
source_agent: string;
target_agent: string;
context: {
original_query: string;
extracted_entities: Record<string, string>;
decisions_made: string[];
confidence: number;
};
instructions: string;
}
Every handoff includes the original query (not a summary), the entities extracted, decisions already made, and the confidence level. The receiving agent has full context without needing to re-process the original input.
Shared Memory
For complex workflows where multiple agents need access to evolving state:
Agent A writes → Shared State Store (Redis/DB) ← Agent B reads
Agent C writes → ← Agent D reads
The shared store contains the conversation state, intermediate results, and any context that multiple agents need. Each agent reads the latest state before processing and writes its results back.
Message Passing
For loosely coupled agents that communicate through a message queue:
- Agent A publishes "lead_qualified" event with lead data
- Agent B subscribes to "lead_qualified" and starts CRM enrichment
- Agent C subscribes to "lead_qualified" and starts email drafting
This decouples agents — they do not need to know about each other. New agents can subscribe to existing events without changing existing code.
The Pitfalls
Cascading Errors
Agent A makes a small error. Agent B amplifies it. Agent C acts on the amplified error. By the end of the pipeline, the output is completely wrong and no single agent is obviously at fault.
Fix: Validate outputs at each stage. If Agent A's output does not meet quality thresholds, stop the pipeline and escalate rather than passing garbage downstream. Implement "circuit breakers" that halt execution when error rates spike.
Cost Multiplication
A single-agent request costs $0.01. A 5-agent pipeline costs $0.05-$0.10. At 10,000 requests per day, that is the difference between $3,000/month and $15,000-$30,000/month. Multi-agent systems multiply costs linearly with the number of agents.
Fix: Not every request needs every agent. Use the manager agent to route simple requests to a single specialist and only invoke the full pipeline for complex requests. In practice, 60-70% of requests can be handled by a single agent.
Debugging Complexity
When the final output is wrong, which agent caused the error? In a 5-agent pipeline, you need to inspect every intermediate result to find the failure point.
Fix: Log every agent's input, output, and reasoning for every request. Build a trace viewer that shows the full execution path. Without this, debugging multi-agent systems is impossible at scale.
Latency Accumulation
Sequential pipelines accumulate latency. Each agent adds 1-3 seconds. A 5-agent pipeline can take 10-15 seconds — unacceptable for interactive use cases.
Fix: Parallelize where possible. Use model routing to assign faster, smaller models to simpler agents. Cache intermediate results for recurring patterns. Set latency budgets per agent and alert when exceeded.
Orchestration Overhead
The manager/orchestrator agent consumes tokens and adds latency without producing user-visible output. In complex systems, orchestration can account for 20-30% of total cost.
Fix: For predictable workflows, use deterministic routing (code) instead of LLM-based routing. Only use an LLM orchestrator when the routing decision genuinely requires language understanding.
FAQ
Which multi-agent framework should I use? LangGraph is the most mature for complex stateful workflows. CrewAI is simpler but less flexible. AutoGen is research-oriented and not production-ready. For most business applications, we recommend LangGraph for complex orchestration and no framework (just direct API calls) for simple pipelines. Frameworks add abstraction overhead — only use one if the orchestration logic is genuinely complex.
How do I test multi-agent systems? Test each agent independently first with its own evaluation suite. Then test the full pipeline with end-to-end test cases. The most critical tests are handoff tests — verify that information passes correctly between agents and that error conditions are handled at each boundary.
How do I predict costs? Count the number of agents in your typical flow. Multiply single-agent cost by that number plus 20% for orchestration overhead. For fan-out patterns, the cost is the sum of all parallel agents. For conditional flows, weight by the probability of each path.
When is multi-agent overkill? If your workflow has fewer than 8 tools and fits within a single context window comfortably, a single well-designed agent is simpler, cheaper, and easier to debug. Multi-agent architecture is justified when single-agent quality degrades measurably.
Multi-agent systems are powerful but add complexity. Start simple and add agents only when measurement shows the need. If you are designing a multi-agent system, our team can help with architecture.