The Agent Building Boom Is Over,Now Comes the Hard Part of Running Them

Q: Why Building Agents Is Easy but Running Them Is Hard?

The AI landscape has shifted dramatically. Claude can generate a complete agent before your morning coffee gets cold, and any developer can now build sophisticated internal tools that previously required months of engineering effort. But this democratization of building has exposed a critical gap: the infrastructure required to run these agents reliably, securely, and cost-effectively remains complex and specialized . The problem manifests in unexpected ways. An AI model might architect a DevOps setup that costs $5,000 per month when the same system could run efficiently at $500 per month. Agents can generate and execute untested code, face prompt injection attacks, or consume tokens unpredictably as workloads scale. Without proper guardrails, a single runaway operation could compromise core systems. These operational challenges have created what some call "shadow IT on steroids," where companies deploy dozens of custom agents without the infrastructure expertise to manage them safely .

Q: How Are Platforms Solving the Agent Operations Problem?

Vercel and Microsoft have taken different but complementary approaches to building agent orchestration platforms. Both recognize that the real competitive advantage no longer comes from whether you can build an agent, but from your ability to iterate rapidly on agents that solve real business problems while operating them reliably at scale . Vercel's approach centers on providing infrastructure primitives purpose-built for agent workloads. The company built its own internal data agent called d0, a text-to-SQL engine that democratized data access previously limited to professional analysts. One engineer built d0 in a few weeks using just 20 percent of their time, only because Vercel's built-in primitives and deployment infrastructure automatically handled the operational complexity that would normally require months of engineering effort . Today, d0 handles natural language questions from engineers, marketers, and executives. When a user asks "What was our Enterprise ARR last quarter?" in Slack, the agent determines the appropriate data access level based on user permissions, explores a semantic layer of YAML-based configurations describing the data warehouse, uses AI SDK for streaming responses and tool use, and orchestrates agent steps durably so that failures like Snowflake timeouts trigger automatic retries. The answer arrives back in Slack, often with charts or Google Sheet links . Microsoft's Agent Framework takes a different angle, focusing on multi-agent orchestration for complex incident response. The company built an On-Call Copilot that ingests raw incident signals and produces structured triage, Slack updates, stakeholder briefs, and draft post-incident reports in under 10 seconds. Rather than asking a single model to process 800 lines of logs, 10 alert lines, and dense metrics while producing four distinct output formats, the system decomposes the task into four specialized agents running in parallel . Eric Dodds, Content Engineering Lead at Vercel, expl

Q: What Makes Multi-Agent Orchestration Practical for Production?

Microsoft's On-Call Copilot demonstrates why decomposing complex tasks into specialized agents produces better results than single-prompt approaches. The system runs four specialist agents concurrently: one for root cause analysis and immediate actions, another for incident narrative and status, a third for audience-appropriate communications, and a fourth for post-incident reports. Each agent receives a tightly scoped system prompt that defines its output schema and guardrails, preventing hallucinations and ensuring structured JSON output . The architecture reveals a key insight: agent behavior is defined entirely by instruction text strings, not code. A non-developer can refine agent behavior by editing the prompt and redeploying without any Python changes needed. This separation of concerns makes agent systems more maintainable and allows domain experts to iterate on behavior without requiring engineering involvement . The On-Call Copilot accepts a single JSON envelope containing incident data, alerts, logs, metrics, runbook excerpts, and operational constraints. The system then routes this through the four specialist agents, which run in parallel via asyncio.gather(). The triage agent analyzes root causes with confidence scores and evidence, the summary agent produces concise incident narratives, the communications agent drafts Slack updates with emoji conventions and non-technical stakeholder briefs, and the post-incident report agent generates chronological timelines with quantified customer impact and prevention actions . This approach addresses two fundamental problems with single-prompt triage. First, context overload: a real incident may contain hundreds of lines of logs and dense metrics that push token limits and degrade quality when processed by a single model. Second, conflicting concerns: triage reasoning and communication drafting are cognitively different tasks, and a model optimized for structured JSON analysis often produces stilted Slack messages

FrontierNews.ai AI Research Desk

FrontierNews.ai