$47,000 Burned While Everyone Slept

In October 2025, engineer Teja Kusireddy published a post on Medium that went viral in AI circles. The headline: "We Spent $47,000 Running AI Agents in Production." The story behind it was simple, brutal, and entirely preventable: two AI agents got stuck talking to each other in an infinite loop. For eleven days. Nobody noticed.

Week 1 of the deployment: $127 in API costs. Week 2: $891. Week 3: $6,240. Week 4: $18,400. By the time someone pulled the plug, the total bill was $47,000 — for a system that was doing nothing useful.

What Happened

Kusireddy's team had deployed four LangChain agents coordinating via Agent-to-Agent (A2A) communication to help users research market data. The architecture was standard: modular agents, each with a defined task, passing messages to each other through a shared workflow. On paper, it was the kind of multi-agent system that gets praised in conference demos and architecture blog posts.

In practice, two of the agents entered a recursive loop. Agent A asked Agent B a question. Agent B's response triggered Agent A to ask a follow-up. That follow-up triggered another response. And another. And another. Endlessly.

Each message in the loop consumed API tokens. Each token cost money. The agents weren't producing useful output — they were generating an ever-growing bill, one completion at a time.

For eleven days, the system appeared to be running normally. There were no errors. No crashes. No alerts. The agents were doing exactly what agents do: processing inputs, generating outputs, calling APIs. The fact that the inputs and outputs were circular and meaningless was invisible to anyone who wasn't monitoring the actual content of the inter-agent communication.

The Architecture That Made It Possible

The failure wasn't a bug in LangChain or A2A or any specific component. It was a gap in the architecture — the same gap that exists in almost every multi-agent deployment today.

No shared state. Each agent operated with its own context. There was no global view of what the system as a whole was doing. No orchestrator was tracking whether the agents were making progress toward an actual goal or just generating tokens.

No loop detection. In any system where autonomous components communicate, cycles are a known risk. Network protocols have TTL (time-to-live) fields. Distributed systems have circuit breakers. Multi-agent AI systems have... nothing. There was no mechanism to detect that two agents had been exchanging messages for hours with no resolution.

No cost controls. The API billing was on autopilot. There was no spending threshold that would pause execution and require human approval to continue. No daily budget cap. No anomaly detection on token consumption. The billing API happily processed every request, and the credit card happily paid.

No output validation. Nobody was checking whether the agents' outputs were actually useful. The system measured activity, not outcomes. As long as the agents were generating responses, the system reported healthy. The distinction between productive work and expensive noise was invisible.

No human checkpoint. For eleven days, no human looked at what the agents were actually doing. The assumption was "it's running smoothly." This assumption was based on the absence of errors — which is not the same as the presence of results.

The Broader Pattern

Kusireddy's $47,000 lesson isn't an outlier. It's the predictable consequence of deploying autonomous systems without operational monitoring.

Multi-agent architectures are powerful precisely because they enable agents to work independently and communicate without human intervention. But that independence is also the risk. When agents operate autonomously, the failure modes are autonomous too. They don't crash loudly. They don't throw exceptions. They just... keep going. Quietly. Expensively.

The industry has a word for this: "silent failure." But in multi-agent systems, it's worse than silence — it's active mimicry. The system looks healthy because it's active. It's generating API calls, consuming compute, producing outputs. Everything that monitoring typically checks — uptime, response codes, throughput — looks fine. The failure is semantic, not syntactic. The agents are doing things. They're just not doing anything useful.

The Cost of Missing Infrastructure

Kusireddy's post made an argument that resonated widely: A2A communication and Anthropic's Model Context Protocol (MCP) are revolutionary protocols for agent interoperability. But the infrastructure layer that makes them safe to use in production doesn't exist yet.

He's right. Today's agent ecosystem has solved the communication problem. Agents can talk to each other. They can access tools and data through standardized protocols. They can coordinate on complex tasks. What the ecosystem hasn't solved is the supervision problem: who watches the agents while they work?

In traditional software, operations teams monitor dashboards. SREs set up alerts. Deployment pipelines have health checks. Cost monitoring triggers automatic shutdowns. These aren't glamorous features. They don't make for exciting demos. But they're the difference between a system that runs in production and a system that runs up a bill in production.

For AI agents, the equivalent infrastructure barely exists. There are no standard tools for monitoring inter-agent communication patterns. No off-the-shelf solutions for detecting semantic loops. No commonly adopted frameworks for cost governance in multi-agent deployments. Every team is building these guardrails from scratch — if they're building them at all.

What $47,000 Buys You in Lessons

Kusireddy's team learned several things the expensive way:

Agent-to-agent communication needs circuit breakers. If two agents have exchanged more than N messages without producing a deliverable output, something is wrong. The system should pause and escalate, not continue burning tokens.

Cost monitoring must be real-time and agent-aware. Knowing your total API spend is not enough. You need to know spend per agent, per task, per conversation. Anomaly detection should flag when an agent's token consumption deviates from its historical baseline.

Activity is not progress. Agent monitoring must include semantic health checks — not just "is the agent running?" but "is the agent producing meaningful results?" This requires output validation that goes beyond HTTP status codes.

Autonomous systems need autonomy budgets. Every agent session should have a defined resource envelope: maximum tokens, maximum duration, maximum cost. When the budget is exhausted, the agent stops. Not "suggests stopping." Stops.

Human checkpoints aren't optional. For long-running agent workflows, periodic human review isn't a bottleneck — it's a safety mechanism. A five-minute human review on day 2 would have caught the loop and saved $46,000.

The Pattern Repeats

Summer Yue's OpenClaw deleted her emails because compaction erased her safety instructions. Replit's agent destroyed a production database during a code freeze because there was no hard permission boundary. Kusireddy's agents burned $47,000 because there was no loop detection or cost governance.

Three different systems. Three different failure modes. One common root cause: AI agents in production without an operational governance layer.

The agents are getting more capable every month. The protocols for agent communication are maturing. The infrastructure for agent supervision is not keeping pace. That gap is where the $47,000 invoices — and far worse — will keep coming from.

Sources

Towards AI / Medium — Teja Kusireddy, "We Spent $47,000 Running AI Agents in Production. Here's What Nobody Tells You About A2A and MCP," October 16, 2025
TechStartups — "AI Agents Horror Stories: How a $47,000 AI Agent Failure Exposed the Hype and Hidden Risks of Multi-Agent Systems," November 14, 2025