Why Agentic AI Hasn't Taken Over the Dispatch Desk (Yet)

In May 2026, when Google shipped Antigravity 2.0 at I/O, people started talking more about applying the advances in agentic coding also to the transport field. After all, it should be easy: if agents can build software, then surely they can automate much of a dispatchers work? It sounds like a logical consequence, but it is wrong for reasons that have nothing to do with how smart the models have become.

Last summer I wrote about everything agentic AI could do for a fleet: the autonomous dispatcher, the personal driver coach, the agent hunting freight marketplaces for extra margin in the background. I promised a follow-up on the risks. This is part of that follow-up, except the interesting risk turned out not to be the sci-fi one (the agent that hallucinates and sends your fleet to the wrong depot). It is structural, and it explains why the breakthrough keeps not happening.

I built and ran fleet management platforms long enough to understand the temptation. A dispatch screen today is a cockpit: a dense board, a map, a stack of lists, all of it tuned so a human can plan, assign and intervene as fast as possible. The agentic world promises to flip that cockpit around. Instead of the human operating the tool, the human supervises agents who do the work. Antigravity 2.0 is the purest expression of this idea yet. It is no longer an IDE at all, but a standalone command center for running agents: you spawn dynamic subagents in parallel, you schedule them to run in the background on a cron-job or use loop-engineering if you want to ride the latest wave, and they hand back artifacts (plans, screenshots, verification results) that you inspect. Google’s own pitch is that exactly these artifacts and verification results are what let the user “gain trust”, with a form factor designed to expose the agent’s autonomy as smoothly as possible. On top of all that sits an autonomy dial. And the whole product is being pushed deliberately out of coding and into general operations.

So of course people look at it and think: dispatch is next. It is tempting to port this model straight onto the dispatch desk. You only have to ask why it worked in coding, and then check whether those reasons hold in logistics. Because unfortunately they don’t.

Why Coding Agents Won (And Why It Has Nothing to Do With the Model)

Coding agents took over because two properties came together, and neither of them has anything to do with model quality.

First: errors are cheap. An agent can work in a sandbox, and if it produces nonsense, a git revert puts the world back the way it was. The blast radius is contained, reality is untouched, and the failed attempt cost you a few minutes of compute.

Second: verification is cheap. A green test, a short walk-through a new workflow, a linter with no complaints. The proof that something works is orders of magnitude cheaper than the work of building it. This is the quiet engine under Google’s “gain trust through artifacts” framing: you trust the agent because you can confirm its output for almost nothing. That is also why “verify after the run” is a viable strategy at all. You let the agent loose and check afterwards, because checking costs next to nothing.

I have written at length about how I use coding agents day to day, and the pattern that makes the whole thing work is precisely this: I decide, the agent executes, and I confirm the result cheaply afterwards. The elegance of the agentic IDE stands or falls with those two properties. The autonomy dial is the right governing primitive only because, in the worst case, a mistake is a bug you roll back.

Dispatch Has Neither of Those Properties

A dispatch decision commits a physical truck, a driver, energy, tolls and a customer promise into the real world. There is no git revert for 180 kilometres in the wrong direction. There is none for a day’s driving time once it has been burned. And there is none for a missed delivery window at your most important retail customer, the kind that earns you an unpleasant email on Monday and a tougher contract negotiation at quarter-end. Mistakes in dispatch are expensive and physically irreversible.

The second property is worse. Verifying a route plan is almost as expensive as creating it. In software development, testing things automatically or even manually is a lot cheaper than the actual development work in our days. In dispatch there is no such cheap verification short-cut. If I want to judge whether a proposed tour plan is any good, I have to re-run most of the optimization in my own head. And that is where the entire value of the agent evaporates: if the human has to recompute the result anyway in order to trust it, he might as well have planned it himself.

So here is the point. Many are trying to copy the autonomy dial from the coding tools and treating it as the main thing to get right. For an agentic dispatcher, it simply isn’t. What actually matters is that before anything happens in the real world, the agent shows you a plan you can check fast, and proves up front that it broke none of your hard rules. So stop asking “how much can the agent decide for itself?” and start asking “how do I see what a decision will cost me in five seconds, before a truck actually moves?” Check the plan before you let it run.

What a 1% Error Rate Looks Like at Fleet Scale

Let me make this concrete, because it is the part the vendors are quietest about.

In my Webfleet years I watched customers become utterly dependent on their fleet management system. Dispatchers stopped keeping a paper fallback. Whole operations were planned, tracked and billed through the platform, and that dependency was a sign of how good the product had become. But it had a flip side. The moment something broke, a sync delay, a connectivity issue with the mobile carrier, a feature behaving differently after a release, the service lines lit up within minutes. In times like this, being a customer support agent was not fun at all. And remember: that was the deterministic world. The system did exactly what it was told. The error was ours to find, ours to reproduce, and ours to fix.

Now put an agentic dispatcher in that same position, except this time the system generates its own errors as a normal operating condition. Today’s agents are not deterministic and they are not error-free. Worse, their errors compound. The research is unkind here: even 99% reliability per step decays to about 37% over a hundred steps, and one production analysis found a 1% per-token error rate compounding to 87% cumulative failure on a long task. So a 1% error rate on a finished dispatch decision is, if anything, a generous assumption.

Take that charitable 1% and do the arithmetic at the scale these systems actually run. The European market leaders carry well over a million vehicles each. Geotab passed five million connected vehicles in 2025. Say each vehicle generates a modest five dispatch decisions a day. For a million-vehicle fleet that is five million decisions daily, and a 1% error rate is fifty thousand wrong dispatches. Every single day. Roughly two thousand an hour. Talk about life of a customer service agent in this world, shall we?

In coding, a wrong action is a red test you retry silently, and nobody ever knows. In dispatch, each of those fifty thousand is a physical truck that went somewhere wrong, a driver whose legal hours are now burned, a customer who needs a phone call. There is no silent retry. And there is no support organization on earth staffed to absorb fifty thousand of those a day. “Verify after the run” is not a strategy here. It is a denial-of-service attack on your own service desk.

First: The Artifact Has to Dissolve the Verification Asymmetry

If checking is almost as expensive as planning, then designing the thing you check is the real product work. And it only works if you stop handing the human a raw assignment to recompute, and instead give him two cleanly separated layers.

The first layer is the hard constraints, and the agent presents them as a guarantee rather than as something for you to re-check. This route respects driving and rest times. It hits the customer time windows. It fits the vehicle and the load. It stays inside the toll budget, or for an electric fleet, inside the energy and charging-window budget. These are deterministic, machine-checkable statements, and the human has to be able to trust them without recomputing them, because the check happens in the machine and not in the dispatcher’s head.

The second layer is the trade-off, and only that belongs in human judgment. This variant is forty euros cheaper but raises the risk of a late delivery at customer X. That variant spares your scarce driver but costs you an empty-kilometre loop. The human verifies the judgment about the trade-off, not the correctness of the assignment. That separation, deterministic guarantee underneath, judgeable trade-off on top, is the difference between an agent you trust and one you check against every single time.

Second: The Autonomy Slider Is Liability Allocation, Not a UX Gimmick

In the IDE, the autonomy slider is a comfort setting. You turn it up when you trust the agent in your own repository, and down when you don’t. Worst case, you pay for that trust with a bug. Antigravity 2.0 even lets you schedule agents to run on a cron, unsupervised, in the background. For software that is a productivity feature. For dispatch it is something else entirely. A cron job that commits real-world decisions while you sleep is a liability instrument, whatever the marketing calls it.

Because in logistics, the sentence “the agent decided autonomously” is one that can end up in an accident report or in a proceeding at the BALM, Germany’s federal logistics and mobility authority. Under present legislation, the operator is responsible for organizing driving and rest times, and that responsibility cannot be handed off to a piece of software. This changes the meaning of the dial completely. It no longer stands for “how much autonomy feels good to me”, it stands for “for which classes of decision am I willing to carry the liability for an automated commit”.

So autonomy is earned not by trust in the model, but by the reversibility and liability exposure of the specific decision. Re-routing inside the running shift is low exposure and can run on autopilot. Re-tasking a driver who is close to his limit is high exposure, and a human commits that. The dial survives. But where the coding version measures how much you trust the agent, the logistics version measures what a wrong decision would actually cost.

What Would Prove Me Wrong

A thesis that explains everything and rules out nothing is worthless. So here is the honest part: two developments would tip my argument over, and both are worth watching.

The first would be cheap, sufficiently realistic simulation. If you could test a dispatch plan against the real world as cheaply and reliably as you run a unit test today, then verification would be cheap again, the coding analogy would hold, and the autonomy dial really would be the right primitive after all. Digital twins point in this direction, but they are a long way from that reliability, and in dispatch the devil sits in the factors no twin knows about: the jam forming right now, the customer who reshuffles on a whim, the driver who is just slower today.

The second would be a change in liability. The way the law is slowly opening up to autonomous driving, it might one day accept a certified autonomous dispatch where the operator no longer answers for every automated commit. Then the whole calculation moves. Until something shifts on one of those two fronts, verification before the commit stays the binding design constraint.

The Real Design Question

So the interesting question for the next generation of dispatch interfaces is not how autonomous we can make the agent. It is this: how do I compress the consequence of a decision so tightly that a human can check it, and own it, in five seconds? Whoever solves the artifact, not the autonomy, builds the interface that wins.

None of this means agents are useless in a fleet today, btw. The version that works right now is the one that commits nothing to the real world: it watches the telematics stream and flags the exception a human should look at, no autonomy required. That is a piece of its own, and it is the one I plan to write next.

If you are running a fleet, or building the software for one, and you are trying to separate the agentic signal from the vendor noise, this is exactly the kind of question I help teams work through. Feel free to reach out.