Can “Vibe Coding” Beat Grad CS Students? Not Yet.

Can “Vibe Coding” Beat Grad CS Students? Not Yet.

Can “vibe coding” beat grad CS students?

Researchers staged a coding tournament in a realistic logistics game: companies bid in auctions, then plan pickup-and-delivery routes under capacity limits. Agents must both bid strategically under uncertainty and optimize routes to maximize profit.

What happened

  • Human-coded agents (from graduate students) dominated: the top 5 spots across repeated tournaments.
  • 33 of 40 LLM-coded agents lost to very simple baselines.
  • Even when given the best human solution to “improve,” the strongest LLM made it worse.

“Vibe coding” (asking an LLM to write complex systems from loose, high-level vibes) didn’t rescue performance.

Why it matters

Passing unit tests isn’t enough. Real-world coding often requires multi-agent reasoning, planning, and strategy. This study urges new benchmarks that test those skills—not just syntax and small functions.

Paper: https://arxiv.org/abs/2511.20613v1

Paper: https://arxiv.org/abs/2511.20613v1

Register: https://www.AiFeta.com

#AI #LLMs #Coding #Benchmark #MultiAgent #Logistics #Optimization #SoftwareEngineering #Research

Read more