Can “Vibe Coding” Beat Grad CS Students? Not Yet.
Can “vibe coding” beat grad CS students?
Researchers staged a coding tournament in a realistic logistics game: companies bid in auctions, then plan pickup-and-delivery routes under capacity limits. Agents must both bid strategically under uncertainty and optimize routes to maximize profit.
What happened
- Human-coded agents (from graduate students) dominated: the top 5 spots across repeated tournaments.
- 33 of 40 LLM-coded agents lost to very simple baselines.
- Even when given the best human solution to “improve,” the strongest LLM made it worse.
“Vibe coding” (asking an LLM to write complex systems from loose, high-level vibes) didn’t rescue performance.
Why it matters
Passing unit tests isn’t enough. Real-world coding often requires multi-agent reasoning, planning, and strategy. This study urges new benchmarks that test those skills—not just syntax and small functions.
Paper: https://arxiv.org/abs/2511.20613v1
Paper: https://arxiv.org/abs/2511.20613v1
Register: https://www.AiFeta.com
#AI #LLMs #Coding #Benchmark #MultiAgent #Logistics #Optimization #SoftwareEngineering #Research