MineNPC-Task: Teaching AI to Remember in Minecraft
How do we know if game-playing AIs can plan, act, and remember like good teammates? Meet MineNPC-Task—a new open benchmark for memory-aware AI agents inside Minecraft’s open world.
Instead of toy prompts, tasks come from real co-play sessions with expert players, then get turned into templates with clear preconditions and dependencies. Machine checks verify progress under a “no out-of-world shortcuts” policy, while the harness logs key events: plan previews, clarifying questions, memory reads/writes, checks, and repairs.
- Initial snapshot: GPT-4o tested on 216 subtasks with 8 experienced players.
- Common breakdowns: code execution, inventory/tool handling, referencing, and navigation.
- Bright spots: mixed-initiative clarifications and lightweight memory often enabled recovery.
- Player feedback: positive UX, but stronger long-term memory is needed.
The team is releasing the full suite—tasks, validators, logs, and harness—for transparent, reproducible evaluation of future embodied agents.
Paper: https://arxiv.org/abs/2601.05215v1
Paper: https://arxiv.org/abs/2601.05215v1
Register: https://www.AiFeta.com
AI Minecraft LLM Agents Benchmark Memory HumanAI OpenSource EmbodiedAI