MineNPC-Task: Teaching AI to Remember in Minecraft

MineNPC-Task: Teaching AI to Remember in Minecraft

How do we know if game-playing AIs can plan, act, and remember like good teammates? Meet MineNPC-Task—a new open benchmark for memory-aware AI agents inside Minecraft’s open world.

Instead of toy prompts, tasks come from real co-play sessions with expert players, then get turned into templates with clear preconditions and dependencies. Machine checks verify progress under a “no out-of-world shortcuts” policy, while the harness logs key events: plan previews, clarifying questions, memory reads/writes, checks, and repairs.

  • Initial snapshot: GPT-4o tested on 216 subtasks with 8 experienced players.
  • Common breakdowns: code execution, inventory/tool handling, referencing, and navigation.
  • Bright spots: mixed-initiative clarifications and lightweight memory often enabled recovery.
  • Player feedback: positive UX, but stronger long-term memory is needed.

The team is releasing the full suite—tasks, validators, logs, and harness—for transparent, reproducible evaluation of future embodied agents.

Paper: https://arxiv.org/abs/2601.05215v1

Paper: https://arxiv.org/abs/2601.05215v1

Register: https://www.AiFeta.com

AI Minecraft LLM Agents Benchmark Memory HumanAI OpenSource EmbodiedAI

Read more