AIResearch - AI Feta, the news about scientific AI research

EmbodiedAI

VirtualEnv: A Next-Gen Playground for Embodied AI

LLMs are getting better at reasoning—but can they act in the world? VirtualEnv is a new open-source simulation platform (built on Unreal Engine 5) that lets researchers put AI agents to the test in rich, interactive 3D worlds. * Fine-grained benchmarking of embodied skills: navigation, object manipulation, multi-agent teamwork. * Game-inspired

AI

ArcAligner: Helping AI use compressed context without losing accuracy

LLMs with Retrieval-Augmented Generation (RAG) work better when they read lots of context — but long prompts are slow and pricey. Compressing the context helps, yet models often lose the thread and answer worse. ArcAligner is a lightweight module that helps models make sense of highly compressed context. It "aligns&

Robotics

Embedding Autonomous Agents in Resource-Constrained Robotic Platforms

Tiny robot, smart decisions—no cloud needed Researchers embedded an autonomous agent (written in AgentSpeak) into a palm-sized, two-wheeled robot that navigates a maze using only its onboard sensors and reasoning. No remote controller and no constant internet required. * Fast thinking: Each decision phase took under 1 millisecond. * Real-time results:

EmbodiedAI

WoW-World-Eval: A Turing Test for Robot-Ready Video AI

What’s new As AI video models are used as “world models” for robots, we need to know if their imagined futures match reality. WoW-World-Eval (Wow, wo, val) is a public benchmark that stress-tests these models before we trust them on real machines. * Five skills: perception, planning, prediction, generalization, execution.

LLM

Smarter LLM Pretraining: Beyond URLs

What’s new Adding the right metadata can make LLM pretraining faster and more effective—and it’s not just about URLs. * Fine‑grained signals work: Prepending detailed quality indicators helps models learn quicker. * Append-and-predict: Appending metadata and training the model to predict it as an auxiliary task boosts efficiency.

AI

On Evaluating LLM Alignment by Evaluating LLMs as Judges

How do we know if an AI model is truly aligned with human preferences—helpful, honest, safe, and instruction-following? This paper explores a surprisingly effective shortcut: judge the judges. Instead of grading a model’s open-ended answers (which needs lots of human effort or very strong AI judges), the authors

AudioAI

Faster, Cleaner Vocal Isolation with Latent Diffusion

AI that pulls the vocals out—fast Ever wished you could solo a singer from any track without artifacts? This research shows how, using a latent diffusion model that separates vocals from full mixes. * Efficient: Generates in a compact audio latent space, then decodes to sound—speeding up training and

AI

Evaluating AI by How Well It Judges: Meet AlignEval

How do we know an AI is truly helpful, honest, safe—and follows instructions? Today, we usually read its answers and score them, which can be slow, costly, and subjective. This paper finds a simple clue: when compared to a trusted reference, models that write well also judge well. The

QuantumComputing

Let Evolution Design Your Quantum Autoencoder

What if we could let evolution design quantum circuits that compress data? This paper explores exactly that with quantum autoencoders—models that shrink high‑dimensional quantum or classical data while keeping the important bits. The authors introduce a neural architecture search framework powered by a genetic algorithm. Instead of hand‑

AIResearch

DR Tulu: Evolving Rubrics Teach AI to Do Deep Research

Long, well-sourced answers are hard for AI to learn: most training rewards short, easily graded Q&A. This work introduces Reinforcement Learning with Evolving Rubrics (RLER) - grading guides that co-evolve with the model, so feedback stays aligned with what the model actually explores during multi-step research. Built with

AI

Probing AI Isn’t So Simple: Synthetic Training Can Mislead

Probing AI Isn’t So Simple To keep AI models honest, researchers train tiny “probes” that look inside a model’s activations to flag behaviors like deception or sycophancy. But real examples of these behaviors are rare, so teams often use synthetic AI-generated data instead. This study tested how well

machinelearning

Smarter AI with Less Labeled Data: Unsupervised Data Augmentation

Training AI usually needs lots of human-labeled examples. This work shows a different path: make models learn from plenty of unlabeled data by asking them to stay consistent under strong "noise." Instead of simple tweaks (like small crops or word drops), they use powerful data augmentation: RandAugment for