AI Feta, the news about scientific AI research (Page 57)

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Paper: http://arxiv.org/abs/2510.03160v1 Register: https://www.AiFeta.com

AI Generated Child Sexual Abuse Material - What's the Harm?

Paper: http://arxiv.org/abs/2510.02978v1 Register: https://www.AiFeta.com

UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization

Paper: http://arxiv.org/abs/2510.03161v1 Register: https://www.AiFeta.com

RAG for UK NICE guidelines: high-recall retrieval and near-perfect faithfulness

From pages to precise answers. 🏥🔎📚✅ This system queries UK NICE clinical guidelines with a hybrid retrieval stack over 10,195 chunks (from 300 guidelines). On 7,901 queries: MRR 0.814; Recall 81%@1 and 99.1%@10. In generation on 70 QA pairs, RAG-enhanced models achieved perfect context precision

Untargeted jailbreaks: a broader, faster way to stress-test LLM safety

Not aiming for a script—just unsafe output. 🚨🧨🔍🛡️ This paper introduces a gradient-based untargeted jailbreak attack (UJA) that seeks any unsafe response rather than a specific target. It maximizes a judged “unsafety” signal by decomposing a non-differentiable objective into differentiable parts for optimizing both harmful responses and adversarial prompts. Reported

Two kinds of meaning in speech: what you say vs. how you feel

Words carry facts; voices carry feelings. 🎙️🧩😊📊 This study separates descriptive semantics (the content) from expressive semantics (the emotion). After viewing emotional movie segments, participants described their experiences. Findings: descriptive semantics align with intended emotions, while expressive semantics correlate with evoked emotions (valence/arousal). Why it matters: Speech Emotion Recognition can

Reward models are metrics in disguise—let’s bridge the fields

Two tribes, one problem. 🎭📏🧠🔧 This position paper argues that reward models (used in post-training) and evaluation metrics tackle the same task: judging output quality. Yet they’ve grown apart—duplicating terms and mistakes. The authors survey both areas, show cases where metrics beat reward models, and chart shared challenges: spurious

Better teamwork for embodied AI: prompt optimization + voice boosts

Make agents talk, think, and team up better. 🤖🤝🗣️🎮 This work enhances a framework for Collaborative Embodied Agents powered by LLMs. Through prompt-engineering studies and model selection, the best combo improved efficiency by 22% (with Gemma3) versus the original system. They also add speech capabilities for smoother voice-based collaboration in shared

AI-generated CSAM isn’t harmless—here’s why that claim falls apart

“No real victim” is a dangerous myth. 🚫🧒⚠️🔒 This paper reviews how AI-generated child sexual abuse material (AI CSAM) can still cause harm: creating synthetic depictions, revictimizing known survivors, facilitating grooming and extortion, normalizing exploitation, and lowering barriers that may lead some users toward offending. Like a slippery slope disguised as

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

10,497 examples, 13 tasks: a holistic yardstick for voice-first multimodal assistants. Voice assistants are rapidly evolving into multimodal agents that must hear, speak, and see. Yet evaluation has lagged behind capability. VoiceAssistant-Eval fills this gap with a comprehensive benchmark of 10,497 curated examples across 13 task categories, spanning

News

Coming soon

This is AI Feta, The news about scientific AI research, a brand new site by Kari Jaaskelainen that's just getting started. Things will be up and running here shortly, but you can subscribe in the meantime if you'd like to stay up to date and receive

Latest