SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Paper: http://arxiv.org/abs/2510.03160v1 Register: https://www.AiFeta.com
Paper: http://arxiv.org/abs/2510.03160v1 Register: https://www.AiFeta.com
Paper: http://arxiv.org/abs/2510.02978v1 Register: https://www.AiFeta.com
Paper: http://arxiv.org/abs/2510.03161v1 Register: https://www.AiFeta.com
From pages to precise answers. 🏥🔎📚✅ This system queries UK NICE clinical guidelines with a hybrid retrieval stack over 10,195 chunks (from 300 guidelines). On 7,901 queries: MRR 0.814; Recall 81%@1 and 99.1%@10. In generation on 70 QA pairs, RAG-enhanced models achieved perfect context precision
Not aiming for a script—just unsafe output. 🚨🧨🔍🛡️ This paper introduces a gradient-based untargeted jailbreak attack (UJA) that seeks any unsafe response rather than a specific target. It maximizes a judged “unsafety” signal by decomposing a non-differentiable objective into differentiable parts for optimizing both harmful responses and adversarial prompts. Reported
Words carry facts; voices carry feelings. 🎙️🧩😊📊 This study separates descriptive semantics (the content) from expressive semantics (the emotion). After viewing emotional movie segments, participants described their experiences. Findings: descriptive semantics align with intended emotions, while expressive semantics correlate with evoked emotions (valence/arousal). Why it matters: Speech Emotion Recognition can
Two tribes, one problem. 🎭📏🧠🔧 This position paper argues that reward models (used in post-training) and evaluation metrics tackle the same task: judging output quality. Yet they’ve grown apart—duplicating terms and mistakes. The authors survey both areas, show cases where metrics beat reward models, and chart shared challenges: spurious
Make agents talk, think, and team up better. 🤖🤝🗣️🎮 This work enhances a framework for Collaborative Embodied Agents powered by LLMs. Through prompt-engineering studies and model selection, the best combo improved efficiency by 22% (with Gemma3) versus the original system. They also add speech capabilities for smoother voice-based collaboration in shared
“No real victim” is a dangerous myth. 🚫🧒⚠️🔒 This paper reviews how AI-generated child sexual abuse material (AI CSAM) can still cause harm: creating synthetic depictions, revictimizing known survivors, facilitating grooming and extortion, normalizing exploitation, and lowering barriers that may lead some users toward offending. Like a slippery slope disguised as
10,497 examples, 13 tasks: a holistic yardstick for voice-first multimodal assistants. Voice assistants are rapidly evolving into multimodal agents that must hear, speak, and see. Yet evaluation has lagged behind capability. VoiceAssistant-Eval fills this gap with a comprehensive benchmark of 10,497 curated examples across 13 task categories, spanning
News
This is AI Feta, The news about scientific AI research, a brand new site by Kari Jaaskelainen that's just getting started. Things will be up and running here shortly, but you can subscribe in the meantime if you'd like to stay up to date and receive