Open-weight genome AI: why data filtering isn't enough
Can we keep open-weight genome AI models safe just by removing pathogen sequences from their training data? A new study tests that idea and finds it's not enough.
- Researchers assessed a state-of-the-art genomic language model and showed that post-release fine-tuning on sensitive pathogen data can partially restore risky capabilities.
- The fine-tuned model performed better on unseen viral sequences and even generalized to spot immune-evading variants, despite not seeing those exact viruses during fine-tuning.
- Bottom line: dataset filtering alone is brittle for safeguarding open-weight biology models.
The authors urge layered safeguards: stronger model evaluations, clearer release norms for open weights, and technical mitigations that hold up under adversarial fine-tuning.
Paper: https://arxiv.org/abs/2511.19299v1
Paper: https://arxiv.org/abs/2511.19299v1
Register: https://www.AiFeta.com
#Genomics #AI #Biosecurity #ResponsibleAI #OpenSource #ML #Safety