Open-weight genome AI: why data filtering isn't enough

Open-weight genome AI: why data filtering isn't enough

Can we keep open-weight genome AI models safe just by removing pathogen sequences from their training data? A new study tests that idea and finds it's not enough.

  • Researchers assessed a state-of-the-art genomic language model and showed that post-release fine-tuning on sensitive pathogen data can partially restore risky capabilities.
  • The fine-tuned model performed better on unseen viral sequences and even generalized to spot immune-evading variants, despite not seeing those exact viruses during fine-tuning.
  • Bottom line: dataset filtering alone is brittle for safeguarding open-weight biology models.

The authors urge layered safeguards: stronger model evaluations, clearer release norms for open weights, and technical mitigations that hold up under adversarial fine-tuning.

Paper: https://arxiv.org/abs/2511.19299v1

Paper: https://arxiv.org/abs/2511.19299v1

Register: https://www.AiFeta.com

#Genomics #AI #Biosecurity #ResponsibleAI #OpenSource #ML #Safety

Read more