Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining

C-FREE fuses 2D graphs with 3D conformers using ego-nets—no negatives required

High-quality molecular representations often require large labeled datasets or fragile contrastive schemes. C-FREE offers a simpler, more powerful path: contrast-free, multimodal pretraining on both 2D topology and ensembles of 3D conformers. The core idea is to predict subgraph (ego-net) embeddings from their complementary neighborhoods in latent space, encouraging models to capture the mutual information between local structure and its broader molecular context—without negatives, positional encodings, or heavy preprocessing.

Technically, C-FREE uses fixed-radius ego-nets as consistent modeling units across conformers and integrates geometric and topological cues with a hybrid GNN–Transformer backbone. Pretrained on GEOM, a dataset rich in conformational diversity, it achieves state-of-the-art results on MoleculeNet, outperforming contrastive, generative, and other multimodal self-supervised methods.

  • Contrast-free objective: avoids negative sampling pitfalls.
  • Multimodal fusion: unifies 2D graphs and 3D conformers seamlessly.
  • Ego-net prediction: learns from complementary neighborhoods to encode context.
  • Simple and efficient: no positional encodings or expensive preprocessing.

Beyond benchmarks, fine-tuning on diverse datasets shows strong transfer to new chemical domains and molecule sizes. The message is clear: when 3D information is plentiful, pretraining strategies that learn the neighborhood—rather than contrast it—can unlock robust, generalizable molecular representations for property prediction, design, and discovery.

Paper: http://arxiv.org/abs/2509.22468v1
Register: https://www.AiFeta.com

#GraphML #GNN #ChemInformatics #SelfSupervised #MoleculeNet #3DConformers #RepresentationLearning

Read more