Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining
C-FREE fuses 2D graphs with 3D conformers using ego-nets—no negatives required
High-quality molecular representations often require large labeled datasets or fragile contrastive schemes. C-FREE offers a simpler, more powerful path: contrast-free, multimodal pretraining on both 2D topology and ensembles of 3D conformers. The core idea is to predict subgraph (ego-net) embeddings from their complementary neighborhoods in latent space, encouraging models to capture the mutual information between local structure and its broader molecular context—without negatives, positional encodings, or heavy preprocessing.
Technically, C-FREE uses fixed-radius ego-nets as consistent modeling units across conformers and integrates geometric and topological cues with a hybrid GNN–Transformer backbone. Pretrained on GEOM, a dataset rich in conformational diversity, it achieves state-of-the-art results on MoleculeNet, outperforming contrastive, generative, and other multimodal self-supervised methods.
- Contrast-free objective: avoids negative sampling pitfalls.
- Multimodal fusion: unifies 2D graphs and 3D conformers seamlessly.
- Ego-net prediction: learns from complementary neighborhoods to encode context.
- Simple and efficient: no positional encodings or expensive preprocessing.
Beyond benchmarks, fine-tuning on diverse datasets shows strong transfer to new chemical domains and molecule sizes. The message is clear: when 3D information is plentiful, pretraining strategies that learn the neighborhood—rather than contrast it—can unlock robust, generalizable molecular representations for property prediction, design, and discovery.
Paper: http://arxiv.org/abs/2509.22468v1
Register: https://www.AiFeta.com
#GraphML #GNN #ChemInformatics #SelfSupervised #MoleculeNet #3DConformers #RepresentationLearning