Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining

Kari Jaaskelainen

29 Sep 2025 — 1 min read

C-FREE unifies 2D topology and 3D conformers—no negatives, no positional encodings, no heavy preprocessing.

Molecular representation learning often relies on contrastive schemes, hand-crafted augmentations, or complex generative objectives—frequently ignoring the rich 3D geometry that governs chemistry. C-FREE (Contrast-Free Representation learning on Ego-nets) offers a simpler, stronger path: learn from fixed-radius ego-nets across ensembles of 3D conformers and predict each subgraph’s embedding from its complementary neighborhood in latent space.

This contrast-free objective integrates topological and geometric signals through a hybrid GNN–Transformer backbone, sidestepping negatives, positional encodings, and costly preprocessing. Training on the GEOM dataset leverages conformational diversity to align representations with chemically meaningful variability.

The payoff is state-of-the-art performance on MoleculeNet, outperforming contrastive, generative, and other multimodal self-supervised methods. Fine-tuning across varied dataset sizes and molecule types shows robust transfer, highlighting the importance of 3D-aware embeddings for downstream property prediction and molecular design.

Why it matters: better pretraining translates directly into improved hit-finding, ADMET prediction, and lead optimization—areas where labeled data is scarce and expensive. C-FREE’s simplicity also lowers engineering overhead, making high-quality molecular pretraining more accessible to both industry and academia.

Looking ahead, combining ego-net predictions with task-aware adapters, uncertainty estimation for conformer coverage, and protein–ligand co-representations could extend the framework to structure-based design.

Paper: http://arxiv.org/abs/2509.22468v1

Register: https://www.AiFeta.com

#AI #DrugDiscovery #GraphLearning #SelfSupervised #GNN #3D

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

A structured 80k instruction–image corpus spanning 11 domains and 51 subtasks to train unified visual editors Unified models for image generation and editing hit a data ceiling: existing corpora emphasize basic manipulations but miss real‑world complexity. OpenGPT‑4o‑Image tackles this with a hierarchical task taxonomy and automated

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

ROVER replaces PPO loops with uniform‑policy Q‑values—boosting quality and diversity in math reasoning Popular RLVR methods for LLM reasoning lean on generalized policy iteration (e.g., PPO/GRPO), but suffer instability and diversity collapse. This paper reframes math RLVR as a specialized finite‑horizon MDP with deterministic

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

A dynamic, self‑paced curriculum that restructures problems to match model ability in RLVR Online RL with Verifiable Rewards (RLVR) has boosted LLM reasoning—but most methods treat all problems equally, wasting effort on solved items and flailing on those beyond current capability. CLPO fixes that with a dynamic pedagogy:

Read more

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning