LigUnity: Hierarchical affinity landscape navigation through learning a shared pocket-ligand space

Bin Feng1, Zijing Liu1, Hao Li1, Mingjun Yang2, Junjie Zou2, He Cao1, Yu Li1, Lei Zhang1, Sheng Wang3
1International Digital Economy Academy (IDEA), 2XtalPi Co., Ltd., 3University of Washington

Overview

  • LigUnity is a unified foundation model for virtual screening and hit-to-lead optimization.
  • Outperforms 24 methods with >50% improvement in virtual screening.
  • Approaches FEP+ accuracy at far lower cost in hit-to-lead optimization.
  • Achieves 106 speedup compared to traditional docking methods like Glide-SP.
LigUnity Cover Image

We are excited to announce that our paper has been accepted by Patterns and is featured as the cover article for the October 2025 issue!

This ocean symbolizes the human proteome—the complete set of proteins that carry out essential functions in our bodies. For medicine to work, it often needs to interact with a specific protein. For an estimated 90% of these proteins, however, they lack known small-molecule ligands with high activity. In the image, these proteins are represented as sailboats drifting in the dark.

At the center, stands a lighthouse symbolizing the AI method LigUnity. Its beam illuminates several sailboats, guiding them toward glowing buoys, which symbolize ligands with high activity found by LigUnity. The work by Feng et al. highlights the power of AI-driven computational methods to efficiently find active ligands and optimize their activity, opening up new therapeutic avenues for various diseases.

Abstract

Structure-based drug discovery involves two critical, sequential tasks: virtual screening to identify active compounds and hit-to-lead optimization to refine their potency. Existing computational methods often treat these tasks separately due to their conflicting speed-accuracy requirements. This separation prevents the synergy that could arise from a unified approach. We introduce LigUnity, a protein-ligand affinity foundation model that jointly addresses both tasks. LigUnity learns a shared embedding space for protein pockets and ligands by capturing both coarse-grained active/inactive distinctions (scaffold discrimination) and fine-grained affinity rankings (pharmacophore ranking). To enable this, we curated PocketAffDB, the largest structure-aware affinity database to date, containing 0.8 million data points. Our evaluations show that LigUnity sets a new state-of-the-art, outperforming 24 methods in virtual screening and approaching the accuracy of costly physics-based methods like FEP+ in hit-to-lead optimization, all while being 106 times faster than traditional docking.


How LigUnity Works


LigUnity's core innovation is its hierarchical approach to learning a shared pocket-ligand embedding space. This allows it to understand both global structure-activity relationships and subtle, affinity-determining chemical features. The entire pipeline is illustrated below.

LigUnity Overview
  • 1. Data Curation (PocketAffDB): We created PocketAffDB, a structure-aware binding database by integrating large-scale affinity data from ChEMBL and BindingDB with 3D structures from the PDB. It contains 0.8 million affinity data points, 0.5 million unique ligands, and over 53,000 protein pockets.
  • 2. Hierarchical Pre-training: LigUnity is pre-trained to learn a shared embedding space where the cosine similarity between a pocket vector and a ligand vector correlates with binding affinity. This is achieved via two complementary objectives:
    • Scaffold Discrimination (Coarse-Grained): Using contrastive learning, the model learns to distinguish active from inactive compounds by pulling embeddings of known binding pairs closer together.
    • Pharmacophore Ranking (Fine-Grained): Using a listwise ranking loss, the model learns to order a series of active ligands according to their measured binding affinity for a given pocket.
  • 3. Task-Specific Inference: For downstream tasks, the model is adapted:
    • Virtual Screening: A Heterogeneous Graph Neural Network (H-GNN) refines the query pocket's embedding by aggregating information from similar pockets and their known binders, enabling rapid and accurate screening.
    • Hit-to-Lead Optimization: The model directly ranks candidate molecules based on embedding similarity and can be fine-tuned with a few experimental data points to achieve accuracy comparable to costly physics-based methods.

Results


• Virtual Screening

On the DUD-E, Dekois 2.0, and LIT-PCBA benchmarks, LigUnity consistently and significantly outperforms 24 competing methods, including docking programs and other ML models. It achieves over a 50% improvement in Enrichment Factor (EF 1%) compared to the next-best structure-based methods and shows strong generalization to novel protein targets.

Virtual Screening Results

• Hit-to-Lead Optimization

On two FEP benchmarks (Merck and JACS), LigUnity shows state-of-the-art performance in predicting binding free energies. When fine-tuned on just a few data points, LigUnity's accuracy becomes comparable to FEP+, a computationally intensive industry standard, positioning it as a powerful, cost-effective alternative. Importance scores calculated by the model also correctly identify key atoms and residues responsible for binding.

Hit-to-Lead Optimization Results

BibTeX

@article{Feng2025LigUnity,
  title        = {Hierarchical affinity landscape navigation through learning a shared pocket-ligand space},
  author       = {Feng, Bin and Liu, Zijing and Li, Hao and Yang, Mingjun and Zou, Junjie and Cao, He and Li, Yu and Zhang, Lei and Wang, Sheng},
  journal      = {Patterns},
  volume       = {6},
  pages        = {101371},
  year         = {2025},
  publisher    = {Elsevier},
  doi          = {10.1016/j.patter.2025.101371}
}