03 PUBLICATIONS
Our Research
Our team publishes cutting-edge research in the fields of artificial intelligence, computational biology, and drug discovery. Below you'll find our latest peer-reviewed publications.
SurfDock is a surface-informed diffusion generative model for reliable and accurate protein–ligand complex prediction
Nature Methods 22, 310–322 (2025)
DOI: https://doi.org/10.1038/s41592-024-02516-y
Abstract
Accurately predicting protein–ligand interactions is crucial for understanding cellular processes. We introduce SurfDock, a deep-learning method that addresses this challenge by integrating protein sequence, three-dimensional structural graphs and surface-level features into an equivariant architecture. SurfDock employs a generative diffusion model on a non-Euclidean manifold, optimizing molecular translations, rotations and torsions to generate reliable binding poses. Our extensive evaluations across various benchmarks demonstrate SurfDock’s superiority over existing methods in docking success rates and adherence to physical constraints. It also exhibits remarkable generalizability to unseen proteins and predicted apo structures, while achieving state-of-the-art performance in virtual screening tasks. In a real-world application, SurfDock identified seven novel hit molecules in a virtual screening project targeting aldehyde dehydrogenase 1B1, a key enzyme in cellular metabolism. This showcases SurfDock’s ability to elucidate molecular mechanisms underlying cellular processes. These results highlight SurfDock’s potential as a transformative tool in structural biology, offering enhanced accuracy, physical plausibility and practical applicability in understanding protein–ligand interactions.
AI-Driven Protein Design
Nature Reviews Bioengineering (2025)
DOI: https://www.nature.com/articles/s44222-025-00349-8
Abstract
Protein design is undergoing a revolution driven by artificial intelligence (AI), transforming how we engineer proteins for applications in drug discovery, biotechnology and synthetic biology. By navigating the immense complexity of protein sequence space and overcoming the limitations of structural and functional data, AI enables unprecedented precision and speed in designing novel proteins with tailored functions. Central to this Review is a comprehensive and actionable roadmap for designers, providing step-by-step guidance on how to integrate state-of-the-art AI tools into protein design workflows, including tools for structural and functional prediction as well as generative models for de novo design. To illustrate this roadmap in practice, we present case studies showcasing AI-driven protein design, from engineering therapeutic proteins to designing novel proteins that unlock enzyme functions and reprogramme biomolecular systems. Looking ahead, we outline future directions highlighting the vast potential of AI to revolutionize synthetic biology, expedite drug development and drive sustainable biotechnology, positioning it as a transformative force at the forefront of protein design.
Large language models for scientific discovery in molecular property prediction
Nature Machine Intelligence 7, 437–447 (2025)
DOI: https://doi.org/10.1038/s42256-025-00994-z
Abstract
Recent advances in large language models have demonstrated remarkable capabilities in various domains, including natural language processing and code generation. In this work, we extend these capabilities to molecular property prediction, a critical task in drug discovery. We propose a novel approach that leverages the knowledge embedded within large language models to enhance the prediction of molecular properties, leading to more accurate and interpretable results compared to traditional methods.
Generic protein–ligand interaction scoring by integrating physical prior knowledge and data augmentation modelling
Nature Machine Intelligence 6, 261–270 (2024)
DOI: https://doi.org/10.1038/s42256-024-00849-z
Abstract
Developing robust methods for evaluating protein–ligand interactions has been a long-standing problem. Data-driven methods may memorize ligand and protein training data rather than learning protein–ligand interactions. Here we show a scoring approach called EquiScore, which utilizes a heterogeneous graph neural network to integrate physical prior knowledge and characterize protein–ligand interactions in equivariant geometric space. EquiScore is trained based on a new dataset constructed with multiple data augmentation strategies and a stringent redundancy-removal scheme. On two large external test sets, EquiScore consistently achieved top-ranking performance compared to 21 other methods. When EquiScore is used alongside different docking methods, it can effectively enhance the screening ability of these docking methods. EquiScore also showed good performance on the activity-ranking task of a series of structural analogues, indicating its potential to guide lead compound optimization. Finally, we investigated different levels of interpretability of EquiScore, which may provide more insights into structure-based drug design.
Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data
Nature Machine Intelligence 6, 673–687 (2024)
DOI: https://doi.org/10.1038/s42256-024-00847-1
Abstract
Understanding protein-ligand interactions is fundamental to drug discovery and development. We present a novel physicochemical graph neural network architecture designed to learn protein-ligand interaction fingerprints directly from sequence data. Our approach integrates physicochemical properties of amino acids and small molecules to create more informative representations, leading to improved prediction accuracy and interpretability of binding interactions.