Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For the 102 target DUD-E set and ligands having a Tanimoto Coefficient (TC) <0.7 to the template ligands, the 1% enrichment factor (EF1%) of FRAGSITE2 is 33.4 versus 21.6 for FRAGSITE and 8.9 for FINDSITEcomb2.0, an earlier LHM algorithm. For the 81 target DEKOIS2.0 set with a TC<0.7 to template ligands, FRAGSITE2 has an EF1% of 23.6, FRAGSITE’s EF1% is 11.7 while FINDSITEcomb2.0’s EF1% is only 3.1. For the DUD-E set, the deep learning DenseFS scoring function has an AUPR of 0.443 while FRAGSITE2’s is 0.465. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC<0.99 to training DUD-E ligands, FRAGSITE2 has an EF1% of 20.2 versus RF-score-VS’s EF1% of 9.8. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets.
NOTE:
- This web service is freely available to all academic users and not-for-profit institutions.
- Commercial users wishing an evaluation copy should contact skolnick@gatech.edu.