From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches are often incapable of detecting hidden structural relationships in the “twilight zone” of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent “d”). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments. We demonstrate that our models trained on Α-helical domains can be successfully transferred to recognize sequences encoding Β-sheet domains. Training and benchmarking on a larger, highly challenging data sets shows significant improvement over established approaches.
Notice: This server is freely available to all academic and non-commercial users.
Commercial users – to use this server, or request an evaluation copy, please send an email to Dr. Jeffrey Skolnick: skolnick@gatech.edu.
If you find this service useful, please cite the following papers:
Gao, M, Skolnick, J. 2021. A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics. 37(4): 490-496. doi: 10.1093/bioinformatics/btaa810. PDF
Gao, M, Skolnick, J. 2021. A general framework to learn tertiary structure for protein sequence annotation. Front. Bioinform. 1: 689960. doi: 10.3389/fbinf.2021.689960. PDF
Download Software
- Benchmark training and test sets
- Deep-Learning models and scripts for benchmarking (85 MB)
- Sequence profiles for benchmarking (80 MB)
- PDB coordinates of SCOP domains used for training and testing (311 MB)
Please send questions and comments to Dr. Mu Gao (mu.gao@gatech.edu).