Research Focus
- Systems Biology, Computational Biology, and Bioinformatics
- Cancer Metabolomics
- Prediction of protein tertiary and quaternary structure and folding pathways in proteomes
- Prediction of membrane protein tertiary structure
- Prediction of DNA-binding proteins
- Protein Evolution
- Protein Function Prediction
- Prediction of small molecule ligands for drug discovery
- Prediction of druggable protein targets
- Drug Design
- Automatic assignment of enzymes to metabolic pathways
- Simulation of Virtual Cells
One of the goals of genome sequencing projects is to develop tools for comparing and interpreting the resulting genomic information. In particular, one would like to be able to identify protein function from sequence. Our group is developing a series of tools to achieve this objective based on the sequence-structure-function paradigm. To accomplish this objective, we are developing algorithms that can predict protein structure from sequence. Included are both ab initio folding tools as well as threading methods. Our ab initio folding approaches appear capable of predicting low resolution structures for a substantial fraction of small, single domain proteins. If a limited amount of experimental data is provided, substantially larger systems can be handled. Furthermore, we have shown that such low resolution models can be used to identify active sites in proteins, and thereby we can employ structural information to predict protein function. This suggests a means for the large scale functional screening of genomic sequence databases based on the prediction of structure from sequence, then on the identification of functional active sites in the predicted structure. This opens up the possibility of screening entire genomes to identify protein having a specified biochemical activity. Finally, such proteins are analyzed in the context of their role in various metabolic pathways.
Major Contributions to the Understanding of Physical and Biochemical Properties of Proteins
- Developed the first coarse-grained lattice models for protein structure prediction. Elucidated many of the general principles of protein folding including the presence of flickering native like secondary structure in the denatured state. He then applied coarse-grained protein models to simulate the folding pathway of plastocyanin that incorporated a very early application of statistical potentials to protein structure prediction. This was subsequently extended to the first successful structure predictions of protein A, ROP and crambin.
- Developed the first multiscale modeling approach to protein structure prediction that was then successfully applied to predict the quaternary structure of coiled coils. For GNC4, the predicted models were very close to the crystal structure.
- Developed the TASSER protein structure prediction algorithm that is among the world’s best structure prediction algorithms as demonstrated by its performance in CASP6-CASP8. This algorithm can yield biologically useful predictions for ~75% of the proteins in a proteome and was the first algorithm that generated better models than those provided by the input template structures. Extension of TASSER to the prediction of protein-protein protein interactions/quaternary structures yielded a powerful approach that is highly competitive with high throughput experiments. He applied TASSER to the first comprehensive protein structure prediction of all human GPCRs, with an estimated 90% success rate.
- Developed a successful effective medium model for predicting membrane peptide conformation and orientation with respect to the membrane that combined both knowledge based and physics based.
- Demonstrated that for single domain proteins, the library of solved protein structures is likely complete and arises from packing compact, hydrogen bonded secondary structural elements. Furthermore, he recently showed that protein structure space is highly connected and continuous. These results do not depend on evolution, rather just on the physics of protein structures. Thus, this highly insightful work shows that evolutionary divergence need not be invoked to explain the continuous nature of protein structure space; rather, it is an intrinsic feature of protein structures that evolution likely exploited.
- Developed Fuzzy Functional Forms for enzyme functional inference that was a very early approach to the use of predicted structure to infer function. This was followed up by the development of the very powerful, EFICAz, enzyme functional inference approach that is widely used for the automated annotation of proteins; indeed, it is part of the annotation pipeline at the Broad Institute.
- Developed the FINDSITE and FINDSITELHM protein structure based ligand homology modeling algorithms for the prediction of protein molecular function, protein binding sites, and rapid ligand screening. This approach can be successfully applied to low-to-moderate resolution predicted protein structures with comparable accuracy as when high-resolution experimental structures are used. He further showed that across a distantly related set of proteins, their ligands contain a common anchor region whose chemical identity and binding pose is strongly conserved and a variable region that likely imparts specificity. These findings when combined with proteome scale protein structure prediction will result in improved approaches to drug discovery. It will provide better lead molecules as well as a systematic approach for minimizing cross reactivity and thereby, minimized side effects.
- Developed the widely used TM-align structural alignment algorithm that is one of the most sensitive protein structure comparison tools. Tm-align uses the TM-score, which is a length independent measure of the statistical significance of the structural alignment of a pair of proteins. This was the first approach that addressed in a robust manner the statistical significance of a structural alignment.
Contributions to Polymer Physics:
- Developed an analytical theory of the calculation of the electrostatic persistence length in polyelectrolytes, now known as the Odijk-Skolnick-Fixman electrostatic persistence length.
- Explored the kinetics of local conformational transitions in polymers and demonstrated that crankshaft motions are not required for local main chain dynamics. He subsequently developed a damped orientational diffusion model of local polymer motion that successfully described local polymer dynamic behavior.
- Provided a molecular explanation of the apparently contradicting experimental data that suggested that chains in polymer melts obey ideal chain statistics (as conjectured by Flory) and yet give rise to depolarized light scattering spectra that suggest local chain ordering.
- Performed simulations of dense polymer systems that suggested that the glass transition emerges when the diffusion of the local free volume pockets becomes localized, i.e. they are below the percolation threshold.
- Elucidated the molecular nature of the ring flip mechanism in polycarbonate that is responsible for its robust dynamic response to stress.
- Performed some of the very first polymer melt simulations that explored whether reptation exists.
- Extended helix-coil theory to treat the equilibrium and dynamic properties of two-chain, coiled coils. This theory successfully predicts the thermal denaturation behavior of two-chain, coiled coils such as tropomyosin and its fragments.