One important problem the organizers of the CASP (Critical Assessment of protein Structure Prediction) have been struggling with is to better define whether and in which types of applications will protein structure prediction be useful, or more generally, to better describe the biological and functional relevance of CASP predictions. Although this question is very general, we signed up to answer some related questions as part of the CASP14 evaluation team. In collaboration with the Kozakov lab (Stony Brook University) we plan to focus on the question of how well the binding properties of the CASP target proteins are conserved in the best models. We intend to perform three type of analysis as follows. 1. Determining the binding hot spots, i.e., regions of the proteins that contribute the most to the free energy of binding any ligand. The method is mapping the protein surface using our FTMap program. FTMap places small organic molecules as probes on the protein surface. The number of probes interacting with each protein residue provides a “binding strength fingerprint”. We map both the native structure and the model, and calculate the correlation coefficient of the fingerprint to measure the conservation of small molecule binding properties. The advantage of this analysis is that it can be performed even for un-annotated proteins.
2. If the protein has known ligand binding sites, the correlation of fingerprint vectors will also be calculated for the binding site residues. If the protein has a known ligand, the ligand will be docked to both the X-ray structure and the model.
3. If the target protein is known to interact with one or more other proteins, protein-protein docking will be performed using both the X-ray structure and the model to determine whether the partners can still be placed based on the model. We will also attempt to predict the complex using template-based docking in order to determine whether simultaneous docking of partner proteins is preferable to modeling the separate proteins and then docking the models if the goal is exploring protein-protein interaction.
Although the CASP14 models will be evaluated in the fall of 2020, to explore our methodology and its potential outcomes we performed similar calculations for a subset of models submitted to CASP12, and plan to publish the results in a short paper.
Preliminary studies of the CASP12 refinement targets reveal that models with a high Global Distance Test Total Score (GDT_TS) are more likely to produce FTMap results similar to the experimentally determined structure. FTMap similarity is assessed by calculating Pearson correlation between the number of probes in contact with each residue in the model and in the experimentally determined structure. In the figure below, each point represents the average scores for the top 5 ranked models of each refinement target, and error bars show the standard deviation.