# PROTEIN-PROTEIN DOCKING

Protein-protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genome-wide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop docking methods that can elucidate the details of specific interactions at the atomic level.

## OUR MULTISTAGE APPROACH TO PROTEIN-PROTEIN DOCKING

Our procedure starts with rigid body global search based on the Fast Fourier Transform (FFT) correlation approach that evaluates the energies of billions of docked conformations on a grid. In ClusPro 1.0 we use the docking programs DOT and ZDOCK, but in ClusPro 2.0 we have changed to our new program PIPER[1]. PIPER is also FFT-based, but the method is extended to be used with pairwise interaction potentials. With DOT and ZDOCK we retain 20,000 and 2,000 conformations, respectively. The number of structures is reduced by rigid body filters based on empirical potentials and electrostatics calculations. Due to the use of the more accurate pairwise potential in PIPER it is enough to retain 1000 structures, and we do not need the filtering step, The retained structures are clustered using the pairwise RMSD as the distance measure and a fixed or variable clustering radius[2]. We have shown that the 30 largest clusters contain at least one near-native structure (defined as having less that 10 Å RMSD from the ligand in the x-ray structure, calculated for ligand atoms that are within 10 Å of the fixed receptor) for 93 % of the complexes[3] in the protein docking benchmark set. The structures in these clusters are refined by a novel medium-range optimization method called SDU (Semi-Definite programming based Underestimation)[4] which has been developed to locate the global energy minima within the regions of the conformational space defined by the separate clusters. The procedure was used in the last rounds of CAPRI with very good results.

## PIPER: FFT-BASED DOCKING WITH PAIRWISE POTENTIALS

PIPER performs exhaustive evaluation of an energy function in discretized 6D space of mutual orientations of two proteins. We sample 70,000 rotations which approximately correspond to sampling at every 5 degrees in the space of Euler angles. In the translational space the sampling is defined by the 1.2 Å grid cell size. The energy-like scoring function describing the receptor-ligand interactions is defined on this grid and is efficiently calculated using Fast Fourier transforms. Results are clustered with a 10 Å cube size, and one or several lowest energy translations for the given rotation are retained. Finally, results from different rotations are collected and sorted. The novelty of the PIPER algorithm is that the scoring function includes an energy term of the form Epair = ΣiΣjεij, where εij is a pairwise interaction potential between atoms i and j. The key to the efficient use of this potential within the FFT framework is the eigenvalue-eigenvector decomposition of the interaction matrix. The complete scoring function is given as the sum of terms representing shape complementarity, electrostatic, and desolvation contributions, the latter described by the pairwise potential. We have shown that PIPER increases the number of near-native conformations in the top 1000 or 2000 structures relative to other FFT-based docking programs[1]

PIPER can be easily tested using our new server ClusPro 2.0, and it is freely available for noncommercial applications.

## CLUSTERING OF LOW ENERGY DOCKED CONFORMATIONS

We cluster the retained 1000 conformations using pairwise ligand RMSD as the distance measure. Our goal is finding large clusters of structures below a certain energy level, indicating minima that are both deep and have a broad region of attraction. We use a simple greedy algorithm to find the structures with the largest number of neighbors within a clustering radius rc. The value of rc depends on a clustering parameter 0 ≤ Δ ≤ 1, which is based on the histogram of pairwise RMSD values, and measures the depth of the separation between clusters[2]. We generally retain 30 clusters, each of them indicating a region of attraction around a local energy minimum.

We have recently reduced the number of retained clusters by testing the stability of local minima[5]. Since structures at narrow minima loose more entropy, some of the non-native states can be detected by determining whether or not a local minimum is surrounded by a broad region of attraction on the energy surface. The analysis is based on starting Monte Carlo Minimization (MCM) runs from random points around each minimum, and observing whether a certain fraction of trajectories converge to a small region within the cluster. The cluster is considered stable if such a strong attractor exists, has at least 10 convergent trajectories, is relatively close to the original cluster center, and contains a low energy structure. We studied the stability of clusters for enzyme-inhibitor and antibody-antigen complexes. All clusters that are close to the native structure are stable. Restricting considerations to stable clusters eliminates around half of the false positives, i.e., solutions that are low in energy but far from the native structure of the complex.

## Improving Antibody-Antigen complex prediction accuracy

The importance of understanding antibodies and their interactions with their respective epitopes in atomic-level detail is paramount in search for new or improved vaccines, and development of antibody-based drugs for infectious and non-infectious diseases. In recognition of that, our lab developed antibody specific pairwise potential in 2012 that captured the asymmetrically hydrophobic residue prevalence on the antibody CDR. Despite the intuitive methodology that considered only 4 of the 18 atom types (as categorized by Zhang et al, 1997), aADARS’s addition significantly improved PIPER’s Antibody-antigen complex (Ab-Ag) prediction capabilities (Brenke et al, 2012). Since 2012, a lot more Ab crystal structures have been found. Therefore, we developed new pairwise statistics based on the latest set of Ab-Ag complexes. Furthermore, we conducted a systematic selection of atom-types on both the CDR and the epitope that best captures the data. Results from the newly developed potential are promising but not significant. Alongside this work, we are studying ways to add an end-to-end deep-learning (DL) of Ab-Ag interactions. The DL method will learn features from the mere atomic density data and use a 2-layered neural network to combine them (instead of linear combination currently implemented in PIPER and CLUSPRO).

Patches of maximum hydrophobicity in an antibody–antigen complex. The structure is Jel42 Fab fragment complexed with HPr (PDB code 2jel). The antibody fragment is shown as the white solid model, with magenta patches representing the regions with maximum hydrophobicity. The HPr antigen is shown as a gray cartoon, with dark red patches as regions of maximum hydrophobicity. In the figure, the antibody CDR is oriented upward, showing that the CDR region includes strongly hydrophobic patches, but these do not interact with regions of maximum hydrophobicity on the HPr antigen. (Brenke et al, 2012)