Computational Protein Design
Designing Novel Antibody Binding Pockets
Antibodies are proteins with important and significant medicinal and experimental applications. In this project, we are developing methods for the de novo design of antibodies. The Optimal Complementarity Determining Regions method (OptCDR) is a four-step procedure to design the binding pockets of antibodies. First, canonical structures that are likely to have favorable binding with the specified antigen are selected for the six complementarity determining regions (CDRs) of the antibody. This is followed by the initialization of the amino acid sequences of the CDRs using a rotamer library and an energy-minimizing MILP optimization formulation. Next, a modified version of our Iterative Protein Redesign & Optimization (IPRO) procedure is used to simultaneously modify the backbones and amino acids of the CDRs. Finally, the rotamer library and MILP optimization formulation are used again to generate a library of antibody binding pockets.
Development of a Computational QM/MM Protocol for Enzyme Redesign
In this project, we aim to develop a new computational workflow utilizing highly accurate hybrid quantum mechanics/molecular mechanics (QM/MM) techniques for the design of enzymes. Specifically, mutants will be initially screened with the modified Iterative Protein Redesign and Optimization framework (IPRO) at the ground and transition states for improved binding energy of an alternative substrate (i.e. ethane) to a desired enzyme (i.e. Cytochrome P450 BM3). Next, mutants will be screened with potential energy surface scans along the reaction coordinate utilizing QM/MM through the Gaussian 03 framework. Successful designs screened by QM will next be experimentally constructed and tested, leading to an improved enzyme design for an alternative substrate. The P450 BM3 system was chosen to address the timely issue of controlled hydroxylation of small gaseous hydrocarbons (methane, ethane, and propane).
Altering Enzyme Cofactor Specificity
In this project, we computationally designed Candida boidinii xylose reductase (CbXR) to alter its cofactor specificity from NADPH to NADH. After compiling and comparing sequence information from previous studies involving cofactor switching mutations, we determined that their effect could not be explained as straightforward changes in volume, hydrophobicity, charge, or BLOSUM62 scores of the residues populating the cofactor binding site. Instead, we found that the use of a detailed cofactor binding energy function was needed to adequately capture the relative affinity towards different cofactors. The implicit solvation models Generalized Born with molecular volume integration and Generalized Born with simple switching were integrated in the Iterative Protein Redesign and Optimization (IPRO) framework to drive the redesign of CbXR to function using the non-native cofactor NADH. We identified ten variants that improve the calculated interaction energy with NADH by introducing mutations in the CbXR binding pocket. These protein variants were experimentally tested, and seven out of ten possessed xylose reductase activity utilizing NADH while essentially abolishing NADPH-dependent activity. Given the higher stability of NADH relative to NADPH, and the higher cost of NADPH regeneration compared to NADH generation, a NADH-utilizing CbXR variant may prove industrially useful. More importantly, this method can be extended to design other enzyme-cofactor systems to utilize more stable and more abundant cofactors.
A Computational Procedure for Transferring a Binding Site Onto an Existing Protein Scaffold
In this research, we use atomistic potential energy functions to design small numbers of protein structures with novel or modified functions. One of the many challenging tasks of protein design is the introduction of a completely new function into an existing protein scaffold. We have developed the OptGraft procedure for placing a novel binding pocket onto a protein structure so as its geometry is minimally perturbed. This is accomplished through a two-level procedure where we first identify where are the most appropriate locations to graft the new binding pocket into the protein fold by minimizing the departure from a set of geometric restraints using mixed-integer linear optimization. On identifying the suitable locations that can accommodate the new binding pocket, CHARMM energy calculations are employed to identify what mutations in the neighboring residues, if any, are needed to ensure that the minimum energy conformation of the binding pocket conserves the desired geometry. This computational framework was benchmarked against the results available in the literature for engineering a copper binding site into thioredoxin protein. OptGraft has been used to guide the transfer of a calcium-binding pocket from thermitase protein (PDB: 1thm) into the first domain of CD2 protein (PDB: 1hng). Experimental characterization of three de novo redesigned proteins with grafted calcium-binding centers demonstrated that they all exhibit high affinites for terbium and can selectively bind calcium over magnesium.
We have also worked with researchers from the Department of Biological Sciences at the Korea Advanced Institute of Science and Technology to rationally design the substrate specificity of D-hydantoinase because enzymes that exhibit superior catalytic activity, stability and substrate specificity are highly desirable for industrial applications. These goals prompted the designed substrate specificity of Bacillus stearothermophilus D-hydantoinase toward the target substrate hydroxyphenylhydantoin. Positions crucial to substrate specificity were selected using structural and mechanistic information on the structural loops at the active site. The size and hydrophobicity of the involved amino acids were rationally changed, and the substrate specificities of the designed D-Hyd mutants were investigated. As a result, M63I/F159S exhibited about 200-fold higher specificity than the wild-type enzyme. Systematic mutational analysis and computational modeling also supported the rationale used in the design.
[Fazelinia et al (2009); Lee et al (2009)]
An Iterative Computational Protein Library Redesign and Optimization Procedure
In this research, we developed the computational procedure IPRO (Iterative Protein Redesign and Optimization procedure) for the redesign of an entire combinatorial protein library in one step using energy based scoring functions. IPRO relies on identifying mutations in the parental sequences that, when propagated downstream in the combinatorial library, improve the average quality of the library (e.g., stability, binding affinity, specific activity, etc.). Residue and rotamer design choices are driven by a globally convergent Mixed-Integer Linear Programming (MILP) formulation. Unlike many of the available computational approaches, the procedure allows for backbone movement as well as re-docking of the associated ligands after a pre-specified number of design iterations. IPRO can also be used, as a limiting case, for the redesign of a single or handful of individual sequences. The application of IPRO was highlighted through the redesign of a sixteen member library of E. coli/B. subtilis dihydrofolate reductase hybrids, both individually and through upstream parental sequence redesign, for improving the average binding energy. Computational results demonstrate that it is indeed feasible to improve the overall library quality as exemplified by binding energy scores through targeted mutations in the parental sequences.
We have extended the IPRO framework for the design of protein libraries with a targeted ligand specificity. Mutations that minimize the binding energy with the desired ligand are identified. At the same time, explicit constraints are introduced that maintain the binding energy for all decoy ligands above a threshold necessary for successful binding. The modified framework was demonstrated by computationally altering the effector binding specificity of the bacterial transcriptional regulatory protein AraC, belonging to the AraC/XylS family of transcriptional regulators for different unnatural ligands. The obtained results demonstrate the importance of systematically suppressing the binding energy for competing ligands. By pinpointing a small set of mutations within the binding pocket, the difference in binding energies between targeted and decoy ligands, even when very similar, is maximized.
[Saraf et al (2006); Fazelinia et al (2007)]
Protein Library Design Using Scoring Functions or Clash Maps
In this research we have introduced a computational procedure, OPTCOMB (Optimal Pattern of Tiling for COMBinatorial library design), for designing protein hybrid libraries that optimally balance library size with quality. The proposed procedure is directly applicable to oligonucleotide ligation-based protocols such as GeneReassembly, DHR, SISDC, and many more. Given a set of parental sequences and the size ranges of the parental sequence fragments, OPTCOMB determines the optimal junction points (i.e., crossover positions) and the fragment contributing parental sequences at each one of the junction points. By rationally selecting the junction points and the contributing parental sequences, the number of clashes (i.e., unfavorable interactions) or any other scoring metric in the library is systematically optimized with the aim of improving the overall library quality.
Clashes are identified by characterizing all contacting residue pairs present in protein hybrids for inconsistency with protein family structural features. This approach is based on examining contacting residue pairs with different parental origins for different types of potentially unfavorable interactions (i.e., electrostatic repulsion, steric hindrance, cavity formation and hydrogen bond disruption). The identified clashing residue pairs between members of a protein family are then contrasted against functionally characterized hybrid libraries (FamClash Procedure).
We have also developed the S2 scoring system, which uses amino acid property bins to identify clashes in protein hybrids and also quantify the degree of their mismatched interactions. A cytochrome P450 library (Otey et al (2006)) was used for benchmarking, and the S2 scoring system was found to be able to significantly functionally enrich the library compared with other scoring systems. Given this scoring base, we implemented S2 in OPTCOMB and also developed OPTOLIGO, a formulation for optimally designing protein combinatorial libraries involving mutations. Computational benchmarking results demonstrate the efficacy of OPTCOMB and OPTOLIGO to generate high-scoring libraries of a pre-specified size.
[Pantazes et al (2007); Saraf et al (2005); Saraf et al (2004); Moore and Maranas (2004); Saraf and Maranas (2003); Moore and Maranas (2003); Saraf et al (2003)]
Modeling and Optimization of Directed Evolution Protocols
Work in our group examined for the first time how fragmentation length, annealing temperature, sequence identity and number of shuffled parental sequences affect the number, type and distribution of crossovers along the length of full-length reassembled sequences. In the eShuffle framework, annealing events during reassembly are modeled as a network of reactions, and equilibrium thermodynamics along with complete nucleotide sequence information was employed to quantify their conversions and selectivities. Comparisons of eShuffle predictions against experimental data revealed good agreement, particularly in light of the fact that there were no adjustable parameters. Specifically we found that crossover numbers were boosted by reducing fragmentation length and annealing temperature and that crossovers tend to aggregate in regions of near perfect sequence identity.
The customization of eShuffle for the SCRATCHY protocol led to the eSCRATCHY framework. Using eSCRATCHY we found that in SCRATCHY libraries (i) fragmentation length used for reassembly does not influence the number or location of crossovers generated in full-length sequences, (ii) the crossover distribution is shaped by the crossover statistics of the ITCHY library, and (iii) crossovers are spread evenly throughout the crossover region. The need to safeguard against the formation of reassembled sequences with either truncated or duplicated domains motivated us to further extend the eShuffle framework to consider out-of-sequence annealing event. Instead of “locking” fragments into their alignment positions, the annealing free energy change was used to determine the likelihood of duplex formation, allowing the prediction of the relative frequency that fragments from different sequence regions will anneal during reassembly.
We have also explored the possibility of boosting or even specifically redirecting the formation of crossovers in DNA shuffling by exploiting the inherent redundancy in the codon representation (e.g., isoleucine has the following three synonymous codon representations: ATA, ATC and ATT) while complying with host preferences for specific patterns of codon usage.
[Moore and Maranas (2003b); Moore and Maranas (2002b); Moore and Maranas (2002a); Lutz et al (2001); Moore et al (2001); Moore and Maranas (2000); Moore et al (2000)]