This chapter summarizes research activities by the C. Maranas group towards modeling the statistics of combinatorial DNA libraries generated through directed evolution methods. Directed evolution methods utilize the process of natural selection to combinatorially evolve enzymes, proteins or even entire metabolic pathways with improved properties. These methods typically begin with the infusion of diversity into a small set of parental nucleotide sequences through DNA recombination and/or mutagenesis. The resulting combinatorial DNA library then is subjected to a high-throughput selection or screening procedure, and the most improved variants are isolated for another round of recombination or mutagenesis. The cycles of recombination/mutagenesis, screening and isolation continue until a protein or enzyme with the desired level of improvement is found. In the last few years remarkable success stories of directed evolution have been reported, ranging from manyfold improvements in enzyme activity and thermostability, enhanced bioremediation, even the design of vaccines and viral vectors for gene delivery.
A key challenge in directed evolution is that only an infinitesimally small fraction of the diversity afforded by DNA sequences can be characterized regardless of the efficiency of the screening procedure employed. For example, a 500-bp gene implies 4500 = 10301 alternatives, but even the most efficient screening methods are restricted to 107 - 108 alternatives. Therefore, it is important to know how diversity is generated and allocated in the combinatorial DNA library and which regions are the most promising. This chapter addresses the first question in the context of the DNA shuffling and SCRATCHY protocols and examines how fragmentation length, annealing temperature, sequence identity and number of shuffled parental sequences affect the number, type and distribution of crossovers along the length of reassembled sequences. The predictive frameworks presented here (eShuffle, eSCRATCHY) provide a step towards optimizing directed evolution protocols in response to an enzyme or protein design challenge. In these modeling frameworks, annealing events during reassembly are modeled as a network of reactions, and equilibrium thermodynamics is employed to quantify their conversions and selectivities. Development of the modeling frameworks was assisted by the experimental and practical expertise of Professors Stephen Benkovic and Stefan Lutz.
The Pennsylvania State University ©2004
This page was last updated on Tuesday, November 30th, 2004.