Flux balance analysis to improve butanol productivity in Synechocystis PCC 6803 Master of Science Thesis Kiyan Shabestary

(1)

Flux balance analysis to improve butanol productivity in Synechocystis PCC 6803

Master of Science Thesis Kiyan Shabestary

Supervisor:

Josefine Anfelt, PhD student, KTH

Examiner:

Paul Hudson, Assistant Professor, KTH

Degree Project in Biotechnology BB202X

(2)

Abstract

Engineering microorganisms at the systems level is recognized to be the future of metabolic engineering. Thanks to the development of genome annotation, microorganisms can be understood, as never before, and be reconstructed in the form of computational models. Flux balance analysis provides a deep insight into cellular metabolism and can guide metabolic engineering strategies. In particular, algorithms can assess the cellular complexity of the metabolism and hint at genetic interventions to improve product productivity. In this work, Synechocystis PCC 6803 metabolism was investigated in silico. Genetic interventions could be suggested to couple butanol synthesis to growth as a way to improve current productivities. Cofactor recycling and, in particular, buffering mechanisms were shown to be important targets. Creating a cofactor imbalance and removing these buffering mechanisms is an important driving force. This forces a carbon flux through butanol synthesis to maintain cofactor balance and sustain growth.

Objective

The objective of the present work is to identify gene targets in silico at the systems level to improve n-‐butanol and isobutanol productivities in Synechocystis PCC 6803.

An emphasis is put on coupling product to growth and understanding the driving forces behind it.

(3)

Introduction

Industrial biotechnology is an emerging and promising discipline for the production of a wide range of compounds including fine chemicals, biopolymers, biofuels or biopharmaceuticals. Using living organisms as ‘cell factories’, mainly micro-‐

refineries that perform a specific task, represents the cornerstone of this field and lay the foundations towards a future bio-‐based society. The apparition of high-‐

throughput sequencing to read cellular genetic information has appeared as a revolution leading biotechnology into a new era. Understanding the cellular genetic code enable an unprecedented insight into microorganisms physiology and functions. As a result, tools to genetically engineer cell factories have evolved to gain in accuracy.

Previous methods to seek for high-‐producing strains were mainly based on chance where random mutagenesis created strains diversity and good screening methods enabled selection of the best producers. However, these methods were time-‐

consuming and did not find the optimal strains. Recently, genomics paved the way for other –omics techniques such as transcriptomics, proteomics and finally metabolomics. This high amount of data assessable with newly available high-‐

throughput analytical techniques can be translated into gene-‐protein-‐reaction (GPR) associations. New methods, mainly rational design, accurately target genes to modulate cellular functions through this relationship for production purposes.

Metabolic engineering is the field aiming at understanding how genes influence cellular metabolism and further exploit them to shape microorganisms metabolism to meet engineering objectives. Metabolic engineer may count on tools including gene deletions, over-‐expression, down-‐regulation or even heterologous gene insertion from other organisms as way to improve strains.

Initial metabolic engineering strategies aimed at redirecting carbon flux to product synthesis regardless of the metabolic burden created in the cell. In particular, cofactors, widely connected metabolites helping reactions to go forward, were not taken into account and their imbalance resulted in such bottlenecks. It is now commonly thought that understanding the cell as a whole is an important key to metabolic engineering success. However, biological organisms are complex systems involving hundreds if not thousands of biochemical reactions. For metabolic engineers, this means a high number of combinatorial gene targets to reach an optimal overproduction phenotype. Therefore, to efficiently assess the high number of possibilities a cell can offer, computational methods are very welcoming tools to support experiments. Genome-‐scale metabolic models aim to represent cellular metabolism as a whole using GPR associations and can be used for simulations. Flux balance analysis (FBA) is a computational method aiming at solving flux distributions (phenotype) in cellular metabolism for a given genotype. Algorithms can further be used to provide strategies (mainly reaction deletions or modulations) in order to reach a desired overproduction phenotype.

(5)

An important issue met in engineering strategies is cellular behavior. Very often, engineering strategies are in opposition to the cellular ones. Most microorganisms optimize their growth as a mean of survival. The ones growing the fastest outcompete slower organisms. This forms the basis for natural selection. Organisms have evolved for millions of year to perfectly fit their environment. Coupling product synthesis to growth is a way to bypass this issue. A successful strategy is to modify the stoichiometric system in a way that the cell needs to produce a product in order to achieve growth. Therefore at maximal growth rate, the product is synthesized as a growth-‐related by-‐product.

The aim of this project is to perform FBA on cyanobacteria Synechocystis PCC 6803 genome scale model to identify metabolic engineering strategies for enhanced biofuel production. n-‐butanol and isobutanol are target molecules in this project.

Cyanobacteria are prokaryotic algae and promising cell factories for the production of a wide variety of compounds including biofuels. There are two types of process using algea for biofuel production. One is to use high-‐biomass content eukaryotic algae such as chlamydomonas species for biomass production. The biomass is then decomposed into sugars and fermented into biofuel by microorganisms such as yeast. The other is the direct use of algae as a chassis for biofuel production.

Cyanobacteria species are preferred cell factories in this case due to easier genetic modifications. Nevertheless, algae only require nitrogen, carbon dioxide and sunlight to grow. This low requirement makes them promising cell factories.

Moreover they do not compete with food production as they do not require arable land.

(6)

Theory

Genome-‐scale metabolic models

Genome-‐scale metabolic models (GSMMs) have been the catalysts for the development of flux balance analysis. It is a bottom-‐up approach where the model is created from its smallest constituents, starting from genes to establish the metabolic map (Thiele and Palsson, 2010; Fig. 1). Based on extensive research from literature and databases (KEGG, Brenda), a draft reconstruction can be first established. It should be noted that this might include previous GSMMs to be upgraded. Software such as the RAVEN toolbox (Agren et al., 2013) can perform reconstructions in a semi-‐automated way. This reconstruction often results in gaps i.e. some missing reactions in a pathway (a missing serine pathway for Synechocystis PCC 6803;

Knoop et al., 2013). For further readability, confidence scores are assigned for each reaction. Reactions without any physiological evidence, introduced in the model for functional purposes solely, take the lowest confidence value. Finally, a model can be generated into a mathematical form, the stoichiometric matrix (S matrix). All stoichiometric coefficients for each reaction and metabolite are stored in the S matrix. One important part of the model is the addition of artificial reactions for modeling purposes. This includes addition of maintenance and a biomass reaction.

The biomass reaction aims to link precursors to biomass constituent such as lipids, proteins and other macromolecules to model growth. The flux going through this reaction is scaled in order to represent the growth rate (μ).

The reconstructed model can then be saved in the Systems Biology Markup Language (SBML) format (Hucka et al., 2003). This provides a standardized way for model usage and improvement.

Genome-‐scale models can be used for two purposes mainly.

First, they can be used as a way to provide a way to contextualize high trough-‐put omics data. Secondly, they can be used alongside algorithms in order to find optimal engineering strategies. The latter is the aim of the present work.

Synechocystis PCC 6803, often labeled as the “green Escherichia coli” (E. coli), is a

Fig. 1. Genome-‐scale model reconstruction procedure. Figure from Feist et al. (2008).

(7)

model organism for phototrophs. However, lack of genome annotation and extensive in vivo gene essentiability have slowed down the development of accurate cyanobacteria genome-‐scale models in comparison to well-‐studied organisms like E.

coli or Saccharomyces cerevisiae (S. cerevisiae). The apparition of a gene database (CyanoBase; http://genome.microbedb.jp/cyanobase/) for all known Synechocystis PCC 6803 and Anabaena PCC 7120 ORFs (Nakamura et al., 1999; Nakao et al., 2010) paved the way for numerous genome-‐scale reconstructions (Shastri and Morgan, 2005; Montagud et al., 2011; Saha et al., 2012; Nogales et al., 2012; Knoop et al., 2013).

Synechocystis sp. PCC 6803 model iJN678 (Nogales et al., 2012) was used to perform flux balance analysis. The reconstructed network incorporates 678 genes, 863 reactions and 795 metabolites. The model supports heterotrophic, autotrophic and mixotrophic conditions simulation by constraining uptakes. Autotrophic conditions were simulated as “light limited” where photon flux was constrained to -‐18.7 mmol/gDW/h (both bounds) and carbon uptake (HCO3) constrained to a maximal value of 3.8 mmol/gDW/h. In heterotrophic conditions, glucose uptake was constrained to 0.8 mmol/gDCW/h in order to match the photoautotrophic maximum growth rate. Photon flux was set to 0.

In this project, iJN678 was used because it offers a standardized metabolite, reaction and transporter annotation comparable to E. coli genome-‐scale models. The biomass objective function is refined in comparison to other models. Respiration and photosynthesis are also well modeled, in particular cyclic electron flows which are important targets in this study.

Results have been shown to be sensitive to the model used, cofactor preference and reaction promiscuity. Since cofactor preference is not known for a large number of reactions, models can show high differences in cofactor assignment. This has been shown to be a real issue when investigating suitable strategies for target overproduction. For this reason, the model from Knoop et al. (2013) is used to contrast and discuss the validity of obtained results.

Flux balance analysis

Flux balance analysis (FBA) is a mathematical method assuming steady-‐state growth to simulate flux distribution for a given genome-‐scale model (Orth et al., 2010). Mainly, internal metabolites such as the ones found in the central metabolism are assumed not to accumulate. For each internal metabolite, production and consumption rates equalize. This then leads to a set of equations that can be solved using linear programming (LP). Since the system is underdetermined, multiple distributions are possible. It should be noted that this is a fundamental difference with metabolic flux analysis (MFA) where exchange reactions are measured such as the system is determined. A single flux distribution can then be obtained for MFA. Using an objective function to be maximized or minimized (a reaction in the model needs to be chosen), a certain distribution is

(8)

picked up. For the flux distribution to be realistic, the objective function to be optimized should fit the organism objective. Natural selection has compelled microorganisms to aim for increased fitness. Therefore, maximization of growth (biomass equation) is often chosen as objective function, especially for prokaryotes.

Alternatives to FBA include minimization of metabolic adjustment (MOMA; Segrè et al., 2002) and regulatory on/off minimization (ROOM; Shlomi et al., 2005). Both are used to find distribution in mutants cells based on WT distributions. They both minimize flux distribution differences between the WT and the mutant. MOMA is used to find immediate flux distribution whereas FBA and ROOM give the final one (Segrè et al., 2002). MOMA and ROOM are relevant tools when mutants are tested experimentally.

FBA is often referred to as a constraint-‐based tool. Constraints present in current GSMMs can be classified into two types: (1) Balance constraints due to the stoichiometry of the system. (2) Flux constraints where each flux is limited by an upper and lower bound (i.e irreversible reaction cannot carry a negative flux).

Additional constraints on the system can greatly influence the solution space, mainly, the set of all mathematical possible flux distributions that satisfy the constraints (Fig. 2). The more constraints one adds, the more restricted the solution space. This results in predicting an optimal flux distribution with greater accuracy.

Fig. 2. The conceptual basis of constraint-‐based modeling. In this representation, a three reactions model is

assumed for simplicity. Without any constraints, all mathematical distributions are possible solutions of the system. When constraints are added, the solution space is restricted to a constrained area (in blue). An objective function can be used to get an optimal flux distribution for (v1,v2,v3). In this example v3 is the objective function to be optimized. Adapted from Orth et al. (2013).

The main constraint in FBA is the stoichiometry of the system. Net formation of a given metabolite is calculated as the sums of all flux leading to this metabolite minus the sums of all flux consuming this metabolite, over the whole system. Depending on the stoichiometry, some reactions can weigh more than others. For a given flux, a reaction whose stoichiometry is 2 moles of metabolites formed per mole of substrate leads to higher formation rate than if only 1 mole was formed. Thus, net

(9)

metabolite formation rate ^!!_!"^! can be simplified to the sum of reaction rates multiplied by the reaction-‐specific stoichiometry coefficient of the metabolite :

𝑑𝑥_!

𝑑𝑡 = 𝑆_!" 𝑣_!

!

!!!

N is the total number of reactions, 𝑆_!" and vj are stoichiometric coefficient and reaction rate for the i^th metabolite and j^th reaction. This equation can be further generalized for all metabolites M of the system. A vector of net formation rates dx/dt can be obtained as the S matrix multiplied by a vector v containing the flux distribution. Under steady-‐state, the net rate for each metabolites is zero (no accumulation).

𝑑𝑥

𝑑𝑡 = 𝑆𝑣 = 0

Other types of constraints not integrated (for now) in current genome-‐scale models are regulatory constraints or advanced kinetic constraints such as enzyme activity or kinetic parameter values.

Similarly, all possible mathematical distributions can be further plot in two or three dimensions. One reaction flux can be plotted against another one giving a particular solution space. This is useful to determine possible phenotypes for a given genotype.

Ibarra et al. (2002) successfully demonstrated FBA accuracy for long-‐term phenotypes. After 40 days, E. coli evolved from a sub-‐optimal phenotype to reach optimal growth as predicted by FBA.

The most widely used platform to perform FBA is the COBRA toolbox (Schellenberger et al., 2011). It can be used on MATLAB or python platforms and its high modularity and documentation makes COBRA toolbox the best choice in the scientific community. Most simulations were performed using the COBRA toolbox 2.0 on MATLAB.

Optimization algorithms

As discussed before, FBA offers a genuine insight into the flux distribution for a given network. As an extension, perturbations can be applied to the system in the form of reaction deletions or modulations. Flux distributions can be computed and differences can be analyzed to understand the cellular metabolism. Additionally, algorithms can be employed to go the other way round. For a particular phenotype – let’s say an overproduction phenotype-‐, algorithms predict associated reaction deletions or modulations (genotype alteration).

(10)

OptKnock

The most common and widely used algorithm is the bi-‐level optimization framework OptKnock (Burgard et al., 2003). It identifies reaction deletion strategies that couple the biomass objective function with a biochemical overproduction target. This bi-‐level formulation can be simplified into a single-‐level mixed integer linear programming (MILP), which consists of problems with both discrete and continuous variables.

maximize 𝑣!"#$%&' (OptKnock)

𝑦_!

subject to maximize 𝑣!"#$%&& (Primal)

𝑣_!

subject to ^!_!!!𝑆_!" 𝑣_! = 0

𝑣!!!"!#_!"#$%& = 𝑣!!!"!#_!!_!"#$%&%"#'

𝑣_!"#_!^! ≤ 𝑣_!"#_{! !"#}^! 𝑣!"#$!"" ≥ 𝑣!"#$%&&!"#$%!

𝑣_!^!"#∙ 𝑦_! ≤ 𝑣_! ≤ 𝑣_!^!"#∙ 𝑦_!, ∀𝑗 ∈ 𝑁

1 − 𝑦_! ≤ 𝐾 𝑤𝑖𝑡ℎ 𝑦_!

!∈! = 0,1 , ∀𝑗 ∈ 𝑁 𝑎𝑛𝑑 𝐾, 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑘𝑛𝑜𝑐𝑘𝑜𝑢𝑡𝑠 𝑎𝑙𝑙𝑜𝑤𝑒𝑑

However, due to the complexity of the algorithm, high computational time has often been an issue. OptKnock screen for all reactions, and the more knockout allowed, the higher number of combinations. An alternative algorithm is OptGene, a genetic algorithm (GA) to find knockout strategies quicker (Patil et al., 2005). The idea behind this algorithm is to find individuals with high fitness values (mainly high productivity). Individuals are then selected for mating. Offspring are screened and the iterative process continues. A mutation rate is also applied each round in order to create diversity.

OptForce

Developped as an extension of the OptKnock framework, OptForce finds reaction modulations in addition to deletions (Ranganathan et al., 2010). The algorithm compares the flux variability between an overproducing (OP) phenotype and a wild type (WT) phenotype (Fig. 3). Reactions whose value ranges significantly differ between OP and WT phenotypes are listed in MustU (over-‐production) and MustL (down-‐regulation). This can be further extended to reactions pair (MustUU, MustLL, MustUL). As a first output of the algorithm, these reaction lists can be used to guide strategies. OptForce can further compute combinations of these reaction modulations from these lists alongside reaction knockouts. The obtained strategy is referred as the FORCE set. By increasing the number of allowed modifications (knock-‐out/-‐up/-‐down), OptForce can iteratively give different strategies and solutions can pictures in the form of a boolean diagram.

(11)

OptSwap

OptSwap (King and Feist, 2013) is a bi-‐level MILP problem based on RobustKnock formulation (Tepper and Shlomi, 2010). In addition to RobustKnock and OptKnock, it incorporates possible swaps in cofactor specificity for oxidoreductase reactions.

Since cofactor usage is an essential feature of coupling, this algorithm provides a genius tool to achieve high productivity.

Fig. 3. Comparing flux ranges between normal and overproducing phenotype in OptForce (Ranganathan et al., 2010).

There are other similar algorithms developed for specific purposes but not extended to the COBRA toolbox community (OptORF, Kim & Reed, 2010; CASOP, Hädicke &

Klamt, 2010). Nevertheless, results using different algorithms are useful tools to compare with results from OptForce, OptKnock and OptSwap.

n-‐butanol and isobutanol pathways

n-‐butanol (hereafter butanol) is a fermentative product produced from the precursor acetyl-‐CoA through the ABE fermentation process. Traditionally, production of fermentative products has been produced under anaerobic conditions in facultative or obligate fermentative organisms. Earlier efforts to produce butanol have been achieved in the native butanol producer and obligate anaerobic bacteria Clostridia, where butanol production was first reported (Pasteur, 1862). Clostridium species are not great hosts due to their slow growth and thus low volumetric productivity. Therefore, Clostridia butanol pathway has further been adapted to other organisms (Nielsen et al., 2009). Expressing a pathway from a fermentative organism into an autotrophic organism such as cyanobacteria raise few concerns.

Among them, oxygen sensitivity of pathway enzymes (Atsumi et al., 2008; Boynton et al., 1996) and reversibility of trans-‐enoyl-‐CoA reductase (Bond-‐Watts et al., 2011) have been targets for improvement. To solve these problems, chimeric pathway using enzymes from other organisms have been employed (Lan & Liao, 2012; Bond-‐

Watts et al., 2011; Anfelt et al., 2015). Alternative pathways to produce butanol include modification of the clostridia pathway at one or several steps. For instance, Lan and Liao (2012) replaced the first condensation step by an ATP-‐driven two-‐step process using beta-‐oxidation reversal in Synechococcus elongatus PCC 7942. One

(12)

malonyl-‐CoA and one acetyl-‐CoA ultimately condense to one acetoacetyl-‐CoA.

Pasztor et al. (2014) also used beta-‐oxidation reversal and extended it to butyryl-‐

ACP further transformed to butanol using the TPC pathway in E. coli (Fig. 4).

Although isobutanol is similar to butanol, it is less soluble and has an energy density close to gasoline making it a better biofuel. Isobutanol synthesis takes place along the 2-‐keto-‐acid (or Ehrlich) pathway. Amino acid precurssors in the form of keto acids are decarboxylated into aldehydes and reduced to the corresponding alcohols.

Isobutanol can be obtained from valine whereas propanol from isoleucine.

Isobutanol has been successfully produced in E. coli (Yan and Liao, 2009) and Synechococcus elongatus PCC 7942 (Atsumi et al., 2009). All pathways are studied in this project (Fig. 4). They are hereafter referred to as LL pathway (Lan and Liao, 2012), TPC pathway (Pasztor et al., 2014), isobutanol pathway, NADH and NADPH pathway.

FBA was used to determine the maximal theoretical butanol (or isobutanol) yields for all these pathways (Table 1). Yields were calculated as production rate over carbon uptake rate. Under light-‐limited conditions, the carbon uptake is the sole limiting-‐factor. It is calculated as bicarbonate input minus carbon dioxide output.

Even though most pathways have the same maximal rate towards the product, yields slightly differ because of different uptake rates.

Table 1. Maximum rates and yields for the different pathways in light-‐limited conditions.

Maximum production rate

Bicarbonate uptake rate

Carbon dioxide export rate

Maximum theoretical yield YHCO3-‐,P Pathway mmol Prod. / gDW h mmol HCO_3-‐/ gDW h mmol CO₂ / gDW h mol Prod. / mol HCO_3-‐

Std NADH 0.39 1.55 0 0.252

Std NADPH 0.39 1.55 0 0.252

TPC 0.375 3.7 2.19 0.248

LL 0.39 3.7 2.14 0.250

Isobutanol 0.39 1.55 0 0.252

It is interesting to see that for TPC and LL pathways, HCO3-‐ is taken at maximal rate even though one third is required as represented by a massive CO2-‐ export. These two pathways differ from the others in using ATP. One explanation could be that this extra demand requires higher ATP production. The main ATP source is ATP synthase, whose activity is dependent on LEF, fixed in this case (light-‐limiting).

Based on these calculations, each pathway can achieve similar maximal theoretical yield due to similar stoichiometry. However TPC and LL pathway requires more carbon dioxide. It was mentioned that carbon sinks such as decarboxylation steps increased carbon uptake (Oliver and Atsumi, 2015). This can be a disadvantage for the cost of the process with a high amount of carbon not efficiently used. Production of isobutanol is suggested here since it a better fuel and has a similar maximal yield than butanol.

(13)

Fig. 4. Pathways leading to butanol and isobutanol synthesis from pyruvate precursor as implemented in the model. Plain and dashed arrows indicate native and non-‐native reactions respectively. In blue, the 2-‐keto-‐

acid route for isobutanol. In red, standard Clostridium butanol pathway. In this project, last three steps were studied with NADH or NADPH. In green, ATP-‐driven pathway reported by Lan and Liao (2012). The first two steps and conversion of crotonyl-‐CoA to butyryl-‐CoA useing NADH are specific for this pathway. In yellow, TPC pathway using β-‐oxidation reversal reported by Pazstor et al. (2014). Metabolite names next to arrows indicate reaction requirements.

(14)

Metabolic engineering in cyanobacteria

Generic approaches / Introduction

Regardless of the organism used, metabolic engineering can be divided into three levels with increased complexity and predictive power (Fig. 5). Initial strategies have mainly focused on gene level. Rerouting carbon flux to a certain pathway by simple amplification using high gene copy number, optimized codon usage or increased promoter strength was common. Quickly it was realized that productivities did not meet expectations due to cellular balancing omission.

Stepping back to the pathway level granted a better global understanding of the process. The “push and pull” concept aiming at increasing pathway precursors and minimizing competing pathways is an example. Balancing the pathway using cofactor usage and the notion of driving force enabled to achieve higher productivity. The current state of metabolic engineering focuses on these two levels.

Generic strategies to increase productivities include most notably: By-‐product production inhibition, removal of competing reactions, target pathway overexpression, cofactor usage, product degradation/uptake removal, feedback inhibition removal or toxicity tolerance. Built on the previous levels, systems level is at its infancy due to its high level of complexity. It requires high computational power and high understanding of the organism.

Fig. 5. Systemic approach of strain development (Lan and Liao, 2013). From right to left, increased in

productivity often means increased in complexity.

In cyanobacteria, early strategies on individual gene level have been performed extensively. Expression of different native and foreign promoters has been achieved.

Native promoters used are usually found in housekeeping genes (light-‐inducible PpsbA2 from PSII; Prbc in CO2 fixation). Strong foreign IPTG-‐inducible promoter Ptrc is widely used in pathway engineering (Atsumi et al., 2009; Huang et al., 2010; Lan and Liao, 2011). Codon usage for heterologous gene could also be optimized. IspS gene encoding isoprene synthase from a plant specie was successfully codon-‐optimized, resulting in a 10-‐fold increase in gene expression (Lindberg et al, 2010).

Recent advances have mainly focus on the pathway level with the notion of a driving force pushing towards product synthesis. Decarboxylation and ATP usage could provide such a driving force (Lan and Liao, 2012). As discussed earlier, reversibility

(15)

pathway reactions has also been investigated as a way to push forward the carbon flux (Bond-‐Watts et al., 2011). Enzyme kinetic data can also be integrated to have an overview of the rate-‐controlling step in a pathway using metabolic control analysis (Angermayr et al., 2013). Engineering tolerance to product is also an important part to achieve high productivity (Kaczmarzyk et al., 2014).

Future studies aim at building on these driving forces to improve current productivity. Genetic engineering of non-‐obvious targets aims at creating such driving forces over the whole organism rather than the pathway level. Since genome-‐scale reconstructions involve hundreds of reactions, powerful computational techniques are therefore essential. In E. coli, it was possible to use OptKnock to couple lactate production to growth (Fong et al., 2005). After 60 days, experimental results were in good agreement with simulations and approximately 90% of maximal theoretical lactate production rate was achieved. Through this project, analysis of driving forces that lead to growth-‐coupled butanol synthesis is a recurring theme.

Cofactor recycling

One important issue in metabolic engineering is cofactor balance. It has been many times the case that adding and removing pathways results in a cofactor imbalance, equivalent to a bottleneck. Due to interconnections between photosynthesis and respiration, cyanobacteria have a highly regulated cofactor balance making it difficult to engineer.

Cofactor recycling is important in order to couple biomass to product synthesis.

Since cell factories internal objectives are to optimize their growth, linking it to engineering objective represents a genuine driving force towards overproduction phenotype. A phenotypic phase plane (PPP or production envelope) can be plot to illustrate that. Biomass is iteratively forced at different flux values. For each of them, flux towards product synthesis is maximized and minimized. For a given genotype, the PPP represents all mathematically possible phenotypes. We assume that the cell will optimize its growth. Thus, the cellular flux distribution will be pinpointed as the point with the highest growth rate (far right; Fig. 6). Now if product synthesis is coupled to growth, at maximal growth rate, product synthesis will occur as a necessary by-‐product to growth. This notion is perhaps the central aspect of this work and provides a powerful strategy for product synthesis.

Coupling biomass and fermentative products requires few gene deletions in E. coli or S. cerevisiae. Lactate production in E. coli is a good example. Biomass formation results in an excess of reducing equivalents through the central metabolism in the form of NADH. Under anaerobic conditions, the TCA cycle is not active.

Fermentation pathways leading to lactate, ethanol or succinate (all recycling NADH to NAD⁺) are thus the main remaining mechanisms to re-‐oxidize excess NADH. This recycling is essential for the cell to sustain growth. Without any NAD⁺ available, flux through the central metabolism cannot go forward. As a result, lactate, ethanol, and

(16)

succinate are all potential growth-‐coupled product through cofactor recycling.

Strong coupling between one of these products and growth can be achieved by knocking-‐out other fermentative pathways.

Fig. 6. Simple trade-‐off versus growth-‐coupled phenotypic phase plane. A. In the simple trade-‐off design, no product is synthesized at maximal growth rate. B. In the growth-‐coupled design, product is synthesized at maximal growth rate.

Another key concept related to cofactor recycling as driving force is the ATP/NADPH ratio, a central theme in cyanobacteria. For instance, knocking out ethanol dehydrogenase and acetate kinase lead to growth-‐coupled synthesis of lactate in silico (Burgard et al., 2003) and in vivo (Fong et al., 2005). This suggests that single NADH recycling does not achieve coupling in this case. ATP generation is also an important factor.

Coupling product with biomass has been shown to be difficult in cyanobacteria (Nogales et al., 2013). Under phototrophic conditions, presence of mechanisms to balance cofactor redox state hinders coupling (Fig. 7). Cyclic electron flows (or alternate electron flows; CEF) support the cell in ATP/NADPH ratio modulation depending on environmental conditions. Because carbon dioxide fixation requires an ATP/NADPH ratio of 1.5, higher than the 1.28 ratio provided by the linear electron flow, CEF decreases the ATP/NADPH ratio by indirectly increasing ATP production at the expense of direct NADH/NADPH oxidation. It is believed that biomass requires a ratio larger than 1.51 or 2 (Erdrich et al., 2014; Knoop and Steuer, 2015). For this reason, a successful strategy would aim at decreasing the ATP/NADPH ratio. With a higher ratio requirement needed to produce biomass, an excess NADPH would then be available for our product. NADPH would then be recycled back to NADP⁺ using product synthesis pathway. This would provide a way to couple product with growth.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0 0.01 0.02 0.03 0.04

Product mmol/gDW/h

Growth rate 1/h

A. Simple trade-‐off

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0 0.01 0.02 0.03 0.04

Product mmol/gDW/h

Growth rate 1/h

B. Growth-‐coupled

(17)

Fig. 7. A. Overview of photosynthetic and respiratory electron transport chains. In most cyanobacteria, photosynthesis only occurs in thylakoids, and respiration takes place in both cellular and thylakoid membrane.

As a result photosynthesis and respiration closely interact even sharing some membrane components (plastoquinone PQ pool, cytochrome b6f). Linear electron flow includes photosystem II (PSII), cytochrome b6f (CYTBF), photosystem I (PSI), ferredoxin NADP+ oxidoreductase (FNR), connected trough plastoquinone (PQ), cytochrome c (cytC), and ferredoxin, respectively. Cyclic electron flows include the ferredoxin PQ reductase (FQR), the NAD(P)H dehydrogenase complexes, the aa3-‐type terminal oxidase (CYO), the PQ oxidase (CydBD), the MEHLER reaction, and the hydrogenase (H2ase). B. Simple scheme showing how ATP/NADPH requirements can be engineered to provide a driving force for product synthesis.

ATP NADPH

1.3 LEF

≥ 2 Biomass

Fixed ATP/NADPH ratios

Extra NADPH for product synthesis

B A

(18)

Results

OptKnock and OptForce were applied to iJN678. OptKnock was performed using the COBRA toolbox 2.0 on MATLAB (Schellenberger et al., 2011). OptForce was performed using the GAMS modeling environment (General Algebraic Modeling System, www.gams.com). CPLEX solver was used for both algorithms. Transport reactions were removed from consideration using OptKnock. For OptForce, transport and peripherical reactions associated with biomass formation (carotenoid, riboflavin, sterol, nucleic acid components...) were removed from the pool as much as possible (reduced set). Two approaches were used, one iteratively removing these reactions from the solution and another removing these reactions before running the algorithm. Transporters or other peripherical reactions were allowed to be knockout mainly for informative reasons. Understanding the logic behind these knockouts is an important aspect to understand driving forces behind growth-‐coupled butanol production. In many cases, strategies could not be found for the reduced set. In both cases, reaction deletions found using OptKnock were used to guide the algorithm and decrease the computation time. Simulations were limited to few hours.

Identifying the driving force

Identifying the reason behind coupling is an important part of understanding to create high-‐producing strain. For each strategy, sensitivity to cofactor recycling is tested to identify which cofactor balance requirements push the flux though product synthesis. Implementing a reaction that re-‐oxidize a specific cofactor and test its influence on the coupling design is informative. If the coupling is not affected, the strategy is not sensitive to this cofactor. Conversely, no coupling means that the reaction is used instead of the pathway because it is less a burden for the cell to simply recycle cofactors using a single reaction than producing butanol. For instance, a reaction converting NADH to NAD+ can be added for a given strategy to test sensitivity to NADH recycling. If the PPP shows uncoupling between growth and product, then the strategy is said to be sensitive to NADH consumption. Strategies were sensitive to NADH consumption, NADPH consumption, and/or ATP production.

OptKnock to predict reaction deletions

OptKnock provides the first step towards identifying suitable gene targets. High computational time or even existence of a possible solution makes it difficult to find targets. OptKnock hinted strategies for both heterotrophic and autotrophic conditions. For simplicity and also because autotrophic conditions are target conditions for cyanobacteria growth, autotrophic strategies are considered here.

Strategies were found for the NADH pathway and for the LL pathway.

(19)

Strategy for NADH pathway

The initial list proposed by OptKnock involved more than 10 reaction deletions to achieve coupling. Refining the list involved successive removal of each target to get a minimal set required to reach coupling. The reduced list involves 7 reaction deletions (Table 2). To compare the effect of each reaction deletion on the flux distribution, simulation with the restored reaction was computed for each target (mutant+). A visualization tool was used to easily see global changes in the metabolism (Maarleveld et al., 2014). A phenotypic phase plane permits to visualize growth-‐coupled production of butanol (Fig. 8). This strategy was shown to be sensitive to NADH consumption.

Fig. 8. Phenotypic phase plane for the standard NADH pathway. In the modified strain, approximately 0.13

mmol/gDW/h butanol production is predicted at maximal growth.

Among the targets, hindering the circular electron flow was shown to be essential for coupling. NADH dehydrogenases (type 1 and type 2) were required to be knocked out. This is consistent with other studies from Erdrich et al. (2014) and Reed et al. (2013), where they used other algorithms such as CASOP and OptORF, respectively. The most obvious reason behind these knockouts is the removal of competing reactions for NADH re-‐oxidation. Since biomass is optimized, NADH re-‐

oxidation will preferentially occurs in the ETC, where ATP can be produced as an end product rather than being ‘wasted’ for a product without interest for the cell (butanol is excreted and cannot be re-‐used after all).

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

Butanol mmol/gDW/h

Growth rate 1/h

PPP for NADH pathway

initial strain moditied strain

Flux balance analysis to improve butanol productivity in Synechocystis PCC 6803 Master of Science Thesis Kiyan Shabestary