CORDA tutorial

Reactions, Genes and confidences

In order to start a reconstruction CORDA requires you to assign a confidence score to each reaction in your base model. This can be done by a variety of methods, and even by hand, but the most common way is to assign confidence based on proteome or gene expression data.

CORDA manages a total of 5 confidence levels:

  • -1 for reactions that are not present and should not be included in the model
  • 0 for reactions with unknown confidence which may be included in the model if necessary
  • 1 for low confidence reactions that should be included if necessary
  • 2 for medium confidence reactions that should be included if necessary
  • 3 for high confidence reactions that must be included if possible in any way

The most tedious step here is usaully mapping the confidence for genes or proteins to the distinct reactions. Many of the larger models come with gene-reaction rules in the form

gene1 and gene2 or (gene3 and gene4)

and the individual confidence values have to be mapped from the gene confidence levels. Here "and" is evaluated by the minimum confidence and "or" by the maximum confidence. The Python package includes a handy function to do this for you automatically in a safe manner. For that you will require the gene-reaction rule (Recon 1 and 2 include them in their model for instance) and a dictionary mapping genes/proteins to their confidence values. For examples:

In [1]:
from corda import reaction_confidence

gene_conf = {"gene1": 1, "gene2": 3, "gene4": -1} # missing entries are automatically assigned zeroes
rule = "gene1 and gene2 or (gene3 and gene4)"

reaction_confidence(rule, gene_conf)
Loading symengine... This feature is in beta testing. Please report any issues you encounter on http://github.com/biosustain/optlang/issues
Out[1]:
1

A small example

Now we will perform a small reconstruction. The package includes a small model in SBML format describing the general central carbon metabolism.

In [2]:
from corda import test_model

mod = test_model()
len(mod.reactions)
Out[2]:
60

Here the last reaction is a biomass reaction.

In [3]:
mod.reactions[59].reaction
Out[3]:
'0.39253 3pg + 20.7045 atp + 0.15446 cit + 0.38587 glu + 0.35261 oaa + 0.053446 prpp + 0.50563 pyr --> 20.6508 adp + 20.6508 pi'

We can now reconstruct the smallest model that still allows for the production of biomass. Let's start by setting the biomass reaction as high confidence and all other reactions as "not include".

In [4]:
conf = {}
for r in mod.reactions: conf[r.id] = -1
conf["r60"] = 3

Now we reconstruct the model.

In [5]:
from corda import CORDA

opt = CORDA(mod, conf)
opt.build()
print(opt)
build status: reconstruction complete
Inc. reactions: 33/60
 - unclear: 0/0
 - exclude: 32/59
 - low and medium: 0/0
 - high: 1/1

In order to see the list of reactions in the reconstructed model without specifically creating the new model you can use the following snippet.

In [6]:
print([opt.model.reactions.get_by_id(k).reaction for k, used in opt.included.items() if used])
['fdp <=> dhap + g3p', 'g3p + nad + pi <=> 13dpg + nadh', 'atp + f6p --> adp + fdp', 'g6p <=> f6p', '3pg + atp <=> 13dpg + adp', 'g1p <=> g6p', 'dhap <=> g3p', 'lac_L + nad <=> nadh + pyr', 'atp + r5p <=> amp + prpp', 'ru5p_D <=> xu5p_D', 'r5p <=> ru5p_D', 'g3p + s7p <=> e4p + f6p', 'r5p + xu5p_D <=> g3p + s7p', 'e4p + xu5p_D <=> f6p + g3p', 'cit <=> icit', 'icit + nad <=> akg + nadh', 'icit + nadp <=> akg + nadph', 'mal_L + nad <=> nadh + oaa', 'mal_L + nad <=> nadh + pyr', 'mal_L + nadp <=> nadph + pyr', 'atp + pyr --> adp + oaa + pi', 'gln <=> glu', 'glu + nad <=> akg + nadh', 'glu + nadp <=> akg + nadph', 'adp + pi --> atp', 'nadh --> nad', ' --> g1p', ' <=> gln', ' --> adp', ' <=> amp', ' <=> pi', 'lac_L <=> ', '0.39253 3pg + 20.7045 atp + 0.15446 cit + 0.38587 glu + 0.35261 oaa + 0.053446 prpp + 0.50563 pyr --> 20.6508 adp + 20.6508 pi']

This gives a reconstruction that looks like this (orange denotes included reactions): reconstruction

We can also define additional metabolic functions that should be possible in the reconstruction. Let's assume we want to be able to produce pep.

In [7]:
opt = CORDA(mod, conf, met_prod="pep")
opt.build()
print(opt)
build status: reconstruction complete
Inc. reactions: 39/61
 - unclear: 0/0
 - exclude: 37/59
 - low and medium: 0/0
 - high: 2/2

This time we included the production of pep:

In [8]:
rec = opt.cobra_model("plus_pep")
use = rec.metabolites.pep.reactions
print("# of redundant pathway for pep =", opt.redundancies["EX_CORDA_0"])
for r in use: print(r.reaction)
# of redundant pathway for pep = 2
gtp + oaa <=> gdp + pep
2pg <=> pep

By default CORDA uses redundancy. This means, in case there are several minimal pathways to reach your objective, CORDA will include all of those (which is good since it gives your model some robustness). In this case this did happen and yields two different forms to produce pep. If we do not want that feature we can modify the parameter n in the CORDA initializer which denotes the maximum number of redundant pathways to include.

In [9]:
opt = CORDA(mod, conf, met_prod="pep", n=1)
opt.build()

rec_min = opt.cobra_model("plus_pep_nored")
print("used", len(rec_min.reactions), "reactions")
print("# of redundant pathway for pep =", opt.redundancies["EX_CORDA_0"])
use = rec_min.metabolites.pep.reactions
for r in use: print(r.reaction)
used 33 reactions
# of redundant pathway for pep = 1
gtp + oaa <=> gdp + pep

Which now includes only a single pathway to activate each high confidence flux.