Workflows

Micom was designed to create and analyze personalized metabolic models for microbial communities. This makes it necessary to run many of the analyses in micom for many samples. As all of the methods currently implemented in micom can be run independently for each sample this workload can be parallelized pretty easily. To make this simple micom provides a workflow module that lets you run analyses for many samples in parallel. It also integrate with the micom logger and has workarounds for some memory leaks in optlang which improves memory usage. As a rule of thumb for each sample you will need one CPU and about 1GB of RAM, so if you have a server with 16 cores and 16+GB of RAM available you can run up to 16 samples in parallel.

For a workflow you will need two things:

  1. A function that takes arguments for a single sample and performs your analysis
  2. A list of arguments for each sample

Let us understand this better with a short example. Let us assume that we want to run the cooperative tradeoff method for our E. coli example with varying numbers of E. coli strains.

In [1]:
from micom.data import test_taxonomy

taxonomies = [test_taxonomy(n=n) for n in range(2, 12)]
taxonomies[2]
Out[1]:
id genus species reactions metabolites file
0 Escherichia_coli_1 Escherichia Eschericia coli 95 72 /home/cdiener/code/micom/micom/data/e_coli_cor...
1 Escherichia_coli_2 Escherichia Eschericia coli 95 72 /home/cdiener/code/micom/micom/data/e_coli_cor...
2 Escherichia_coli_3 Escherichia Eschericia coli 95 72 /home/cdiener/code/micom/micom/data/e_coli_cor...
3 Escherichia_coli_4 Escherichia Eschericia coli 95 72 /home/cdiener/code/micom/micom/data/e_coli_cor...

This will be our arguments. Each entry in taxonomies defines a single sample so we have 10 samples in total. Now we need a function that takes a single samples’ arguments as input (as set of abundances) and runs the cooperative tradeoff method. So let us implement that:

In [2]:
from micom import Community

def run_tradeoff(tax):
    com = Community(tax, progress=False, solver="gurobi")
    sol = com.cooperative_tradeoff()
    return sol.members

This is all we need to run the analysis in parallel.

In [3]:
from micom.workflows import workflow

results = workflow(run_tradeoff, taxonomies, n_jobs=2)
100%|██████████| 10/10 [00:06<00:00,  1.57sample(s)/s]

results is a list that contains one entry for each result (in the correct order).

In [4]:
results[2]
Out[4]:
abundance growth_rate reactions metabolites
compartments
Escherichia_coli_1 0.25 0.873922 95 72
Escherichia_coli_2 0.25 0.873922 95 72
Escherichia_coli_3 0.25 0.873922 95 72
Escherichia_coli_4 0.25 0.873922 95 72
medium NaN NaN 20 20