Workflows¶
Micom was designed to create and analyze personalized metabolic models for microbial communities. This makes it necessary to run many of the analyses in micom for many samples. As all of the methods currently implemented in micom can be run independently for each sample this workload can be parallelized pretty easily. To make this simple micom provides a workflow module that lets you run analyses for many samples in parallel. It also integrate with the micom logger and has workarounds for some memory leaks in optlang which improves memory usage. As a rule of thumb for each sample you will need one CPU and about 1GB of RAM, so if you have a server with 16 cores and 16+GB of RAM available you can run up to 16 samples in parallel.
For a workflow you will need two things:
- A function that takes arguments for a single sample and performs your analysis
- A list of arguments for each sample
Let us understand this better with a short example. Let us assume that we want to run the cooperative tradeoff method for our E. coli example with varying numbers of E. coli strains.
In [1]:
from micom.data import test_taxonomy
taxonomies = [test_taxonomy(n=n) for n in range(2, 12)]
taxonomies[2]
Out[1]:
id | genus | species | reactions | metabolites | file | |
---|---|---|---|---|---|---|
0 | Escherichia_coli_1 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
1 | Escherichia_coli_2 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
2 | Escherichia_coli_3 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
3 | Escherichia_coli_4 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
This will be our arguments. Each entry in taxonomies defines a single sample so we have 10 samples in total. Now we need a function that takes a single samples’ arguments as input (as set of abundances) and runs the cooperative tradeoff method. So let us implement that:
In [2]:
from micom import Community
def run_tradeoff(tax):
com = Community(tax, progress=False, solver="gurobi")
sol = com.cooperative_tradeoff()
return sol.members
This is all we need to run the analysis in parallel.
In [3]:
from micom.workflows import workflow
results = workflow(run_tradeoff, taxonomies, n_jobs=2)
100%|██████████| 10/10 [00:06<00:00, 1.57sample(s)/s]
results
is a list that contains one entry for each result (in the
correct order).
In [4]:
results[2]
Out[4]:
abundance | growth_rate | reactions | metabolites | |
---|---|---|---|---|
compartments | ||||
Escherichia_coli_1 | 0.25 | 0.873922 | 95 | 72 |
Escherichia_coli_2 | 0.25 | 0.873922 | 95 | 72 |
Escherichia_coli_3 | 0.25 | 0.873922 | 95 | 72 |
Escherichia_coli_4 | 0.25 | 0.873922 | 95 | 72 |
medium | NaN | NaN | 20 | 20 |