Building communities¶
micom
will construct communities from a specification via a Pandas
DataFrame. Here, the DataFrame needs at least two columns: “id” and
“file” which specify the ID of the organism/tissue and a file containing
the actual individual model.
To make more sense of that we can look at a small example. micom
comes with a function that generates a simple example community
specification consisting of several copies of a small E. coli model
containing only the central carbon metabolism.
In [10]:
from micom.data import test_taxonomy
taxonomy = test_taxonomy()
taxonomy
Out[10]:
id | genus | species | reactions | metabolites | file | |
---|---|---|---|---|---|---|
0 | Escherichia_coli_1 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
1 | Escherichia_coli_2 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
2 | Escherichia_coli_3 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
3 | Escherichia_coli_4 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
4 | Escherichia_coli_5 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... |
As we see this specification contains the required fields and some more information. In fact the specification may contain any number of additional information which will be saved along with the community model. One special example is “abundance” which we will get to know soon :) The logic behind requiring the community information in this table format is that this table can be appended as supplement to your project or publication as is and describes your community composition without any doubt.
In order to convert the specification in a community model we will use
the Community
class from micom
which derives from the cobrapy
Model
class.
In [11]:
from micom import Community
com = Community(taxonomy)
print("Build a community with a total of {} reactions.".format(len(com.reactions)))
100%|██████████| 5/5 [00:00<00:00, 5.96models/s]
Build a community with a total of 495 reactions.
This includes the correctly scaled exchange reactions with the internal
medium and initializes the external imports to the maximum found in all
models. The original taxonomy is maintained in the com.taxonomy
attribute.
Note that micom
can figure out how to read a variety of different
file types based on the extension. It curently supports:
.pickle
for pickled models.xml
or.gz
for XML models.json
for JSON models.mat
for COBRAtoolbox models
In [12]:
com.taxonomy
Out[12]:
id | genus | species | reactions | metabolites | file | abundance | |
---|---|---|---|---|---|---|---|
id | |||||||
Escherichia_coli_1 | Escherichia_coli_1 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... | 0.2 |
Escherichia_coli_2 | Escherichia_coli_2 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... | 0.2 |
Escherichia_coli_3 | Escherichia_coli_3 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... | 0.2 |
Escherichia_coli_4 | Escherichia_coli_4 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... | 0.2 |
Escherichia_coli_5 | Escherichia_coli_5 | Escherichia | Eschericia coli | 95 | 72 | /home/cdiener/code/micom/micom/data/e_coli_cor... | 0.2 |
As you can notice we have gained a new column called abundance
. This
column quantifies the relative quantity of each individual in the
community. Since we did not specify this in the original taxonomy
micom
assumes that all individuals are present in the same quantity.
The presented community here is pretty simplistic. For microbial
communities micom
includes a larger taxonomy for 773 microbial
species from the AGORA paper.
Here the “file” column only contains the base names for the files but
you can easily prepend any path as demonstrated in the following:
In [13]:
from micom.data import agora
tax = agora.copy()
tax.file = "models/" + tax.file # assuming you have downloaded the AGORA models to the "models" folder
tax.head()
Out[13]:
organism | id | kingdom | phylum | class | order | family | genus | species | oxygen_status | ... | draft_created | platform | kbase_genome_id | pubseed_id | ncbi_id | genome_size | genes | reactions | metabolites | file | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Abiotrophia defectiva ATCC 49176 | Abiotrophia_defectiva_ATCC_49176 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Aerococcaceae | Abiotrophia | Abiotrophia defectiva | Facultative anaerobe | ... | 07/01/14 | ModelSEED | NaN | Abiotrophia defectiva ATCC 49176 (592010.4) | 592010.0 | 2041839 | 598 | 1069 | 840 | models/Abiotrophia_defectiva_ATCC_49176.xml |
1 | Acidaminococcus fermentans DSM 20731 | Acidaminococcus_fermentans_DSM_20731 | Bacteria | Firmicutes | Negativicutes | Acidaminococcales | Acidiaminococcaceae | Acidaminococcus | Acidaminococcus fermentans | Obligate anaerobe | ... | 04/17/16 | Kbase | kb|g.2555 | Acidaminococcus fermentans DSM 20731 (591001.3) | 591001.0 | 2329769 | 646 | 1090 | 903 | models/Acidaminococcus_fermentans_DSM_20731.xml |
2 | Acidaminococcus intestini RyC-MR95 | Acidaminococcus_intestini_RyC_MR95 | Bacteria | Firmicutes | Negativicutes | Selenomonadales | Acidaminococcaceae | Acidaminococcus | Acidaminococcus intestini | Obligate anaerobe | ... | 08/03/14 | ModelSEED | NaN | Acidaminococcus intestini RyC-MR95 (568816.4) | 568816.0 | 2487765 | 599 | 994 | 827 | models/Acidaminococcus_intestini_RyC_MR95.xml |
3 | Acidaminococcus sp. D21 | Acidaminococcus_sp_D21 | Bacteria | Firmicutes | Negativicutes | Selenomonadales | Acidaminococcaceae | Acidaminococcus | unclassified Acidaminococcus | Obligate anaerobe | ... | 06/29/12 | ModelSEED | NaN | Acidaminococcus sp. D21 (563191.3) | 563191.0 | 2238973 | 598 | 851 | 768 | models/Acidaminococcus_sp_D21.xml |
4 | Acinetobacter calcoaceticus PHEA-2 | Acinetobacter_calcoaceticus_PHEA_2 | Bacteria | Proteobacteria | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Acinetobacter | Acinetobacter calcoaceticus | Aerobe | ... | 04/18/16 | Kbase | kb|g.3519 | Acinetobacter calcoaceticus PHEA-2 (871585.3) | 871585.0 | 3862530 | 1026 | 1561 | 1165 | models/Acinetobacter_calcoaceticus_PHEA_2.xml |
5 rows × 24 columns
Saving and loading communities¶
Contructing large community models can be slow which is due to performance limitations of the solvers. In essence, adding a single variable/constraint to a model qih 10 variables is much faster than adding it to a model with 10 million variables. Thus, we recommend you save the constructed community in a serialized format afterwards which will be much faster in loading repetitively.
In [14]:
%time com = Community(taxonomy)
100%|██████████| 5/5 [00:00<00:00, 6.46models/s]
CPU times: user 784 ms, sys: 4.18 ms, total: 788 ms
Wall time: 785 ms
In [15]:
%time com.to_pickle("community.pickle")
CPU times: user 15.7 ms, sys: 4.13 ms, total: 19.8 ms
Wall time: 19.4 ms
In [16]:
from micom import load_pickle
%time com = load_pickle("community.pickle")
CPU times: user 152 ms, sys: 3.92 ms, total: 156 ms
Wall time: 154 ms
As we can see loading the model from the pickle format is much faster than creating it de novo.