{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building communities\n", "\n", "`micom` will construct communities from a specification via a Pandas DataFrame. Here, the DataFrame needs at least two columns: \"id\" and \"file\" which specify the ID of the organism/tissue and a file containing the actual individual model. \n", "\n", "To make more sense of that we can look at a small example. `micom` comes with a function that generates a simple example community specification consisting of several copies of a small *E. coli* model containing only the central carbon metabolism." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgenusspeciesreactionsmetabolitesfile
0Escherichia_coli_1EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...
1Escherichia_coli_2EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...
2Escherichia_coli_3EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...
3Escherichia_coli_4EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...
4Escherichia_coli_5EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...
\n", "
" ], "text/plain": [ " id genus species reactions metabolites \\\n", "0 Escherichia_coli_1 Escherichia Eschericia coli 95 72 \n", "1 Escherichia_coli_2 Escherichia Eschericia coli 95 72 \n", "2 Escherichia_coli_3 Escherichia Eschericia coli 95 72 \n", "3 Escherichia_coli_4 Escherichia Eschericia coli 95 72 \n", "4 Escherichia_coli_5 Escherichia Eschericia coli 95 72 \n", "\n", " file \n", "0 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "1 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "2 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "3 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "4 /home/cdiener/code/micom/micom/data/e_coli_cor... " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from micom.data import test_taxonomy\n", "\n", "taxonomy = test_taxonomy()\n", "taxonomy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see this specification contains the required fields and some more information. In fact the specification may contain any number of additional information which will be saved along with the community model. One special example is \"abundance\" which we will get to know soon :) The logic behind requiring the community information in this table format is that this table can be appended as supplement to your project or publication as is and describes your community composition without any doubt.\n", "\n", "In order to convert the specification in a community model we will use the `Community` class from `micom` which derives from the cobrapy `Model` class." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 5/5 [00:00<00:00, 5.96models/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Build a community with a total of 495 reactions.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from micom import Community\n", "\n", "com = Community(taxonomy)\n", "print(\"Build a community with a total of {} reactions.\".format(len(com.reactions)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This includes the correctly scaled exchange reactions with the internal medium and initializes the external imports to the maximum found in all models. The original taxonomy is maintained in the `com.taxonomy` attribute.\n", "\n", "Note that `micom` can figure out how to read a variety of different file types based on the extension. It curently supports:\n", "\n", "- `.pickle` for pickled models\n", "- `.xml` or `.gz` for XML models\n", "- `.json` for JSON models\n", "- `.mat` for COBRAtoolbox models\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idgenusspeciesreactionsmetabolitesfileabundance
id
Escherichia_coli_1Escherichia_coli_1EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...0.2
Escherichia_coli_2Escherichia_coli_2EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...0.2
Escherichia_coli_3Escherichia_coli_3EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...0.2
Escherichia_coli_4Escherichia_coli_4EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...0.2
Escherichia_coli_5Escherichia_coli_5EscherichiaEschericia coli9572/home/cdiener/code/micom/micom/data/e_coli_cor...0.2
\n", "
" ], "text/plain": [ " id genus species \\\n", "id \n", "Escherichia_coli_1 Escherichia_coli_1 Escherichia Eschericia coli \n", "Escherichia_coli_2 Escherichia_coli_2 Escherichia Eschericia coli \n", "Escherichia_coli_3 Escherichia_coli_3 Escherichia Eschericia coli \n", "Escherichia_coli_4 Escherichia_coli_4 Escherichia Eschericia coli \n", "Escherichia_coli_5 Escherichia_coli_5 Escherichia Eschericia coli \n", "\n", " reactions metabolites \\\n", "id \n", "Escherichia_coli_1 95 72 \n", "Escherichia_coli_2 95 72 \n", "Escherichia_coli_3 95 72 \n", "Escherichia_coli_4 95 72 \n", "Escherichia_coli_5 95 72 \n", "\n", " file \\\n", "id \n", "Escherichia_coli_1 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "Escherichia_coli_2 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "Escherichia_coli_3 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "Escherichia_coli_4 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "Escherichia_coli_5 /home/cdiener/code/micom/micom/data/e_coli_cor... \n", "\n", " abundance \n", "id \n", "Escherichia_coli_1 0.2 \n", "Escherichia_coli_2 0.2 \n", "Escherichia_coli_3 0.2 \n", "Escherichia_coli_4 0.2 \n", "Escherichia_coli_5 0.2 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "com.taxonomy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can notice we have gained a new column called `abundance`. This column quantifies the relative quantity of each individual in the community. Since we did not specify this in the original taxonomy `micom` assumes that all individuals are present in the same quantity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The presented community here is pretty simplistic. For microbial communities `micom` includes a larger taxonomy for 773 microbial species from the [AGORA paper](https://doi.org/10.1038/nbt.3703). Here the \"file\" column only contains the base names for the files but you can easily prepend any path as demonstrated in the following:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
organismidkingdomphylumclassorderfamilygenusspeciesoxygen_status...draft_createdplatformkbase_genome_idpubseed_idncbi_idgenome_sizegenesreactionsmetabolitesfile
0Abiotrophia defectiva ATCC 49176Abiotrophia_defectiva_ATCC_49176BacteriaFirmicutesBacilliLactobacillalesAerococcaceaeAbiotrophiaAbiotrophia defectivaFacultative anaerobe...07/01/14ModelSEEDNaNAbiotrophia defectiva ATCC 49176 (592010.4)592010.020418395981069840models/Abiotrophia_defectiva_ATCC_49176.xml
1Acidaminococcus fermentans DSM 20731Acidaminococcus_fermentans_DSM_20731BacteriaFirmicutesNegativicutesAcidaminococcalesAcidiaminococcaceaeAcidaminococcusAcidaminococcus fermentansObligate anaerobe...04/17/16Kbasekb|g.2555Acidaminococcus fermentans DSM 20731 (591001.3)591001.023297696461090903models/Acidaminococcus_fermentans_DSM_20731.xml
2Acidaminococcus intestini RyC-MR95Acidaminococcus_intestini_RyC_MR95BacteriaFirmicutesNegativicutesSelenomonadalesAcidaminococcaceaeAcidaminococcusAcidaminococcus intestiniObligate anaerobe...08/03/14ModelSEEDNaNAcidaminococcus intestini RyC-MR95 (568816.4)568816.02487765599994827models/Acidaminococcus_intestini_RyC_MR95.xml
3Acidaminococcus sp. D21Acidaminococcus_sp_D21BacteriaFirmicutesNegativicutesSelenomonadalesAcidaminococcaceaeAcidaminococcusunclassified AcidaminococcusObligate anaerobe...06/29/12ModelSEEDNaNAcidaminococcus sp. D21 (563191.3)563191.02238973598851768models/Acidaminococcus_sp_D21.xml
4Acinetobacter calcoaceticus PHEA-2Acinetobacter_calcoaceticus_PHEA_2BacteriaProteobacteriaGammaproteobacteriaPseudomonadalesMoraxellaceaeAcinetobacterAcinetobacter calcoaceticusAerobe...04/18/16Kbasekb|g.3519Acinetobacter calcoaceticus PHEA-2 (871585.3)871585.03862530102615611165models/Acinetobacter_calcoaceticus_PHEA_2.xml
\n", "

5 rows × 24 columns

\n", "
" ], "text/plain": [ " organism id \\\n", "0 Abiotrophia defectiva ATCC 49176 Abiotrophia_defectiva_ATCC_49176 \n", "1 Acidaminococcus fermentans DSM 20731 Acidaminococcus_fermentans_DSM_20731 \n", "2 Acidaminococcus intestini RyC-MR95 Acidaminococcus_intestini_RyC_MR95 \n", "3 Acidaminococcus sp. D21 Acidaminococcus_sp_D21 \n", "4 Acinetobacter calcoaceticus PHEA-2 Acinetobacter_calcoaceticus_PHEA_2 \n", "\n", " kingdom phylum class order \\\n", "0 Bacteria Firmicutes Bacilli Lactobacillales \n", "1 Bacteria Firmicutes Negativicutes Acidaminococcales \n", "2 Bacteria Firmicutes Negativicutes Selenomonadales \n", "3 Bacteria Firmicutes Negativicutes Selenomonadales \n", "4 Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales \n", "\n", " family genus species \\\n", "0 Aerococcaceae Abiotrophia Abiotrophia defectiva \n", "1 Acidiaminococcaceae Acidaminococcus Acidaminococcus fermentans \n", "2 Acidaminococcaceae Acidaminococcus Acidaminococcus intestini \n", "3 Acidaminococcaceae Acidaminococcus unclassified Acidaminococcus \n", "4 Moraxellaceae Acinetobacter Acinetobacter calcoaceticus \n", "\n", " oxygen_status ... \\\n", "0 Facultative anaerobe ... \n", "1 Obligate anaerobe ... \n", "2 Obligate anaerobe ... \n", "3 Obligate anaerobe ... \n", "4 Aerobe ... \n", "\n", " draft_created platform kbase_genome_id \\\n", "0 07/01/14 ModelSEED NaN \n", "1 04/17/16 Kbase kb|g.2555 \n", "2 08/03/14 ModelSEED NaN \n", "3 06/29/12 ModelSEED NaN \n", "4 04/18/16 Kbase kb|g.3519 \n", "\n", " pubseed_id ncbi_id genome_size \\\n", "0 Abiotrophia defectiva ATCC 49176 (592010.4) 592010.0 2041839 \n", "1 Acidaminococcus fermentans DSM 20731 (591001.3) 591001.0 2329769 \n", "2 Acidaminococcus intestini RyC-MR95 (568816.4) 568816.0 2487765 \n", "3 Acidaminococcus sp. D21 (563191.3) 563191.0 2238973 \n", "4 Acinetobacter calcoaceticus PHEA-2 (871585.3) 871585.0 3862530 \n", "\n", " genes reactions metabolites \\\n", "0 598 1069 840 \n", "1 646 1090 903 \n", "2 599 994 827 \n", "3 598 851 768 \n", "4 1026 1561 1165 \n", "\n", " file \n", "0 models/Abiotrophia_defectiva_ATCC_49176.xml \n", "1 models/Acidaminococcus_fermentans_DSM_20731.xml \n", "2 models/Acidaminococcus_intestini_RyC_MR95.xml \n", "3 models/Acidaminococcus_sp_D21.xml \n", "4 models/Acinetobacter_calcoaceticus_PHEA_2.xml \n", "\n", "[5 rows x 24 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from micom.data import agora\n", "\n", "tax = agora.copy()\n", "tax.file = \"models/\" + tax.file # assuming you have downloaded the AGORA models to the \"models\" folder\n", "tax.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving and loading communities\n", "\n", "Contructing large community models can be slow which is due to performance limitations of the solvers. In essence, adding a single variable/constraint to a model qih 10 variables is much faster than adding it to a model with 10 million variables. Thus, we recommend you save the constructed community in a serialized format afterwards which will be much faster in loading repetitively." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 5/5 [00:00<00:00, 6.46models/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 784 ms, sys: 4.18 ms, total: 788 ms\n", "Wall time: 785 ms\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "%time com = Community(taxonomy)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 15.7 ms, sys: 4.13 ms, total: 19.8 ms\n", "Wall time: 19.4 ms\n" ] } ], "source": [ "%time com.to_pickle(\"community.pickle\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 152 ms, sys: 3.92 ms, total: 156 ms\n", "Wall time: 154 ms\n" ] } ], "source": [ "from micom import load_pickle\n", "%time com = load_pickle(\"community.pickle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see loading the model from the pickle format is much faster than creating it *de novo*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }