mbtools
has support for aligning shotgun metagenomic reads. Before proceeding we recommend you preprocess the reads first as described in an earlier vignette.
We use minimap2 for everything since it performs as good as other aligners but does not require explicit building of the reference. This way your reference database can just be a (compressed) fasta file.
As example data we will use 3 samples generated with the polyester read sampler from a list of 10 reference genomes in equal abundances.
Let’s create our file list for the example data and reference database:
fi <- system.file("extdata/shotgun", package = "mbtools") %>%
find_read_files()
ref <- system.file("extdata/genomes/zymo_mock.fna.gz",
package = "mbtools")
Which are 3 paired-end files.
As always we will need A config object.
## $reference
## [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/genomes/zymo_mock.fna.gz"
##
## $build_index
## [1] FALSE
##
## $threads
## [1] 3
##
## $alignment_dir
## [1] "alignments"
##
## $max_hits
## [1] 100
##
## $use_existing
## [1] FALSE
##
## $limited_memory
## [1] FALSE
##
## attr(,"class")
## [1] "config"
This will be sufficient to align reads. As always the first argument can also be an artifact from quality_control
or preprocess
.
## INFO [2019-05-28 16:44:18] Aligning 3 samples on 3 threads. Keeping up to 100 secondary alignments.
## INFO [2019-05-28 16:44:20] Finished aligning even1.
## INFO [2019-05-28 16:44:23] Finished aligning even2.
## INFO [2019-05-28 16:44:25] Finished aligning even3.
You will get an output artifact that logs the created alignments…
## id alignment success
## 1: even1 alignments/even1.bam TRUE
## 2: even2 alignments/even2.bam TRUE
## 3: even3 alignments/even3.bam TRUE
…the size of all the alignments on disk…
## 2.4 Mb
…and the logs in case something goes wrong.
## [M::mm_idx_gen::1.824*1.17] collected minimizers
## [M::mm_idx_gen::2.019*1.34] sorted minimizers
## [M::main::2.020*1.34] loaded/built the index for 10 target sequence(s)
## [M::mm_mapopt_update::2.020*1.34] mid_occ = 1000
## [M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 10
## [M::mm_idx_stat::2.102*1.33] distinct minimizers: 10112254 (98.45% are singletons); average occurrences: 1.040; average spacing: 5.997
## [M::worker_pipeline::2.234*1.35] mapped 12598 sequences
## [M::main] Version: 2.17-r941
## [M::main] CMD: minimap2 -acx sr -t 3 --secondary=yes -N 100 -I 100G /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/genomes/zymo_mock.fna.gz /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/shotgun/even1_S1_L001_R1_001.fasta.gz /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/shotgun/even1_S1_L001_R2_001.fasta.gz
## [M::main] Real time: 2.307 sec; CPU: 3.088 sec; Peak RSS: 0.476 GB