vignettes/01_preprocessing.Rmd
01_preprocessing.Rmd
For filtering and trimming of the raw reads we usually use the DADA2 functions but wrap them in a reproducible workflow step.
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## Also loading:
## - dada2=1.12.0
## - data.table=1.12.2
## - ggplot2=3.1.1
## - magrittr=1.5
## - phyloseq=1.28.0
## - ShortRead=1.42.0
## - yaml=2.2.0
## Found tools:
## - minimap2=2.17-r941
## - slimm=0.3.4
## - samtools=1.9
##
## Attaching package: 'mbtools'
## The following object is masked _by_ 'package:BiocGenerics':
##
## normalize
We will again use our helper function to get a list of sequencing files.
## forward
## 1: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D0_S188_L001_R1_001.fastq.gz
## 2: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D1_S189_L001_R1_001.fastq.gz
## 3: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D2_S190_L001_R1_001.fastq.gz
## 4: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D3_S191_L001_R1_001.fastq.gz
## 5: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/Mock_S280_L001_R1_001.fastq.gz
## reverse
## 1: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D0_S188_L001_R2_001.fastq.gz
## 2: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D1_S189_L001_R2_001.fastq.gz
## 3: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D2_S190_L001_R2_001.fastq.gz
## 4: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D3_S191_L001_R2_001.fastq.gz
## 5: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/Mock_S280_L001_R2_001.fastq.gz
## id injection_order lane
## 1: F3D0 188 1
## 2: F3D1 189 1
## 3: F3D2 190 1
## 4: F3D3 191 1
## 5: Mock 280 1
All mbtools
workflow step come with corresponding config_*
that returns an example/default configuration. Changes can be done a-posteriori or by directly passing in the parameters. We will specify a temporary directory as storage point for the preprocessed data and truncate the forward reads to 240 bp and the reverse reads to 200 bp (based on our previous quality assessment).
## $threads
## [1] 1
##
## $out_dir
## [1] "/var/folders/55/dv0p21y96g1cq84sr1zd3kym0000gr/T//RtmpXCaPpi"
##
## $trimLeft
## [1] 10
##
## $truncLen
## [1] 240 200
##
## $maxEE
## [1] 2
##
## $truncQ
## [1] 2
##
## $maxN
## [1] 0
##
## attr(,"class")
## [1] "config"
We can see that there are some more parameters that we could specify.
We can now run our preprocessing step.
## INFO [2019-05-28 16:35:48] Preprocessing reads for 5 paired-end samples...
## INFO [2019-05-28 16:35:55] 4.03e+04/4.48e+04 (89.75%) reads passed preprocessing.
This will report the percentage of passed reads on the logging interface but you can also inspect that in detail by
## raw preprocessed id
## 1: 7793 6992 F3D0
## 2: 5869 5210 F3D1
## 3: 19620 17706 F3D2
## 4: 6758 6114 F3D3
## 5: 4779 4280 Mock