This can be used for a variety of applications the most common ones are:
removing sequences from the host
removing ribosomal sequences
removing contaminants
This function uses minimap2 to align and identify hits and does not require a prebuilt index.
remove_reference(reads, out, reference, alignments = NA, threads = 3)
reads | A character vector containing the read files in fastq format.
Can be generated using |
---|---|
out | A folder to which to save the filtered fastq files. |
reference | Path to a fasta file (can be gzipped) that contains the sequences to filter. Can be a genome or transcripts. |
alignments | Whether to keep the alignment. If not NA should be a string indicating the path to the output bam file. |
threads | How many threads to use for mapping. |
A numeric vector with two entries. The number of sequences after filtering (non-mapped), and the number of removed sequences (mapped).
NULL#> NULL