This can be used for a variety of applications the most common ones are:

  • removing sequences from the host

  • removing ribosomal sequences

  • removing contaminants

This function uses minimap2 to align and identify hits and does not require a prebuilt index.

remove_reference(reads, out, reference, alignments = NA, threads = 3)

Arguments

reads

A character vector containing the read files in fastq format. Can be generated using find_read_files.

out

A folder to which to save the filtered fastq files.

reference

Path to a fasta file (can be gzipped) that contains the sequences to filter. Can be a genome or transcripts.

alignments

Whether to keep the alignment. If not NA should be a string indicating the path to the output bam file.

threads

How many threads to use for mapping.

Value

A numeric vector with two entries. The number of sequences after filtering (non-mapped), and the number of removed sequences (mapped).

Examples

NULL
#> NULL