Raptor
A fast and space-efficient pre-filter
index

raptor search

Main Parameters

-​-index

The path to the index. For partitioned indices, the suffix _x, where x is a number, must be omitted.

-​-query

File containing query sequences.

Many file types and compressions are supported. Click to show a list.

Supported file extensions are (possibly followed by bz2, gz, or bgzf):

  • embl
  • fasta
  • fa
  • fna
  • ffn
  • faa
  • frn
  • fas
  • fastq
  • fq
  • genbank
  • gb
  • gbk
  • sam

-​-output

The output file name.

  • Format
    ###<text> | Meta-information
    ##<text> | Meta-information
    #<number><tab><filepaths> | Assigns each input file a number. Multiple filepaths are separated by a whitespace
    #QUERY_NAME<tab>USER_BINS | Header for the results
    <query_id><tab>[<number>...] | A line for each query, listing matches in input files, if any. Multiple hits are separated by a comma.
  • Example
    ### Minimiser parameters
    ## Window size = 19
    ## Shape = 1111111111111111111
    ## Shape size (length) = 19
    ## Shape count (number of 1s) = 19
    ### Search parameters
    ## Query file = "/data/query.fq"
    ## Pattern size = 65
    ## Output file = "search.out"
    ## Threads = 1
    ## tau = 0.9999
    ## p_max = 0.4
    ## Percentage threshold = nan
    ## Errors = 0
    ## Cache thresholds = false
    ### Index parameters
    ## Index = "/data/index.hibf"
    ## Index hashes = 2
    ## Index parts = 1
    ## False positive rate = 0.05
    ## Index is HIBF = true
    #0 /data/bin1.fa
    #1 /data/bin2.fa
    #2 /data/bin3.fa
    #3 /data/bin4.fa
    #QUERY_NAME USER_BINS
    query1
    query2 1
    query3 0,1,2

-​-threads

The number of threads to use. Sequences in the query file will be processed in parallel. Negligible effect on RAM usage for unpartitioned indices. Moderate effect for partitioned indices.

-​-quiet

By default, runtime and memory statistics are printed to stderr at the end.

This flag disables this behaviour.

-​-error

The number of allowed errors.

Note
Mutually exclusive with –threshold.

-​-threshold

Ratio of k-mers that need to be found for a hit to occur.

Note
Mutually exclusive with –error.

-​-query_length

The sequence length of a query. Used to determine thresholds. The sequence lengths should have little to no variance.

If not provided:

  • the median of sequence lengths in the query file is used.
  • a warning is emitted if there is a high variance in sequence lengths.
  • an error occurs if any sequence is shorter than the window size.

-​-tau

The higher tau, the lower the threshold.

Note
Has no effect when using --threshold or w == k.

-​-p_max

The higher p_max, the higher the threshold.

Note
Has no effect when using --threshold or w == k.

-​-cache-thresholds

Stores the computed thresholds with a unique name next to the index. In the next search call using this option, the stored thresholds are re-used. Two files are stored:

  • threshold_*.bin: Depends on query_length, window, kmer/shape, errors, and tau.
  • correction_*.bin: Depends on query_length, window, kmer/shape, p_max, and fpr.
Note
Has no effect when using --threshold or w == k.