Prague Stringology Conference 2017

Strahil Ristov, Robert Vaser and Mile Šikić

Trade-offs in Query and Target Indexing for the Selection of Candidates in Protein Homology Searches

We compare two recent similar and complementary indexing methods for fast seed discovery (Ristov, 2016, Vaser et al., 2016). Both methods are based on the principle of counting matches on a diagonal with a goal to find the value and/or position of the best match between two sequences under Hamming distance on alphabet of k-mers, where k can equal 1. The matching k-mers in two sequences are found by scanning one sequence and using the index of the other. Indexing the shorter of the two sequences is easier to perform on-line; however, if the index is constructed off-line on the longer sequence, the number of comparison operation is potentially much smaller. We present the analysis of this effect for different real data sequence lengths in the context of protein search.

Download article: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference