Prague Stringology Conference 2011

Kalle Karhu

Improving Exact Search of Multiple Patterns From a Compressed Suffix Array

Self-indexes are largely studied and widely applied structures in string matching. However, the exact matching of multiple patterns using self-indexes is a topic that has not been the subject of concentrated study although it is an area that may have direct and indirect applications and uses in fields such as bioinformatics. This paper presents a method of improving the exact search of multiple patterns from a compressed suffix array. The proposed method is able to cut down run-times for the handled patterns by as much as 71.6%. A set of 1000 patterns of length 1000 nucleotides each is found from a text of 50 MB in size 14.0% faster than by searching the patterns using the locate functionality of the compressed suffix array.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference 
Download presentation: Presentation