Prague Stringology Conference 2020

Simone Faro and Francesco Pio Marino

Reducing Time and Space in Indexed String Matching by Characters Distance Text Sampling

Abstract:
Sampled string matching is an efficient approach to the string matching problem which tries to overcome the prohibitive space requirements of indexed matching, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. Sampled string matching dates back to 1991, however practical solutions to the problem only appeared more recently. They are able to speed up the online searching up to 9 times while using less than 5 % of the text size. In this paper we take into account the problem of indexing sampled texts in order to reduce the space requirements of indexed matching and maintaining its very high practical and theoretical performances. Specifically we present a new efficient indexed string matching approach based on a characters distance sampling. The main idea is to sample the distances between consecutive occurrences of a given set of pivot characters and then to create a suffix array of the sampled text. From our experimental results it turns out that the newly presented approach is able to obtain a searching time gain up to 91 %, when compared to the standard indexed approach, while using less than 15 % of the space needed for the standard suffix array.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference 
Download presentation: Presentation