Prague Stringology Conference 2013

Domenico Cantone and Simone Faro

Improved and Self-Tuned Occurrence Heuristics

Abstract:
In this note we present three efficient variations of the occurrence heuristic, adopted by many exact string matching algorithms and firstly introduced in the well-known Boyer-Moore algorithm. Our first heuristic, called improved-occurrence heuristic, is a simple improvement of the rule introduced by Sunday in his Quick-Search algorithm. Our second heuristic, called worst-occurrence heuristic, achieves its speed-up by selecting the relative position which yields the largest average advancement. Finally, our third heuristic, called jumping-occurrence heuristic, uses two characters for computing the next shift, whose distance allows one to maximize the average advancement. The worst-occurrence and jumping-occurrence heuristics tune their parameters according to the text characters' distribution. Experimental results show that the new proposed heuristics achieve very good results on average, especially in the case of small alphabets.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference 
Download presentation: Presentation