The Prague Stringology Conference 2002

Frantisek Franek, William F. Smyth and Xiangdong Xiao

A Note on Crochemore's Repetitions Algorithm a Fast Space-Efficient Approach

Abstract:
The space requirement of Crochemore's repetitions algorithm is generally estimated to be about 20MN bytes of memory, where $N$ is the length of the input string and M the number of bytes required to store the integer N. The same algorithm can also be used in other contexts, for instance to compute the suffix tree of the input string in O(N log N) time for the purpose of data compression. In such contexts the large space requirement of the algorithm is a significant drawback. There are of course several newer space-efficient algorithms with the same time complexity that can compute suffix trees or arrays. However, in actual implementations, these algorithms may not be faster than Crochemore's. Therefore, we consider it interesting enough to describe a new approach based on the same mathematical principles and observations that were put forth in Crochemore's original paper, but whose space requirement is 10MN bytes. Additional advantages of the approach are the ease with which it can be implemented in C/C++ (as we have done) and the apparent speed of such an implementation in comparison to other implementations of the original algorithm.

Download paper: Article in PostScript Article in PDF
 PostScript   PDF