Prague Stringology Conference 2025

Hamed Hasibi, Mhaskar Neerja and William F. Smyth

Approximate Longest Common Substring of Multiple Strings: Experimental Evaluation

Abstract:
Determining an \emph{Approximate Longest Common Substring} (ALCS) among a collection of strings \(\s{S} = \{s_1, s_2, \dots, s_m\}\), \(m \ge 2\), is especially significant in computational biology — for instance, when tracking related mutations across several genomic sequences. To address this, the Restricted $k$-$t$ Longest Common Substring ($Rkt$-LCS) problem was introduced by Hasibi \emph{et al.} in~\cite{hasibi2025approximateLCS}, along with several efficient solutions under various distance metrics. In this paper, we describe and evaluate two parallel strategies for computing the $Rkt$-LCS of a set \s{S} of strings under the Hamming distance metric. The first strategy parallelizes the computation of the core data structure introduced in \cite{hasibi2025approximateLCS}, while the second makes use of a Graphics Processing Unit (GPU) that we demonstrate executes over a hundred times faster than its CPU counterpart.

Download paper: Article in PostScript Article in PDF BibTeX Reference
 PostScript   PDF   BibTeX reference