Kimmo Fredriksson and Szymon Grabowski
Efficient Algorithms for (δ, γ, α)-Matching
Abstract: |
We propose new algorithms for (δ, γ, α)-matching. In this string matching problem we are given a pattern P=p_{0}p_{1}...p_{m-1} and a text T=t_{0}t_{1}...t_{n-1} over some integer alphabet Σ = {0...σ-1}. The pattern symbol p_{i} matches the text symbol t_{j} iff |p_{i}-t_{j}| ≤ δ. The pattern P (δ,γ)-matches some text substring t_{j}...t_{j+m-1} iff for all i it holds that |p_{i}-t_{j+i}| ≤ δ and ∑|p_{i}-t_{j+i}| ≤ γ. Finally, in (δ, γ, α)-matching we also permit at most α length gaps (text substrings) between each matching text symbol. The only known previous algorithm runs in O(mn) time. We give several algorithms that improve the average case up to O(n) for small α, and the worst case to O(min{mn,|M|α}) or O(mn log (γ/w)), where M = {(i,j); |p_{i}-t_{j}| ≤ δ} and w is the number of bits in a machine word. We conclude with experimental results showing that the algorithms are very efficient in practice. |
Download paper: | |||
PostScript | BibTeX reference |
Download presentation: |