The Prague Stringology Conference 2003

Veli Makinen, Gonzalo Navarro and Esko Ukkonen

Matching Numeric Strings under Noise

Abstract:
Numeric string is a sequence of symbols from an alphabet Σ subset of U where U$ is some numerical universe closed under addition and subtraction. Given two numeric strings A=a1 ... am and B=b1 ... bn and a distance function d(A,B) that gives the score of the best (partial) matching of A and B, the transposition invariant distance is mint in U{d(A+t,B)}, where A + t=(a1+t)(a2+t)...(am+t). The corresponding matching problem is to find occurrences j of A in B where d(A+t,Bj'...j) is smaller than some given threshold and Bj'...j is a substring of B. In this paper, we give efficient algorithms for matching numeric strings - with and without transposition invariance - under noise; we consider distance functions d(A,B) such that symbols a \in A and b\in B can be matched if |b-a|≤ δ, or the κ largest differences |b-a| can be discarded.

Download paper: Article in PostScript Article in PDF
 PostScript   PDF