The Prague Stringology Conference 2005

Amihood Amir

Asynchronous Pattern Matching - Metrics

Abstract:
Traditional Approximate Pattern Matching (e.g. Hamming distance errors, edit distance errors) assumes that various types of errors may occur to the data, but an implicit assumption is that the order of the data remains unchanged. Over the years, some applications identified types of "errors" were the data remains correct but its order is compromised. The earliest example is the "swap" error motivated by a common typing error. Other widely known examples such as transpositions, reversals and interchanges are motivated by biology. We propose that it is time to formally split the concept of "errors in data" and "errors in address" since they present different algorithmic challenges solved by different techniques. The "errors in address" model, which we call asynchronous pattern matching, since the data does not arrive in a synchronous sequential manner, is rich in problems not addresses hitherto. We will consider some reasonable metrics for asynchronous pattern matching, such as the number of inversions, or the number of generalized swaps, and show some efficient algorithms for these problems. As expected, the techniques needed to solve the problems are not taken from the standard pattern matching "toolkit".

Download paper: Article in PostScript Article in PDF
 PostScript   PDF