Manolis Christodoulakis and Gerhard Brey
Edit Distance with Single-Symbol Combinations and Splits
Abstract: |
In this article we introduce new variants of the edit distance string similarity measure, where apart from the traditional insertion, deletion and substitution operations, two new operations are supported. The first one is called a combination and it allows two or more symbols from one string, to be ``matched'' against one symbol of the other. The dual of a combination, is the operation of a split, where one symbol from the first string is broken down into a sequence of two or more other symbols, that can then be matched against an equal number of symbols from the second string. The notions of combining and splitting symbols can be defined in a variety of ways, depending on how the application in hand defines similarity. Here we introduce three different possible definitions, and we provide an algorithm that deals with one of them. Our algorithm requires O(L) time for preprocessing, and O(m n k) time for computing the edit distance, where L is the total length of all the valid combinations/splits, and k is an upper bound on the number of valid splits of any single symbol. |
Download paper: | |||
PostScript | BibTeX reference |
Download presentation: |