Samuel R. Buss, Peter N. Yianilos.
"A Bipartite Matching Approach to Approximate String Comparison and Search."
Download article: postscript or PDF.
Abstract: Approximate string comparison and search
is an important part of applications that range from natural language to the
interpretation of DNA. This paper presents a bipartite weighted graph matching approach to
these problems, based on the authors' linear time matching algorithms Our approach's
tolerance to permutation of symbols or blocks, distinguishes it from the widely used edit
distance and finite state machine methods. A close relationship with the earlier related
`proximity comparison' method is established.
Under the linear cost model, a simple O(1) time per position online algorithm is presented for comparing two strings given a fixed alignment. Heuristics are given for optimal alignment. In the approximate string search problem, one string advances in a fixed direction relative to the other with each time step. We introduce a new online algorithm for this setting which dynamically maintains an optimal bipartite weighted matching.
We discuss the application of our algorithms to natural language text search, including prefilters to improve efficiency, and the use of polygraphic symbols to improve search quality. Our approach is used in the LikeIt text search utility now under development. Its overall design and objectives are summarized.
Back to Sam Buss's publications page.