Math 176 - Advanced Data Structures
Walking Through Two Inverted Lists
Incrementing inverted list positions: Suppose we have two inverted lists,
< f1, p1, f2 , p2, f3, p3, ... >
and < g1, q1, g2, q2, g3, q3, ... >
indicating that the n-th position of word #1 is in file number fn at character position pn, and that the n-th position of word #2 is in file number gn and position qn.
Definition. The position < f, p > is before the position < g, q > if either f<g or if f=g and p<q.
As we walk through the inverted lists, we are seeking close occurences of the two words and/or files (documents) in which both words occur. To walk through the lists, we keep two indices n and m. These indicate, that we are currently considering the position <fn,pn> of word #1 and position <gm,qm> of the second word. At each step, we increment one of n or m. The algorithm to decide which one to increment is based on the next positions in the two lists. Namely,
This will ensure that we do not miss any places where there are close pairs of occurences of the two words. (However, if there are multiple occurences of words in close proximity, we will find only a subset of the possible close pairs.)