Math 176

Math 176 - Advanced Data Structures
Walking Through Two Inverted Lists

Incrementing inverted list positions: Suppose we have two inverted lists,

< f₁, p₁, f₂ , p₂, f₃, p₃, ... >

and < g₁, q₁, g₂, q₂, g₃, q₃, ... >

indicating that the n-th position of word #1 is in file number f_n at character position p_n, and that the n-th position of word #2 is in file number g_n and position q_n.

Definition. The position < f, p > is before the position < g, q > if either f<g or if f=g and p<q.

As we walk through the inverted lists, we are seeking close occurences of the two words and/or files (documents) in which both words occur. To walk through the lists, we keep two indices n and m. These indicate, that we are currently considering the position <f_n,p_n> of word #1 and position <g_m,q_m> of the second word. At each step, we increment one of n or m. The algorithm to decide which one to increment is based on the next positions in the two lists. Namely,

If position < f_(n+1), p_(n+1) > is before position < g_(m+1), q_(m+1) >, then increment n.
Otherwise, increment m.
Exception: if at the end of one of the lists, increment the other list position.

This will ensure that we do not miss any places where there are close pairs of occurences of the two words. (However, if there are multiple occurences of words in close proximity, we will find only a subset of the possible close pairs.)