Multi de Bruijn Sequences

Multi de Bruijn Sequences

Glenn Tesler,
Multi de Bruijn sequences,
Journal of Combinatorics, 2017, 8(3):439-474.
Journal, arXiv preprint

We generalize the notion of a de Bruijn sequence to a “multi de Bruijn sequence”: a cyclic or linear sequence that contains every k-mer over an alphabet of size q exactly m times. For example, over the binary alphabet {0,1}, the cyclic sequence (00010111) and the linear sequence 000101110 each contain two instances of each 2-mer 00,01,10,11. We derive formulas for the number of such sequences. The formulas and derivation generalize classical de Bruijn sequences (the case m=1). We also determine the number of multisets of aperiodic cyclic sequences containing every k-mer exactly m times; for example, the pair of cyclic sequences (00011)(011) contains two instances of each 2-mer listed above. This uses an extension of the Burrows-Wheeler Transform due to Mantaci et al, and generalizes a result by Higgins for the case m=1.

Files and links

Software: Cyclic multi de Bruijn sequences: Cyclic sequences with m copies of each k-mer over an alphabet of size q. Since each cyclic sequence can be represented by multiple rotations, we output the smallest rotation as the representative. E.g., 00110011, 01100110, 11001100, 10011001 are equivalent under rotation, and the smallest is 00110011. On-Line Encyclopedia of Integer Sequences:

Support

This work was supported in part by National Science Foundation grant CCF-1115206.