journal

Fully-Online Suffix Tree and Directed Acyclic Word Graph Construction for Multiple Texts

Takuya Takagi, Shunsuke Inenaga, Hiroki Arimura, Dany Breslauer, Diptarama Hendrian

Algorithmica Vol.82, pp.1346-1377 (2020), [peer-reviewed]

Abstract / 概要

We consider the construction of the suffix tree and the directed acyclic word graph (DAWG) indexing data structures for a collection $\mathcal{T}$ of texts, where a new symbol may be appended to any text in $\mathcal{T} = {T_1, \ldots, T_{K} }$, at any time. This fully-online scenario, which arises in dynamically indexing multi-sensor data, is a natural generalization of the long solved semi-online text indexing problem, where texts $T_1, \ldots, T_{K}$ are permanently fixed before the next text $T_{k_1}$ is processed for each $k$ $(1 \leq k < K)$. We first show that a direct application of Weiner’s right-to-left online construction for the suffix tree of a single text to fully-online multiple texts requires superlinear time. This also means that Blumer et al.’s left-to-right online construction for the DAWG of a single text requires superlinear time in the fully-online setting. We then present our fully-online versions of these algorithms that run in $O(N \log{\sigma})$ time and $O(N)$ space, where $N$ is the total length of the texts in $\mathcal{T}$ and $\sigma$ is their alphabet size. We then show how to extend Ukkonen’s left-to-right online suffix tree construction to fully-online multiple strings, with the aid of Weiner’s suffix tree for the reversed texts.