conference

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, Ayumi Shinohara

ICALP2024, LIPIcs Vol.297, pp.89:1-89:19 (2024), [peer-reviewed]
Event Date: July 8-12, 2024

Abstract / 概要

A parameterized string (p-string) is a string over an alphabet $(\Sigma_s \cup \Sigma_p)$, where $\Sigma_s$ and Σ_p are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $\Sigma_p$ to every occurrence of p-symbols in $x$. The indexing problem for p-matching is to preprocess a p-string $T$ of length $n$ so that we can efficiently find the occurrences of substrings of $T$ that p-match with a given pattern. Let $\sigma_s$ and respectively $\sigma_p$ be the numbers of distinct s-symbols and p-symbols that appear in $T$ and $\sigma = \sigma_s + \sigma_p$. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed parameterized BWTs (pBWTs) to design the first compact index for p-matching, and posed an open problem on how to construct the pBWT-based index in compact space, i.e., in $O(n \lg |\Sigma_s \cup \Sigma_p|)$ bits of space. Hashimoto et al. [SPIRE 2022] showed how to construct the pBWT for $T$, under the assumption that $\Sigma_s \cup \Sigma_p = [0..O(σ)]$, in $O(n \lg \sigma)$ bits of space and $O(n \frac{\sigma_p \lg{n}}{\lg \lg{n}})$ time in an online manner while reading the symbols of $T$ from right to left. In this paper, we refine Hashimoto et al.’s algorithm to work in $O(n \lg{\sigma})$ bits of space and $O(n \frac{\lg{\sigma_p} \lg{n}}{\lg\lg{n}})$ time in a more general assumption that $\Sigma_s \cup \Sigma_p = [0..n^{O(1)}]$. Our result has an immediate application to constructing parameterized suffix arrays in $O(n \frac{\lg{σ_p} \lg{n}}{\lg\lg{n}})$ time and $O(n \lg{\sigma})$ bits of working space. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.