Subsequence Matching and LCS with Segment Number Constraints
The 14th International Conference on Algorithms and Complexity (CIAC 2025),
Lecture Notes in Computer Science Vol.15680, pp.136-150
(2025), [peer-reviewed]
Event Date:
June 10-12, 2025
Abstract / 概要
The longest common subsequence (LCS) is a fundamental problem in string processing which has numerous algorithmic studies, extensions, and applicaions. A sequence $u_1,\ldots,j_f$ of $f$ strings is said to be an ($f$-)segmentation of a string $P$ if $P = u_1 \cdots u_f$. Li et al. [BIBM 2022] proposed a new variant of the LCS problem for given strings $T_1$, $T_2$ and an integer $f$, which we hereby call the segmental LCS problem (SegLCS), of finding (the length of) a longest string $P$ that has an $f$-segmentation which can be embedded into both $T_1$ and $T_2$. Li et al. [IJTCS-FAW 2024] gave a dynamic programming solution that solves SegLCS in $O(f n_1 n_2)$ time with $O(f n_1 + n_2)$ space, where $n_1 = |T_1|$, $n_2 = |T_2|$ and $n_1 \leq n_2$. Recently, Banerjee et al. [ESA 2024] presented an algorithm which, for a constant $f \geq 3$, solves SegLCS in $\tilde{O}((n_1 n_2)^{1 - (1/3)^{f-2}})$ time ($\tilde{O}(\cdot)$ suppresses polylogarithmic factors). In this paper, we deal with SegLCS as well as the problem of segmental subsequence pattern matching, SegE, that asks to determine whether a pattern $P$ of length $m$ has an $f$-segmentation that can be embedded into a text $T$ of length $n$. When $f = 1$, this is equivalent to substring matching c, and when $f = |P|$, this is equivalent to subsequence matching. Our focus in this article is the case of general values of $f$, and our main contributions are threefold: (1) $O((nm)^{1-\epsilon})$-time conditional lower bound for SegE under the strong exponential-time hypothesis (SETH), for any constant $\epsilon > 0$. (2) $O(mn)$-tie algorithm for SegE. (3) $O(f n_2(n_1 - \ell +1))$-time algorithm for Se cgLCS where $\ell$ is the solution length.