conference

Subsequence Matching and LCS with Segment Number Constraints

Yuki Yonemoto, Takuya Mieno, Shunsuke Inenaga, Ryo Yoshinaka, Ayumi Shinohara

The 14th International Conference on Algorithms and Complexity (CIAC 2025), Lecture Notes in Computer Science Vol.15680, pp.136-150 (2025), [peer-reviewed]
Event Date: June 10-12, 2025

Abstract / 概要

The longest common subsequence (LCS) is a fundamental problem in string processing which has numerous algorithmic studies, extensions, and applicaions. A sequence $u_1,\ldots,j_f$ of $f$ strings is said to be an ($f$-)segmentation of a string $P$ if $P = u_1 \cdots u_f$. Li et al. [BIBM 2022] proposed a new variant of the LCS problem for given strings $T_1$, $T_2$ and an integer $f$, which we hereby call the segmental LCS problem (SegLCS), of finding (the length of) a longest string $P$ that has an $f$-segmentation which can be embedded into both $T_1$ and $T_2$. Li et al. [IJTCS-FAW 2024] gave a dynamic programming solution that solves SegLCS in $O(f n_1 n_2)$ time with $O(f n_1 + n_2)$ space, where $n_1 = |T_1|$, $n_2 = |T_2|$ and $n_1 \leq n_2$. Recently, Banerjee et al. [ESA 2024] presented an algorithm which, for a constant $f \geq 3$, solves SegLCS in $\tilde{O}((n_1 n_2)^{1 - (1/3)^{f-2}})$ time ($\tilde{O}(\cdot)$ suppresses polylogarithmic factors). In this paper, we deal with SegLCS as well as the problem of segmental subsequence pattern matching, SegE, that asks to determine whether a pattern $P$ of length $m$ has an $f$-segmentation that can be embedded into a text $T$ of length $n$. When $f = 1$, this is equivalent to substring matching c, and when $f = |P|$, this is equivalent to subsequence matching. Our focus in this article is the case of general values of $f$, and our main contributions are threefold: (1) $O((nm)^{1-\epsilon})$-time conditional lower bound for SegE under the strong exponential-time hypothesis (SETH), for any constant $\epsilon > 0$. (2) $O(mn)$-tie algorithm for SegE. (3) $O(f n_2(n_1 - \ell +1))$-time algorithm for Se cgLCS where $\ell$ is the solution length.