Grammar Compression with Probabilistic Context-Free Grammar
Data Compression Conference (DCC2020),
, pp.386
(2020), [peer-reviewed]
Event Date:
March 24-27, 2020
@ online
Abstract / 概要
We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = T$. Such a grammar is often called a straight-line program (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar G and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.