conference

Grammar Compression with Probabilistic Context-Free Grammar

Hiroaki Naganuma, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara, Naoki Kobayashi

Data Compression Conference (DCC2020), , pp.386 (2020), [peer-reviewed]
Event Date: March 24-27, 2020 @ online

開催地: online

online

DOI External Link

Abstract / 概要

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = T$. Such a grammar is often called a straight-line program (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar G and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

DOI: 10.1109/dcc47342.2020.00093