Scalability and robustness are not an issue when compression is applied for massive data storage, in the context of distributed computing.
Speeding up on-the-fly compression for data transmission is more controversial. In such case, a compression technique merging together
an adaptive and a non-adaptive approach has to be considered.
A practical implementation of LZW (Lempel, Ziv and Welch) compression, called LZC (C indicates the Unix command compress), has
this characteristic. The non-adaptive phases work with bounded size prefix dictionaries built by LZW factorizations during the adaptive ones.
In order to improve the compression effectiveness, we suggest to apply LZMW (Lempel, Ziv, Miller and Wegman) factorization
to LZC compression (LZCMW) during the adaptive phases since it builds better dictionaries than LZW. The LZMW heuristic was originally thought
with a dictionary bounded by the least recently used strategy. We introduce the LZCMW heuristic in order to have non-adaptive phases.
All the heuristics mentioned above employ the greedy approach. We show, finally, a worst case analysis of the greedy approach with respect to
the optimal solution decodable by the LZC decompressor. Such analysis suggests parallelization of on the fly compression is not suitable
for highly disseminated data since the non-adaptive phases are too far from optimal.
|