LZ-78

Main features

dictionary compression method
one-pass compression method
adaptive compression method
symetric compression method
the output of the algorithm is a collection of pairs (i,a) - i is an index into the dictionary and a is the next symbol

LZ78 has high requirements on space, because the dictionary can occupy the whole free memory. Several ways exist to bypass this problem. If we run out of memory, we can freeze the dictionary, or delete the whole dictionary and begin to make new one. There are also other methods. In the UNIX utility compress this problem is solved by the freezing of the dictionary and by monitoring the compression ratio. If this ratio falls below the predefined treshold, the entire dictionary is deleted and a new one is being created.

History

LZ78 was published by Abraham Lempel and Jacob Ziv in 1978. Though initially popular, the popularity of LZ78 later dampened, possibly because for the first few decades after it was introduced, parts of LZ78 were patent encumbered in the United States. The most popular modification of LZ78 is LZW made by Terry Welche.

Description

LZ78-based schemes work by entering phrases into a dictionary and then, when a repeat occurrence of that particular phrase is found, outputting the dictionary index instead of the phrase.

Every step LZ78 will send a pair (i,a) to the output, where i is an index of the phrase into the dictionary and a is the next symbol following immediately after the found phrase. The dictionary is represented like the trie with numbered nodes. If we go from the root to a certain node, we will get phrase from the input text.

In each step we look for the longest phrase in dictionary, that would correspond to the unprocessed part of the input text. Index of this phrase together with the symbol, which follows the found part in input text, are then send to the output. The old phrase extended by the new symbol is then put into dictionary. This new phrase is numbered by the smallest possible number.

The coding will start with tree, that has only one node, which represents empty string.

Pseudo-code


 1:  begin
 2:     initialize a dictionary by empty phrase P
 3:     while (not EOF) do
 4:      begin
 5:        readSymbol(X)
 6:        if (F.X> is in the dictionary) then
 7:           F = F.X   
 8:        else 
 9:         begin
10:           output(pointer(F),X)
11:           encode X to the dictionary
12:           initialize phrase F by empty character
13:         end
14:      end
15:  end  
16:  
17:  DECODING 
18:  begin
19:     init dictionary by an empty phrase
20:     while (not EOF) do
21:      begin
22:        read pair of index and character (i,X) from input
23:        put new phrase phrase(i).X into distionary
24:        generate phrase to the output 
25:      end
26:  end

References

LZ78. Wikipedia, the free encyclopedia.
Arkadi Kagan: Lempel-Ziv Compressions - LZ78 viewed from the programmer's point of view.
Jacob Ziv and Abraham Lempel: Compression of Individual Sequences Via Variable-Rate Coding. IEEE Transactions on Information Theory, September 1978.
LZ77/LZSS and derivatives area - list of derivatives LZ77 algorithm libraries, papers and sources. Compression Links Info.