Shannon-Fano coding
Main features
- statistic compression method
- two-pass compression method
- semi-adaptive compression method
- asymetric compression method
- the result compression is worse in comparison with Huffman coding
- optimality of output is not guaranteed
Time and space complexities
Time complexity of algorithm: O(n + |Σ| * log|Σ|)
Space complexity of algorithm: O(|Σ|)
From description of the algorithm we can compute time and space complexity by dividing the problem into three phases. Time complexity of the first phase of computing counts of symbols appearance is linearly dependent on length of the input text. The next phase is ordering symbols, which is implementation dependent. The last phase is generating of the output. This also linearly depends on length of the input text. Space complexity grows linearly with the count of different symbols in the input string.
History
This method dates from the year 1949. It was published by Claude Elwood Shannon (he is designated as the father of theory of information) with Warren Weaver and by Robert Mario Fano independently.
Description
As it can be seen in pseudocode of this algorithm, there are two passes through an input data. Based on frequency of appearance of symbols in the input text, codes for each symbol are generated. Using a method from top to bottom, the prefix code of variable length is generated. First, symbols are ordered by counts of their appearance. Next, list of symbols is recursively divided into two parts of almost same frequency of symbols occurrence.
First, count of different symbols is written to the output. Next, the encoded tree is written to the output. This tree is constructed from parts of divided list of symbols. Left branches of the tree are marked by 1, right branches by 0. This numbers are created by walk through the tree from root and from left to right. Only necessary numbers are written. Next, Fibonacci code of count of leaves is determined and written to the output. Then the coding of the input begins.
Pseudo-code
1: begin
2: count source units
3: sort source units to non-decreasing order
4: SF-SplitS
5: output(count of symbols, encoded tree, symbols)
6: write output
7: end
8:
9: procedure SF-Split(S)
10: begin
11: if (|S|>1) then
12: begin
13: divide S to S1 and S2 with about same count of units
14: add 1 to codes in S1
15: add 0 to codes in S2
16: SF-Split(S1)
17: SF-Split(S2)
18: end
19: end
References
- Shannon-Fano coding. Wikipedia, the free encyclopedia
- Claude Elwood Shannon. Wikipedia, the free encyclopedia.
- C. E. Shannon: A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, July, October, 1948.
- C. E. Shannon: Prediction and Entropy of Printed English. The Bell System Technical Journal, Vol. 30, 1951.
- C. E. Shannon: Communication Theory of Secrecy Systems. The Bell System Technical Journal, Vol. 28, 1949.