Huffman codes P62266


Statement
 

pdf   zip

thehtml

Suppose that n different symbols are used in the writing of a certain language. A simple way to code a text consists in assigning ⌈ log2 n ⌉ bits to each symbol. For instance, consider the five vowels and the three more frequent consonants in Catalan. According to the following table, a “normal” coding of the word LIE would be 011100000:

||
Letter“Normal” codificationRelative frequenceHuffman coding
E0002101
A00119111
S01012.7101
L01111.6100
I10010.6001
R10110.2000
O1108.61101
U1116.31100
||

Now, suppose that we know the probabilities (or relative frequences) of every symbol (for instance, according to the table, out of 100 symbols among the eight chosen, 21 are e’s, 19 are a’s, etcetera). We can obtain a more efficient coding on the average by assigning less bits to frequent letters. According to the table, the Huffman coding of the word LIE would be 10000101, with only eight bits instead of nine.

The construction of a Huffman coding is relatively simple: Repeatedly, choose the two less frequent symbols, arbitrarily assign a bit (0 or 1) to each one to make them different, and consider them both a unique symbol from now on. Stop when only one symbol remains.

(To see the tree corresponding to the Huffman algorithm for the eight characters of the table above, consult the pdf or ps version of this statement.)

In this example, the average length of the Huffman coding of a symbol is only

0.21*2 + 0.19*3 + 0.127*3 + 0.116*3 + 0.106*3 + 0.102*3 + 0.086*4 + 0.063*4 ≃ 2.9390 ,

smaller than the average length of a “normal” coding, which obviously is 3. For more biased probability distributions, much more significant savings can be obtained.

Write a program that reads the relative frequences of some letters, and computes the average length of their Huffman coding.

Input

Input consists of the number of symbols n ≥ 2, followed by the relative frequences of the n symbols. These frequences are all non-negative, and their sum is 100.

Output

Print with four digits after the decimal point the expected number of bits per letter.

Public test cases
  • Input

    8
    19 21 10.6 8.6 6.3 12.7 11.6 10.2
    

    Output

    expected number of bits per letter: 2.9390
    
  • Input

    4
    5 2 3 90
    

    Output

    expected number of bits per letter: 1.1500
    
  • Information
    Author
    Salvador Roura
    Language
    English
    Translator
    Carlos Molina
    Original language
    Catalan
    Other languages
    Catalan
    Official solutions
    C++ Python
    User solutions
    C++ Python