Golomb Rice coding

Contents

Using Huffman coding for exponential probability distributions (finite alphabet case)

Start with an alphabet of 256 symbols. Define the set of data symbols and the probability associated with each element.

symbols = 0:255;
theta = 0.9;
p = (1-theta)*theta.^symbols ;
figure(1),semilogy(p)
title([' theta = ' num2str(theta)]),xlabel('integer i'),ylabel('probability p(i)')

First create the normal Huffman code dictionary for 255 symbols

dict = huffmandict(symbols,p);
CLs = [];
% Extract the codeword lengths
for i = 1:length(symbols)
    CLs(i) = length( dict{i,2} );
end
figure(2),stem(CLs(1:100)),title('First 100 codelengths')

Example for symbol i=41

dict{42,1} % the symbol
dict{42,2} % its codeword, designed by Hufman function
ans =

    41


ans =

     1     0     1     1     1     1     1     1     0

Details on the construction of Golomb codes for an infinite alphabet

Find the parameter ell of the distribution p = (1-theta)*theta.^symbols; Candidate "ell" parameters for the given theta

ell = 0:10;
% The optimal ell is that where the string theta.^ell +theta.^(ell-1)  and the string theta.^ell +theta.^(ell+1)
% are  straddling the value 1
high= theta.^ell +theta.^(ell-1);
low = theta.^ell +theta.^(ell+1);
ell_opt = find( (high >1) & (low <=1)) - 1
ell_opt =

     7

Construct the Huffman code for the extended source with ell_opt symbols

p_red = (1-theta)*theta.^(0: (ell_opt-1) )/(1-theta^ell_opt)
dict1 = huffmandict(0: (ell_opt-1),p_red);
p_red =

    0.1917    0.1725    0.1553    0.1397    0.1258    0.1132    0.1019

CLs_red = [];
% Extract the codeword lengths
for i = 1:length(0: (ell_opt-1))
    CLs_red(i) = length( dict1{i,2} );
end
figure(3),plot(0:ell_opt-1,CLs_red,'or',[-1 0],[0 4],'w.') ,title('Codelengths for the Huffman code of the reduced source')
 xlabel('reduced symbol i'), ylabel('length of codeword for i')

Example: Encode the integer i=41 with Golomb code

represent i by two integers, (first_part,second_part) such that i= first_part*ell_opt + second_part

i = 41
first_part = floor(i/ell_opt)
second_part = i -first_part*ell_opt
[41 5*7+6]
i =

    41


first_part =

     5


second_part =

     6


ans =

    41    41

The first_part = 5 belongs to {0,...,ell_opt-1} and is encoded by dict1

dict1{5+1,1}
dict1{5+1,2}
ans =

     5


ans =

     1     0     0

The second_part= 6 is encoded in unary, as a five repetition of 1, followed by a 0

Therefore 41 is encoded by Golomb code as

codewordG = [[1     0     0] [ 1 1 1 1 1 0] ]
% while it is encoded by Huffman code as
dict{42,2}
codewordG =

     1     0     0     1     1     1     1     1     0


ans =

     1     0     1     1     1     1     1     1     0

Due to non-unicity of Huffman coding, the two codes of 41 are different, but their codelength are the same

[length(codewordG ) length( dict{42,2} )]
ans =

     9     9

Checking the codeword length for all symbols 0...255

obtained by Huffman and by Golomb codes

for i=0:255
    CLH(i+1) = length(dict{i+1,2});
    first_part = floor(i/ell_opt);
    second_part = i -first_part*ell_opt;

    CL_GR(i+1) =  first_part +1+   length(dict1{second_part+1,2});
end
% the codeword length are for most symbols the same
[CLH; CL_GR]
ans =

  Columns 1 through 13

     3     4     4     4     4     4     4     4     5     5     5     5     5
     3     4     4     4     4     4     4     4     5     5     5     5     5

  Columns 14 through 26

     5     5     6     6     6     6     6     6     6     7     7     7     7
     5     5     6     6     6     6     6     6     6     7     7     7     7

  Columns 27 through 39

     7     7     7     8     8     8     8     8     8     8     9     9     9
     7     7     7     8     8     8     8     8     8     8     9     9     9

  Columns 40 through 52

     9     9     9     9    10    10    10    10    10    10    10    11    11
     9     9     9     9    10    10    10    10    10    10    10    11    11

  Columns 53 through 65

    11    11    11    11    11    12    12    12    12    12    12    12    13
    11    11    11    11    11    12    12    12    12    12    12    12    13

  Columns 66 through 78

    13    13    13    13    13    13    14    14    14    14    14    14    14
    13    13    13    13    13    13    14    14    14    14    14    14    14

  Columns 79 through 91

    15    15    15    15    15    15    15    16    16    16    16    16    16
    15    15    15    15    15    15    15    16    16    16    16    16    16

  Columns 92 through 104

    16    17    17    17    17    17    17    17    18    18    18    18    18
    16    17    17    17    17    17    17    17    18    18    18    18    18

  Columns 105 through 117

    18    18    19    19    19    19    19    19    19    20    20    20    20
    18    18    19    19    19    19    19    19    19    20    20    20    20

  Columns 118 through 130

    20    20    20    21    21    21    21    21    21    21    22    22    22
    20    20    20    21    21    21    21    21    21    21    22    22    22

  Columns 131 through 143

    22    22    22    22    23    23    23    23    23    23    23    24    24
    22    22    22    22    23    23    23    23    23    23    23    24    24

  Columns 144 through 156

    24    24    24    24    24    25    25    25    25    25    25    25    26
    24    24    24    24    24    25    25    25    25    25    25    25    26

  Columns 157 through 169

    26    26    26    26    26    26    27    27    27    27    27    27    27
    26    26    26    26    26    26    27    27    27    27    27    27    27

  Columns 170 through 182

    28    28    28    28    28    28    28    29    29    29    29    29    29
    28    28    28    28    28    28    28    29    29    29    29    29    29

  Columns 183 through 195

    29    30    30    30    30    30    30    30    31    31    31    31    31
    29    30    30    30    30    30    30    30    31    31    31    31    31

  Columns 196 through 208

    31    31    32    32    32    32    32    32    32    33    33    33    33
    31    31    32    32    32    32    32    32    32    33    33    33    33

  Columns 209 through 221

    33    33    33    34    34    34    34    34    34    34    34    35    35
    33    33    33    34    34    34    34    34    34    34    35    35    35

  Columns 222 through 234

    35    35    35    35    36    36    36    36    36    36    37    37    37
    35    35    35    35    36    36    36    36    36    36    36    37    37

  Columns 235 through 247

    37    37    37    38    38    38    38    38    38    38    39    39    39
    37    37    37    37    37    38    38    38    38    38    38    38    39

  Columns 248 through 256

    39    39    39    39    39    40    40    40    40
    39    39    39    39    39    39    40    40    40

except these 7 symbols (from those with very small probability)

find(CLH~=CL_GR)
ans =

   219   232   238   239   245   246   253