Golomb Rice coding

Using Huffman coding for exponential probability distributions (finite alphabet case)

Start with an alphabet of 256 symbols. Define the set of data symbols and the probability associated with each element.

```symbols = 0:255;
theta = 0.9;
p = (1-theta)*theta.^symbols ;
figure(1),semilogy(p)
title([' theta = ' num2str(theta)]),xlabel('integer i'),ylabel('probability p(i)')
```

First create the normal Huffman code dictionary for 255 symbols

```dict = huffmandict(symbols,p);
```
```CLs = [];
% Extract the codeword lengths
for i = 1:length(symbols)
CLs(i) = length( dict{i,2} );
end
figure(2),stem(CLs(1:100)),title('First 100 codelengths')
```

Example for symbol i=41

```dict{42,1} % the symbol
dict{42,2} % its codeword, designed by Hufman function
```
```ans =

41

ans =

1     0     1     1     1     1     1     1     0

```

Details on the construction of Golomb codes for an infinite alphabet

Find the parameter ell of the distribution p = (1-theta)*theta.^symbols; Candidate "ell" parameters for the given theta

```ell = 0:10;
% The optimal ell is that where the string theta.^ell +theta.^(ell-1)  and the string theta.^ell +theta.^(ell+1)
% are  straddling the value 1
high= theta.^ell +theta.^(ell-1);
low = theta.^ell +theta.^(ell+1);
ell_opt = find( (high >1) & (low <=1)) - 1
```
```ell_opt =

7

```

Construct the Huffman code for the extended source with ell_opt symbols

```p_red = (1-theta)*theta.^(0: (ell_opt-1) )/(1-theta^ell_opt)
dict1 = huffmandict(0: (ell_opt-1),p_red);
```
```p_red =

0.1917    0.1725    0.1553    0.1397    0.1258    0.1132    0.1019

```
```CLs_red = [];
% Extract the codeword lengths
for i = 1:length(0: (ell_opt-1))
CLs_red(i) = length( dict1{i,2} );
end
figure(3),plot(0:ell_opt-1,CLs_red,'or',[-1 0],[0 4],'w.') ,title('Codelengths for the Huffman code of the reduced source')
xlabel('reduced symbol i'), ylabel('length of codeword for i')
```

Example: Encode the integer i=41 with Golomb code

represent i by two integers, (first_part,second_part) such that i= first_part*ell_opt + second_part

```i = 41
first_part = floor(i/ell_opt)
second_part = i -first_part*ell_opt
[41 5*7+6]
```
```i =

41

first_part =

5

second_part =

6

ans =

41    41

```

The first_part = 5 belongs to {0,...,ell_opt-1} and is encoded by dict1

```dict1{5+1,1}
dict1{5+1,2}
```
```ans =

5

ans =

1     0     0

```

The second_part= 6 is encoded in unary, as a five repetition of 1, followed by a 0

Therefore 41 is encoded by Golomb code as

```codewordG = [[1     0     0] [ 1 1 1 1 1 0] ]
% while it is encoded by Huffman code as
dict{42,2}
```
```codewordG =

1     0     0     1     1     1     1     1     0

ans =

1     0     1     1     1     1     1     1     0

```

Due to non-unicity of Huffman coding, the two codes of 41 are different, but their codelength are the same

```[length(codewordG ) length( dict{42,2} )]
```
```ans =

9     9

```

Checking the codeword length for all symbols 0...255

obtained by Huffman and by Golomb codes

```for i=0:255
CLH(i+1) = length(dict{i+1,2});
first_part = floor(i/ell_opt);
second_part = i -first_part*ell_opt;

CL_GR(i+1) =  first_part +1+   length(dict1{second_part+1,2});
end
% the codeword length are for most symbols the same
[CLH; CL_GR]
```
```ans =

Columns 1 through 13

3     4     4     4     4     4     4     4     5     5     5     5     5
3     4     4     4     4     4     4     4     5     5     5     5     5

Columns 14 through 26

5     5     6     6     6     6     6     6     6     7     7     7     7
5     5     6     6     6     6     6     6     6     7     7     7     7

Columns 27 through 39

7     7     7     8     8     8     8     8     8     8     9     9     9
7     7     7     8     8     8     8     8     8     8     9     9     9

Columns 40 through 52

9     9     9     9    10    10    10    10    10    10    10    11    11
9     9     9     9    10    10    10    10    10    10    10    11    11

Columns 53 through 65

11    11    11    11    11    12    12    12    12    12    12    12    13
11    11    11    11    11    12    12    12    12    12    12    12    13

Columns 66 through 78

13    13    13    13    13    13    14    14    14    14    14    14    14
13    13    13    13    13    13    14    14    14    14    14    14    14

Columns 79 through 91

15    15    15    15    15    15    15    16    16    16    16    16    16
15    15    15    15    15    15    15    16    16    16    16    16    16

Columns 92 through 104

16    17    17    17    17    17    17    17    18    18    18    18    18
16    17    17    17    17    17    17    17    18    18    18    18    18

Columns 105 through 117

18    18    19    19    19    19    19    19    19    20    20    20    20
18    18    19    19    19    19    19    19    19    20    20    20    20

Columns 118 through 130

20    20    20    21    21    21    21    21    21    21    22    22    22
20    20    20    21    21    21    21    21    21    21    22    22    22

Columns 131 through 143

22    22    22    22    23    23    23    23    23    23    23    24    24
22    22    22    22    23    23    23    23    23    23    23    24    24

Columns 144 through 156

24    24    24    24    24    25    25    25    25    25    25    25    26
24    24    24    24    24    25    25    25    25    25    25    25    26

Columns 157 through 169

26    26    26    26    26    26    27    27    27    27    27    27    27
26    26    26    26    26    26    27    27    27    27    27    27    27

Columns 170 through 182

28    28    28    28    28    28    28    29    29    29    29    29    29
28    28    28    28    28    28    28    29    29    29    29    29    29

Columns 183 through 195

29    30    30    30    30    30    30    30    31    31    31    31    31
29    30    30    30    30    30    30    30    31    31    31    31    31

Columns 196 through 208

31    31    32    32    32    32    32    32    32    33    33    33    33
31    31    32    32    32    32    32    32    32    33    33    33    33

Columns 209 through 221

33    33    33    34    34    34    34    34    34    34    34    35    35
33    33    33    34    34    34    34    34    34    34    35    35    35

Columns 222 through 234

35    35    35    35    36    36    36    36    36    36    37    37    37
35    35    35    35    36    36    36    36    36    36    36    37    37

Columns 235 through 247

37    37    37    38    38    38    38    38    38    38    39    39    39
37    37    37    37    37    38    38    38    38    38    38    38    39

Columns 248 through 256

39    39    39    39    39    40    40    40    40
39    39    39    39    39    39    40    40    40

```

except these 7 symbols (from those with very small probability)

```find(CLH~=CL_GR)
```
```ans =

219   232   238   239   245   246   253

```