# Arithmetic coding principles

## Contents

Assume the probabilities of the symbols are known to both encoder and decoder

Define symbols

## Pick a string to encode and build the cumulative probability table

Choose as string to be compressed 'Arithmetic'

```s = 'Arithmetic'%
uniq = unique(s)
duniq = double(uniq)
ds = double(s)
hh = hist(ds,duniq)
```
```s =

Arithmetic

uniq =

Acehimrt

duniq =

65    99   101   104   105   109   114   116

ds =

65   114   105   116   104   109   101   116   105    99

hh =

1     1     1     1     2     1     1     2

```

Keep probability values with 13 bits precision; Stop if any of them has probability 0

```prec = 48
probs = hh/sum(hh);
probst = floor(probs*2^prec)/2^prec
% normalize their sum to 1
sp = sum(probst);
if (any(probs) <=0 )
display('Error: all probabilities must be strictly poistive')
STOP1
end
```
```prec =

48

probst =

Columns 1 through 3

0.099999999999998   0.099999999999998   0.099999999999998

Columns 4 through 6

0.099999999999998   0.199999999999999   0.099999999999998

Columns 7 through 8

0.099999999999998   0.199999999999999

```

Construct the cumulative probability table (Low and High values for each symbol)

```lows = 0; highs = probst(1);
for i = 2:length(duniq)
lows(i) = lows(i-1)+ probst(i-1);
highs(i) = highs(i-1)+ probst(i);
end
InitialTable = [probst' lows' highs']
```
```InitialTable =

0.099999999999998                   0   0.099999999999998
0.099999999999998   0.099999999999998   0.199999999999996
0.099999999999998   0.199999999999996   0.299999999999994
0.099999999999998   0.299999999999994   0.399999999999991
0.199999999999999   0.399999999999991   0.599999999999991
0.099999999999998   0.599999999999991   0.699999999999989
0.099999999999998   0.699999999999989   0.799999999999987
0.199999999999999   0.799999999999987   0.999999999999986

```

we assume that both decoder and encoder have the table "InitialTable"

## Perform encoding

```LowEnc = 0; HighEnc = 1; Progress = [LowEnc HighEnc]; prob_message = 1;
for i = 1:length(ds)
current_symbol = ds(i);
j = find( duniq == current_symbol );
los = lows(j); his = highs(j);
current_interval = HighEnc-LowEnc;
HighEnc = LowEnc + current_interval*his;
LowEnc = LowEnc + current_interval*los;
Progress = [Progress; LowEnc HighEnc];
prob_message = prob_message * probs(j);
end
```

The processing done at the encoder is found in the table "Progress"

``` Progress
```
```Progress =

0   1.000000000000000
0   0.099999999999998
0.069999999999997   0.079999999999997
0.073999999999997   0.075999999999997
0.075599999999997   0.075999999999997
0.075719999999997   0.075759999999997
0.075743999999997   0.075747999999997
0.075744799999997   0.075745199999997
0.075745119999997   0.075745199999997
0.075745151999997   0.075745167999997
0.075745153599997   0.075745155199997

```

## Choose as message any number between LowEnc and HighEnc

Show binary and integer values for the variables

```intLowEnc = floor(LowEnc*2^40)
binLowEnc = bitget( intLowEnc, 40:-1:1)
intHighEnc = floor(HighEnc*2^40)
binHighEnc = bitget( intHighEnc, 40:-1:1)
[binHighEnc' binLowEnc' (1:40)']
message = (LowEnc+HighEnc)/2

ind = find(binHighEnc~= binLowEnc,1)
ind1 = find( binLowEnc(ind:end) == 0,1);
ind2 = ind+ind1-1;
binLowEnc(ind2) = 1;
```
```intLowEnc =

8.328267713000000e+10

binLowEnc =

Columns 1 through 13

0     0     0     1     0     0     1     1     0     1     1     0     0

Columns 14 through 26

1     0     0     0     0     0     0     1     0     0     0     1     1

Columns 27 through 39

0     0     1     1     0     1     1     0     0     0     1     0     1

Column 40

0

intHighEnc =

8.328267889000000e+10

binHighEnc =

Columns 1 through 13

0     0     0     1     0     0     1     1     0     1     1     0     0

Columns 14 through 26

1     0     0     0     0     0     0     1     0     0     0     1     1

Columns 27 through 39

0     1     0     1     0     0     0     1     1     0     1     0     1

Column 40

0

ans =

0     0     1
0     0     2
0     0     3
1     1     4
0     0     5
0     0     6
1     1     7
1     1     8
0     0     9
1     1    10
1     1    11
0     0    12
0     0    13
1     1    14
0     0    15
0     0    16
0     0    17
0     0    18
0     0    19
0     0    20
1     1    21
0     0    22
0     0    23
0     0    24
1     1    25
1     1    26
0     0    27
1     0    28
0     1    29
1     1    30
0     0    31
0     1    32
0     1    33
1     0    34
1     0    35
0     0    36
1     1    37
0     0    38
1     1    39
0     0    40

message =

0.075745154399997

ind =

28

```

This is the chosen bitstream

```binLowEnc(1:ind2)
```
```ans =

Columns 1 through 13

0     0     0     1     0     0     1     1     0     1     1     0     0

Columns 14 through 26

1     0     0     0     0     0     0     1     0     0     0     1     1

Columns 27 through 28

0     1

```

This is the representation of the bitstream as a subunitary number

```message = binLowEnc(1:ind2)*(2.^(-(1:ind2)'))
```
```message =

0.075745154172182

```

The probability associated with the message is prob_message and the ideal codelength is - log2(prob_message) bits

```ideal_codelength = - log2(prob_message)
```
```ideal_codelength =

29.219280948873621

```

## Perform decoding

``` MessageDecoded = [];
crt_message = message;
for i = 1:length(ds)
for j = 1:length(duniq)
los = lows(j); his = highs(j);
if( (los <= crt_message) && (his > crt_message) )
break
end
end
current_interval = his-los;
crt_message = (crt_message - los)/current_interval;
MessageDecoded = [MessageDecoded uniq(j)]
end
```
```MessageDecoded =

A

MessageDecoded =

Ar

MessageDecoded =

Ari

MessageDecoded =

Arit

MessageDecoded =

Arith

MessageDecoded =

Arithm

MessageDecoded =

Arithme

MessageDecoded =

Arithmet

MessageDecoded =

Arithmeti

MessageDecoded =

Arithmetic

```