Arithmetic coding principles

Contents

Assume the probabilities of the symbols are known to both encoder and decoder

Define symbols

Pick a string to encode and build the cumulative probability table

Choose as string to be compressed 'Arithmetic'

s = 'Arithmetic'%
uniq = unique(s)
duniq = double(uniq)
ds = double(s)
hh = hist(ds,duniq)
s =

Arithmetic


uniq =

Acehimrt


duniq =

    65    99   101   104   105   109   114   116


ds =

    65   114   105   116   104   109   101   116   105    99


hh =

     1     1     1     1     2     1     1     2

Keep probability values with 13 bits precision; Stop if any of them has probability 0

prec = 48
probs = hh/sum(hh);
probst = floor(probs*2^prec)/2^prec
% normalize their sum to 1
sp = sum(probst);
if (any(probs) <=0 )
    display('Error: all probabilities must be strictly poistive')
    STOP1
end
prec =

    48


probst =

  Columns 1 through 3

   0.099999999999998   0.099999999999998   0.099999999999998

  Columns 4 through 6

   0.099999999999998   0.199999999999999   0.099999999999998

  Columns 7 through 8

   0.099999999999998   0.199999999999999

Construct the cumulative probability table (Low and High values for each symbol)

lows = 0; highs = probst(1);
for i = 2:length(duniq)
    lows(i) = lows(i-1)+ probst(i-1);
    highs(i) = highs(i-1)+ probst(i);
end
InitialTable = [probst' lows' highs']
InitialTable =

   0.099999999999998                   0   0.099999999999998
   0.099999999999998   0.099999999999998   0.199999999999996
   0.099999999999998   0.199999999999996   0.299999999999994
   0.099999999999998   0.299999999999994   0.399999999999991
   0.199999999999999   0.399999999999991   0.599999999999991
   0.099999999999998   0.599999999999991   0.699999999999989
   0.099999999999998   0.699999999999989   0.799999999999987
   0.199999999999999   0.799999999999987   0.999999999999986

we assume that both decoder and encoder have the table "InitialTable"

Perform encoding

LowEnc = 0; HighEnc = 1; Progress = [LowEnc HighEnc]; prob_message = 1;
for i = 1:length(ds)
    current_symbol = ds(i);
    j = find( duniq == current_symbol );
    los = lows(j); his = highs(j);
    current_interval = HighEnc-LowEnc;
    HighEnc = LowEnc + current_interval*his;
    LowEnc = LowEnc + current_interval*los;
    Progress = [Progress; LowEnc HighEnc];
    prob_message = prob_message * probs(j);
end

The processing done at the encoder is found in the table "Progress"

 Progress
Progress =

                   0   1.000000000000000
                   0   0.099999999999998
   0.069999999999997   0.079999999999997
   0.073999999999997   0.075999999999997
   0.075599999999997   0.075999999999997
   0.075719999999997   0.075759999999997
   0.075743999999997   0.075747999999997
   0.075744799999997   0.075745199999997
   0.075745119999997   0.075745199999997
   0.075745151999997   0.075745167999997
   0.075745153599997   0.075745155199997

Choose as message any number between LowEnc and HighEnc

Show binary and integer values for the variables

intLowEnc = floor(LowEnc*2^40)
binLowEnc = bitget( intLowEnc, 40:-1:1)
intHighEnc = floor(HighEnc*2^40)
binHighEnc = bitget( intHighEnc, 40:-1:1)
[binHighEnc' binLowEnc' (1:40)']
message = (LowEnc+HighEnc)/2

ind = find(binHighEnc~= binLowEnc,1)
ind1 = find( binLowEnc(ind:end) == 0,1);
ind2 = ind+ind1-1;
binLowEnc(ind2) = 1;
intLowEnc =

     8.328267713000000e+10


binLowEnc =

  Columns 1 through 13

     0     0     0     1     0     0     1     1     0     1     1     0     0

  Columns 14 through 26

     1     0     0     0     0     0     0     1     0     0     0     1     1

  Columns 27 through 39

     0     0     1     1     0     1     1     0     0     0     1     0     1

  Column 40

     0


intHighEnc =

     8.328267889000000e+10


binHighEnc =

  Columns 1 through 13

     0     0     0     1     0     0     1     1     0     1     1     0     0

  Columns 14 through 26

     1     0     0     0     0     0     0     1     0     0     0     1     1

  Columns 27 through 39

     0     1     0     1     0     0     0     1     1     0     1     0     1

  Column 40

     0


ans =

     0     0     1
     0     0     2
     0     0     3
     1     1     4
     0     0     5
     0     0     6
     1     1     7
     1     1     8
     0     0     9
     1     1    10
     1     1    11
     0     0    12
     0     0    13
     1     1    14
     0     0    15
     0     0    16
     0     0    17
     0     0    18
     0     0    19
     0     0    20
     1     1    21
     0     0    22
     0     0    23
     0     0    24
     1     1    25
     1     1    26
     0     0    27
     1     0    28
     0     1    29
     1     1    30
     0     0    31
     0     1    32
     0     1    33
     1     0    34
     1     0    35
     0     0    36
     1     1    37
     0     0    38
     1     1    39
     0     0    40


message =

   0.075745154399997


ind =

    28

This is the chosen bitstream

binLowEnc(1:ind2)
ans =

  Columns 1 through 13

     0     0     0     1     0     0     1     1     0     1     1     0     0

  Columns 14 through 26

     1     0     0     0     0     0     0     1     0     0     0     1     1

  Columns 27 through 28

     0     1

This is the representation of the bitstream as a subunitary number

message = binLowEnc(1:ind2)*(2.^(-(1:ind2)'))
message =

   0.075745154172182

The probability associated with the message is prob_message and the ideal codelength is - log2(prob_message) bits

ideal_codelength = - log2(prob_message)
ideal_codelength =

  29.219280948873621

Perform decoding

 MessageDecoded = [];
crt_message = message;
for i = 1:length(ds)
    for j = 1:length(duniq)
        los = lows(j); his = highs(j);
        if( (los <= crt_message) && (his > crt_message) )
            break
        end
    end
    current_interval = his-los;
    crt_message = (crt_message - los)/current_interval;
    MessageDecoded = [MessageDecoded uniq(j)]
end
MessageDecoded =

A


MessageDecoded =

Ar


MessageDecoded =

Ari


MessageDecoded =

Arit


MessageDecoded =

Arith


MessageDecoded =

Arithm


MessageDecoded =

Arithme


MessageDecoded =

Arithmet


MessageDecoded =

Arithmeti


MessageDecoded =

Arithmetic