# Arithmetic coding principles

## Contents

Assume the probabilities of the symbols are known to both encoder and decoder

Define symbols

## Pick a string to encode and build the cumulative probability table

Choose as string to be compressed 'Arithmetic'

s = 'Arithmetic'% uniq = unique(s) duniq = double(uniq) ds = double(s) hh = hist(ds,duniq)

s = Arithmetic uniq = Acehimrt duniq = 65 99 101 104 105 109 114 116 ds = 65 114 105 116 104 109 101 116 105 99 hh = 1 1 1 1 2 1 1 2

Keep probability values with 13 bits precision; Stop if any of them has probability 0

prec = 48 probs = hh/sum(hh); probst = floor(probs*2^prec)/2^prec % normalize their sum to 1 sp = sum(probst); if (any(probs) <=0 ) display('Error: all probabilities must be strictly poistive') STOP1 end

prec = 48 probst = Columns 1 through 3 0.099999999999998 0.099999999999998 0.099999999999998 Columns 4 through 6 0.099999999999998 0.199999999999999 0.099999999999998 Columns 7 through 8 0.099999999999998 0.199999999999999

Construct the cumulative probability table (Low and High values for each symbol)

lows = 0; highs = probst(1); for i = 2:length(duniq) lows(i) = lows(i-1)+ probst(i-1); highs(i) = highs(i-1)+ probst(i); end InitialTable = [probst' lows' highs']

InitialTable = 0.099999999999998 0 0.099999999999998 0.099999999999998 0.099999999999998 0.199999999999996 0.099999999999998 0.199999999999996 0.299999999999994 0.099999999999998 0.299999999999994 0.399999999999991 0.199999999999999 0.399999999999991 0.599999999999991 0.099999999999998 0.599999999999991 0.699999999999989 0.099999999999998 0.699999999999989 0.799999999999987 0.199999999999999 0.799999999999987 0.999999999999986

we assume that both decoder and encoder have the table "InitialTable"

## Perform encoding

LowEnc = 0; HighEnc = 1; Progress = [LowEnc HighEnc]; prob_message = 1; for i = 1:length(ds) current_symbol = ds(i); j = find( duniq == current_symbol ); los = lows(j); his = highs(j); current_interval = HighEnc-LowEnc; HighEnc = LowEnc + current_interval*his; LowEnc = LowEnc + current_interval*los; Progress = [Progress; LowEnc HighEnc]; prob_message = prob_message * probs(j); end

The processing done at the encoder is found in the table "Progress"

Progress

Progress = 0 1.000000000000000 0 0.099999999999998 0.069999999999997 0.079999999999997 0.073999999999997 0.075999999999997 0.075599999999997 0.075999999999997 0.075719999999997 0.075759999999997 0.075743999999997 0.075747999999997 0.075744799999997 0.075745199999997 0.075745119999997 0.075745199999997 0.075745151999997 0.075745167999997 0.075745153599997 0.075745155199997

## Choose as message any number between LowEnc and HighEnc

Show binary and integer values for the variables

intLowEnc = floor(LowEnc*2^40) binLowEnc = bitget( intLowEnc, 40:-1:1) intHighEnc = floor(HighEnc*2^40) binHighEnc = bitget( intHighEnc, 40:-1:1) [binHighEnc' binLowEnc' (1:40)'] message = (LowEnc+HighEnc)/2 ind = find(binHighEnc~= binLowEnc,1) ind1 = find( binLowEnc(ind:end) == 0,1); ind2 = ind+ind1-1; binLowEnc(ind2) = 1;

intLowEnc = 8.328267713000000e+10 binLowEnc = Columns 1 through 13 0 0 0 1 0 0 1 1 0 1 1 0 0 Columns 14 through 26 1 0 0 0 0 0 0 1 0 0 0 1 1 Columns 27 through 39 0 0 1 1 0 1 1 0 0 0 1 0 1 Column 40 0 intHighEnc = 8.328267889000000e+10 binHighEnc = Columns 1 through 13 0 0 0 1 0 0 1 1 0 1 1 0 0 Columns 14 through 26 1 0 0 0 0 0 0 1 0 0 0 1 1 Columns 27 through 39 0 1 0 1 0 0 0 1 1 0 1 0 1 Column 40 0 ans = 0 0 1 0 0 2 0 0 3 1 1 4 0 0 5 0 0 6 1 1 7 1 1 8 0 0 9 1 1 10 1 1 11 0 0 12 0 0 13 1 1 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 19 0 0 20 1 1 21 0 0 22 0 0 23 0 0 24 1 1 25 1 1 26 0 0 27 1 0 28 0 1 29 1 1 30 0 0 31 0 1 32 0 1 33 1 0 34 1 0 35 0 0 36 1 1 37 0 0 38 1 1 39 0 0 40 message = 0.075745154399997 ind = 28

This is the chosen bitstream

binLowEnc(1:ind2)

ans = Columns 1 through 13 0 0 0 1 0 0 1 1 0 1 1 0 0 Columns 14 through 26 1 0 0 0 0 0 0 1 0 0 0 1 1 Columns 27 through 28 0 1

This is the representation of the bitstream as a subunitary number

message = binLowEnc(1:ind2)*(2.^(-(1:ind2)'))

message = 0.075745154172182

The probability associated with the message is prob_message and the ideal codelength is - log2(prob_message) bits

ideal_codelength = - log2(prob_message)

ideal_codelength = 29.219280948873621

## Perform decoding

MessageDecoded = []; crt_message = message; for i = 1:length(ds) for j = 1:length(duniq) los = lows(j); his = highs(j); if( (los <= crt_message) && (his > crt_message) ) break end end current_interval = his-los; crt_message = (crt_message - los)/current_interval; MessageDecoded = [MessageDecoded uniq(j)] end

MessageDecoded = A MessageDecoded = Ar MessageDecoded = Ari MessageDecoded = Arit MessageDecoded = Arith MessageDecoded = Arithm MessageDecoded = Arithme MessageDecoded = Arithmet MessageDecoded = Arithmeti MessageDecoded = Arithmetic