Contents

Open and read the file with student names

% When the file is in internet, use:
if(0)
    URL = 'http://www.cs.tut.fi/~tabus/course/SC/Participants2018BonusPoints.txt'
    A = urlread(URL)
else
    % When the file 'Participants2018BonusPoints.txt' is in the current directory use:
    fileID = fopen('Participants2018BonusPoints.txt');
    A = fread(fileID,'*char')';
    fclose(fileID);
end

% Find the letters used in the file
Alphabet = unique(A)
nA = length(Alphabet)
% find the frequency of occurrence of each letter
% first convert the data from chars to numerical
Ad = double(A);
Alphabetd = unique(Ad)
hh = hist(Ad,unique(Ad))
[char(Alphabetd)' ones(nA,1)*'  ' num2str(Alphabetd')  ones(nA,1)*'  ' num2str(hh')]
Alphabet =


 -ABCDEFGHIJKLMNOPQRSTVWYZabcdefghijklmnopqrstuvyzäéïö


nA =

    56


Alphabetd =

  Columns 1 through 13

    10    13    32    45    65    66    67    68    69    70    71    72    73

  Columns 14 through 26

    74    75    76    77    78    79    80    81    82    83    84    86    87

  Columns 27 through 39

    89    90    97    98    99   100   101   102   103   104   105   106   107

  Columns 40 through 52

   108   109   110   111   112   113   114   115   116   117   118   121   122

  Columns 53 through 56

   228   233   239   246


hh =

  Columns 1 through 13

    46    46    95     1    19     5     2     3     4     1     1     9     5

  Columns 14 through 26

    12     4     8    13     5     4     5     1     5    17     9     3     1

  Columns 27 through 39

     1     2   120     4     5    11    61     4    11    43    61     3    18

  Columns 40 through 52

    36    25    71    45     9     2    40    27    23    29     8     9     1

  Columns 53 through 56

     8     1     1     1


ans =


   10   46
   13   46
    32   95
-   45    1
A   65   19
B   66    5
C   67    2
D   68    3
E   69    4
F   70    1
G   71    1
H   72    9
I   73    5
J   74   12
K   75    4
L   76    8
M   77   13
N   78    5
O   79    4
P   80    5
Q   81    1
R   82    5
S   83   17
T   84    9
V   86    3
W   87    1
Y   89    1
Z   90    2
a   97  120
b   98    4
c   99    5
d  100   11
e  101   61
f  102    4
g  103   11
h  104   43
i  105   61
j  106    3
k  107   18
l  108   36
m  109   25
n  110   71
o  111   45
p  112    9
q  113    2
r  114   40
s  115   27
t  116   23
u  117   29
v  118    8
y  121    9
z  122    1
ä  228    8
é  233    1
ï  239    1
ö  246    1

Find the entropy of the text

the counts of each letter are in hh the frequency of occurrence of each letter

frequ = hh/sum(hh)
% Entropy:
H = -sum(frequ .*log2(frequ))
H1 = 0;
for i = 1:nA
    H1 = H1- frequ(i) .*log2(frequ(i));
end
[H H1]
% The text needs about 4.79, bits per symbol in average

% Shannon code
Codes_length_Shannon = ceil(log2(1./frequ))
% Check Kraft inequality
SumKraft = sum( 2.^(-Codes_length_Shannon ) )
% Is KI true?
SumKraft <= 1
[char(Alphabetd)' ones(nA,1)*'  ' num2str(Alphabetd') ones(nA,1)*'  ' num2str(hh') ones(nA,1)*'  ' num2str(Codes_length_Shannon')]
frequ =

  Columns 1 through 7

    0.0458    0.0458    0.0946    0.0010    0.0189    0.0050    0.0020

  Columns 8 through 14

    0.0030    0.0040    0.0010    0.0010    0.0090    0.0050    0.0120

  Columns 15 through 21

    0.0040    0.0080    0.0129    0.0050    0.0040    0.0050    0.0010

  Columns 22 through 28

    0.0050    0.0169    0.0090    0.0030    0.0010    0.0010    0.0020

  Columns 29 through 35

    0.1195    0.0040    0.0050    0.0110    0.0608    0.0040    0.0110

  Columns 36 through 42

    0.0428    0.0608    0.0030    0.0179    0.0359    0.0249    0.0707

  Columns 43 through 49

    0.0448    0.0090    0.0020    0.0398    0.0269    0.0229    0.0289

  Columns 50 through 56

    0.0080    0.0090    0.0010    0.0080    0.0010    0.0010    0.0010


H =

    4.7936


ans =

    4.7936    4.7936


Codes_length_Shannon =

  Columns 1 through 13

     5     5     4    10     6     8     9     9     8    10    10     7     8

  Columns 14 through 26

     7     8     7     7     8     8     8    10     8     6     7     9    10

  Columns 27 through 39

    10     9     4     8     8     7     5     8     7     5     5     9     6

  Columns 40 through 52

     5     6     4     5     7     9     5     6     6     6     7     7    10

  Columns 53 through 56

     7    10    10    10


SumKraft =

    0.6973


ans =

  logical

   1


ans =


   10   46   5
   13   46   5
    32   95   4
-   45    1  10
A   65   19   6
B   66    5   8
C   67    2   9
D   68    3   9
E   69    4   8
F   70    1  10
G   71    1  10
H   72    9   7
I   73    5   8
J   74   12   7
K   75    4   8
L   76    8   7
M   77   13   7
N   78    5   8
O   79    4   8
P   80    5   8
Q   81    1  10
R   82    5   8
S   83   17   6
T   84    9   7
V   86    3   9
W   87    1  10
Y   89    1  10
Z   90    2   9
a   97  120   4
b   98    4   8
c   99    5   8
d  100   11   7
e  101   61   5
f  102    4   8
g  103   11   7
h  104   43   5
i  105   61   5
j  106    3   9
k  107   18   6
l  108   36   5
m  109   25   6
n  110   71   4
o  111   45   5
p  112    9   7
q  113    2   9
r  114   40   5
s  115   27   6
t  116   23   6
u  117   29   6
v  118    8   7
y  121    9   7
z  122    1  10
ä  228    8   7
é  233    1  10
ï  239    1  10
ö  246    1  10

Questions to be answered by March 16 for bonus points

send email to ioan.tabus@tut.fi or bring the paper with your answer to me during lecture time

% Q1: Find the Shannon codelength for each letter of the alphabet from your
% name.
% (Example: for the name Claude Shannon you have the ten symbols
% ['space' 'C' 'S' 'a' 'd' 'e' 'h' 'l' 'n' 'o' 'u'] with frequencies
% [1 1 1 2 1 1 1 1 3 1 1] and Shanon codelengths [4 4 4 3 4 4 4 4 3 4 4].
% The tree has maximum depth 4.

% Q2: Draw the prefix tree corresponding to the Shannon codelengths found
% at Q1 (you can draw on a word file, or on a paper and bring the paper to
% me). Note: here you should draw only the prefix tree needed for your letters

% Q3: What is your name (first name, space, Family name) coded using the tree of the Shannon code which you
% drawn at Q2.

% Q4: Write the binary representation of the tree found at Q2 (see page 25 in the slides).

% Q5. Write down the binary message containing first the tree of the Shannon
% Code and then your name.