Contents

Open and read the file with student names

% When the file is in internet, use:
if(0)
    URL = 'http://www.cs.tut.fi/~tabus/course/SC/Participants2019BonusPoints.txt'
    A = urlread(URL)
else
    % When the file 'Participants2019BonusPoints.txt' is in the current
    % directory use:
    fileID = fopen('Participants2019BonusPoints.txt');
    A = fread(fileID,'*char')';
    fclose(fileID);
end

% Find the letters used in the file
Alphabet = unique(A)
nA = length(Alphabet)
% find the frequency of occurrence of each letter
% first convert the data from chars to numerical
Ad = double(A);
Alphabetd = unique(Ad)
hh = hist(Ad,unique(Ad))
[char(Alphabetd)' ones(nA,1)*'  ' num2str(Alphabetd')  ones(nA,1)*'  ' num2str(hh')]
Alphabet =

    '
           '-ABCDEFGHJKLMNOPRSTVZabcdefghijklmnoprstuvwyz¤©Ã'


nA =

    52


Alphabetd =

  Columns 1 through 13

    10    13    32    39    45    65    66    67    68    69    70    71    72

  Columns 14 through 26

    74    75    76    77    78    79    80    82    83    84    86    90    97

  Columns 27 through 39

    98    99   100   101   102   103   104   105   106   107   108   109   110

  Columns 40 through 52

   111   112   114   115   116   117   118   119   121   122   164   169   195


hh =

  Columns 1 through 13

    31    31    68     1     2     5     5     4     1     7     2     2    12

  Columns 14 through 26

     8     3     5    14     4     2     6     1     7     4     4     1    80

  Columns 27 through 39

     2     5     5    36     1     5    18    66     1    20    25    16    34

  Columns 40 through 52

    26     6    30    22    13    18     5     1     7     2     9     3    12


ans =

  52×10 char array

    '↵   10  31'
    '←   13  31'
    '    32  68'
    ''   39   1'
    '-   45   2'
    'A   65   5'
    'B   66   5'
    'C   67   4'
    'D   68   1'
    'E   69   7'
    'F   70   2'
    'G   71   2'
    'H   72  12'
    'J   74   8'
    'K   75   3'
    'L   76   5'
    'M   77  14'
    'N   78   4'
    'O   79   2'
    'P   80   6'
    'R   82   1'
    'S   83   7'
    'T   84   4'
    'V   86   4'
    'Z   90   1'
    'a   97  80'
    'b   98   2'
    'c   99   5'
    'd  100   5'
    'e  101  36'
    'f  102   1'
    'g  103   5'
    'h  104  18'
    'i  105  66'
    'j  106   1'
    'k  107  20'
    'l  108  25'
    'm  109  16'
    'n  110  34'
    'o  111  26'
    'p  112   6'
    'r  114  30'
    's  115  22'
    't  116  13'
    'u  117  18'
    'v  118   5'
    'w  119   1'
    'y  121   7'
    'z  122   2'
    '¤  164   9'
    '©  169   3'
    'Ã  195  12'

Find the entropy of the text

the counts of each letter are in hh the frequency of occurrence of each letter

frequ = hh/sum(hh)
% Entropy:
H = -sum(frequ .*log2(frequ))
H1 = 0;
for i = 1:nA
    H1 = H1- frequ(i) .*log2(frequ(i));
end
[H H1]
% The text needs about 4.79, bits per symbol in average

% Shannon code
Codes_length_Shannon = ceil(log2(1./frequ))
% Check Kraft inequality
SumKraft = sum( 2.^(-Codes_length_Shannon ) )
% Is KI true?
SumKraft <= 1
[char(Alphabetd)' ones(nA,1)*'  ' num2str(Alphabetd') ones(nA,1)*'  ' ...
    num2str(hh') ones(nA,1)*'  ' num2str(Codes_length_Shannon')]
frequ =

  Columns 1 through 7

    0.0444    0.0444    0.0974    0.0014    0.0029    0.0072    0.0072

  Columns 8 through 14

    0.0057    0.0014    0.0100    0.0029    0.0029    0.0172    0.0115

  Columns 15 through 21

    0.0043    0.0072    0.0201    0.0057    0.0029    0.0086    0.0014

  Columns 22 through 28

    0.0100    0.0057    0.0057    0.0014    0.1146    0.0029    0.0072

  Columns 29 through 35

    0.0072    0.0516    0.0014    0.0072    0.0258    0.0946    0.0014

  Columns 36 through 42

    0.0287    0.0358    0.0229    0.0487    0.0372    0.0086    0.0430

  Columns 43 through 49

    0.0315    0.0186    0.0258    0.0072    0.0014    0.0100    0.0029

  Columns 50 through 52

    0.0129    0.0043    0.0172


H =

    4.8141


ans =

    4.8141    4.8141


Codes_length_Shannon =

  Columns 1 through 13

     5     5     4    10     9     8     8     8    10     7     9     9     6

  Columns 14 through 26

     7     8     8     6     8     9     7    10     7     8     8    10     4

  Columns 27 through 39

     9     8     8     5    10     8     6     4    10     6     5     6     5

  Columns 40 through 52

     5     7     5     5     6     6     8    10     7     9     7     8     6


SumKraft =

    0.6865


ans =

  logical

   1


ans =

  52×14 char array

    '↵   10  31   5'
    '←   13  31   5'
    '    32  68   4'
    ''   39   1  10'
    '-   45   2   9'
    'A   65   5   8'
    'B   66   5   8'
    'C   67   4   8'
    'D   68   1  10'
    'E   69   7   7'
    'F   70   2   9'
    'G   71   2   9'
    'H   72  12   6'
    'J   74   8   7'
    'K   75   3   8'
    'L   76   5   8'
    'M   77  14   6'
    'N   78   4   8'
    'O   79   2   9'
    'P   80   6   7'
    'R   82   1  10'
    'S   83   7   7'
    'T   84   4   8'
    'V   86   4   8'
    'Z   90   1  10'
    'a   97  80   4'
    'b   98   2   9'
    'c   99   5   8'
    'd  100   5   8'
    'e  101  36   5'
    'f  102   1  10'
    'g  103   5   8'
    'h  104  18   6'
    'i  105  66   4'
    'j  106   1  10'
    'k  107  20   6'
    'l  108  25   5'
    'm  109  16   6'
    'n  110  34   5'
    'o  111  26   5'
    'p  112   6   7'
    'r  114  30   5'
    's  115  22   5'
    't  116  13   6'
    'u  117  18   6'
    'v  118   5   8'
    'w  119   1  10'
    'y  121   7   7'
    'z  122   2   9'
    '¤  164   9   7'
    '©  169   3   8'
    'Ã  195  12   6'

Questions to be answered by March 16 for bonus points

send email to ioan.tabus@tuni.fi or bring the paper with your answer to me during lecture time

% Q1: Find the Shannon codelength for each letter of the alphabet from your
% name.
% (Example: for the name Claude Shannon you have the ten symbols
% ['space' 'C' 'S' 'a' 'd' 'e' 'h' 'l' 'n' 'o' 'u'] with frequencies
% [1 1 1 2 1 1 1 1 3 1 1] and Shanon codelengths [4 4 4 3 4 4 4 4 3 4 4].
% The tree has maximum depth 4.

% Q2: Draw the prefix tree corresponding to the Shannon codelengths found
% at Q1 (you can draw on a word file, or on a paper and bring the paper to
% me). Note: here you should draw only the prefix tree needed for your
% letters

% Q3: What is your name (first name, space, Family name) coded using the
% tree of the Shannon code which you drawn at Q2.

% Q4: Write the binary representation of the tree found at Q2 (see page 25
% in the slides).

% Q5. Write down the binary message containing first the tree of the Shannon
% Code and then your name.