Contents

Open and read the file with student names

When the file 'Participants2017.txt' is in the current directory use: fileID = fopen('Participants2017.txt'); A1 = fread(fileID,'*char')'; fclose(fileID);

% When the file is in internet, use:
URL = 'http://www.cs.tut.fi/~tabus/course/SC/Participants2017.txt'
A = urlread(URL)
% Find the letters used in the file
Alphabet = unique(A)
nA = length(Alphabet)
% find the frequency of occurrence of each letter
% first convert the data from chars to numerical
Ad = double(A);
Alphabetd = unique(Ad)
hh = hist(Ad,unique(Ad))
[char(Alphabetd)' num2str(Alphabetd') num2str(hh')]
URL =

http://www.cs.tut.fi/~tabus/course/SC/Participants2017.txt


A =

Adhikari Bishwo Prakash
Ahvensalmi Arto Tapani
Aronen Tapio Lauri Mikael
Balnius Gerardas Ramunas
Bhattacharya Sounak
Chuchvara Aleksandra
Da Graca Gama Filipe Xavier
Darvishmohammadi Bahareh
Diaz Benito Marta
Er�palo Mikko Olavi
Gharib Shayan
Harju Manu Pietari
Husseini Sahar
Ib�nez S�ez Natalia
Jafari Kianoush
Jalalinejad Fatemeh
Javanmardi Farhad
J�rvenp�� Toni Kristian
Kirszenberg Alexandre
Kwan Ian
Kylm�l� Jarkko-Matti Jaakko
Laaksonen Lauri Kristian
Lee Emilia
Lempi�inen Pekka Juhani
Linsenmaier Hugo Claude Robert
Lotvonen Atro Johannes
Malm Anna Elisa Johanna
Malysh Konstantin
Marmai Dana� Anne Eleonore Marie
Mir Abdul Rasool Khan Mir Abdul Naser
Moreschini Sergio
M�kinen Roope Uolevi
Oravuo Juho Oskari
Pascual Olivas Pablo
Perez-Macias Martin Jose Maria
Pham Huu Thanh Binh
Polewczyk Joanna Ewa
Rezaei Yousefi Zeinab
Sainson Antoine Jacques Ren�
Sankala Heikki Juhana
Teo Sherilyn
Vicentijevic Marko
Yang Zhen



Alphabet =


 -ABCDEFGHIJKLMNOPRSTUVXYZabcdefghijklmnopqrstuvwxyz�


nA =

    55


Alphabetd =

  Columns 1 through 6

          10          13          32          45          65          66

  Columns 7 through 12

          67          68          69          70          71          72

  Columns 13 through 18

          73          74          75          76          77          78

  Columns 19 through 24

          79          80          82          83          84          85

  Columns 25 through 30

          86          88          89          90          97          98

  Columns 31 through 36

          99         100         101         102         103         104

  Columns 37 through 42

         105         106         107         108         109         110

  Columns 43 through 48

         111         112         113         114         115         116

  Columns 49 through 54

         117         118         119         120         121         122

  Column 55

       65533


hh =

  Columns 1 through 13

    43    43    84     2    12     6     2     4     5     3     4     5     2

  Columns 14 through 26

    14     8     7    17     2     4     8     6     8     5     1     1     1

  Columns 27 through 39

     2     2   122     8    10    11    60     2     4    32    71     3    21

  Columns 40 through 52

    29    15    67    42     7     1    49    31    20    23    12     4     1

  Columns 53 through 55

     6     7    12


ans =


   10 43
   13 43
    32 84
-   45  2
A   65 12
B   66  6
C   67  2
D   68  4
E   69  5
F   70  3
G   71  4
H   72  5
I   73  2
J   74 14
K   75  8
L   76  7
M   77 17
N   78  2
O   79  4
P   80  8
R   82  6
S   83  8
T   84  5
U   85  1
V   86  1
X   88  1
Y   89  2
Z   90  2
a   97122
b   98  8
c   99 10
d  100 11
e  101 60
f  102  2
g  103  4
h  104 32
i  105 71
j  106  3
k  107 21
l  108 29
m  109 15
n  110 67
o  111 42
p  112  7
q  113  1
r  114 49
s  115 31
t  116 20
u  117 23
v  118 12
w  119  4
x  120  1
y  121  6
z  122  7
�65533 12

Find the entropy of the text

the counts of each letter are in hh the frequency of occurrence of each letter

frequ = hh/sum(hh)
% Entropy:
H = -sum(frequ .*log2(frequ))
H1 = 0;
for i = 1:nA
    H1 = H1- frequ(i) .*log2(frequ(i));
end
[H H1]
% The text needs about 4.81, bits per symbol in average

% Shannon code
Codes_length_Shannon = ceil(log2(1./frequ))
% Check Kraft inequality
SumKraft = sum( 2.^(-Codes_length_Shannon ) )
% Is KI true?
SumKraft <= 1
[char(Alphabetd)' ones(nA,1)*'  ' num2str(Alphabetd') ones(nA,1)*'  ' num2str(hh') ones(nA,1)*'  ' num2str(Codes_length_Shannon')]
frequ =

  Columns 1 through 7

    0.0443    0.0443    0.0865    0.0021    0.0124    0.0062    0.0021

  Columns 8 through 14

    0.0041    0.0051    0.0031    0.0041    0.0051    0.0021    0.0144

  Columns 15 through 21

    0.0082    0.0072    0.0175    0.0021    0.0041    0.0082    0.0062

  Columns 22 through 28

    0.0082    0.0051    0.0010    0.0010    0.0010    0.0021    0.0021

  Columns 29 through 35

    0.1256    0.0082    0.0103    0.0113    0.0618    0.0021    0.0041

  Columns 36 through 42

    0.0330    0.0731    0.0031    0.0216    0.0299    0.0154    0.0690

  Columns 43 through 49

    0.0433    0.0072    0.0010    0.0505    0.0319    0.0206    0.0237

  Columns 50 through 55

    0.0124    0.0041    0.0010    0.0062    0.0072    0.0124


H =

    4.8080


ans =

    4.8080    4.8080


Codes_length_Shannon =

  Columns 1 through 13

     5     5     4     9     7     8     9     8     8     9     8     8     9

  Columns 14 through 26

     7     7     8     6     9     8     7     8     7     8    10    10    10

  Columns 27 through 39

     9     9     3     7     7     7     5     9     8     5     4     9     6

  Columns 40 through 52

     6     7     4     5     8    10     5     5     6     6     7     8    10

  Columns 53 through 55

     8     8     7


SumKraft =

    0.7725


ans =

     1


ans =


     10   43   5
     13   43   5
      32   84   4
-     45    2   9
A     65   12   7
B     66    6   8
C     67    2   9
D     68    4   8
E     69    5   8
F     70    3   9
G     71    4   8
H     72    5   8
I     73    2   9
J     74   14   7
K     75    8   7
L     76    7   8
M     77   17   6
N     78    2   9
O     79    4   8
P     80    8   7
R     82    6   8
S     83    8   7
T     84    5   8
U     85    1  10
V     86    1  10
X     88    1  10
Y     89    2   9
Z     90    2   9
a     97  122   3
b     98    8   7
c     99   10   7
d    100   11   7
e    101   60   5
f    102    2   9
g    103    4   8
h    104   32   5
i    105   71   4
j    106    3   9
k    107   21   6
l    108   29   6
m    109   15   7
n    110   67   4
o    111   42   5
p    112    7   8
q    113    1  10
r    114   49   5
s    115   31   5
t    116   20   6
u    117   23   6
v    118   12   7
w    119    4   8
x    120    1  10
y    121    6   8
z    122    7   8
�  65533   12   7

Questions to be answered by March 16 for bonus points

Task reformulated on March 14 (if you want to send answers to the old

task is fine, but that requires drawing larger trees)

send email to ioan.tabus@tut.fi or bring the paper with your answer to me during lecture time

% Q1: Find the Shannon codelength for each letter of the alphabet from your
% name.
% (Example: for the name Claude Shannon you have the ten symbols
% ['space' 'C' 'S' 'a' 'd' 'e' 'h' 'l' 'n' 'o' 'u'] with frequencies
% [1 1 1 2 1 1 1 1 3 1 1] and Shanon codelengths [4 4 4 3 4 4 4 4 3 4 4].
% The tree has maximum depth 4.

% Q2: Draw the prefix tree corresponding to the Shannon codelengths found
% at Q1 (you can draw on a word file, or on a paper and bring the paper to
% me). Note: here you can draw only the prefix tree needed for your letters

% Q3: What is your name (first name, space, Family name) coded using the tree of the Shannon code which you
% drawn at Q2.

% Q4: Write the binary representation of the tree found at Q2 (see page 25 in the slides).

% Q5. Write down the binary message containing first the tree of the Shannon
% Code and then your name.