Period 5, 23.3.2009-8.5.2009
In this course, student will learn many preprocessing, normalization and analysis methods for data mining of large scale data. The methods are used for analyzing DNA microarray data, but applicable to other types of large scale data as well. Student will learn the basics of microarray technology with several data mining options. This course provides the basic knowledge of computational methods used insystems biology.


The first exam of the course in on Friday 15.5.2009. Remember to take your calculator into the exam with you.
An alternative exam will be organized on Monday 18.5. at 9am in the room TC314. If you want to participate on that, please contact Reija Autio (


Lectures: 2h/week: Wednesday 10 - 12, TB222
Demonstration Lectures: 2h/week: Wednesday 12 - 14, TC415
Exercises: 2h/week: Friday 10 - 12, TC415 Latest information in POP.

If you have any questions, please contact Reija Autio (


You need to get a Lintula user account before the exercises: Applying for a new user account.
If you have no previous MATLAB experience, it would be useful to read the first parts of MATLAB: Getting Started, or the MATLAB Primer. (Or Lyhyt MATLAB-opas in Finnish.)


The course will be lectured based on the book: Jarno Tuimala and M. Minna Laine, CSC, "DNA Microarray Data Analysis", which is available as PDF and can be ordered free of charge from CSC in

The contents of the course:
  1. DNA Microarray experiments
  2. Statistical methods for large scale data analysis
  3. Design and analysis of microarray experiments
  4. Data classification and clustering
The book: Gentleman, R., Carey, V., Huber, W., Irizarry, R.I, and Dudoit, S., Bioinformatics and Computational Biology solutions Using R and Bioconductor, Springer 2005, will be used as additional material.


  1. Exam
  2. Assignment
  3. Exercises

Assessment scale is numeric (1-5). 60% of grade will be based on exam, 30% based on the assignment and 10% based on the participation to the exercises.
Both the exam and assignment needs to be done.

Prequisites: SGN-6106 Computational Systems Biology I or equivalent knowledge.


The lecture slides will be available on the shelf between doors TC313 and TC315 for copying.
Lecture slides of the lecture 3: Differentially Expressed Genes
Lecture slides of the lecture 5: Gene regulatory networks & sequence annotations


Exercise 1
Exercise 2: Preprocessed data needed in the exercise 2 can be downloaded here. In addition, the matlab file readDataR.m will be used.
Exercise 3
Exercise 4: Due to the difficulties in saving the codes in the home folders during the exercise session, the codes are found here: Exercise 4 codes .
Exercise 5
Exercise 6: Correct answers are found here: Exercise 6 codes .


The R codes used in the Demonstration 2 can be downloaded here: Demonstration 2
The Matlab codes used in the Demonstration 3 can be downloaded here: Demonstration 3a and Demonstration 3b


The course includes the Assignment. The deadline for returning the assignment is June 6th 2009.
The data needed in the assignment can be downloaded here.

SGN-6156 in Course Catalog 2008-2009

Last updated: 30.4.2009 by Reija Autio