OHJ-2206 General Instructions for the Projects

AV 2010-08-26

Addition 2011-12-20General remarks on the instructions and topicsRemarks on the programInput and outputSubmitting, testing and fixing the programThe documentSpecial rules for programs based on systematic search

Addition 2011-12-20

Please take the following into account in your projects:

Problems with the above are not the students' fault. Therefore, the delay caused by them is not counted against the student — they do not reduce the mark or spoil the possibility of getting an “early submission works” bonus. However, to make testing possible, the students must fix them if they are found in testing and the teacher requests for a fix.

General remarks on the instructions and topics

It is not easy to design project topics that meet the pedagogical goals, are interesting from the point of view of the students, and for which the Internet does not present too ready solutions or parts of solutions. Also the effort needed to build a testing environment sets a restriction to how many and what kind of topics may be made available.

Topic descriptions may contain requirements that seem harsh but are stated in a vague form. This is especially the case regarding the performance of the programs. Please do not be scared by them. That they have been stated in a vague form is partly because of lack of experience on how good solutions can be obtained with reasonable effort. Their purpose is to rule out solutions which are technically trivial but whose asymptotic time complexity is clearly worse than that of the intended solutions. When accepting or rejecting solutions, the spirit of the law is more important than the letter of the law.

There is only little experience on some of the topics, so it is possible that things do not work as the teacher had planned. It is also possible that there are errors in the instructions. If the topic seems too difficult or too easy, or something else seems strange, please contact the teacher as soon as possible.

The descriptions of some topics contain mathematical symbols. If the web browser that you use does not show them correctly, then the description may look weird. The descriptions have been proof-read with Firefox.

For some project topics, the literature presents a very ready-made solution. You must not use such sources of information. When the teacher is aware of such sources, the topic description contains an explicit restriction like “do not use literature that is from the year 2008 or younger”. If you find such a source that is not mentioned in the topic description, please tell the teacher.

Remarks on the program

The program must be written in the standard C++ language. You may use all features of the language, including all algorithms and data structures in the standard library. However, their use for crucial data structures is not recommended, because experience has shown that it often leads to mediocre or even bad performance. Please notice that the C++ standard does not yet contain hash tables (unordered maps), although many compilers support them.

The program must be efficient regarding run time and memory consumption. Efficiency is assessed by a comparison to programs made by other students or the teacher. If there is an obvious trivial algorithm, the program should be clearly more efficient than it. It is not required that the program is asymptotically the best possible; it is only required that it is asymptotically clearly better than trivial algorithms. If the program is exceptionally efficient, the student is awarded one, or in extreme cases two, additional points.

If the program contains exceptionally good ideas that are reported well in the document, it may earn one, or in extreme cases two, additional points. A good idea may thus bring additional points in double: once via good performance in the tests, and a second time via its explanation in the document.

The program must have been written by you. You may copy small examples like those presented in textbooks, but, other than that, every line of the program must have been written by you. You may, however, copy code from your other programs, as long as that code has been written by you.

At the level of ideas (and not code), you may use all publicly available information, except when otherwise stated in the topic web page. You may also use ideas that other students of the course present in their seminar talks. Similarly, other students have the right to use ideas that you present in your seminar talks. When using ideas, etec., that you obtained from elsewhere, always mention the source in the document. Furthermore, be sure that you understand how they work and how fast they are, instead of copying them blindly.

Input and output

The rules regarding input and output have two (partially contradictory) goals: reading the input and writing the output should be as easy as possible to program; and, where reasonable, the same rules should apply to both the input and the output.

The program may trust that the input obeys the rules. It needs not detect and survive illegal input. The output of the program must obey the rules. Usual C++ input and output operators are okay in most cases. However, they are not necessarily the fastest way.

Towards the beginning of the input there are usually numbers that describe the size of the input. They make it possible for the program to reserve a data structure of the right size at start.

Unless otherwise stated in the description of the topic, the following rules apply to the input and output:

Submitting, testing and fixing the program

The program is submitted to the teacher via email. It may be one file or a .zip or similar package that contains many files.

For testing, the program or package is copied into an empty directory. If it is a package, it is unpacked. Then the program is compiled with the following command:

g++ -ansi -W -Wall -pedantic -O2 *.c* -o student_code.out

This implies that no make-files are used, even if the package contains any. The program must compile without compile-time errors and warnings. If the student submits a package, it must open in such a way that the preceding compilation command succeeds immediately.

The program must pass the tests executed by the teacher without errors excluding the heaviest test runs. In the heaviest test runs the program is allowed to fail by crashing, reporting running out of memory, computing too long, etc. In no circumstances the program is allowed to output a wrong answer.

If the program fails the tests, the teacher sends it back to be fixed. This may be repeated many times, if necessary. The teacher may reject the program immediately if it turns out that fundamental changes are needed to make it meet the requirements. The teacher may also reject the program if it has been fixed at least twice and it still seems to contain either one significant or many small errors. If three months after the original deadline the program still fails to meet the requirements, the teacher may reject it. The teacher may give one, or in extreme cases two, negative points, if exceptionally many rounds are needed to fix the program. If the first version meets the requirements or only contains insignificant errors that are all fixed in the second version, the teacher gives one additional point.

The teacher does not provide (significant) public test data. The students may share test data with each other. It is, however, recommended that each student tries on her/his own to find good ways of testing his/her program. If the student invents and gives to the teacher one or more test cases that are exceptionally good in revealing errors or bad performance, the teacher may give the student an additional point. The student may earn an additional point also by developing a good method of testing her/his program automatically and reporting it well in the document.

Please also read the section The programming project on the homepage of the course.

The document

The document must be submitted as a PDF file or printout on paper. It may be submitted in the same package with the program, as an email attachment, or separately.

The document has to be written for a reader who has good knowledge on data structures and algorithms, but who has no knowledge on your project topic other than what is described on the web page of the topic.

Unlike an industrial program document, the document must concentrate on the key ideas and the performance of the program, and pay little attention to routine details. The overall operation and main data structures and algorithms of the program are important. It is not necessary to mention every subroutine, class, or variable. However, everything that is important for assessing the performance must be mentioned. Typically this includes all arrays, dynamically reserved data structures, use of algorithms and data structures from the C++ standard library, loops, and subroutines that call itself recursively directly or indirectly. However, data structures whose size is a small constant (like arrays with size 5) need not be mentioned, and similarly with loops and recursion.

If something is implemented like described in textbooks, then it suffices to refer to the concepts in textbooks. For instance, it suffices to say that “this array is then sorted with heapsort”, or ”the coordinates of the squares that have been taken into the candidate solution are stored into an array that is used as a stack and whose elements are of type Square”.

The writing style in the Cormen, Leiserson, Rivest & Stein textbook is a very good example.

Always mention in the document the sources of ideas and material that you obtained from elsewhere. Always mentioning sources is an important part of the ethics of science. You must usually present the borrowed ideas in your own words. Copying short pieces of text from elsewhere is allowed according to the usual rules of citing scientific material, but usually ideas need longer descriptions than that.

When referring to ideas and material obtained from elsewhere, please use the usual numeric citation system: for instance, [3] in the document, and a numbered list of sources at the end of the document, ordered alphabetically according to the author. Write “private communication” and date as the source of information that you got from a discussion with someone.

The minimal requirement for the document is that the teacher understands the data structures, algorithms, overall operation, and an upper limit to the run time (O-notation) without excess effort. Please do realise that this is not a trivial goal. When writing the document, please take into account the hints on the homepage of the course about making the seminar talks understandable to the audience.

If the document is better than this, it may earn one, or in extreme cases two, additional points.

Special rules for programs based on systematic search

For some project topics no better algorithms are known than various forms of systematic search. In those cases, it is acceptable that the program consumes very much time with some small inputs.

If the run time of the program as a whole is exponential, then it needs not be presented with the O-notation; it suffices that it is mentioned that it is exponential. However, then the run times of finding a valid extension to the solution candidate or testing the validity of a potential extension, adding the extension to the candidate, and removing it from there must be presented with the O-notation.