OHJ-2206 project topic 2009:
Optimal state lumping of Markov chains

The problem

Markov chains are used for modelling state–transition systems whose transitions are controlled by probabilities. For instance, in a children's board game the states are positions on the board, and the probabilities of the next states are determined by throwing a dice: with probability 1/6, the next position is found by taking one step; with probability 1/6, the next position is found by taking two steps; and so on, up to six steps. With Markov chains, one can solve problems such as: What is the probability of hitting a dangerous square that is located 20 steps ahead of the current position on the board?

Mathematically, a Markov chain is a directed graph (S,Δ,Q) whose edges are labelled with probabilities. In the case of this programming project, also an initial partition I of S is needed. It is used to distinguish between states with different properties. In our board game example, dangerous states would be in one block of I and the remaining states in another block. A block is a set of a partition.

There are variants of Markov chains where the edge labels are not probabilities. Furthermore, the use of real numbers between 0 and 1 would cause problems with rounding errors. Because of these, in this project we use integers as edge labels. Even so, one can represent probabilities simply by multiplying them with a suitable number, like 1000.

namemathematical symbolcondition
statesSset
transitionsΔ ΔS × S
weightsQmapping from transitions to integers, that is, the weight of the transition (s1,s2) is the integer Q(s1,s2)
initial blocksIpartition of S

To simplify discussion, we extend Q to S2 by letting

Q(s1,s2) = 0 when (s1,s2) ∉ Δ.

That is, no transition from s1 to s2 is equivalent to there being a transition whose weight is zero. If the input contains a transition with zero weight, it can be ignored.

When BS we define

Q(s,B) = ∑s′ ∈ B Q(s,s′)

That is, Q(s,B) is the sum of the weights of the transitions from s to states in B.

Two or more states in a Markov chain may be equivalent from the point of view of the interesting aspects of the behaviour of the system. Obviously, a dangerous state in our board game example is not equivalent to a safe state. More generally, equivalent states must belong to the same block in I. This does not suffice, however, because two states in the same initial block may have different future behaviour. A safe square on the board whose next square is dangerous is different from a safe square whose next square is safe.

A well-working notion of equivalence is obtained via the concept of compatible partition. Two states are equivalent if and only if they are in the same block of it. As justified above, they must be in the same initial block. Furthermore, for each possible kind of next state — that is, for each block — equivalent states have the same probability of their next state being of that kind — that is, in that block.

A partition J of S is compatible with Q if and only if the following hold for every states s1 and s2 that are in the same block of J:
  1. s1 and s2 are in the same block of I.
  2. For every block B of J we have Q(s1,B) = Q(s2,B).

The behaviour of the Markov chain does not change, when equivalent states are fused into one state. This is called state lumping. Lumping makes the Markov chain smaller and thus easier to analyse. Of course, we want to lump as many states as possible. Therefore, we want to use the coarsest compatible partition, that is, the compatible partition with the biggest blocks. It can be proven that the coarsest compatible partition is unique.

The coarsest lumping problem consists of finding the coarsest compatible partition, and collapsing each of its blocks into a single state. This yields an as small a Markov chain as possible without changing the behaviour. The result is (S′,Δ′,Q′) and I′, where

The goal of the project is to write a program that solves the coarsest lumping problem. However, because of testing reasons, the program need not write the solution in full. It suffices that it writes the most important size information, namely |S′| and |Δ′|.

The following publication presents an efficient algorithm for solving the coarsest lumping problem: Derisavi, S., Hermanns, H., Sanders, W. H.: Optimal State-Space Lumping in Markov Chains. Information Processing Letters Volume 87, Issue 6 (September 2003) 309–315. It can be obtained from the teacher. However, it is better not to implement the algorithm precisely as described in the publication.

First, the algorithm contains an error. When a block has been split, the algorithm puts all except the largest of the resulting subblocks into the worklist of the algorithm. It is sometimes correct to leave the largest out, but the algorithm leaves it out more often than is legal. More precisely, it may be left out if the original block is not in the worklist. Please either implement this or always put all subblocks into the worklist.

Second, the algorithm does not handle properly the situation where there are transitions from a state s to the so-called splitter block B but the sum of their weights is zero, that is, when Q(s,B) = 0 although (s,s′) ∈ Δ for some s′ ∈ B. This should be handled as if there were no transitions from the state to the block.

Finally, the splay tree and even the binary tree part of the algorithm is hard to implement. Please do not implement them. Instead, sort the states that would be put into the tree according to the sum of the weights of the output transitions to the splitter block. This makes the theoretical complexity worse, but in practice the loss is not big.

The input

The input consists of natural numbers. It consists of the following parts:

<The input>::= <Sizes> <Blocks> <Transitions>
<Sizes>::= <number of states> <number of transitions> <number of initial blocks>
<Blocks>::= <Block>*      // repeated (number of initial blocks − 1) times
<Block>::= <state>* <zero>
<Transitions>::= <Transition>*      // repeated (number of transitions) times
<Transition>::= <tail state> <head state> <weight>

The states are given as numbers in the range 1, 2, …, <number of states>. Weights are integers. <zero> is 0. One of the blocks is not listed, because it can be computed from the listed blocks. That is, if I = {B1, B2, …, Bk}, then B1, …, Bk−1 are listed in <Block>*, and Bk = S − (B1 ∪ … ∪ Bk−1).

The output

The output consists of the following two numbers, in this order:

However, if you want, you can make your program print the resulting Markov chain in full, using the same syntax as the input. This is useful for testing your program, because you can then easily run it a second time, using the first output as the second input. If your program is correct, then the second output has the same size information as the first output. (Actually, the second output is isomorphic with the first output, but isomorphism is much more difficult to check than the size information.) The teacher's testing environment accepts both kinds of outputs.

Read also general instructions of the projects.