Markov Process

Support/Figures/Pasted image 20250119135611.png
Imagine that we have a set of m states, Then, the idea of a Markov process, is that no matter where we are in the chain of transitions, or no matter how many state transitions we have seen, or what path we have followed to get to this state, given the current state, Xn Xn+1 is determined by Xn with some noise. that is, if Xn=i, then there are probabilities for each j=1m such that Xn+1=j. and such transition probabilities are present for each state i of Xn.
A nice way to draw a markov process is to draw a directed graph (which may have self loop, but no multi edges) whose vertices are the states 1,2,...m and whose edges (i,j) represent a single step transition from state i to state j, with a probability weight p(i,j) on that edge. (note that p(i,j) can be different from p(j,i))
these edge weights could be written pij as well. We can also enforce a complete directed graph, with all self loops, and put p(i,j)=0 when we can't get from state i to state j.

Then, the question is, starting from state i, and taking n transitions or steps, what is the probability I get to state j?
denote rij(n) as the probability of starting at state i and reaching state j after n steps.

Then, recursively, using the complete DAG idea, for each state k=1m, (ofc state transitions are independent of previous history) we can look at the probabiity of reaching k from i in n1 states, depicted rik(n1) and then multiply that by the probability of reaching j from k in one step, pkj and add up the branches for each k.

That is, $$r_{ij}(n) = \sum_{k=1}^mp_{kj} \cdot r_{ik}(n-1)$$

Note that for any state i, the probabilities of the outgoing edges to any state (maybe even itself), should add up to one. j=1mpij=1, i=1m.

generic limits for rij(n) as n
Support/Figures/Pasted image 20250119142337.png

Recurrent and Transient states:

When drawing graphs of Markov chains, it is custom to omit zero probability edges, hence in the graph of the Markov chain, there is a path between two vertices iff there is a non zero probability of reaching (at least in 1 direction) from one state to the other.

A graph G is a markov graph if V(G)={1,2,..,m} the set of all possible states, and (i,j)E(G)pij>0, moreover for any vertex i, j=1mpij=1.

define the relation on V(G), R(i,j) where iRjpath[ij].
Now, define another relation R(i,j) where iRj if and only if whenever jR(i) (that is whenever j is on some path from i), iR(j) as well. (that is there is a path from j) back to i.

That is R(i) is the set of all vertices such that whenever j is on some path of i, there is also a path from j back to i. of course this is an equivalence relation, and each equivalence class partitions the vertices into states where once you get there, you stay in the same equivalence class (like two way connected components) So once you're in any [i]R, there is no way to reach any j[i]R

These equivalence classes are called recurrent states.
Notice that R is not a full partition, some vertices are NOT in ANY equivalence class under R. Such vertices/states are called transient.
Eventually, moving between transient states, we will have to escape into some state, and we can never get back, that is we, will move into some reccurrant class.

if i is a transient state, in the long run P(Xn=i)0 as n.

Support/Figures/Pasted image 20250119143925.png
Support/Figures/Pasted image 20250119144129.png
In a general markov chain, you have some reccurant classes of states, with all edges internal, and some transient states that can get to some vertex of some recurrant class (and by definition can never leave)

Periodic markov chains

Suppose G is a markov graph, Then G is periodic, if V(G) can be partitioned into color classes C0,C1,..Cd1 such that there exists an edge from (a vertex in) Ci to Cj if and only if j=i+1 (mod d) an(d that d is the minimum partitioning. (fewest number of classes).
Support/Figures/Pasted image 20250119145700.png
if we have a self loop, the chain is not periodic :)

Steady state Markov theorem:

So before we go on, Let us talk about the matrix interpretation.
so if I have some states 1,2,...m and I have some initial random varaible X0, which encapsulates the distribution pX0(i) for each i=1m, then we can write X0 as a vector v0Rm where j=1mv0j=1 and each component is between 0 and 1 of course.
Now, suppose using a the Markov graph G, make an m×m matrix where the (i,j)th entry is the probability edge weight (or zero if that edge is not in G) of the edge (j,i) (we are reversing the direction here)

That is Mij=pji for each pair (i,j).
Why do we reverse the direction? just see below (we just do it to preserve notation of multiplying vectors on the right).

So HWAHT is is Mv0? well as a computation, $$M\mathbf{v_{0}}^i = \sum_{k=1}^m M[i][k] \mathbf{v_{0}}^k$$
Hence, the ith entry of Mv0, can be seen as:

Mv0i=k=1mpkiv0k

Probabalistically, we can say $$ P(X_{1} = i) = \sum_{k=1}^m p_{ki} P(X_{0} = k)$$
So The above thing looks like the total probability theorem right? probability that X1=i is the sum of the disjoint probabilities that: given X0=k, what is the probability that X1=i? the conditional becomes multiplication due to independence.

Therefore, if M is a matrix of a markov graph G, where M[i][j]=p(j,i) if (j,i)E(G) else 0, and The initial random variable X0 is represented by the distribution v0, Then the distribution of X1 is represented by Mv0. Inductively, And since markov chains are only determined by the current state, and forget the previous state, we can treat X1 as the intital state, and v1=Mv0, to get that X2 is represented by the distribution Mv1=M2v0.

Therefore, using the transition matrix M we have that Xk is represented by the distribution Mkv0.

We sort of want to know if the distribution of the states, after n transitions, given by Mnv0 when n become very large, approaches a steady distribution v, that is independent of the initial choice of v0

If A markov chain has more than one recurrent class, (and assume v0 is deterministic, that is we choose exactly 1 initial state j, without any uncertainity) if v0 starts at a transient vertex, then with non zero probability, we enter one of (at least 2) recurrent classes, after which we cannot leave it. However if we start at the OTHER recurrent class, we can never enter this one. So the steady state distribution depends on where we start.

A Markov chain that has more than one recurrent class is called "reducible", intuitively, you can think about the fact that given a long time, the distribution settles into one (of the many) recurrent classes and never leaves, so each recurrent class can be treated as a different chain.
A Markov chain is irreducible if there is exactly one recurrent class, and no transient vertices.

So we can turn a transition matrix M into an adjacency matrix by just taking the ceiling, so non zero probability pji means M[i][j]=1. So the adjacency matrix A=MT.

So the nice thing about A is that A1[i][j] repsents the number of 1 length paths from i to j, A2[i][j] represents two length paths and so on. To determine reachability, it is suffient to compute A1+A2+Am1. and wherever there is a
non zero element make it one, else leave it zero, (there are at most m-1 length paths) to get the reachability matrix A

AN aside

NOTE! there are many ways to get an algorithm for the sum i=1m1Ai, we could repeatedly square and store about log2(m) powers A,A2,A4,..A2log2(m) and for any power k, that we want to calculate, we want to write k in binary, as k=j=0t1aj2j, and whichever aj=1, we have A2j calculated in our look up table, and we have to multiply all these, doing t=O(log(k)) matrix multiplications for each k that we have not calculated. So how many matrix multiplications are we doing to calculate all of A1,A2,...Am1?, well first we do O(log2(m)) multiplications one for each A,A2,A4,...Am1 (roughly, roughly). And then, we have to calculate Ak for each k from 1m1 that is not a power of 2. and each such calculation takes log(k) time.
For a rough upper bound, suppose we just do each k=1m, without ignoring the powers of two we have already computed.
So we do O(k=1m1log2(k))+O(log2(m)) total matrix multiplies. Which is roughly O(log2((m1)!)) which is roughly O(mlog2(m)).

So doing O(mlogm) matmuls, we might parallelize these matmuls or whatnot. Just saying that its doable and not intractable. And then we have to do O(m) matrix additions. which we could also paraellize.

Now that we have the reachability matrix A of the markov graph G, then G is irreducable if and only if A is symmetric. This means there is only one reccurent class, as j is reachable from i if and only if i is reachable from j.
If we want to impose a stronger notion on irreducible markov chains, as given here. We have said that G is irreducible if there is only one recurrent class and no transient vertices. But we can go one step further and say that G is irreducible if and only if ANY node i is reachable from any other node j. That is, the reachability matrix A of G has one in ALL of its entries.

So there is this process of getting Mij=pji, then A=ceil(MT), then Q=i=1m1Ai, then A=BinSquash(Q), where BinSquash gives 1 if Q[i][j]>0 else gives 0. So we want A to have all ones, so let us reverse engineer this. This means that Q has all positive elements.
Which means for every pair (i,j) there exits t{1,2,m1} such that At[i][j]>0. So if this is violated, there exists (at least one) pair (i,j) for which each of A1,A2,...Am1 have a zero in the (i,j)th entry, which is easier to work with.
So we will not push on what A might look like given these conditions. (at least for now)

Next, given adjacency matrix A of Markov chain G, we say G is periodic, if it is possible to partition vertices of G into C0,C1,..Cd1, such that for any two vertices ciCj and cjCj, there exists an edge (ci,cj) if and only if j=i+1 modulo d. (and d is the minimal size of any such partition)

Now, there is no harm is assuming that C0={0,1,..a1} and C1={a1+1,a2+2,a2}, and so on. This is just a relabeling of the vertices of G, so that each color class has labels that are consecutive. If this re-labeling is given by an isomorphism f, then The isomorphic adjacency matrix B is given by B[i][j]=A[f(i)][f(j)].
The matrix B is particularly cool, as we can partition it into block matrices, Where the block Cij is made by taking the the rows ai,ai+1,ai+1 and the columns aj,aj+1,aj+1 and intersecting them.
Writing the matrix as blocks like this, we see that the only blocks that can have ones in it are the blocks C01,C12,C23,,Cd1C0 and all other blocks must be made entirely of zeros. (i will put a picture for this sometime)

A matrix that DOES NOT have this property represents the adjacency of an Aperiodic Markov chain.

Now to state the thoerem: if M is a state trasition matrix of markov chain G, and A is an adjacency matrix of G, (both M and A are allowed to be re-written, as long as they use the same isomorphism on G), then a steady state distribution v is one where Mv=v. If M (and hence A) are such that they are irreducible and Aperiodic, then there exists a steady state v such that no matter the initial distribution v0, for any ϵ>0, there exists n=n(v0,ϵ)N such that ||v(Mnv0)||<ϵ.

As far is I know, n is allowed to depend on v0 as well, just that no matter what v0 is, it eventually has to converge (maybe at a different rate than some u0) to v.
A lot of the ideas we build high on drugs today is refined and tied up to become proofs here. (note that I use a transposed transition matrix, so I can right multiply column vectors the usual way, but the paper does it the other way around)

Finally, if Xn is a markov chain, in m states, described by the transition matrix M, which contains probabilities as its entries, Mij=pji and that the columns of M are normalized (they sum to one), And such that M is aperiodic and irreducible, then there exists a unique steady state distribution, v which we can solve for by allowing Mv=v where v[0,1]m and the sum of the components of v is 1.

Given the steady state distribution v, and observing over a long, long number of transitions, The fraction of times we reach some state j is equal (in the limit) to vj.
And the faction of times we transition from state i to state j is vipij.