7 - Probability - Derived distributions

We want to combine some random variables and take some function on them, and then understand the distribution of the new random variable obtained.
Support/Figures/Pasted image 20250116005343.png
Support/Figures/Pasted image 20250116005705.png
for the discreate case, the idea is simple enough. for each y find out all x such that g(x)=y. Then the probability of getting that y, is the sum of the probabilities of getting each of those x's whose image under g is y.

But in the continuous case, the probability of getting any single point is zero. So what we can do is given in the picture below:
Support/Figures/Pasted image 20250116010202.png
Support/Figures/Pasted image 20250116010324.png
Support/Figures/Pasted image 20250116013004.png
1d Convolution:
Support/Figures/Pasted image 20250116013333.png
Support/Figures/Pasted image 20250116013352.png
I mean the idea of finding the distribution of X+Y is more or less clear. Suppose we want f(X+Y)(x+y=w), then we need to enumerate (or integrate) all pairs fX,Y(x,wx), but due to independence, fX,Y(x,wx)=fX(x)fY(wx). This operation is called convolution.
Support/Figures/Pasted image 20250119051833.png
In the above picture the pink contours are where X+Y is constant at a value.
Support/Figures/Pasted image 20250119052110.png
Support/Figures/Pasted image 20250119052209.png
Support/Figures/Pasted image 20250119052424.png
Let us talk about co-variance. Also the 2 in the formula is WRONG!!!

XE[X] tells you how far (positive or negative) X deviates from its mean, same for YE[Y], taking their product sorta gives us the distribution which tells us: how often does X get bigger than its mean along with Y getting bigger than it's mean? And how often does X get smaller than it's mean along with Y getting smaller than its mean? and the expected value of this distribution (called covariance) if positive tells us that on average, X,Y get bigger and smaller than their respective means together. A negative covariance tells us that on average, when one gets bigger from its mean, the other gets smaller (this is just the nature of products)

Suppose Xi i=1n are not independent.
Var(Xi)=E[(XiE[Xi])2]=E[Xi2]E[Xi]2

Hence iVar(Xi)+ijcov(Xi,Xj)=iE[Xi2]E[Xi]2+ijE[XiXj]E[Xi]E[Xj]

let T=iXi. var(T)=E[(TE[T])2]=E[T2]E[T]2=E[(iXi)2](E[iXi])2 .
Normally we write a2+b2+2ab=(a+b)2, but thinking about the pairs, we can also write a2+b2+ab+ba.
Hence, var(T)=E[iXi2+ijXiXj](iE[Xi])2
Therefore var(T)=E[iXi2]+E[ijXiXj](iE[Xi])2
Finally, var(T)=iE[Xi2]+ijE[XiXj](iE[Xi])2
Even more finally, we can expand the rightmost term and get var(T)=iE[Xi2]E[Xi]2+ijE[XiXj]E[Xi]E[Xj]

Therefore $$ var\left( \sum_{i}X_{i} \right) = \sum_{i}var(X_{i}) - \sum_{i\neq j}Cov(X_{i},X_{j})$$

To understand the formula intuitively, just thing about it like a pegion hole principle thing. When I sum the variances of each Xi, I am overcounting the pairwise co-variance of them. I need to remove that.

In general if X,Y are independent, Cov(X,Y)=0. The converse is however not true.

Support/Figures/Pasted image 20250119060953.png
the linearly related iff |ρ|=1 can be proved using the formular for covariance, and variance σT2=E[T2]E[T]2.