7 - Probability - Derived distributions

We want to combine some random variables and take some function on them, and then understand the distribution of the new random variable obtained.
Support/Figures/Pasted image 20250116005343.png

for the discreate case, the idea is simple enough. for each $y$ find out all $x$ such that $g (x) = y$ . Then the probability of getting that $y$ , is the sum of the probabilities of getting each of those $x$ 's whose image under $g$ is $y$ .

But in the continuous case, the probability of getting any single point is zero. So what we can do is given in the picture below:
Support/Figures/Pasted image 20250116010202.png

1d Convolution:

I mean the idea of finding the distribution of $X + Y$ is more or less clear. Suppose we want $f_{(X + Y)} (x + y = w)$ , then we need to enumerate (or integrate) all pairs $f_{X, Y} (x, w - x)$ , but due to independence, $f_{X, Y} (x, w - x) = f_{X} (x) f_{Y} (w - x)$ . This operation is called convolution.
Support/Figures/Pasted image 20250119051833.png
In the above picture the pink contours are where $X + Y$ is constant at a value.

Let us talk about co-variance. Also the 2 in the formula is WRONG!!!

$X - E [X]$ tells you how far (positive or negative) $X$ deviates from its mean, same for $Y - E [Y]$ , taking their product sorta gives us the distribution which tells us: how often does $X$ get bigger than its mean along with $Y$ getting bigger than it's mean? And how often does $X$ get smaller than it's mean along with $Y$ getting smaller than its mean? and the expected value of this distribution (called covariance) if positive tells us that on average, $X, Y$ get bigger and smaller than their respective means together. A negative covariance tells us that on average, when one gets bigger from its mean, the other gets smaller (this is just the nature of products)

Suppose $X_{i}$ $i = 1 \to n$ are not independent.
$\sum V a r (X_{i}) = \sum E [(X_{i} - E [X_{i}])^{2}] = \sum E [X_{i}^{2}] - E [X_{i}]^{2}$

Hence $\sum_{i} V a r (X_{i}) + \sum_{i \neq j} c o v (X_{i}, X_{j}) = \sum_{i} E [X_{i}^{2}] - E [X_{i}]^{2} + \sum_{i \neq j} E [X_{i} X_{j}] - E [X_{i}] E [X_{j}]$

let $T = \sum_{i} X_{i}$ . $v a r (T) = E [(T - E [T])^{2}] = E [T^{2}] - E [T]^{2} = E [{(\sum_{i} X_{i})}^{2}] - {(E [\sum_{i} X_{i}])}^{2}$ .
Normally we write $a^{2} + b^{2} + 2 a b = (a + b)^{2}$ , but thinking about the pairs, we can also write $a^{2} + b^{2} + a b + b a$ .
Hence, $v a r (T) = E [\sum_{i} X_{i}^{2} + \sum_{i \neq j} X_{i} X_{j}] - {(\sum_{i} E [X_{i}])}^{2}$
Therefore $v a r (T) = E [\sum_{i} X_{i}^{2}] + E [\sum_{i \neq j} X_{i} X_{j}] - {(\sum_{i} E [X_{i}])}^{2}$
Finally, $v a r (T) = \sum_{i} E [X_{i}^{2}] + \sum_{i \neq j} E [X_{i} X_{j}] - {(\sum_{i} E [X_{i}])}^{2}$
Even more finally, we can expand the rightmost term and get $v a r (T) = \sum_{i} E [X_{i}^{2}] - E [X_{i}]^{2} + \sum_{i \neq j} E [X_{i} X_{j}] - E [X_{i}] E [X_{j}]$

Therefore $$ var\left( \sum_{i}X_{i} \right) = \sum_{i}var(X_{i}) - \sum_{i\neq j}Cov(X_{i},X_{j})$$

To understand the formula intuitively, just thing about it like a pegion hole principle thing. When I sum the variances of each $X_{i}$ , I am overcounting the pairwise co-variance of them. I need to remove that.

In general if $X, Y$ are independent, $C o v (X, Y) = 0$ . The converse is however not true.

Support/Figures/Pasted image 20250119060953.png
the linearly related iff $| ρ | = 1$ can be proved using the formular for covariance, and variance $σ_{T}^{2} = E [T^{2}] - E [T]^{2}$ .