Information

Information

Data communicated or received that resolves some amount of uncertainty about a fact/circumstance.
For example, if we draw a card from a 52-card shuffled deck, the following are in increasing order of information : it's a red card, its a heart, its the two of hearts.
The "self information" of a value x of a random variable XpX() is: $$ I(x) = -\log(p_{X}(x))$$
Support/Figures/Pasted image 20250216140037.png>The entropy of a random variable is the expected amount of information of that variable. Remember that for any deterministic function g, and a random variable X, g(X) is a random variable as well. Therefore, the entropy of a rv is defined as: $$ H(X) = E[I(X)] = \begin{cases}
-\int_{-\infty}^\infty f_{X}(x)\log(f_{X}(x))dx \ \ \ \ \ \ \ \text{continuous rv} \ \
-\sum_{x}p_{X}(x)\log(p_{X}(x))dx \ \ \ \ \ \ \ \ \ \ \text{discrete rv}
\end{cases}$$
Support/Figures/Pasted image 20250216142410.png
If we have a sequence of data drawn from an rv X, then (as the sequence gets long) if the average number of bits: used to transmit this data, is less than H(X) we haven't transmitted enough information, if its more, we're inefficiently transmitting too much information. For long sequences of data with underlying rv X, its perfect to transmit an average of H(X) bits of information.

Encodings

Support/Figures/Pasted image 20250216145004.png
The first one is a valid encoding, since these is an unambiguous way to decode, given the key, and its fixed length sinch each symbol in the set get a fixed number of bits as its encoding. The second one is also valid but is a variable length encoding. The third one is invalid because there is ambiguity in the decoding given the key.
Support/Figures/Pasted image 20250216145907.png
In order to ensure an unambiguous encoding, we can tree it and check if each symbol is a leaf. Then there is as unique path from the root to each symbol, where we can unambigously [[]]decode.
Support/Figures/Pasted image 20250216153006.png
Fixed length encoding gives us random access decoding, if we want to decode from the k'th letter in the decoded message, and we have N symbols, we need to skip roughly klog2(N) bits and start decoding from there. Otherwise if each symbol has t bits assigned to it in the encoding, we skip (k1)t bits and start from there instead.
Support/Figures/Pasted image 20250216153644.png
Support/Figures/Pasted image 20250216153804.png
Support/Figures/Pasted image 20250216165715.png
Support/Figures/Pasted image 20250216170047.png
Notice that (-1) in the two's compliment is just a string of all 1's, so subtracting A from it is just flipping each bit of A.
Therefore in 2's compliment, A= A+1 where A is the bitwise compliment of A.

Support/Figures/Pasted image 20250216170601.png
The rough idea is that for a symbol x assign as many bits as I(x)=log2(p(x)).
Support/Figures/Pasted image 20250216171003.png

Let S be a set of symbols, and a probability p:S[0,1] on each symbol.
The entropy of the set of symbols H(S)=sSp(s)log2(p(s)).
let g:SN depict the the number of bits each symbol is mapped to under a valid encoding.

The expected length of the encoding of a symbol under this, given Hg(S)=sSp(s)g(s).
For N symbols, the lower bound on the expected encoding length is NH(S), and the expected encoding length of a particular encoding g is NHg(S).

Huffman's Algorithm:

Let S={s1,s2,sn} and p:S[0,1] a probability on each symbol. We construct an encoding tree T as follows:

Algorithm Huffman EncodingInput: Finiteset of symbols S,probability map  p:S[0,1]Output: Encoding tree  TT(ϕ,ϕ)assign empty vertices and edges to treeWhile S is not empty do:a,bS such that p(a),p(b) have the two lowest probabilities under pSS{a,b}wlog bV(T),make a new root x for T with  subtree(a) =left(x),b=right(x)SSa,  p(a)=p(a)+p(b)

Support/Figures/Pasted image 20250216175002.png
informally:

Now, if T is a Huffman tree of a set of symbols S and a probability map p, arrange S in non increasing order under p.