Information

Data communicated or received that resolves some amount of uncertainty about a fact/circumstance.
For example, if we draw a card from a 52-card shuffled deck, the following are in increasing order of information : it's a red card, its a heart, its the two of hearts.
The "self information" of a value $x$ of a random variable $X \sim p_{X} (\cdot)$ is: $$ I(x) = -\log(p_{X}(x))$$
Support/Figures/Pasted image 20250216140037.png >The entropy of a random variable is the expected amount of information of that variable. Remember that for any deterministic function $g$ , and a random variable $X$ , $g (X)$ is a random variable as well. Therefore, the entropy of a rv is defined as: $$ H(X) = E[I(X)] = \begin{cases}
-\int_{-\infty}^\infty f_{X}(x)\log(f_{X}(x))dx \ \ \ \ \ \ \ \text{continuous rv} \ \
-\sum_{x}p_{X}(x)\log(p_{X}(x))dx \ \ \ \ \ \ \ \ \ \ \text{discrete rv}
\end{cases}$$
Support/Figures/Pasted image 20250216142410.png
If we have a sequence of data drawn from an rv $X$ , then (as the sequence gets long) if the average number of bits: used to transmit this data, is less than $H (X)$ we haven't transmitted enough information, if its more, we're inefficiently transmitting too much information. For long sequences of data with underlying rv $X$ , its perfect to transmit an average of $H (X)$ bits of information.

Encodings

Support/Figures/Pasted image 20250216145004.png
The first one is a valid encoding, since these is an unambiguous way to decode, given the key, and its fixed length sinch each symbol in the set get a fixed number of bits as its encoding. The second one is also valid but is a variable length encoding. The third one is invalid because there is ambiguity in the decoding given the key.
Support/Figures/Pasted image 20250216145907.png
In order to ensure an unambiguous encoding, we can tree it and check if each symbol is a leaf. Then there is as unique path from the root to each symbol, where we can unambigously [[]]decode.

Fixed length encoding gives us random access decoding, if we want to decode from the $k$ 'th letter in the decoded message, and we have $N$ symbols, we need to skip roughly $k \log_{2} (N)$ bits and start decoding from there. Otherwise if each symbol has $t$ bits assigned to it in the encoding, we skip $(k - 1) t$ bits and start from there instead.
Support/Figures/Pasted image 20250216153644.png

Notice that (-1) in the two's compliment is just a string of all 1's, so subtracting A from it is just flipping each bit of A.
Therefore in 2's compliment, $- A = \sim A + 1$ where $\sim A$ is the bitwise compliment of $A$ .

Support/Figures/Pasted image 20250216170601.png
The rough idea is that for a symbol $x$ assign as many bits as $I (x) = - \log_{2} (p (x))$ .

Let $S$ be a set of symbols, and a probability $p : S \to [0, 1]$ on each symbol.
The entropy of the set of symbols $H (S) = - \sum_{s \in S} p (s) \log_{2} (p (s))$ .
let $g : S \to N$ depict the the number of bits each symbol is mapped to under a valid encoding.

The expected length of the encoding of a symbol under this, given $H_{g} (S) = - \sum_{s \in S} p (s) g (s)$ .
For $N$ symbols, the lower bound on the expected encoding length is $N \cdot H (S)$ , and the expected encoding length of a particular encoding $g$ is $N \cdot H_{g} (S)$ .

Huffman's Algorithm:

Let $S = {s_{1}, s_{2}, \dots s_{n}}$ and $p : S \to [0, 1]$ a probability on each symbol. We construct an encoding tree $T$ as follows:

\begin{aligned} Algorithm Huffman Encoding \\ Input: Finiteset of symbols S, probability map p : S \to [0, 1] \\ Output: Encoding tree T \\ T \leftarrow (ϕ, ϕ) assign empty vertices and edges to tree \\ While S is not empty do: \\ a, b \leftarrow S such that p (a), p (b) have the two lowest probabilities under p \\ S \leftarrow S - {a, b} \\ wlog b \notin V (T), make a new root x for T with s u b t r e e (a) = l e f t (x), b = r i g h t (x) \\ S \leftarrow S \cup a^{*}, p (a^{*}) = p (a) + p (b) \end{aligned}

Support/Figures/Pasted image 20250216175002.png
informally:

take two symbols/subtrees with the lowest probability symbol/root probability, combine them to form a new subtree.
![[Support/Figures/Pasted image 20250216175127.png#invert]]
![[Support/Figures/Pasted image 20250216175142.png#invert]]
![[Support/Figures/Pasted image 20250216175155.png#invert]]

Now, if $T$ is a Huffman tree of a set of symbols $S$ and a probability map $p$ , arrange $S$ in non increasing order under $p$ .