0 - Understanding Broadcasting

Notation: We will use parentheses to denote the shape of a matrix. For instance, a matrix $A$ with shape $m \times n$ will be written as $A, (m, n)$ .

Suppose we have two tensors, $X$ , $(x_{0}, x_{1}, \dots, x_{m - 1})$ and $Y$ , $(y_{0}, y_{1}, \dots, y_{n - 1})$ .

Now, consider performing an element-wise multiplication, $X * Y$ . The rules for NumPy broadcasting are as follows:
First, align the shapes of the tensors by right-justifying them (assuming $n < m$ ). $$ (x_{0}, x_{1}, \dots, x_{k}, \dots, x_{m-2}, x_{m-1}) $$ $$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (y_{0}, \dots, y_{n-2}, y_{n-1}) $$ Here, each $y_{i}$ is aligned with $x_{k + i}$ , where $k = m - n$ . Now, $X$ and $Y$ are eligible for broadcasting if, for each pair $(y_{i}, x_{k + i})$ :

$y_{i} = x_{k + i}$
$y_{i} = 1$ (in which case $y_{i}$ is expanded to $x_{k + i}$ )
$x_{k + i} = 1$ (in which case $x_{k + i}$ is expanded to $y_{i}$ )

In the above discussion, we described how the shapes of the tensors are aligned for broadcasting. Now, let's discuss what happens to the actual tensor values: the values are copied along the expandable dimensions.

Support/Figures/Pasted image 20241019214738.png source: NumPy broadcast docs

Support/Figures/Pasted image 20241019214837.png

Element-Wise operations:

Consider the above tensors $X$ , $(x_{0}, x_{1}, \dots, x_{m - 1})$ and $Y$ , $(y_{0}, y_{1}, \dots, y_{n - 1})$ , where $n \leq m$ . let $(\cdot)$ denote any elementwise operator. let us write python functions to calculate $Z = X \cdot Y$ , $J = sum all elements (Z^{2})$ .

Then, using the multivariable chain rule, we will evaluate $\frac{\partial J}{\partial Y}$ for the elementwise operations $+, *, /, -$ . In order to make sure our implementations are correct, we will test our function with 3d tensors, where each dimension is at most four, by evaluating the numerical approximation of $\frac{\partial J}{\partial Y}$ .