0 - Understanding Broadcasting

Notation: We will use parentheses to denote the shape of a matrix. For instance, a matrix A with shape m×n will be written as A,(m,n).

Suppose we have two tensors, X, (x0,x1,,xm1) and Y, (y0,y1,,yn1).

Now, consider performing an element-wise multiplication, XY. The rules for NumPy broadcasting are as follows:
First, align the shapes of the tensors by right-justifying them (assuming n<m). $$ (x_{0}, x_{1}, \dots, x_{k}, \dots, x_{m-2}, x_{m-1}) $$ $$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (y_{0}, \dots, y_{n-2}, y_{n-1}) $$ Here, each yi is aligned with xk+i, where k=mn. Now, X and Y are eligible for broadcasting if, for each pair (yi,xk+i):

  1. yi=xk+i
  2. yi=1 (in which case yi is expanded to xk+i)
  3. xk+i=1 (in which case xk+i is expanded to yi)

In the above discussion, we described how the shapes of the tensors are aligned for broadcasting. Now, let's discuss what happens to the actual tensor values: the values are copied along the expandable dimensions.

Support/Figures/Pasted image 20241019214738.png source: NumPy broadcast docs

Support/Figures/Pasted image 20241019214837.png

Element-Wise operations:

Consider the above tensors X, (x0,x1,,xm1) and Y, (y0,y1,,yn1), where nm. let () denote any elementwise operator. let us write python functions to calculate Z=XY, J=sum all elements(Z2).

Then, using the multivariable chain rule, we will evaluate JY for the elementwise operations +,,/,. In order to make sure our implementations are correct, we will test our function with 3d tensors, where each dimension is at most four, by evaluating the numerical approximation of JY.