Notation: We will use parentheses to denote the shape of a matrix. For instance, a matrix with shape will be written as .
Suppose we have two tensors, , and , .
Now, consider performing an element-wise multiplication, . The rules for NumPy broadcasting are as follows:
First, align the shapes of the tensors by right-justifying them (assuming ). $$ (x_{0}, x_{1}, \dots, x_{k}, \dots, x_{m-2}, x_{m-1}) $$ $$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (y_{0}, \dots, y_{n-2}, y_{n-1}) $$ Here, each is aligned with , where . Now, and are eligible for broadcasting if, for each pair :
(in which case is expanded to )
(in which case is expanded to )
In the above discussion, we described how the shapes of the tensors are aligned for broadcasting. Now, let's discuss what happens to the actual tensor values: the values are copied along the expandable dimensions.
source: NumPy broadcast docs
Element-Wise operations:
Consider the above tensors , and , , where . let denote any elementwise operator. let us write python functions to calculate , .
Then, using the multivariable chain rule, we will evaluate for the elementwise operations . In order to make sure our implementations are correct, we will test our function with 3d tensors, where each dimension is at most four, by evaluating the numerical approximation of .