# NumPy-style broadcasting for R TensorFlow customers

We develop, practice, and deploy TensorFlow fashions from R. However that doesn’t imply we don’t make use of documentation, weblog posts, and examples written in Python. We glance up particular performance within the official TensorFlow API docs; we get inspiration from different folks’s code.

Relying on how snug you’re with Python, there’s an issue. For instance: You’re purported to understand how broadcasting works. And maybe, you’d say you’re vaguely acquainted with it: So when arrays have completely different shapes, some parts get duplicated till their shapes match and … and isn’t R vectorized anyway?

Whereas such a worldwide notion may fit normally, like when skimming a weblog submit, it’s not sufficient to grasp, say, examples within the TensorFlow API docs. On this submit, we’ll attempt to arrive at a extra precise understanding, and examine it on concrete examples.

Talking of examples, listed here are two motivating ones.

## Broadcasting in motion

The primary makes use of TensorFlow’s `matmul` to multiply two tensors. Would you wish to guess the outcome – not the numbers, however the way it comes about normally? Does this even run with out error – shouldn’t matrices be two-dimensional (rank-2 tensors, in TensorFlow converse)?

``````a <- tf\$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
#
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b <- tf\$fixed(keras::array_reshape(101:106, dim = c(1, 3, 2)))
b
# tf.Tensor(
# [[[101. 102.]
#   [103. 104.]
#   [105. 106.]]], form=(1, 3, 2), dtype=float64)

c <- tf\$matmul(a, b)``````

Second, here’s a “actual instance” from a TensorFlow Likelihood (TFP) github challenge. (Translated to R, however retaining the semantics). In TFP, we are able to have batches of distributions. That, per se, isn’t a surprise. However take a look at this:

``````library(tfprobability)
d <- tfd_normal(loc = c(0, 1), scale = matrix(1.5:4.5, ncol = 2, byrow = TRUE))
d
# tfp.distributions.Regular("Regular", batch_shape=[2, 2], event_shape=[], dtype=float64)``````

We create a batch of 4 regular distributions: every with a unique scale (1.5, 2.5, 3.5, 4.5). However wait: there are solely two location parameters given. So what are their scales, respectively? Fortunately, TFP builders Brian Patton and Chris Suter defined the way it works: TFP really does broadcasting – with distributions – identical to with tensors!

We get again to each examples on the finish of this submit. Our major focus might be to clarify broadcasting as finished in NumPy, as NumPy-style broadcasting is what quite a few different frameworks have adopted (e.g., TensorFlow).

Earlier than although, let’s rapidly overview a couple of fundamentals about NumPy arrays: The way to index or slice them (indexing usually referring to single-element extraction, whereas slicing would yield – effectively – slices containing a number of parts); learn how to parse their shapes; some terminology and associated background. Although not sophisticated per se, these are the sorts of issues that may be complicated to rare Python customers; but they’re typically a prerequisite to efficiently making use of Python documentation.

Said upfront, we’ll actually limit ourselves to the fundamentals right here; for instance, we received’t contact superior indexing which – identical to heaps extra –, might be regarded up intimately within the NumPy documentation.

## Few info about NumPy

### Fundamental slicing

For simplicity, we’ll use the phrases indexing and slicing roughly synonymously any longer. The essential machine here’s a slice, particularly, a `begin:cease` construction indicating, for a single dimension, which vary of parts to incorporate within the choice.

In distinction to R, Python indexing is zero-based, and the tip index is unique:

``````import numpy as np
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[1:7]
# array([1, 2, 3, 4, 5, 6])``````

``````x[5:]
# array([5, 6, 7, 8, 9])

x[:7]
# array([0, 1, 2, 3, 4, 5, 6])``````

``````x[:]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])``````

``````x = np.array([[1, 2], [3, 4], [5, 6]])
x
# array([[1, 2],
#        [3, 4],
#        [5, 6]])

x[1, :]
# array([3, 4])``````

``````x
# array([3, 4])

x[1, ]
# array([3, 4])``````

``````x = np.array([[,,], [,,]])
x
# array([[,
#         ,
#         ],
#
#        [,
#         ,
#         ]])

x.form
# (2, 3, 1)``````

``````x[0,]
#array([,
#       ,
#       ])``````

``````x[0, ...]
#array([,
#       ,
#       ])``````

### Syntax for array creation

``np.zeros(24).reshape(4, 3, 2)``

``````c1 = np.array([[[0, 0, 0]]])
c2 = np.array([[, , ]])
c3 = np.array([[], [], []])``````

``````c1.form # (1, 1, 3)
c2.form # (1, 3, 1)
c3.form # (3, 1, 1) ``````

### A little bit of terminology

``````a = np.array([[1, 2, 3], [4, 5, 6]])
a
# array([[1, 2, 3],
#        [4, 5, 6]])``````

``1 2 3 4 5 6``

``````1 4 2 5 3 6
``````

``````c1 = np.array([[[0, 0, 0]]])
c1.form   # (1, 1, 3)
c1.strides # (24, 24, 8)

c2 = np.array([[, , ]])
c2.form   # (1, 3, 1)
c2.strides # (24, 8, 8)

c3 = np.array([[], [], []])
c3.form   # (3, 1, 1)
c3.strides # (8, 8, 8)``````

``````a = np.array([1,2,3])
b = 1
a + b``````
``array([2, 3, 4])``

``````a = np.array([1,2,3])
b = np.array([[1,2,3], [4,5,6]])
a + b``````
``````array([[2, 4, 6],
[5, 7, 9]])``````

``````a = np.array([1,2,3])
b = np.array([[1,2,3], [4,5,6]])
a + b``````
``ValueError: operands couldn't be broadcast along with shapes (2,) (2,3) ``

``````   # array 1, form:     8  1  6  1
# array 2, form:        7  1  5``````

``````a = np.zeros([2, 3]) # form (2, 3)
b = np.zeros()    # form (2,)
c = np.zeros()    # form (3,)

a + b # error

a + c
# array([[0., 0., 0.],
#        [0., 0., 0.]])``````

``````# begin with the above "non-vector"
c = np.array([0, 0])
c.form
# (2,)

# manner 1: reshape
c.reshape(2, 1).form
# (2, 1)

# np.newaxis inserts new axis
c[ :, np.newaxis].form
# (2, 1)

# None does the identical
c[ :, None].form
# (2, 1)

# or assemble instantly as (2, 1), being attentive to the parentheses...
c = np.array([, ])
c.form
# (2, 1)``````

``````c = np.array([, ])
c.form
# (2, 1)

a = np.zeros([2, 3])
a.form
# (2, 3)
a + c
# array([[0., 0., 0.],
#       [0., 0., 0.]])

a = np.zeros([3, 2])
a.form
# (3, 2)
a + c
# ValueError: operands couldn't be broadcast along with shapes (3,2) (2,1) ``````

``````a = np.array([0.0, 10.0, 20.0, 30.0])
a.form
# (4,)

b = np.array([1.0, 2.0, 3.0])
b.form
# (3,)

a[:, np.newaxis] * b
# array([[ 0.,  0.,  0.],
#        [10., 20., 30.],
#        [20., 40., 60.],
#        [30., 60., 90.]])``````

## TensorFlow

````a <- tf\$ones(form = ````c(4L, 1L))
a
# tf.Tensor(
# [[1.]
#  [1.]
#  [1.]
#  [1.]], form=(4, 1), dtype=float32)

b <- tf\$fixed(c(1, 2, 3, 4))
b
# tf.Tensor([1. 2. 3. 4.], form=(4,), dtype=float32)

a + b
# tf.Tensor(
# [[2. 3. 4. 5.]
# [2. 3. 4. 5.]
# [2. 3. 4. 5.]
# [2. 3. 4. 5.]], form=(4, 4), dtype=float32)``````

And second, after we add tensors with shapes `(3, 3)` and `(3,)`, the 1-d tensor ought to get added to each row (not each column):

``````a <- tf\$fixed(matrix(1:9, ncol = 3, byrow = TRUE), dtype = tf\$float32)
a
# tf.Tensor(
# [[1. 2. 3.]
#  [4. 5. 6.]
#  [7. 8. 9.]], form=(3, 3), dtype=float32)

b <- tf\$fixed(c(100, 200, 300))
b
# tf.Tensor([100. 200. 300.], form=(3,), dtype=float32)

a + b
# tf.Tensor(
# [[101. 202. 303.]
#  [104. 205. 306.]
#  [107. 208. 309.]], form=(3, 3), dtype=float32)``````

Now again to the preliminary `matmul` instance.

## Again to the puzzles

The documentation for matmul says,

The inputs should, following any transpositions, be tensors of rank >= 2 the place the inside 2 dimensions specify legitimate matrix multiplication dimensions, and any additional outer dimensions specify matching batch measurement.

So right here (see code slightly below), the inside two dimensions look good – `(2, 3)` and `(3, 2)` – whereas the one (one and solely, on this case) batch dimension reveals mismatching values `2` and `1`, respectively. A case for broadcasting thus: Each “batches” of `a` get matrix-multiplied with `b`.

``````a <- tf\$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
#
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b <- tf\$fixed(keras::array_reshape(101:106, dim = c(1, 3, 2)))
b
# tf.Tensor(
# [[[101. 102.]
#   [103. 104.]
#   [105. 106.]]], form=(1, 3, 2), dtype=float64)

c <- tf\$matmul(a, b)
c
# tf.Tensor(
# [[[ 622.  628.]
#   [1549. 1564.]]
#
#  [[2476. 2500.]
#   [3403. 3436.]]], form=(2, 2, 2), dtype=float64) ``````

Let’s rapidly examine this actually is what occurs, by multiplying each batches individually:

``````tf\$matmul(a[1, , ], b)
# tf.Tensor(
# [[[ 622.  628.]
#   [1549. 1564.]]], form=(1, 2, 2), dtype=float64)

tf\$matmul(a[2, , ], b)
# tf.Tensor(
# [[[2476. 2500.]
#   [3403. 3436.]]], form=(1, 2, 2), dtype=float64)``````

Is it too bizarre to be questioning if broadcasting would additionally occur for matrix dimensions? E.g., may we attempt `matmul`ing tensors of shapes `(2, 4, 1)` and `(2, 3, 1)`, the place the `4 x 1` matrix can be broadcast to `4 x 3`? – A fast check reveals that no.

To see how actually, when coping with TensorFlow operations, it pays off overcoming one’s preliminary reluctance and truly seek the advice of the documentation, let’s attempt one other one.

Within the documentation for matvec, we’re instructed:

Multiplies matrix a by vector b, producing a * b. The matrix a should, following any transpositions, be a tensor of rank >= 2, with form(a)[-1] == form(b)[-1], and form(a)[:-2] capable of broadcast with form(b)[:-1].

In our understanding, given enter tensors of shapes `(2, 2, 3)` and `(2, 3)`, `matvec` ought to carry out two matrix-vector multiplications: as soon as for every batch, as listed by every enter’s leftmost dimension. Let’s examine this – thus far, there is no such thing as a broadcasting concerned:

``````# two matrices
a <- tf\$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
#
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b = tf\$fixed(keras::array_reshape(101:106, dim = c(2, 3)))
b
# tf.Tensor(
# [[101. 102. 103.]
#  [104. 105. 106.]], form=(2, 3), dtype=float64)

c <- tf\$linalg\$matvec(a, b)
c
# tf.Tensor(
# [[ 614. 1532.]
#  [2522. 3467.]], form=(2, 2), dtype=float64)``````

Doublechecking, we manually multiply the corresponding matrices and vectors, and get:

``````tf\$linalg\$matvec(a[1,  , ], b[1, ])
# tf.Tensor([ 614. 1532.], form=(2,), dtype=float64)

tf\$linalg\$matvec(a[2,  , ], b[2, ])
# tf.Tensor([2522. 3467.], form=(2,), dtype=float64)``````

The identical. Now, will we see broadcasting if `b` has only a single batch?

``````b = tf\$fixed(keras::array_reshape(101:103, dim = c(1, 3)))
b
# tf.Tensor([[101. 102. 103.]], form=(1, 3), dtype=float64)

c <- tf\$linalg\$matvec(a, b)
c
# tf.Tensor(
# [[ 614. 1532.]
#  [2450. 3368.]], form=(2, 2), dtype=float64)``````

Multiplying each batch of `a` with `b`, for comparability:

``````tf\$linalg\$matvec(a[1,  , ], b)
# tf.Tensor([ 614. 1532.], form=(2,), dtype=float64)

tf\$linalg\$matvec(a[2,  , ], b)
# tf.Tensor([[2450. 3368.]], form=(1, 2), dtype=float64)``````

It labored!

Now, on to the opposite motivating instance, utilizing tfprobability.

### Broadcasting in all places

Right here once more is the setup:

``````library(tfprobability)
d <- tfd_normal(loc = c(0, 1), scale = matrix(1.5:4.5, ncol = 2, byrow = TRUE))
d
# tfp.distributions.Regular("Regular", batch_shape=[2, 2], event_shape=[], dtype=float64)``````

What’s going on? Let’s examine location and scale individually:

``````d\$loc
# tf.Tensor([0. 1.], form=(2,), dtype=float64)

d\$scale
# tf.Tensor(
# [[1.5 2.5]
#  [3.5 4.5]], form=(2, 2), dtype=float64)``````

Simply specializing in these tensors and their shapes, and having been instructed that there’s broadcasting happening, we are able to cause like this: Aligning each shapes on the proper and increasing `loc`’s form by `1` (on the left), now we have `(1, 2)` which can be broadcast with `(2,2)` – in matrix-speak, `loc` is handled as a row and duplicated.

Which means: Now we have two distributions with imply (0) (one among scale (1.5), the opposite of scale (3.5)), and in addition two with imply (1) (corresponding scales being (2.5) and (4.5)).

Right here’s a extra direct technique to see this:

``````d\$imply()
# tf.Tensor(
# [[0. 1.]
#  [0. 1.]], form=(2, 2), dtype=float64)

d\$stddev()
# tf.Tensor(
# [[1.5 2.5]
#  [3.5 4.5]], form=(2, 2), dtype=float64)``````

Puzzle solved!

Summing up, broadcasting is easy “in principle” (its guidelines are), however might have some training to get it proper. Particularly along with the truth that capabilities / operators do have their very own views on which components of its inputs ought to broadcast, and which shouldn’t. Actually, there is no such thing as a manner round trying up the precise behaviors within the documentation.

Hopefully although, you’ve discovered this submit to be an excellent begin into the subject. Possibly, just like the writer, you’re feeling such as you would possibly see broadcasting happening anyplace on the planet now. Thanks for studying!