import numpy as np

I will be using Numpy version :- 1.19.2

Note: Click on Open in colab icon at the top of this blog, to read and play along with the code for yourself !

What is Numpy

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more. numpy.org

Why Numpy

Python is a higher level language than C, which means it abstracts the details of the computer from you - memory management, pointers, etc, and allows you to write programs in a way which is closer to how humans think.

It is true that C code usually runs 10 to 100 times faster than Python code if you measure only the execution time. However if you also include the development time Python often beats C. For many projects the development time is far more critical than the run time performance. Longer development time converts directly into extra costs, fewer features and slower time to market.Source

So, to get the advantage of both worlds, i.e, faster execution time and development time we can make us of Numpy.

I will tell you how exactly we can achieve this by introducing you with the following Numpy Strategies :-

UFuncs (Universal Functions)
Aggregations
Broadcasting
Slicing, masking and fancy indexing

Let's look at each of these with some simple examples

1. UFuncs (Univarsal Functions)

ufuncs are the sepcial type of functions defined in Numpy that are operated element-wise on Arrays

I will create a Python List L and a numpy array A with some elements in it.

L = list(range(10_000)) #create list of Integers from 0 to 999
A = np.array(L)         #create python list to a numpy array
L[0:10], A[0:10]        #print 1st 10 elements of the list/array

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))

Adding a number to every element in the list is done by using List comprehensions for L and ufuncs for A

L = [(l+10) for l in L] #add 10 to each element
A = A + 10              #add 10 to each element
L[0:10], A[0:10]        #print 1st 10 elements of the list/array

([10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]))

compare which works faster using line magic comand -> %timeit (This line magic command will run the line for specific number of times and gives us the average time it took to run that particular line of code)

%timeit -n 1000 [(l+10) for l in L]
%timeit -n 1000 A + 10

465 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
5.61 µs ± 180 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

print(f"Speed increment is almost {464/5.5 :.2f} times")

Speed increment is almost 84.36 times

There are many ufuncs available, some of them are listed below :-

Arithmetic operators :- + - * / // % ....
Bitwise operators :- & | ~ ^ >> << ....
Comparision operators :- < > >= <= == != ....
Trignometric operators :- np.sin np.cos np.tan ....
Exponential operators :- np.exp np.log np.log10 ....

Take a look at official documentation for more ufuncs

2. Aggregations

Aggregations are the functions that summarizes the values in an array

Let's create Python Lists of list L and a numpy 2d-array A with some elements in it.

L = [[1,11], [2,22], [3,33], [4,44]] #create lists of list with 2 elements in each list
A = np.array(L, dtype=np.int)        #convert L into a 2d-array of shape 4 x 2
L, A, A.shape

([[1, 11], [2, 22], [3, 33], [4, 44]],
 array([[ 1, 11],
        [ 2, 22],
        [ 3, 33],
        [ 4, 44]]),
 (4, 2))

Now let's say we have to find the minimum value in the given array by sepcifying axis i.e along which dimention we want our function to be applied. This can be done easily by using following aggregations.

A.min(axis = 0) #Gives us the minimum value in each column

array([ 1, 11])

A.min(axis = 1) #Gives us the minimum value in each row

array([1, 2, 3, 4])

Now, let;s find the summation of all the values in an array and also along each axis seperately.

A.sum(axis=0) #Gives us the summation in each column

array([ 10, 110])

A.sum(axis=1) #Gives us the summation in each row

array([12, 24, 36, 48])

A.sum() #Gives us the summation of all elements if we don't specify any axis

120

Similarly we can use any aggregation functions we want to apply on an array, some of them are given below :-

Functions : Description

np.mean() : Compute the arithmetic mean along the specified axis.
np.std() : Compute the standard deviation along the specified axis.
np.var() : Compute the variance along the specified axis.
np.sum() : Sum of array elements over a given axis.
np.prod() : Return the product of array elements over a given axis.
np.cumsum() : Return the cumulative sum of the elements along a given axis.
np.cumprod() : Return the cumulative product of elements along a given axis.
np.min(), np.max() : Return the minimum / maximum of an array or minimum along an axis.
np.argmin(), np.argmax() : Returns the indices of the minimum / maximum values along an axis
np.all() : Test whether all array elements along a given axis evaluate to True.
np.any() : Test whether any array element along a given axis evaluates to True. source

3. Broadcasting

Source -> fast.ai

Broadcasting is a set of rules by which ufuncs operate on arrays of different sizes and/or different dimensions

For instance, it's obvious there is no way to add a 3×3 matrix with a 4×5 matrix, but what if we want to add one scalar (which can be represented as a 1×1 array) with a matrix? Or a vector of size 3 with a 3×4 matrix? In both cases, we can find a way to make sense of this operation.

Broadcasting gives specific rules to codify when shapes are compatible when trying to do an elementwise operation, and how the array of the smaller shape is expanded to match the array of the bigger shape. It's essential to master those rules if you want to be able to write code that executes quickly.

Broadcasting with a scalar

Broadcasting with a scalar is the easiest type of broadcasting. When we have a array a and a scalar, we just imagine a array of the same shape as a filled with that scalar and perform the operation:

a = np.array([1, 2, 4, 8, 12])
a > 3

array([False, False,  True,  True,  True])

How are we able to do this comparison? 3 is being broadcast to have the same dimensions as a. Note that this is done without creating a array full of zeros in memory (that would be very inefficient).

This is very useful if you want to normalize your dataset by subtracting the mean (a scalar) from the entire data set (a matrix) and dividing by the standard deviation (another scalar):

m = np.array([[1., 2, 3], [4,5,6], [7,8,9]])
(m - 5) / 2.73

array([[-1.46520147, -1.0989011 , -0.73260073],
       [-0.36630037,  0.        ,  0.36630037],
       [ 0.73260073,  1.0989011 ,  1.46520147]])

What if have different means for each row of the matrix? in that case you will need to broadcast a vector to a matrix.

Broadcasting a vector to a matrix

We can broadcast a vector to a matrix as follows:

c = np.array([10.,20,30])
m = np.array([[1., 2, 3], [4,5,6], [7,8,9]])
m.shape,c.shape

((3, 3), (3,))

m + c

array([[11., 22., 33.],
       [14., 25., 36.],
       [17., 28., 39.]])

Here the elements of c are expanded to make three rows that match, making the operation possible. Again, Numpy doesn't actually create three copies of c in memory.

t = np.expand_dims(c, axis=1) 
t.nbytes                       #nbytes: This attribute gives the total bytes consumed by the elements of the NumPy array.

24

If we want to broadcast in the other dimension, we have to change the shape of our vector to make it a 3×1 matrix. This is done with the unsqueeze method in Numpy:

c = np.array([10.,20,30])
m = np.array([[1., 2, 3], [4,5,6], [7,8,9]])
c = np.expand_dims(c, axis=1)
m.shape,c.shape

((3, 3), (3, 1))

This time, c is expanded on the column side:

c+m

array([[11., 12., 13.],
       [24., 25., 26.],
       [37., 38., 39.]])

Like before, only three scalars are stored in memory:

t = np.expand_dims(c, axis=1)
t.nbytes

24

Adding or removing extra dimentions in numpy using None

c.shape, c[None,:].shape,c[:,None].shape, c.squeeze(1).shape

((3, 1), (1, 3, 1), (3, 1, 1), (3,))

You can always omit trailing colons, and ... means all preceding dimensions:

c[None].shape,c[...,None].shape

((1, 3, 1), (3, 1, 1))

Rules of Broadcasting

When operating on two Arrays, Numpy compares their shapes elementwise. It starts with the trailing dimensions and works its way backward, adding 1 when it meets empty dimensions. Two dimensions are compatible when one of the following is true:

They are equal.
One of them is 1, in which case that dimension is broadcast to make it the same as the other.

Arrays do not need to have the same number of dimensions. For example, if you have a 256×256×3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with three values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

Image  (3d Array): 256 x 256 x 3
Scale  (1d Array):  (1)   (1)  3
Result (3d Array): 256 x 256 x 3

Example 1

Try to determine what dimensions you will get as an output when you multiply below 2 vectors (v1 * v2) :

v1 = [[0],
      [1],
      [2]]   with shape 3 x 1

v2 = [0,1,2] with shape 1 x 3

print("""
Example 1 :- Answer
---------------------
""")
v1 = np.arange(3).reshape(3,1)
v2 = np.arange(3)
v3 = v1 * v2   
print(f"The shape of (v1 * v2) is :- {v3.shape}\n") 

print("The reason is as follows :- \n")
print("shape of v1 = [3 x 1]\n")
print("shape of v2 = [3]\n")
print("when you try to mutiply arrays of above shape,\n")
print("step 1 :- shape of v2 will become [1 x 3] to match the number of axis of v1\n")
print("step 2 :- final shapes of v1 and v2 will be equal to [3 x 3], beacuse both axis wil be broadcasted to the higher dimensions to make it the same as the other")

Example 1 :- Answer
---------------------

The shape of (v1 * v2) is :- (3, 3)

The reason is as follows :- 

shape of v1 = [3 x 1]

shape of v2 = [3]

when you try to mutiply arrays of above shape,

step 1 :- shape of v2 will become [1 x 3] to match the number of axis of v1

step 2 :- final shapes of v1 and v2 will be equal to [3 x 3], beacuse both axis wil be broadcasted to the higher dimensions to make it the same as the other

Example 2

Here is slightly complicated exercise, try to determine what dimensions to add (and where) when you need to normalize a batch of images of size 64 x 3 x 256 x 256 (no. of imgs x no. of channels x height x witdh) with vectors of three elements (one for the mean and one for the standard deviation).

print("""
Example 2 :- Answer
--------------------
""")
print("Create an array of given shape with random floats sampled from a univariate normal(Gaussian) distribution of mean 0 and variance 1\n")
imgs=np.random.randn(64,3,256,256) 
print(f"Shape of created image is :- {imgs.shape}\n")
print(f"[4 x 4] values of 1st channel of 1st image :- \n")      
print(f"{imgs[0,0,0:4,0:4]}\n")        
imgs_mean = imgs.mean(axis=(0,2,3)) 
imgs_std = imgs.std(axis=(0,2,3))  
print(f"taking mean over all the axis expect the channel axis gives us :- \n") 
print(f"{imgs_mean}\n") 
print(f"taking std over all the axis expect the channel axis gives us :- \n") 
print(f"{imgs_std}\n") 
print(f"As expected, mean is almost 0 and variance is 1 for all the 3 channels.\n") 
print("So, as you might have guessed, to normalize a batch of images of size [64 x 3 x 256 x 256] with vector of three elements, we need to add additional axis at 0, 2 and 3 (or 2 and 3)which makes the final shapes of imgs_mean, imgs_std == [1 x 3 x 1 x 1]  (or [3 x 1 x 1] bcz, last dimension will be added automatically via broadcasting)")

Example 2 :- Answer
--------------------

Create an array of given shape with random floats sampled from a univariate normal(Gaussian) distribution of mean 0 and variance 1

Shape of created image is :- (64, 3, 256, 256)

[4 x 4] values of 1st channel of 1st image :- 

[[-0.03435409 -1.38338202 -0.08341944 -0.1930462 ]
 [ 0.84541251 -1.03895418  0.2455685  -0.89559485]
 [ 0.99927253 -0.29458447  1.91009142  0.93426054]
 [ 0.48428839 -1.8414874   0.46746997 -1.42344483]]

taking mean over all the axis expect the channel axis gives us :- 

[-0.00048131 -0.00048009 -0.00079984]

taking std over all the axis expect the channel axis gives us :- 

[0.99951819 1.00006011 0.99943832]

As expected, mean is almost 0 and variance is 1 for all the 3 channels.

So, as you might have guessed, to normalize a batch of images of size [64 x 3 x 256 x 256] with vector of three elements, we need to add additional axis at 0, 2 and 3 (or 2 and 3)which makes the final shapes of imgs_mean, imgs_std == [1 x 3 x 1 x 1]  (or [3 x 1 x 1] bcz, last dimension will be added automatically via broadcasting)

4. Slicing, masking and fancy indexing

indexing in Python lists accepts Integers/slices

L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

L[2] #integer index

2

L[1:3] #slice for multiple elements

[1, 2]

Indexing in Numpy arrays works similar

A = np.array(L)
A[2], A[1:3]

(2, array([1, 2]))

But numpy offers other fast and convenient indexing as well

Masking :- Indexing with Bolean masks

L = [2,4,8,10,12,14,16]
A = np.array(L)
mask = np.array([False, True, False, True, False, True, False])
mask

array([False,  True, False,  True, False,  True, False])

A[mask]

array([ 4, 10, 14])

Masks are often constructed using comparision operators and booelan logic.

mask = (A > 2) & (A < 14)  # "&" Bitwise and operator
A[mask]

array([ 4,  8, 10, 12])

Fancy indexing :- Passing a list/array of indicies

A = np.random.randint(10,20, size=(7))  #generate random intergers of size 7
idx = [1, 4, 6]                         #access 1st, 4th and 6th index
A, A[idx]

(array([10, 17, 10, 17, 13, 14, 13]), array([17, 13, 13]))

multiple dimension indexing

m = np.arange(6).reshape(2,3)
m

array([[0, 1, 2],
       [3, 4, 5]])

m[abs(m - 3) < 2]

array([2, 3, 4])

All of these operations can be combined and composed in nearly limitless ways !

Computing Nearest Neighbors using Numpy

Here comes the most exiting part as an example to work with. we will implement Nearest Neighbors Algorithm using Numpy and then we wil gonna compare it with the scikit-learn Implementation

Naive approachrequires 3 nested loops for finding Nearest Neighbors with 2-dimensional data

In general, for points given by Cartesian coordinates in {\displaystyle n}n-dimensional Euclidean space, the distance is

$ d\left( p,q\right) = \sqrt {\sum _{i=1}^{n} \left( q_{i}-p_{i}\right)^2 } $

create 1000 points in 3 Dimension

X = np.random.random((1000,3))
X.shape

(1000, 3)

Broadcast to find Pair-wise differences

diff = X.reshape(1000,1,3) - X
diff.shape

(1000, 1000, 3)

Aggregate to find Pair-wise differences

D = (diff ** 2).sum(2)
D.shape

(1000, 1000)

Set diagonals to infinity to skip self-neighbors

i = np.arange(1000)
D[i,i] = np.inf

print the indices of nearest neighbors

i = np.argmin(D,1)
i[:10]

array([371, 846, 214, 357, 129, 517, 349, 391, 367, 574])

Double check with scikit-learn

from sklearn.neighbors import NearestNeighbors
D, i = NearestNeighbors().fit(X).kneighbors(X,2)
i[0:10,1]

array([371, 846, 214, 357, 129, 517, 349, 391, 367, 574])

Summary

Writing Python is fast, loops can be slow
Numpy pushes loops into its compiled layer
- Fast Development time of Python
- Fast execution time of compiled code

Strategies:-

ufuncs for element-wise operations
Aggregations for array summarizations
Broadcasting for combining arrays
Slicing, masking and indexing for selecting and operating on subsets of arrays

This is my first blog, so i'm happy to hear your comments and advice. Happy Learning

References

This blog post is a notebook version of great talk given by jake vanderplas in pycon-2015. Talk is old, but content is gold. Many thanks to him :)

numpy.org
scikit-learn.org
Braoadcasting topic is taken from fast.ai course Notebook -> Chapter 17, Foundations
Why Python is slow ?

Many thanks to ProfessorJeremy Howard for inspiring me to learn Deep learning and to write posts and help others along the way. One such instance is from below tweet ;)

This approach, of writing up a talk that you like and posting it publicly, is great for everyone:
- The speaker gets a sharable text version of their talk
- The writer gets the appreciation of the speaker and the audience
- People can save time by reading instead of listening
— Jeremy Howard (@jeremyphoward) May 26, 2021