There is a common saying among low-level language developers like those schooled in C++. They admit that Python improves development time but claim that it sacrifices runtime in the process.
While this is certainly true for many applications, what few developers know is that there are ways to speed up Python operation time without compromising its ease of use. Even advanced Python developers don't always use the tools available to them for optimizing computations.
However, in the world of intensive programming where repeating millions of function calls is common practice, a runtime of 50 microseconds is considered slow. Consider that those 50 microseconds, repeated over a million function calls, translates to an additional 50 seconds of runtime – and that is definitely slow.
NumPy functions like ndarray.size, np.zeros, and its all-important indexing functions can radically improve the functionality and convenience of working with large data arrays.
What Is NumPy and What Is It for?
On its own, Python is a powerful general-purpose programming language. The NumPy library (along with SciPy and MatPlotLib) turns it into an even more robust environment for serious scientific computing.
NumPy establishes a homogenous multidimensional array as its main object – an n-dimensional matrix. You can use this object as a table of same-type elements indexed by positive integer tuples.
For the most part, only Python programmers in academic settings make full use of these computational opportunities this approach offers. The majority of other developers rely on Python lists. This is a perfectly feasible method for dealing with relatively small matrices, but it gets very unwieldy when dealing with large ones.
For example, if you were trying to create a cube array of 1000 cells cubed – a 1 billion cell 3D matrix – you would be stuck at a minimum size of 12 GB using Python lists. 32-bit architecture breaks down at this size, so you're forced to create a 64-bit build using pointers to create a "list of lists". If that sounds inefficient and infeasible, that's because it is.
With NumPy, you could arrange all of this data into a 64-bit build that takes up about 4 GB of space. The amount of time it would take to manipulate or compute any of that data is much smaller than if you try to implement an iterative, nested Python list.
Basic NumPy Functions
In order to use Python NumPy, you have to become familiar with its functions and routines. One of the reasons why Python developers outside academia are hesitant to do this is because there are a lot of them. For an exhaustive list, consult SciPy.org.
However, getting started with the basics is easy to do. Knowing that NumPy establishes an N-dimensional matrix for elements of the same type, you can immediately begin working with its array functions.
NumPy refers to dimensions as axes. Keep this in mind while familiarizing yourself with the following functions:
Using these functions to describe an array in NumPy would look something like this:
>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array( [[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>
The example defines an array as a and then identifies the size, shape, and type of its elements and axes.
How to Create Arrays
Since NumPy is all about creating and indexing arrays, it makes sense that there would be multiple ways to create new arrays. You can create arrays out of regular Python lists and create new arrays comprised of 1s and 0s as placeholder content.
Creating an Array from a Python List
If you have a regular Python list or a tuple that you would like to call using a NumPy array, you can create an array out of the types of elements in the called sequences. This would look like the following example:
>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
dtype('int64')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')
In this example, there is a specific format for calling the np.array that many NumPy beginners get wrong. Notice the parenthesis and the brackets around the list of numbers that comprise the argument:
>>> a = np.array([x,y,z])
Most coders new to NumPy will only use parentheses, which establishes multiple numeric arguments. This will result in a botched array and potentially many hours of frustrated debugging followed a final "ah-hah!" moment.
Understanding how np.array works is actually quite simple. It will transform sequences of sequences into a two-dimensional array. It will transform sequences of sequences of sequences into a three-dimensional array, working in the same way to the nth degree.
This is one of the main ways that NumPy actually delivers on its promise to radically optimize indexing for very large arrays. It functions as a "list of lists" but does so using a matrix of arbitrary dimensions.
Making a Placeholder Matrix Using NP.Zeros
It's very common for programmers to have to create an array for an unknown set of elements. Even if you don't know the values of the elements themselves, it's easy to determine the size of the matrix. If you know the size of the matrix, then you can create a placeholder array in NumPy and fill it with placeholder content – in this case, with zeroes
Creating a placeholder matrix full of zeroes allows you to establish the size of an array at the outset of your NumPy session. This way, you won't have to grow the array later on, which is a complicated and expensive operation that you generally want to avoid unless absolutely necessary.
Here's an example of how np.zeros works: and np.ones work:
>>> np.zeros( (3,4) )
array( [[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]] )
Notice that in NumPy, you have to spell np.zeros exactly as written. There is no such command as np.zeroes, as many beginning NumPy users find out when their first placeholder arrays don't load as expected.
Other Placeholder Arrays: NP.Ones and NP.Empty
There are other placeholder arrays you can use in NumPy. The two main ones are np.ones and np.empty. Both of these establish a dtype for the created array, which is set by default to float64 – a floating 64-bit build.
Here is an example of how to create an np.ones array in Python using NumPy:
>>> np.ones( (2,3,4), dtype=np.int16 )
# dtype can also be specified
array( [[[ 1, 1, 1, 1], [ 1, 1, 1, 1], [ 1, 1, 1, 1]], [[ 1, 1, 1, 1], [ 1, 1, 1, 1], [ 1, 1, 1, 1]]], dtype=int16 )
You can use np.empty to create an uninitialized array of random data. NumPy generates this data depending on the state of your memory:
>>> np.empty( (2,3) )
# uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260], [ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
Creating a Sequenced Array Using aRange and LinSpace
If you maintain a floating 64-bit dtype, you can use np.arange to create a sequenced array much in the same way a standard Python programmer uses range to return lists. Here are two examples of how you can create a sequenced array:
>>> np.arange( 10, 30, 5 )
# Multiples of 5 between 10 and 30
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # Compatible with float arguments like 0.3
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
You should notice, however, that there is no way to predict the number of elements you'll get out of the np.arange function using floating point arguments. This is a built-in limitation of the precision that floating point build architecture can offer.
For this reason, NumPy programmers typically prefer to use np.linspace with an argument describing the number of elements they want. An example of how this works looks like so:
>>> from numpy import pi
>>> np.linspace( 0, 2, 9 )
# 9 numbers between 0 and 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function over multiple points
>>> f = np.sin(x)
This is just the start to using NumPy functions like np.zeros to create and manage arrays of data. You can use these to build your first price graphs, Big Data comparison tables, and more.