Optimizing Python (Using Aliases)

Introduction

In the using built-ins lesson, we learned some approximate quantifications of the number of libraries and modules we can use in Python. The take-away: there are lots of them, and if there is a well-tested, trusted library that solves all or part of our problem at hand, we should use it.

There is an avoidable issue with performance related to the use of libraries like this. Suppose we need to calculate a square root of a value x. We would access the library with import math, and then do the calculation via the math.sqrt(x) function call. The Python interpreter has to look into the math module, find the sqrt() function, and then execute the code. The problem, performance-wise, is that process happens every time the function is used, i.e., the Python interpreter does not remember that it has already looked up the sqrt() function. This is costly if the function call look-up happens inside of a loop.

Consider this code that creates a list of n pseudo-random values in the range [0, 1). We have not implemented the tips from the list comprehension lesson here.

1
2
3
4
5
6
# Sample 1 - naive creation of list of random values.
# n contains the number of values to create
values = []

for i in range(n):
    values.append(random.random())

Using aliases to prevent repeated look-ups

To improve the performance of the code in Sample 1, we can use an alias, which is where we have more than one name for the same object. In Sample 1, random.random is the name of a function object that produces a pseudo-random number in the range [0, 1). To prevent repeated look-ups of the function, we can remember the name of the function in our own code via an alias, as shown in Sample 2:

1
2
3
4
5
6
7
# Sample 2 - creation of list of random values with aliases
# n contains the number of values to create
values = []

rand = random.random
for i in range(n):
    values.append(rand())

In Sample 2, we assign an alias, a.k.a. another name, for the random number generating function with the rand = random.random statement.

Note how we do not include parentheses here! If we did this, rand = random.random(), Python would execute the function, and our name rand would be a random number between 0 and 1. Without the parentheses, Python simply gives another name, of our own choosing, to the function.

The performance implication is that now, instead of having to look into the random module each time to find the required function, the Python interpreter can access the function via the local name rand.

Incorporating previous lessons

Before we examine the speedup caused by using aliases, let us see what happens when we also use list comprehension in our list creating code. First, Sample 3 shows code to create the list of random numbers, with list comprehension, but without using aliases:

1
2
3
# Sample 3 - creation of list with list comprehension but no aliases
# n contains the number of values to create
values = [random.random() for i in range(n)]

Sample 4, which uses both list comprehension and an alias for random.random, should be the fastest version of this code so far.

1
2
3
4
# Sample 4 - creation of list with list comprehension and aliases
# n contains the number of values to create
rand = random.random
values = [rand() for i in range(n)]

The following graph shows the time taken by each of the four samples for a variety of list sizes, and the speedup comparing Sample 1 with Sample 4. We can see from the graph that using list comprehension and an alias for our function results in code that is almost twice as fast as the naive code in Sample 1.

Random list creation speedup

Tip: If you are using the dot (.) operator to access a function or some other object in a module, and you are doing so inside a loop, consider using an alias to improve your code’s performance.

Exercise

You are probably familiar with the Cartesian coordinate system, where the coordinates of a point on a plane are specified by the distance of the point from the x and y axes, respectively. For example, this image shows the Cartesian coordinates for a point with a negative x and positive y value.

Cartesian coordinates

In some mathematical, physics, and computing applications, we often would rather work with polar coordinates, where the coordinate is specified by the angle from the rightward facing axis and the distance from the point to (0, 0). This figure shows the polar coordinate equivalent of the point in the previous image:

Polar coordinates

The sample code below creates a large list of randomly-generated Cartesian points in the unit square, i.e., each point is in [1, 1]. Look up how to convert Cartesian coordinates to polar, and then write code to produce a new list containing (r, Θ) polar coordinates for each of the Cartesian points in the list named cartesians. Use list comprehension and aliases to make the code work faster.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
'''
Function alias exercise 

Convert cartesian coordinates to polar.
'''

import math
import random

# create n cartesian coordinates in the unit square
n = 1_000_000
uni = random.uniform
cartesians = [(uni(-1, 1), uni(-1, 1)) for i in range(n)]

# write code here to create a new list called polars.
# the new list should contain the polar coordinate
# equivalents of each of the coors in cartesians.

Speedup

We refer to speedup as a way to compare our optimized code with its original, slower counterpart. Speedup is a measure of how well we have improved the runtime of our code. If T₀ is the time taken by the original code, and T₁ is the time taken by the improved code, then speedup is defined as

Speedup

A speedup of 2 would mean that our optimized code is twice as fast as the original, while any value under 1 would mean that we actually made our code slower!

All performance figures on this page were obtained on a Windows 10 PC with an Intel® Core™ i5-9600K CPU @ 3.70GHz and 32GB of RAM, in Python 3.6.9 running in an Ubuntu Windows Subsystem for Linux environment.

Next lesson

The next lesson in the workshop is about aggregate functions