Python tips, questions, and answers

From Phyics 7682 Course Wiki

Jump to: navigation, search

Contents

Python syntax

List comprehensions

This refers to a syntax for creating lists algorithmically in one expression. A list comprehension consists of two, or optionally three, parts, all enclosed in square brackets []:

[ list_elements  for_statements   optional_if_statement ]

list_elements refers to an expression that will be evaluated to provide each element in the list; for_statements refers to one or more Python for loops that will be iterated over to construct the list; optional_if_statement refers to a single if statement (which can be a compound expression with and, or, etc.) that filters those elements of the list that evaluate to True.

This is perhaps best understood with a few examples (feel free to add more examples of your own):

[x+3 for x in somelist]         # return a list which adds 3 to every element of somelist

[x for x in somelist if x > 0]  # return a list containing only those elements of somelist that are positive

[(i,j) for i in range(10) for j in range(10)]    # return a list of all pairs (tuples) of integers from 0 through 9:
                                                 # [(0,0), (0,1), ..., (1,0),(1,1),...(9,8),(9,9)]

Integer division

The decision in Python to have division of two integers result in so-called "floor division" has generated considerable controversy over the years, and this behavior will finally be changed in Python version 3.0. (By "floor division", we mean that the division of two integers results in an integer that is less than or equal to the floating point number that would result from the division of the two corresponding floating point numbers.) As described in this Python Enhancement Proposal (PEP), however, you don't have to wait for Python 3.0 in order to get the division semantics you prefer. If you want to dispense with "classic division" (i.e., floor division for integers) and instead use "true division", then insert the following statement in any module (source file) where you want true division:

from __future__ import division

Once you've done so, then 5/2 will result in 2.5, instead of 2 as previously.

Python packages

IPython interpreter

  • ipython and pylab: If you plan to use pylab within an ipython session, try to following from the command line when you start up the session:
    ipython -pylab

This will import the pylab library, and spare you from having to run pylab.show() to display plots. Instead, plotting commands are shown on the current figure window as soon as they are executed.

  • Note: with ipython -pylab, the pylab module is both imported as is (so that commands must be prefaced with the pylab module name, e.g., pylab.plot(), but is also imported into the main namespace, so that pylab commands can be executed without the pylab. prefix as well.

Python Debugger

There's a good discussion of how to use the python debugging module (pdb) online entitled Debugging in Python

Benchmarking

Using the "time" package, it is fairly straightforward to microbenchmark different ways of accomplishing the same task in Python (apparently there are function annotations to better facilitate Python benchmarking, but I'm not aware of how to use them). The general procedure is to call a function on a large data set within a loop, asking for the time at the start and stop of the loop. Comparing the differences between these times for different methods will give an estimate of their relative performance. Occasionally, care must be taken to ensure that the functions being benchmarked do not produce a trivial or unused result (this last point is more applicable to compiled languages where optimizing compilers can "compile out" unnecessary operations).

Scipy vs. map()

Using scipy, it is possible to write functions that act on and return arrays in addition to performing their task with scalars (this is how good MATLAB functions are written). The advantage of this approach is that computers are very good at performing the same operation over and over again on a set of data. (This is traditionally called "vector processing," and it is what made old Cray supercomputers so special. More recently, a limited form of this called "stream processing" is receiving a great deal of attention due to the prevalence of graphics cards.) Thus, calling "scipy.sin()" on a scipy array is an extremely efficient way to compute the sine of a set of numbers.

A function can also be applied to every element in a set of data using "map()" or list comprehensions. A question is, then, are these methods "smart" enough to utilize the fact that scipy functions can operate on arrays, and if not, what is the overhead due to calling the same function over and over again once per data element. To do this, we can write a benchmark timing how long it takes to perform the same operation on a large data set using each of these approaches.

import scipy
import time

# Function to benchmark
def myfunc(a, b):
    u = scipy.pi*(1.0 - scipy.cos(a))
    k = scipy.e*(scipy.pi*b)**2
    return u + k

# Number of elements in data set
listSize = 100000

# Number of loops
nLoops = 100

# Generate random data sets
alist = scipy.random.random(listSize)
blist = scipy.random.random(listSize)

# Benchmark scipy array functions
eSum = scipy.zeros(listSize)
startTime = time.time()
for i in xrange(nLoops):
    eSum += myfunc(alist, blist)
endTime = time.time()
print "Mean: %f" % scipy.mean(eSum)
print "Scipy: %f s" % (endTime - startTime)

# Benchmark map()
eSum = scipy.zeros(listSize)
startTime = time.time()
for i in xrange(nLoops):
    eSum += map(myfunc, alist, blist)
endTime = time.time()
print "Mean: %f" % scipy.mean(eSum)
print "Map(): %f s" % (endTime - startTime)

# Benchmark list comprehension (only 1 argument)
eSum = scipy.zeros(listSize)
startTime = time.time()
for i in xrange(nLoops):
    eSum += [myfunc(a, 0.5) for a in alist]
endTime = time.time()
print "Mean: %f" % scipy.mean(eSum)
print "Lists: %f s" % (endTime - startTime)

Output:

Mean: 944.526175
Scipy: 0.449259 s
Mean: 944.526175
Map(): 67.699522 s
Mean: 720.593796
Lists: 58.868948 s

The results of this benchmark tell us that the overhead incurred by using functional programming in the form of the "map()" function results in a 100x slowdown compared to simply calling the exact same function with array arguments. Thus, if performance is a consideration, vector operations should be performed by array-aware scipy functions whenever possible.

NOTE regarding Scipy vs. Map(): You might want to repeat this comparison for different size lists. Scipy arrays are optimized for operations over longer arrays, but there is some overhead in constructing arrays, and operations over lists might be more competitive for smaller list/array sizes.

Personal tools