A HTML version of this presentation is also available.

IPython Logo SciPy Logo Matplotlib Logo

Why Python for Scientific Computing?

  • Using these modules you have access to an environment that rivals MATLAB, IDL, etc
  • All software is open source and free to use, no licenses involved
  • Can translate your interactive work into scripts and then into production software
  • Use Python with compiled code to gain speed advantages with advantages of Python's flexibility
  • Python is a general purpose language, not a math tool with a language added to it

Major Modules

  • IPython
  • Numpy and SciPy
  • Matplotlib
  • h5py


  • Interactive shell giving capabilities beyond the default Python REPL
  • Most useful with plotting libraries such as matplotlib
  • Now has browser-based notebooks ala MATLAB notebooks
  • Can be embedded in another program
  • Interactive parallel computation tools

REPL = Read-eval-print loop


  • Built in help
    • Get help by appending ? to functions
    • %pdoc, %pdef, %psource, %pfile
  • Command history
    • Can be logged to a file
    • Repeat commands similar to in bash
  • Access to shell commands through command line
  • Bookmarks
  • Variables from %run scripts available in command line
  • Tab completion


  • Multidimensional array objects
    • Many operations implemented in compiled code for speed
    • Allowing use vectorized code
  • Various routines for fast operation on arrays
    • Basic mathematical operations
    • Shape manipulation, sorting, selecting
    • I/O - Reading and writing to disk
    • discrete Fourier transformations
    • Basic linear algebra
    • Basic statistical operations
    • Random simulations


  • Collections of additional mathematical algorithms built on top of NumPy
  • Built on many existing C, C++ and Fortran math packages
Subpackage Description
cluster Clustering algorithms
constants Physical and mathematical constants
fftpack Fast Fourier Transform routines
integrate Integration and ordinary differential equation solvers
interpolate Interpolation and smoothing splines
io Input and Output
linalg Linear algebra
ndimage N-dimensional image processing
odr Orthogonal distance regression
optimize Optimization and root-finding routines
signal Signal processing
sparse Sparse matrices and associated routines
spatial Spatial data structures and algorithms
special Special functions
stats Statistical distributions and functions
weave C/C++ integration

Numpy Examples

>>> from numpy import *
>>> x = array([[1,2,3],[4,5,6]])
>>> x.shape
(2, 3)
>>> y = array([[1,2],[3,4],[5,6]])
>>> y.shape
(3, 2)
>>> dot(x,y) # matrix multiplication (2,3) x (3,2) -> (2,2)
array([[22, 28],
       [49, 64]])

>>> x.sum()

>>> a = array([[1,2,7],[4,9,6]])
>>> a.mean()

>>> a = array([3,5,7,9])
>>> where(a <= 6)
(array([0, 1]),)

For Users of MATLAB and IDL

  • NumPy uses 0 based indexing instead of 1 based like in MATLAB
  • NumPy is row first indexing like MATLAB and unlike the column first indexing in IDL
  • If using array class in NumPy then matrix multiplication is performed with the dot() function

Some MATLAB and IDL Equivalency

ndims(a) size(a, /N_DIMENSIONS) ndim(a) or a.ndim Get number of dimensions
size(a) size(a, /DIMENSIONS) shape(a) or a.shape Get size of dimensions
[1 2 3 ; 4 5 6] [[1, 2, 3], [4, 5, 6]] array([[1.,2.,3.], [4.,5.,6.]]) 2x3 array literal
a(2,5) a(4,1) a[1,4] Array element access
a(2,:) a(*,1) a[1,:] Array splicing
a.^3 a ^ 3 a**3 or power(a,3) Element-wise exponentiation
a * b a # b or b ## a dot(a,b) Array multiplication
a .* b a * b a * b Per element multiplication
a.' transpose(a) a.T or a.transpose() Array transpose
zeros(3,4) dblarr(3,4) zeros((3,4)) Array initialized to zeros
ones(3,4) dblarr(3,4)+1 ones((3,4)) Initialized to ones
eye(3) identity(3) eye(3) or identity(3) Identity matrix
find(a > 0.5) where(a > 0.5) where(a > 0.5) Finding indices
  • The above refers to using NumPy the array class. Multiplication will be different if using the matrix class.
  • See references for more examples


  • Object oriented Pythonic plotting and graphics library
  • Builds on top of NumPy
  • Aims to provide the ability to make publication quality plots
  • Multiple graphics back ends
    • On screen display, with various interfaces
    • File output: PostScript, PDF, SGV, PNG
  • pylab component provides MATLAB like functions
  • Can use LaTex expressions in text for mathematical expressions

matplotlib Simple Example

from pylab import *

t = arange(0.0, 2.0, 0.01)
s = sin(2*pi*t)
plot(t, s, linewidth=1.0)

xlabel('time (s)')
ylabel('voltage (mV)')
title('About as simple as it gets, folks')
Simple Plot Example

matplotlib Subplot Example

import matplotlib.pyplot as plt
import numpy as np

# Simple data to display in various forms
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)

# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(x, y)
axarr[0].set_title('Sharing X axis')
axarr[1].scatter(x, y)
Subplots Example


  • Pythonic interface to HDF5 files
  • Builds on top of NumPy
  • Interact with HDF5 files as if they were Python and NumPy data structures

h5py Examples

>>> import h5py
>>> f = h5py.File('filename.hdf5','w')
>>> print f.filename
>>> f.create_dataset('identity', data=numpy.identity(5))
<HDF5 dataset "identity": shape (5, 5), type "<f8">
>>> f['identity']
<HDF5 dataset "identity": shape (5, 5), type "<f8">
>>> f['identity'][:]
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
>>> f.keys()
>>> f.create_group("mydatagroup")
<HDF5 group "/mydatagroup" (0 members)>
>>> f['mydatagroup'].create_dataset('mystuff', (10,30)
<HDF5 dataset "mystuff": shape (10, 30), type "<f4">

Making Tools

  • These Python modules all build off each other
  • Easily leverage them to build custom tools

L2 Analysis Tool

  • Uses IPython, NumPy, Matplotlib and h5py
  • Provides an extended IPython shell
  • Loads OCO L2 output HDF files and correlates between files on the same data
  • Exposes a large number of analysis and plotting routines that already know how to use loaded data
  • Also built using metaprogramming to facilitate adding new functionally just be defining functions with desired dataset names in header

Loading the Tool

$ analyze_l2.py l2_geom_aggregate.h5
*** Launcing L2 Analysis Shell ***
Available analysis routines:
get_sv_apriori, get_sv_names, get_sv_result
plot_abo2_radiance_scaling_slope_time, plot_abo2_chi2_sza, plot_abo2_rms_time,
  plot_abo2_radiance_scaling_sza, plot_abo2_rad_mean_uncert_time,
  plot_abo2_rad_mean_time, plot_abo2_radiance_scaling_slope_sza,
  plot_abo2_albedo_ap_ret_corr, plot_abo2_albedo_sza, plot_abo2_chi2_time,
  plot_abo2_rms_sza, plot_abo2_albedo_time, plot_abo2_radiance_scaling_time
plot_aerosol_ice_sza, plot_aerosol_water_sza, plot_aerosol_kahn_2b_time,
  plot_aerosol_kahn_2b_sza, plot_aerosol_water_time, plot_aerosol_ice_time,
  plot_aerosol_kahn_3b_sza, plot_aerosol_kahn_3b_time
plot_psurf_delta_bar_std, plot_psurf_sza, plot_psurf_ap_sza, plot_psurf_ap_time,
  plot_psurf_time, plot_psurf_delta_bar_mean
plot_sco2_albedo_time, plot_sco2_albedo_sza, plot_sco2_chi2_sza,
  plot_sco2_chi2_time, plot_sco2_radiance_scaling_time, plot_sco2_rms_time,
  plot_sco2_rms_sza, plot_sco2_albedo_ap_ret_corr,
  plot_sco2_radiance_scaling_sza, plot_sco2_radiance_scaling_slope_sza,
  plot_sco2_radiance_scaling_slope_time, plot_sco2_rad_mean_time,
plot_wco2_chi2_time, plot_wco2_rms_sza, plot_wco2_chi2_sza,
  plot_wco2_albedo_ap_ret_corr, plot_wco2_rad_mean_uncert_time,
  plot_wco2_radiance_scaling_slope_time, plot_wco2_radiance_scaling_slope_sza,
  plot_wco2_rms_time, plot_wco2_radiance_scaling_sza,
  plot_wco2_radiance_scaling_time, plot_wco2_albedo_sza, plot_wco2_albedo_time,
plot_windspeed_time, plot_windspeed_sza
plot_xco2_time, plot_xco2_bar_std, plot_xco2_sza, plot_xco2_bar_mean

L2A: In <1>:

Example Plot

L2A: In <1>: plot_xco2_time()
L2A: Out<1>: <matplotlib.axes.AxesSubplot at 0xeebe7d0>
Plot of XCO2

Integrating Large Fortran Program

  • We have several large Fortan 90 packages provided by external partners
  • They have fairly large number of complicated inputs, large number of derived types
  • Their interfaces often change slightly between versions
  • These packages are in the critical path time wise

Code Generation

  • Built using G3 F2PY, in development version beyond what is in SciPy
  • Parses Fortan code using F2PY
  • Uses Python Jinja templating engine to define interface to be written
  • Creates C++ interface that exposes Fortran on to C++ side through iso_c_bindings and Blitz++ arrays with no copying
  • Presents Fortran user defined types (like structs) as C++ classes
  • Somewhat generic but built for type of interface our partner provides: lots of types that are initialized and passed to subroutines

f2py (SciPy version)

  • Create interfaces between Python and Fortran
    • Automatically
    • With hand written interface files
  • Included with SciPy
  • Takes care of difference in data storage order
  • Can not automatically generate interfaces for complicated routines, end up creating intermediary routines
  • No support for Fortran user defined types
  • Can not take advantage of iso_c_bindings in Fortran 2003

f2py example

Given a simple source file:

subroutine super_fast_add(a, b, c)
     integer, intent(in)  :: a
     integer, intent(in)  :: b
     integer, intent(out) :: c
     c = a + b
end subroutine

Create and compile the interface module in one command:

$ f2py -c adder.f90 -m adder

Use from Python

>>> import addder
>>> print adder.super_fast_add(1,1)


  • CFFI = Common Foreign Function Interface
  • Provides an interface to calling C libraries from Python
  • Keep all Python logic in Python without writing interface files
  • Can work at the API (Application Programming Interface) or ABI (Application Binary Interface)
  • Can also be used with Fortan code at the ABI level thanks to iso_c_binding
  • Easy to use with distutils for distributing software using CFFI

Example Using CFFI with C

Straight out of CFFI documentation

from cffi import FFI
ffi = FFI()
ffi.cdef("""     // some declarations from the man page
    struct passwd {
        char *pw_name;
    struct passwd *getpwuid(int uid);
C = ffi.verify("""   // passed to the real C compiler
#include <sys/types.h>
#include <pwd.h>
""", libraries=[])   # or a list of libraries to link with
p = C.getpwuid(0)
assert ffi.string(p.pw_name) == 'root'

Example Using CFFI with Fortran

subroutine super_fast_add(a, b, c) bind (C, name="fortran_add_test")

     use iso_c_binding
     integer(kind=c_int), intent(in)  :: a
     integer(kind=c_int), intent(in)  :: b
     integer(kind=c_int), intent(out) :: c
     c = a + b
end subroutine
$ gfortran -shared adder_isoc.f90 -o adder_isoc.so
>>> from cffi import FFI
>>> ffi = FFI()
>>> ffi.cdef("void fortran_add_test(int *a, int *b, int *c);")
>>> F = ffi.dlopen("adder_isoc.so")
>>> a = ffi.new("int*", 2)
>>> b = ffi.new("int*", 2)
>>> c = ffi.new("int*")
>>> F.fortran_add_test(a, b, c)
>>> print c[0]

Other Possibilities for Interfacing

  • Cython - http://www.cython.org/
    • Works at the API level
    • Requires definition of interface files
    • You intertwine Python and C in interface files which are a superset of Python
  • ctypes - http://docs.python.org/2/library/ctypes.html
    • Works at ABI Level
    • Foreign function library for Python
    • Loads shared libraries and sets up ability to call them
    • Built into Python

Other Possibilities for Interfacing

  • Weave - http://www.scipy.org/Weave
    • Allows C code to be inlined inside of Python
    • Compiles extension modules on the fly
    • Modules are cached for subsequent use
    • Built into SciPy

Other Useful Packages