Python for Scientific Computing

James McDuffie

ipython_logo.png scipy_logo.png matplotlib_logo.png

Presenter Notes

Why Python for Scientific Computing?

  • Using these modules you have access to an environment that rivals MATLAB, IDL, etc
  • All software is open source and free to use, no licenses involved
  • Can translate your interactive work into scripts and then into production software
  • Use Python with compiled code to gain speed advantages with advantages of Python's flexibility
  • Python is a general purpose language, not a math tool with a language added to it

Presenter Notes

Major Modules

  • IPython
  • Numpy and SciPy
  • Matplotlib
  • h5py

Presenter Notes

IPython

  • Interactive shell giving capabilities beyond the default Python REPL
  • Most useful with plotting libraries such as matplotlib
  • Now has browser-based notebooks ala MATLAB notebooks
  • Can be embedded in another program
  • Interactive parallel computation tools

REPL = Read-eval-print loop

Presenter Notes

IPython

  • Built in help
    • Get help by appending ? to functions
    • %pdoc, %pdef, %psource, %pfile
  • Command history
    • Can be logged to a file
    • Repeat commands similar to in bash
  • Access to shell commands through command line
  • Bookmarks
  • Variables from %run scripts available in command line
  • Tab completion

Presenter Notes

Numpy

  • Multidimensional array objects
    • Many operations implemented in compiled code for speed
    • Allowing use vectorized code
  • Various routines for fast operation on arrays
    • Basic mathematical operations
    • Shape manipulation, sorting, selecting
    • I/O - Reading and writing to disk
    • discrete Fourier transformations
    • Basic linear algebra
    • Basic statistical operations
    • Random simulations

Presenter Notes

SciPy

  • Collections of additional mathematical algorithms built on top of NumPy
  • Built on many existing C, C++ and Fortran math packages
Subpackage Description
cluster Clustering algorithms
constants Physical and mathematical constants
fftpack Fast Fourier Transform routines
integrate Integration and ordinary differential equation solvers
interpolate Interpolation and smoothing splines
io Input and Output
linalg Linear algebra
ndimage N-dimensional image processing
odr Orthogonal distance regression
optimize Optimization and root-finding routines
signal Signal processing
sparse Sparse matrices and associated routines
spatial Spatial data structures and algorithms
special Special functions
stats Statistical distributions and functions
weave C/C++ integration

Presenter Notes

Numpy Examples

>>> from numpy import *
>>> x = array([[1,2,3],[4,5,6]])
>>> x.shape
(2, 3)
>>> y = array([[1,2],[3,4],[5,6]])
>>> y.shape
(3, 2)
>>> dot(x,y) # matrix multiplication (2,3) x (3,2) -> (2,2)
array([[22, 28],
       [49, 64]])

>>> x.sum()
21

>>> a = array([[1,2,7],[4,9,6]])
>>> a.mean()
4.833333333333333

>>> a = array([3,5,7,9])
>>> where(a <= 6)
(array([0, 1]),)

Presenter Notes

For Users of MATLAB and IDL

  • NumPy uses 0 based indexing instead of 1 based like in MATLAB
  • NumPy is row first indexing like MATLAB and unlike the column first indexing in IDL
  • If using array class in NumPy then matrix multiplication is performed with the dot() function

Presenter Notes

Some MATLAB and IDL Equivalency

MATLAB IDL NumPy Notes
ndims(a) size(a, /N_DIMENSIONS) ndim(a) or a.ndim Get number of dimensions
size(a) size(a, /DIMENSIONS) shape(a) or a.shape Get size of dimensions
[1 2 3 ; 4 5 6] [[1, 2, 3], [4, 5, 6]] array([[1.,2.,3.], [4.,5.,6.]]) 2x3 array literal
a(2,5) a(4,1) a[1,4] Array element access
a(2,:) a(*,1) a[1,:] Array splicing
a.^3 a ^ 3 a**3 or power(a,3) Element-wise exponentiation
a * b a # b or b ## a dot(a,b) Array multiplication
a .* b a * b a * b Per element multiplication
a.' transpose(a) a.T or a.transpose() Array transpose
zeros(3,4) dblarr(3,4) zeros((3,4)) Array initialized to zeros
ones(3,4) dblarr(3,4)+1 ones((3,4)) Initialized to ones
eye(3) identity(3) eye(3) or identity(3) Identity matrix
find(a > 0.5) where(a > 0.5) where(a > 0.5) Finding indices
  • The above refers to using NumPy the array class. Multiplication will be different if using the matrix class.
  • See references for more examples

Presenter Notes

matplotlib

  • Object oriented Pythonic plotting and graphics library
  • Builds on top of NumPy
  • Aims to provide the ability to make publication quality plots
  • Multiple graphics back ends
    • On screen display, with various interfaces
    • File output: PostScript, PDF, SGV, PNG
  • pylab component provides MATLAB like functions
  • Can use LaTex expressions in text for mathematical expressions

Presenter Notes

matplotlib Simple Example

from pylab import *

t = arange(0.0, 2.0, 0.01)
s = sin(2*pi*t)
plot(t, s, linewidth=1.0)

xlabel('time (s)')
ylabel('voltage (mV)')
title('About as simple as it gets, folks')
grid(True)
savefig("test.png")
show()
simple_plot.png

Presenter Notes

matplotlib Subplot Example

import matplotlib.pyplot as plt
import numpy as np

# Simple data to display in various forms
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)

# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(x, y)
axarr[0].set_title('Sharing X axis')
axarr[1].scatter(x, y)
subplots_demo_01.png

Presenter Notes

h5py

  • Pythonic interface to HDF5 files
  • Builds on top of NumPy
  • Interact with HDF5 files as if they were Python and NumPy data structures

Presenter Notes

h5py Examples

>>> import h5py
>>> f = h5py.File('filename.hdf5','w')
>>> print f.filename
filename.hdf5
>>> f.create_dataset('identity', data=numpy.identity(5))
<HDF5 dataset "identity": shape (5, 5), type "<f8">
>>> f['identity']
<HDF5 dataset "identity": shape (5, 5), type "<f8">
>>> f['identity'][:]
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
>>> f.keys()
[u'identity']
>>> f.create_group("mydatagroup")
<HDF5 group "/mydatagroup" (0 members)>
>>> f['mydatagroup'].create_dataset('mystuff', (10,30)
<HDF5 dataset "mystuff": shape (10, 30), type "<f4">

Presenter Notes

Making Tools

  • These Python modules all build off each other
  • Easily leverage them to build custom tools

Presenter Notes

L2 Analysis Tool

  • Uses IPython, NumPy, Matplotlib and h5py
  • Provides an extended IPython shell
  • Loads OCO L2 output HDF files and correlates between files on the same data
  • Exposes a large number of analysis and plotting routines that already know how to use loaded data
  • Also built using metaprogramming to facilitate adding new functionally just be defining functions with desired dataset names in header

Presenter Notes

Loading the Tool

$ analyze_l2.py l2_geom_aggregate.h5
*** Launcing L2 Analysis Shell ***
Available analysis routines:
cloud_flag
filter_cloud_free
filter_outcome
filter_solar_zenith
filter_surface_type
filter_tccon_site
get_data
get_sv_apriori, get_sv_names, get_sv_result
new_axis
outcome
plot_abo2_radiance_scaling_slope_time, plot_abo2_chi2_sza, plot_abo2_rms_time,
  plot_abo2_radiance_scaling_sza, plot_abo2_rad_mean_uncert_time,
  plot_abo2_rad_mean_time, plot_abo2_radiance_scaling_slope_sza,
  plot_abo2_albedo_ap_ret_corr, plot_abo2_albedo_sza, plot_abo2_chi2_time,
  plot_abo2_rms_sza, plot_abo2_albedo_time, plot_abo2_radiance_scaling_time
plot_aerosol_ice_sza, plot_aerosol_water_sza, plot_aerosol_kahn_2b_time,
  plot_aerosol_kahn_2b_sza, plot_aerosol_water_time, plot_aerosol_ice_time,
  plot_aerosol_kahn_3b_sza, plot_aerosol_kahn_3b_time
plot_div_histogram
plot_iter_histogram
plot_psurf_delta_bar_std, plot_psurf_sza, plot_psurf_ap_sza, plot_psurf_ap_time,
  plot_psurf_time, plot_psurf_delta_bar_mean
plot_quality_flag
plot_radiance_scaling
plot_sco2_albedo_time, plot_sco2_albedo_sza, plot_sco2_chi2_sza,
  plot_sco2_chi2_time, plot_sco2_radiance_scaling_time, plot_sco2_rms_time,
  plot_sco2_rms_sza, plot_sco2_albedo_ap_ret_corr,
  plot_sco2_radiance_scaling_sza, plot_sco2_radiance_scaling_slope_sza,
  plot_sco2_radiance_scaling_slope_time, plot_sco2_rad_mean_time,
  plot_sco2_rad_mean_uncert_time
plot_spectral_fits
plot_statevector_diff_trend
plot_wco2_chi2_time, plot_wco2_rms_sza, plot_wco2_chi2_sza,
  plot_wco2_albedo_ap_ret_corr, plot_wco2_rad_mean_uncert_time,
  plot_wco2_radiance_scaling_slope_time, plot_wco2_radiance_scaling_slope_sza,
  plot_wco2_rms_time, plot_wco2_radiance_scaling_sza,
  plot_wco2_radiance_scaling_time, plot_wco2_albedo_sza, plot_wco2_albedo_time,
  plot_wco2_rad_mean_time
plot_windspeed_time, plot_windspeed_sza
plot_xco2_time, plot_xco2_bar_std, plot_xco2_sza, plot_xco2_bar_mean
quality_flag
surface_type
tccon_sites

L2A: In <1>:

Presenter Notes

Example Plot

L2A: In <1>: plot_xco2_time()
L2A: Out<1>: <matplotlib.axes.AxesSubplot at 0xeebe7d0>
xco2_plot.png

Presenter Notes

Integrating Large Fortran Program

  • We have several large Fortan 90 packages provided by external partners
  • They have fairly large number of complicated inputs, large number of derived types
  • Their interfaces often change slightly between versions
  • These packages are in the critical path time wise

Presenter Notes

Code Generation

  • Built using G3 F2PY, in development version beyond what is in SciPy
  • Parses Fortan code using F2PY
  • Uses Python Jinja templating engine to define interface to be written
  • Creates C++ interface that exposes Fortran on to C++ side through iso_c_bindings and Blitz++ arrays with no copying
  • Presents Fortran user defined types (like structs) as C++ classes
  • Somewhat generic but built for type of interface our partner provides: lots of types that are initialized and passed to subroutines

Presenter Notes

f2py (SciPy version)

  • Create interfaces between Python and Fortran
    • Automatically
    • With hand written interface files
  • Included with SciPy
  • Takes care of difference in data storage order
  • Can not automatically generate interfaces for complicated routines, end up creating intermediary routines
  • No support for Fortran user defined types
  • Can not take advantage of iso_c_bindings in Fortran 2003

Presenter Notes

f2py example

Given a simple source file:

subroutine super_fast_add(a, b, c)
     integer, intent(in)  :: a
     integer, intent(in)  :: b
     integer, intent(out) :: c
     c = a + b
end subroutine

Create and compile the interface module in one command:

$ f2py -c adder.f90 -m adder

Use from Python

>>> import addder
>>> print adder.super_fast_add(1,1)
2

Presenter Notes

cffi

  • CFFI = Common Foreign Function Interface
  • Provides an interface to calling C libraries from Python
  • Keep all Python logic in Python without writing interface files
  • Can work at the API (Application Programming Interface) or ABI (Application Binary Interface)
  • Can also be used with Fortan code at the ABI level thanks to iso_c_binding
  • Easy to use with distutils for distributing software using CFFI

Presenter Notes

Example Using CFFI with C

Straight out of CFFI documentation

from cffi import FFI
ffi = FFI()
ffi.cdef("""     // some declarations from the man page
    struct passwd {
        char *pw_name;
        ...;
    };
    struct passwd *getpwuid(int uid);
""")
C = ffi.verify("""   // passed to the real C compiler
#include <sys/types.h>
#include <pwd.h>
""", libraries=[])   # or a list of libraries to link with
p = C.getpwuid(0)
assert ffi.string(p.pw_name) == 'root'

Presenter Notes

Example Using CFFI with Fortran

subroutine super_fast_add(a, b, c) bind (C, name="fortran_add_test")

     use iso_c_binding
     integer(kind=c_int), intent(in)  :: a
     integer(kind=c_int), intent(in)  :: b
     integer(kind=c_int), intent(out) :: c
     c = a + b
end subroutine
$ gfortran -shared adder_isoc.f90 -o adder_isoc.so
>>> from cffi import FFI
>>> ffi = FFI()
>>> ffi.cdef("void fortran_add_test(int *a, int *b, int *c);")
>>> F = ffi.dlopen("adder_isoc.so")
>>> a = ffi.new("int*", 2)
>>> b = ffi.new("int*", 2)
>>> c = ffi.new("int*")
>>> F.fortran_add_test(a, b, c)
>>> print c[0]
4

Presenter Notes

Other Possibilities for Interfacing

  • Cython - http://www.cython.org/
    • Works at the API level
    • Requires definition of interface files
    • You intertwine Python and C in interface files which are a superset of Python
  • ctypes - http://docs.python.org/2/library/ctypes.html
    • Works at ABI Level
    • Foreign function library for Python
    • Loads shared libraries and sets up ability to call them
    • Built into Python

Presenter Notes

Other Possibilities for Interfacing

  • Weave - http://www.scipy.org/Weave
    • Allows C code to be inlined inside of Python
    • Compiles extension modules on the fly
    • Modules are cached for subsequent use
    • Built into SciPy

Presenter Notes

Other Useful Packages

Presenter Notes

References

Presenter Notes