Introduction to Python

From CSDMS

Introduction

At the 2012 NCED Summer Institute in Earth Surface Dynamics (SIESD), we realized that there is a large community desire to transition to open source programming languages and tools, but that a roadblock is the lack of a coherent set of information to allow individual scientists to act on these desires without needing to invest a large amount of time. Here I will introduce programming in Python, a fairly intuitive open-source interpreted programming language, and some basic finite difference numerical methods through a simple example of a linear-diffusive hillslope.

Python

What is it?

Python is a general-purpose, high-level, interpreted programming language. This means that:

  • General-purpose: You can use Python for data analysis, numerical modeling, mapping and GIS, web design, interfacing with computer hardware, plotting and graphics, database management, image processing, and certainly lots of additional applications that scientists do not often have need to use. These applications are embodied in specific modules that can be imported into your Python script.
  • High-level: Python uses array operations and pre-constructed functions to allow you to write your code in many fewer lines than would be required in a low-level language like C.
  • Interpreted: Some languages use a compiler to convert what you write into bytecode that can be interpreted by the computer. Python is "interpreted", meaning that it calls pre-compiled chunks of code that are combined to run the program. Other common interpreted languages are Bash and Matlab. Interpreted languages are easier and faster, but can be slower for computationally-intensive applications. However, your own compiled C code can be included in Python scripts for these applications, and the ease of use is a huge advantage!

Downloading and Installing

We will be using Python 2.7, the Scipy and Numpy packages that include tools for many common scientific and numerical applications and n-dimensional arrays, and the Matplotlib package that comprises a range of plotting tools.

I also highly recommend:

  • IPython, a high-quality python shell (terminal window with syntax highlighting, tab-completion, etc.)
  • A nice text editor that does syntax highlighting: gedit for any OS (though most commonly on Linux), notepad++ for Windows, and TextWrangler for Mac all seem good choices.

In addition, some colleagues of mine really like Spyder, which is a all-inclusive interface to edit and execute Python code, and is reminiscent of Matlab.

The following sections contain instructions to install these on the three major operating systems:

Windows

This is a cross-platform guide to installing the enthought Python distribution. Haven't tested it yet.

Mac

Probably best to use Anaconda distribution, which you can download from: https://www.anaconda.com/distribution/

Linux

In Linux, the easiest way to install Python and its packages is with the package manager. This, as you probably already know, installs the required tools and all of their dependencies (programs that must be installed in order that the tool that you want to use works properly). I will be providing instructions for Ubuntu, but these should be easily adaptable to Debian and fairly similar to those for Red Hat / Fedora.

sudo apt-get install python2.7 python-numpy python-scipy python-matplotlib

You will have to enter your password for these programs to be installed.

I like the default gedit, but you may have your own favorite syntax-highlighting text editor (e.g., Kate).

To obtain IPython and Spyder, you type, respectively:

sudo apt-get install ipython
sudo apt-get install spyder

Nice and simple!

(As an alternative for those who run Ubuntu and like graphical interfaces, just open the Ubuntu Software Center and click "install" on Python, Numpy, Scipy, Matplotlib, IPython, and Spyder.)

Helpful tools and information

  • iPython
  • Spyder
  • Mathesaurus Matlab-Numpy

How to Write a Model in Python

Theory

Numerical models are fundamentally a computer-readable way to interpret rules that we have provided about how the natural world should work. Here, we are creating a simple model of a hillslope profile with rivers incising on either end. We don't worry about regolith development or any specific hillslope processes. In our case, these rules take the form of partial differential equations. It is extremely important to have a full grasp of and—insofar as it is possible—analytical solution for these equations before beginning.

Here, we make the simple assumption that regolith discharge is a linear function of slope:

[math]\displaystyle{ Q = - k S }[/math]

Here, [math]\displaystyle{ Q }[/math] is regolith discharge [square meters per year], [math]\displaystyle{ S }[/math] is slope [unitless], and [math]\displaystyle{ k }[/math] is a constant of proportionality that scales slope to discharge. The negative sign is present because we want positive discharge where there are negative (i.e. downhill) slopes (or, in other words, elevation decreasing as [math]\displaystyle{ x }[/math]-distance increases).

We want all of our equations in terms of elevation [math]\displaystyle{ z }[/math], and distance across the hillslope profile, [math]\displaystyle{ x }[/math]. Substituting these into [math]\displaystyle{ S }[/math] gives us:

[math]\displaystyle{ Q = - k \frac{dz}{dx} }[/math]

Hillslope evolution occurs when we combine these discharges to conserve mass. We use a continuity equation for conservation of volume to state that:

[math]\displaystyle{ \frac{\partial z}{\partial t} = - \frac{\partial Q}{\partial x} }[/math]

Combining these two terms together gives us the diffusion equation:

[math]\displaystyle{ \frac{\partial z}{\partial t} = \frac{\partial}{\partial x} \left(k \frac{\partial z}{\partial x}\right) }[/math]

If we assume that the hillslope diffusivity, [math]\displaystyle{ k }[/math], is uniform across the landscape, we obtain the diffusion equation in its more familiar form:

[math]\displaystyle{ \frac{\partial z}{\partial t} = k \frac{\partial^2 z}{\partial x^2} }[/math]

Source code

First, I will write the whole source code. Then I will break it down into sections where I explain each component. Then I will present a new model that uses an implicit matrix-style solution instead of a foward-time-stepping solution, and write about numerical (finite difference) methods.

#! /usr/bin/python

# Import modules
import numpy as np
from matplotlib import pyplot as plt

"""
Copyright 2012 Andrew D. Wickert

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
"""

################################################################################
# hill.py
# 
# HILLSLOPE PROFILE
# 
# Center of hill is at 0, two incising channels at steady rate at edges
# no channel-hillslope feedback and assuming entire hill is made of mobile 
# regolith
# 
# This is a very simple example code in which variables are defined inline, 
# and the equations are solved by forward difference methods without 
# consideration for stability
# 
# Written by Andrew D. Wickert; August, 2013
################################################################################

# We will have the simple boundary condition of rivers on either end of the 
# hillslope that are incising at the same, steady rate.
zdot_channel = 2E-4 # [m], 0.5 mm/yr boundary incision

# time
dt = 100. # [years], the time step (from guessing, not from stability analysis)
t_final = 4E5 # [years], the final time for hillslope evolution
timesteps = np.arange(0,t_final + dt/10.,dt) # [years]

# Set up domain
dx = 10 # [m]
xmax = 500 # [m]
x = np.arange(-xmax, xmax + dx/10., dx) # [m], +dx/10. to make sure that edges are included
z = t_final * zdot_channel * np.ones(x.shape) # [m], elevation in meters - set such that the edges are 0 at t_final

# We're using a very simplified assumption that the rate of hillslope 
# material transport is linearly proportional to local slope, and that this 
# constant of proportionality is constant across the hill
k = 5E-3 # Slope -- soil flux scaling

# Loop through time and evolve elevation
for t in timesteps:
  # The channels set the boundaries
  z[0] -= zdot_channel * dt
  z[-1] -= zdot_channel * dt
  # for np.diff, out[n] = a[n+1] - a[n]
  # We are calculating the slopes between each of the cells, then using 
  # these to solve for discharge of material between interior cells.
  S = np.diff(z) # [-] slope
  Q = -k*S # [m**2/yr] discharge, goes downslope, hence the negative sign # POSITIVE - WHY? CANCELLED OUT AGAIN!
  # The change in internal elevation, due to conservation of mass, is equal 
  # once again to the x-defivative
  dzdt_interior = np.diff(Q)
  z[1:-1] -= dzdt_interior * dt
  # (It would have been quicker to take both derivatives at once, but would be 
  # somewhat less intuitive for descriptive purposes)

# Plot
plt.plot(x,z,'k-',linewidth=3)
plt.title('Final hillslope profile', fontsize=20, weight='bold')
plt.xlabel('Distance across hillside [m]', fontsize=16)
plt.ylabel('Elevation [m]', fontsize=16)
plt.xlim((x[0],x[-1]))
plt.show()

The "Shebang"

This script starts with:

#! /usr/bin/python

This is a "shebang" that tells the program where to look for the Python interpreter on Unix-like systems (the most popular being Mac and Linux, the former more popular for desktop applications and the latter more popular for supercomputers). You can find where Python is located by typing:

which python

at the Unix command prompt (i.e. terminal).

Module Import (and Namespaces)

# Import modules
import numpy as np
from matplotlib import pyplot as plt

Python is a general-purpose language, which means that for specific functionalities, you have to import modules. These modules are sets of functions that share a namespace (np and plt, respectively). Numpy does general mathematical and n-dimensional array operations, and Pyplot does basic plotting. "Sharing a namespace" means that to call functions within these modules, we must use the np and plt prefixes, e.g., to plot a parabola:

a = np.array([[0,1,2,3,4],[0,1,4,9,16]]) # Create an array with 2 rows and 3 columns - nested sets of brackets add dimensions to the array
plt.plot(a[0,:], a[1,:], 'ko-') # Plot the first row of "a" as ''x''-values and the second row as ''y''-values with a dashed line between points
plt.show() # Show the plot in a display screen

Namespaces are important, because it means that you cannot easily accidentally overwrite the name of a function by giving a variable that name. For example, typing "np.mean = 1.5" seems a less-likely accident than writing "mean = 1.5".

Copyright Notice

The aforementioned code is trivial, but lately (2012) it seems that big, ugly patent wars are being fought over trivial chunks of code. So better to be safe than sorry. Below is text included for version 3 of the GNU General Public License (GPL). This is an open-source "copyleft" license, meaning that if this code is included in a project, the project must be open-source with a "copyleft" license as well. This is a very good license for science, which is a community-driven collaborative effort. To give your code the GNU GPL v3, include the following text (though replacing my name with yours in the copyright, unless you are feeling generous).

In order for your code to be included in CSDMS, you need to include some type of open-source license, even if it is not as strong a license as the GPL v3.

"""
Copyright 2012 Andrew D. Wickert

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
"""

On a technical note, the triple quotes at the beginning and end of the block are used in Python to create blocks of text as strings. These can be used later inside functions, blocks of code designed for a single purpose, to provide accessible documentation. For now, since we are not assigning this string anywhere, it simply functions as a comment.

Introductory Comments and Authorship

This section is huge overkill for this short of a model, but I put it in here to be complete.

################################################################################
# hill.py
# 
# HILLSLOPE PROFILE
# 
# Center of hill is at 0, two incising channels at steady rate at edges
# no channel-hillslope feedback and assuming entire hill is made of mobile 
# regolith
# 
# This is a very simple example code in which variables are defined inline, 
# and the equations are solved by forward difference methods without 
# consideration for stability
# 
# Written by Andrew D. Wickert; August, 2013
################################################################################

Initialization

# We will have the simple boundary condition of rivers on either end of the 
# hillslope that are incising at the same, steady rate.
zdot_channel = 2E-4 # [m], 0.5 mm/yr boundary incision

# time
dt = 100. # [years], the time step (from guessing, not from stability analysis)
t_final = 4E5 # [years], the final time for hillslope evolution
timesteps = np.arange(0,t_final + dt/10.,dt) # [years]

# Set up domain
dx = 10 # [m]
xmax = 500 # [m]
x = np.arange(-xmax, xmax + dx/10., dx) # [m], +dx/10. to make sure that edges are included
z = t_final * zdot_channel * np.ones(x.shape) # [m], elevation in meters - set such that the edges are 0 at t_final

# We're using a very simplified assumption that the rate of hillslope 
# material transport is linearly proportional to local slope, and that this 
# constant of proportionality is constant across the hill
k = 5E-3 # Slope -- soil flux scaling

Main Loop

# Loop through time and evolve elevation
for t in timesteps:
  # The channels set the boundaries
  z[0] -= zdot_channel * dt
  z[-1] -= zdot_channel * dt
  # for np.diff, out[n] = a[n+1] - a[n]
  # We are calculating the slopes between each of the cells, then using 
  # these to solve for discharge of material between interior cells.
  S = np.diff(z) # [-] slope
  Q = -k*S # [m**2/yr] discharge, goes downslope, hence the negative sign # POSITIVE - WHY? CANCELLED OUT AGAIN!
  # The change in internal elevation, due to conservation of mass, is equal 
  # once again to the x-defivative
  dzdt_interior = np.diff(Q)
  z[1:-1] -= dzdt_interior * dt
  # (It would have been quicker to take both derivatives at once, but would be 
  # somewhat less intuitive for descriptive purposes)

Plotting

# Plot
plt.plot(x,z,'k-',linewidth=3)
plt.title('Final hillslope profile', fontsize=20, weight='bold')
plt.xlabel('Distance across hillside [m]', fontsize=16)
plt.ylabel('Elevation [m]', fontsize=16)
plt.xlim((x[0],x[-1]))
plt.show()