Code Tutorials


Much of modern science relies on practical computing, and Chemistry is no exception.

Dr. Bonham’s programming language of choice is Python, which is one of the easiest, powerful languages to learn. He’s primarily self-taught from an excellent (and free) book on the subject, A Byte of Python.

Below is a heavily commented and explained snippet of code that performs a non-linear least squares regression to an arbitrary model and evaluates the goodness of fit for a set of data. In other words, it checks how well your data matches some equation that you think it should fit.

It’s a good program to give you a feel for how easily you can achieve things in Python that would be difficult or impossible in Excel, or without using a very expensive fitting package.

To use this code, you’ll need to have Python and a few helper packages on your system.

If you use Windows, I recommend ActivePython 2.7, 32-bit
If you use OS X, I recommend the Python available through Mac Ports
You’ll also need Numerical Python, Scientific Python, and Matplotlib, all for the 2.7 and 32-bit version of python (all also available directly through Mac Ports on OS X).

Once all that is installed, you can run the user-friendly code editing program IDLE.
In IDLE, paste all the below code into a New file. Save it, hit Run in the menus, and it will do its stuff!

So, here’s the code for Non-Linear Fitting: Python for Chemists

## An introduction / explanatory python program for Chemists
## ---------------------------------------------------------
## A common situation you might find yourself in is trying to fit data to a
## non-linear model.  Doing this in Excel can be a pain, and proprietary
## solutions can cost hundreds of dollars.
##
## Instead, we can use python, a free and open source programming language.
## This is an annotate program, explaining what is going on, and how to solve
## problems using python.
##
## In Python, anything following an # is ignored by the computer, so we can
## put all of our comments and explanations behind them.
 
## ---------------------------------------------------------------------------
## First, we will add additional functionality to Python that allows us to do curve 
## fitting.  These modules of additional tools are also available for free.
 
import numpy	# This loads Numerical Python, which is good for math
from scipy.optimize import leastsq    # This loads optimization routines from
                                      # Scientific Python
import pylab	# This loads Math Plotting functions, to make pretty graphs
 
## ----------------------------------------------------------------------------
## Next, we will get into the core of the program.  In programming, you define
## "Functions", or sets of steps to follow.  You can then use them as a shortcut,
## telling the computer to "Make a Sandwich", rather then telling it to "get out bread,
## then get out a knife, etc."
##
## We will need to set up the computer to receive x and y data from the user, then 
## perform a non-linear least-squares regression fit of the data.  Finally, we will tell
## it how to make pretty graphs of the data for the user to see.
 
## ------------------------------------------------------------------------------
## Let's start with a function/shortcut to collect user data.
 
def GetData():  # this line tells the computer that the shorthand for all these steps
				# is called "GetData".  When I later type "GetData()", I mean
				# "do all this stuff".
 
	print("Non-Linear Fitting for Dose-Response Data\n")	# this shows a nice title on
								# the screen to start
 
	input_x = raw_input("Type X values, comma seperated, then hit return\n")
				# this line does a lot!  It prints a prompt for X data to the user,
				# then grabs that data and saves it as "input_x"
				# the user will input data, so it will receive something like:
				# 1,2,3,4,5
 
	input_y = raw_input("Type Y values, comma seperated, then hit return\n")
				# this line then does the same for our Y data
 
	# Now we have a problem.  The data the user typed is still just text to the computer.
	# We need to let the computer know that it is a list of numbers.  That means taking
	# out the commas and storing each value into an array or list (like a shopping list). 
 
	x = numpy.array(input_x.split(','),dtype=float)
		 	# turns the X values (i.e. "1,2,3,4,5") into an array, which
			# we represent as [1,2,3,4,5] , although it actually has
			# no commas-- we could equally well describe it as 
			# [ 1 2 3 4 5]
 
	y = numpy.array(input_y.split(','),dtype=float)
			# this does the same thing for Y values, turning 
			# "10,15,40,80,90" into [ 10 15 40 80 90]
 
	# Now, we have nice arrays of our data.  The next step is to pass this data to
	# a new shortcut/function which will do all the math.
	# It's considered good form to keep all these parts separate, so if you need to
	# change one, you don't have to worry about changing everything.
 
	FittingMath(x,y)	# this calls a new shortcut and hands it our arrays of data
 
	return				# this tells the computer that we're at the end of this shortcut,
					# and it can do something else.
 
## --------------------------------------------------------------------------------
## Next, we'll write a function that tells the computer how to fit the data to a model
 
def FittingMath(x,y):		# this lines tells the computer this shortcut is named
							# FittingMath, and that it will get handed a set of x and y
							# data to act on
 
	# For this example, we'll be fitting dose-response data-- like an increase in 
	# fluorescence as two chemicals react in a concentration-dependent manner
 
	# The first step is telling the computer the equation that the data "should" fit:
	# We'll use a 3 parameter dose response curve, usually called Sigmoidal Dose
	# Response.  It fits the model:
	# Y = Bottom + (Top - Bottom) / (1 + 10^(Log(KD)-X) ) and looks like an "S"
 
	equation = lambda variables,x:	(# this line defines an equation that is a function
                			# of x and a set of variables 
 
				variables[0]  +	# the 0th piece of data in the set of variables is
									# the Bottom value
				(variables[1]-variables[0])	/	# the 1st piece of data in variables 
									# is the Top value
				(1+10**((variables[2]-x))))	        # the 2nd piece of data is the log(KD) value
 
	# Next, we tell the computer how to measure the "goodness" of the fit:  by how close
	# it's guess is to actual values.  We call that the error of the fit.
 
	error = lambda variables,x,y: (	# Error as an equation needs our variables, x, and y
		 equation(variables,x)		# It is simply our predicted value
			-y					)	# minus the actual value
 
	# for this to work, we must give initial guesses for all of our variables
 
	variable_guesses = [	# this is a list of our guesses for the initial variables	
		numpy.min(y),		# Bottom will be the minimum y value
		numpy.max(y),		# Top will be the maximum Y value
		numpy.mean(x)	]	# and Log(Kd) will be the average X value
 
	# Lastly, now that the computer knows the equation, our guesses, and how to look for
	# errors, we can call the function that will fit the data, and grab all of the output
 
 
 
	output = leastsq(                   # We'll use a routine for least-squares regression
					error, 			# using our error equation
					variable_guesses, 	# our initial variable guesses
					args=(x,y),		# and our X and Y data
					full_output=1 ) 	# and getting lots of information back
 
	# Then, we'll process the out put to:
	variables = output[0]	# grab the final variables that the computer finds and
	fitinfo = output[2]		# information about the quality / goodness of the fit.
 
	## We might also want a standard degree of goodness of fit like R^2.
	## We can call another function/shortcut to find that for us.
 
	r_squared = residuals(y,fitinfo)	# use the shortcut "residuals" to find R^2
 
	## Let's tell the user the answer
	print("The fitted variables are: "+str(variables) )	# Print out the answer
 
	x_range = numpy.arange(		# First, let's split X into a lot of points so that our  
					# curve looks smooth
		numpy.min(x),numpy.max(x),abs(numpy.max(x)/100))
 
	## And then call a function to draw a pretty graph
	Grapher( 				# This function is called Grapher
		x,y,				# and needs our X and Y data to graph
		equation(variables,x_range),	# as well as the predicted data,
		x_range,			# our smoothed data
		variables[2],			# the Kd value,
		r_squared )			# and the R^2 value
 
	return				# this tells the computer that we're at the end of this shortcut,
					# and it can do something else.
 
## --------------------------------------------------------------------------------
## We can quickly define a shortcut/ function to find R^2 values
 
def residuals(y,fitinfo):	# To do so, we need y values and the fit information
 
	fit_error = 0			# We'll start with 0 assumed error and
	fit_variance = 0		# 0 assumed variance
 
	# Then, we'll go through a list of each Y data point, and check the error and
	# the variance
	for i in range(len(fitinfo['fvec'])):		# This complex block tells us that
							# we'll look at each Y data point
 
		fit_error += (fitinfo['fvec'][i])**2	# Here, we add a given point's error to
							# our total error
 
		fit_variance += (y[i]-numpy.mean(y))**2	# Here, we add a give point's variance
							# to our total variance
 
	r_squared = 1 - (fit_error/fit_variance)	# and then solve for our R^2 value
 
	return r_squared		# Tell that R^2 value to the function that asked
 
## ---------------------------------------------------------------------------------
## The last bit is printing an informative graph for the user.
 
def Grapher(					# This shortcut is for drawing a graph
		x,y,					# Using our X and Y data to draw points
		equation,				# the equation to draw a curve
		x_range,				# smoothed data,
		kd_value,r_squared):	# and label it with our Kd and R^2 values
 
 
	## Now, let's define a figure
	fig = pylab.figure(1)				# This is a pylab-made figure
	axis = fig.add_subplot(111)			# It has one set of axis on it
 
	axis.plot(x,y,'ro',label='data')	# We draw x,y data points on it
 
	axis.plot(x_range,equation,label='fit')	# and a smooth curve of our fit
 
	axis.set_xlabel('Concentration')	# The X axis is labeled concentration
	axis.set_ylabel('Intensity')		# and the Y axis is labeled Intensity
	axis.set_title("Dose Response Fit")	# Give the window a Title
 
 
	# Let's make room on the plot for our axis labels
	fig.subplots_adjust(bottom=0.15)
	fig.subplots_adjust(left=0.15)
 
	# Then add some text listing the Kd and R^2
	text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value,
							 transform=axis.transAxes, va='top')
 
	rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared,
							   transform=axis.transAxes, va='top')
 
	# Lastly, let's display the graph to the user
	pylab.show()
 
 
 
## ----------------------------------------------------------------------------------
## Everything till now has told the computer _how_ to do things, but hasn't told it
## to actually _do_ anything.  Let's change that.
 
if __name__ == '__main__':		#If we're running this program...
	GetData()			# then activate the GetData shortcut and start the show!

That looks like a lot of code, but almost all of it is comments, explanations, and advice. If we remove that and leave just the program, it looks much smaller:

import numpy
from scipy.optimize import leastsq
import pylab
 
def GetData():
	print("Non-Linear Fitting for Dose-Response Data\n")
	input_x = raw_input("Type X values, comma seperated, then hit return\n")
	input_y = raw_input("Type Y values, comma seperated, then hit return\n")
 
	x = numpy.array(input_x.split(','),dtype=float)
	y = numpy.array(input_y.split(','),dtype=float)
 
	FittingMath(x,y)
	return
 
def FittingMath(x,y):
	equation = lambda variables,x:	(variables[0]+(variables[1]-variables[0])/(1+10**((variables[2]-x))))
	error = lambda variables,x,y: (equation(variables,x)-y)
	variable_guesses = [numpy.min(y),numpy.max(y),numpy.mean(x)]
 
	output = leastsq(error,variable_guesses,args=(x,y),full_output=1 )
	variables = output[0]
	fitinfo = output[2]
	r_squared = residuals(y,fitinfo)
 
	print("The fitted variables are: "+str(variables) )
 
	x_range = numpy.arange(numpy.min(x),numpy.max(x),abs(numpy.max(x)/100))
 
	Grapher(x,y,equation(variables,x_range),x_range,variables[2],r_squared)
	return
 
def residuals(y,fitinfo):
	fit_error = 0
	fit_variance = 0
 
	for i in range(len(fitinfo['fvec'])):
		fit_error += (fitinfo['fvec'][i])**2
		fit_variance += (y[i]-numpy.mean(y))**2
	r_squared = 1 - (fit_error/fit_variance)
	return r_squared
 
def Grapher(x,y,equation,x_range,kd_value,r_squared):
	fig = pylab.figure(1)
	axis = fig.add_subplot(111)
	axis.plot(x,y,'ro',label='data')
	axis.plot(x_range,equation,label='fit')
	axis.set_xlabel('Concentration')
	axis.set_ylabel('Intensity')
	axis.set_title("Dose Response Fit")
	fig.subplots_adjust(bottom=0.15)
	fig.subplots_adjust(left=0.15)
	text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value,transform=axis.transAxes, va='top')
	rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared,transform=axis.transAxes, va='top')
	pylab.show()
 
if __name__ == '__main__':
	GetData()