Much of modern science relies on practical computing, and Chemistry is no exception.
Dr. Bonham’s programming language of choice is Python, which is one of the easiest, powerful languages to learn. He’s primarily self-taught from an excellent (and free) book on the subject, A Byte of Python.
Below is a heavily commented and explained snippet of code that performs a non-linear least squares regression to an arbitrary model and evaluates the goodness of fit for a set of data. In other words, it checks how well your data matches some equation that you think it should fit.
It’s a good program to give you a feel for how easily you can achieve things in Python that would be difficult or impossible in Excel, or without using a very expensive fitting package.
To use this code, you’ll need to have Python and a few helper packages on your system.
I recommend Anaconda Python 3.6, a free and very well-maintained and user-friendly package of Python and assocciated tools for data science.
Once it is installed, you can run Microsoft Visual Studio Code (vscode) to input and edit python files.
Paste all the below code into a New file. Save it, run the program (Debugging -> Start Debugging, or F5), and it will do its stuff!
You can see an online implementation of this code at http://doseresponse.bonhamcode.com
So, here’s the code for Non-Linear Fitting: Python for Chemists
## An introduction / explanatory python program for Chemists
## ---------------------------------------------------------
## A common situation you might find yourself in is trying to fit data to a
## non-linear model. Doing this in Excel can be a pain, and proprietary
## solutions can cost hundreds of dollars.
##
## Instead, we can use python, a free and open source programming language.
## This is an annotate program, explaining what is going on, and how to solve
## problems using python.
##
## In Python, anything following an # is ignored by the computer, so we can
## put all of our comments and explanations behind them.
## ---------------------------------------------------------------------------
## First, we will add additional functionality to Python that allows us to do curve
## fitting. These modules of additional tools are also available for free.
import numpy # This loads Numerical Python, which is good for math
from scipy.optimize import leastsq # This loads optimization routines from
# Scientific Python
import pylab # This loads Math Plotting functions, to make pretty graphs
## ----------------------------------------------------------------------------
## Next, we will get into the core of the program. In programming, you define
## "Functions", or sets of steps to follow. You can then use them as a shortcut,
## telling the computer to "Make a Sandwich", rather then telling it to "get out bread,
## then get out a knife, etc."
##
## We will need to set up the computer to receive x and y data from the user, then
## perform a non-linear least-squares regression fit of the data. Finally, we will tell
## it how to make pretty graphs of the data for the user to see.
## ------------------------------------------------------------------------------
## Let's start with a function/shortcut to collect user data.
def GetData(): # this line tells the computer that the shorthand for all these steps
# is called "GetData". When I later type "GetData()", I mean
# "do all this stuff".
print("Non-Linear Fitting for Dose-Response Data\n") # this shows a nice title on
# the screen to start
input_x = raw_input("Type X values, comma seperated, then hit return\n")
# this line does a lot! It prints a prompt for X data to the user,
# then grabs that data and saves it as "input_x"
# the user will input data, so it will receive something like:
# 1,2,3,4,5
input_y = raw_input("Type Y values, comma seperated, then hit return\n")
# this line then does the same for our Y data
# Now we have a problem. The data the user typed is still just text to the computer.
# We need to let the computer know that it is a list of numbers. That means taking
# out the commas and storing each value into an array or list (like a shopping list).
x = numpy.array(input_x.split(','),dtype=float)
# turns the X values (i.e. "1,2,3,4,5") into an array, which
# we represent as [1,2,3,4,5] , although it actually has
# no commas-- we could equally well describe it as
# [ 1 2 3 4 5]
y = numpy.array(input_y.split(','),dtype=float)
# this does the same thing for Y values, turning
# "10,15,40,80,90" into [ 10 15 40 80 90]
# Now, we have nice arrays of our data. The next step is to pass this data to
# a new shortcut/function which will do all the math.
# It's considered good form to keep all these parts separate, so if you need to
# change one, you don't have to worry about changing everything.
FittingMath(x,y) # this calls a new shortcut and hands it our arrays of data
return # this tells the computer that we're at the end of this shortcut,
# and it can do something else.
## --------------------------------------------------------------------------------
## Next, we'll write a function that tells the computer how to fit the data to a model
def FittingMath(x,y): # this lines tells the computer this shortcut is named
# FittingMath, and that it will get handed a set of x and y
# data to act on
# For this example, we'll be fitting dose-response data-- like an increase in
# fluorescence as two chemicals react in a concentration-dependent manner
# The first step is telling the computer the equation that the data "should" fit:
# We'll use a 3 parameter dose response curve, usually called Sigmoidal Dose
# Response. It fits the model:
# Y = Bottom + (Top - Bottom) / (1 + 10^(Log(KD)-X) ) and looks like an "S"
equation = lambda variables,x: (# this line defines an equation that is a function
# of x and a set of variables
variables[0] + # the 0th piece of data in the set of variables is
# the Bottom value
(variables[1]-variables[0]) / # the 1st piece of data in variables
# is the Top value
(1+10**((variables[2]-x)))) # the 2nd piece of data is the log(KD) value
# Next, we tell the computer how to measure the "goodness" of the fit: by how close
# it's guess is to actual values. We call that the error of the fit.
error = lambda variables,x,y: ( # Error as an equation needs our variables, x, and y
equation(variables,x) # It is simply our predicted value
-y ) # minus the actual value
# for this to work, we must give initial guesses for all of our variables
variable_guesses = [ # this is a list of our guesses for the initial variables
numpy.min(y), # Bottom will be the minimum y value
numpy.max(y), # Top will be the maximum Y value
numpy.mean(x) ] # and Log(Kd) will be the average X value
# Lastly, now that the computer knows the equation, our guesses, and how to look for
# errors, we can call the function that will fit the data, and grab all of the output
output = leastsq( # We'll use a routine for least-squares regression
error, # using our error equation
variable_guesses, # our initial variable guesses
args=(x,y), # and our X and Y data
full_output=1 ) # and getting lots of information back
# Then, we'll process the out put to:
variables = output[0] # grab the final variables that the computer finds and
fitinfo = output[2] # information about the quality / goodness of the fit.
## We might also want a standard degree of goodness of fit like R^2.
## We can call another function/shortcut to find that for us.
r_squared = residuals(y,fitinfo) # use the shortcut "residuals" to find R^2
## Let's tell the user the answer
print("The fitted variables are: "+str(variables) ) # Print out the answer
x_range = numpy.arange( # First, let's split X into a lot of points so that our
# curve looks smooth
numpy.min(x),numpy.max(x),abs(numpy.max(x)/100))
## And then call a function to draw a pretty graph
Grapher( # This function is called Grapher
x,y, # and needs our X and Y data to graph
equation(variables,x_range), # as well as the predicted data,
x_range, # our smoothed data
variables[2], # the Kd value,
r_squared ) # and the R^2 value
return # this tells the computer that we're at the end of this shortcut,
# and it can do something else.
## --------------------------------------------------------------------------------
## We can quickly define a shortcut/ function to find R^2 values
def residuals(y,fitinfo): # To do so, we need y values and the fit information
fit_error = 0 # We'll start with 0 assumed error and
fit_variance = 0 # 0 assumed variance
# Then, we'll go through a list of each Y data point, and check the error and
# the variance
for i in range(len(fitinfo['fvec'])): # This complex block tells us that
# we'll look at each Y data point
fit_error += (fitinfo['fvec'][i])**2 # Here, we add a given point's error to
# our total error
fit_variance += (y[i]-numpy.mean(y))**2 # Here, we add a give point's variance
# to our total variance
r_squared = 1 - (fit_error/fit_variance) # and then solve for our R^2 value
return r_squared # Tell that R^2 value to the function that asked
## ---------------------------------------------------------------------------------
## The last bit is printing an informative graph for the user.
def Grapher( # This shortcut is for drawing a graph
x,y, # Using our X and Y data to draw points
equation, # the equation to draw a curve
x_range, # smoothed data,
kd_value,r_squared): # and label it with our Kd and R^2 values
## Now, let's define a figure
fig = pylab.figure(1) # This is a pylab-made figure
axis = fig.add_subplot(111) # It has one set of axis on it
axis.plot(x,y,'ro',label='data') # We draw x,y data points on it
axis.plot(x_range,equation,label='fit') # and a smooth curve of our fit
axis.set_xlabel('Concentration') # The X axis is labeled concentration
axis.set_ylabel('Intensity') # and the Y axis is labeled Intensity
axis.set_title("Dose Response Fit") # Give the window a Title
# Let's make room on the plot for our axis labels
fig.subplots_adjust(bottom=0.15)
fig.subplots_adjust(left=0.15)
# Then add some text listing the Kd and R^2
text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value,
transform=axis.transAxes, va='top')
rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared,
transform=axis.transAxes, va='top')
# Lastly, let's display the graph to the user
pylab.show()
## ----------------------------------------------------------------------------------
## Everything till now has told the computer _how_ to do things, but hasn't told it
## to actually _do_ anything. Let's change that.
if __name__ == '__main__': #If we're running this program...
GetData() # then activate the GetData shortcut and start the show!
That looks like a lot of code, but almost all of it is comments, explanations, and advice. If we remove that and leave just the program, it looks much smaller:
import numpy
from scipy.optimize import leastsq
import pylab
def GetData():
print("Non-Linear Fitting for Dose-Response Data\n")
input_x = raw_input("Type X values, comma seperated, then hit return\n")
input_y = raw_input("Type Y values, comma seperated, then hit return\n")
x = numpy.array(input_x.split(','),dtype=float)
y = numpy.array(input_y.split(','),dtype=float)
FittingMath(x,y)
return
def FittingMath(x,y):
equation = lambda variables,x: (variables[0]+(variables[1]-variables[0])/(1+10**((variables[2]-x))))
error = lambda variables,x,y: (equation(variables,x)-y)
variable_guesses = [numpy.min(y),numpy.max(y),numpy.mean(x)]
output = leastsq(error,variable_guesses,args=(x,y),full_output=1 )
variables = output[0]
fitinfo = output[2]
r_squared = residuals(y,fitinfo)
print("The fitted variables are: "+str(variables) )
x_range = numpy.arange(numpy.min(x),numpy.max(x),abs(numpy.max(x)/100))
Grapher(x,y,equation(variables,x_range),x_range,variables[2],r_squared)
return
def residuals(y,fitinfo):
fit_error = 0
fit_variance = 0
for i in range(len(fitinfo['fvec'])):
fit_error += (fitinfo['fvec'][i])**2
fit_variance += (y[i]-numpy.mean(y))**2
r_squared = 1 - (fit_error/fit_variance)
return r_squared
def Grapher(x,y,equation,x_range,kd_value,r_squared):
fig = pylab.figure(1)
axis = fig.add_subplot(111)
axis.plot(x,y,'ro',label='data')
axis.plot(x_range,equation,label='fit')
axis.set_xlabel('Concentration')
axis.set_ylabel('Intensity')
axis.set_title("Dose Response Fit")
fig.subplots_adjust(bottom=0.15)
fig.subplots_adjust(left=0.15)
text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value,transform=axis.transAxes, va='top')
rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared,transform=axis.transAxes, va='top')
pylab.show()
if __name__ == '__main__':
GetData()