Much of modern science relies on practical computing, and Chemistry is no exception.

Dr. Bonham’s programming language of choice is Python, which is one of the easiest, powerful languages to learn. He’s primarily self-taught from an excellent (and free) book on the subject, A Byte of Python.

Below is a **heavily** commented and explained snippet of code that performs a non-linear least squares regression to an arbitrary model and evaluates the goodness of fit for a set of data. In other words, it checks how well your data matches some equation that you think it should fit.

It’s a good program to give you a feel for how easily you can achieve things in Python that would be difficult or impossible in Excel, or without using a very expensive fitting package.

To use this code, you’ll need to have Python and a few helper packages on your system.

If you use Windows, I recommend ActivePython 2.7, 32-bit

If you use OS X, I recommend the Python available through Mac Ports

You’ll also need Numerical Python, Scientific Python, and Matplotlib, all for the 2.7 and 32-bit version of python (all also available directly through Mac Ports on OS X).Once all that is installed, you can run the user-friendly code editing program IDLE.

In IDLE, paste all the below code into a New file. Save it, hit Run in the menus, and it will do its stuff!

So, here’s the code for **Non-Linear Fitting: Python for Chemists**

## An introduction / explanatory python program for Chemists ## --------------------------------------------------------- ## A common situation you might find yourself in is trying to fit data to a ## non-linear model. Doing this in Excel can be a pain, and proprietary ## solutions can cost hundreds of dollars. ## ## Instead, we can use python, a free and open source programming language. ## This is an annotate program, explaining what is going on, and how to solve ## problems using python. ## ## In Python, anything following an # is ignored by the computer, so we can ## put all of our comments and explanations behind them. ## --------------------------------------------------------------------------- ## First, we will add additional functionality to Python that allows us to do curve ## fitting. These modules of additional tools are also available for free. import numpy # This loads Numerical Python, which is good for math from scipy.optimize import leastsq # This loads optimization routines from # Scientific Python import pylab # This loads Math Plotting functions, to make pretty graphs ## ---------------------------------------------------------------------------- ## Next, we will get into the core of the program. In programming, you define ## "Functions", or sets of steps to follow. You can then use them as a shortcut, ## telling the computer to "Make a Sandwich", rather then telling it to "get out bread, ## then get out a knife, etc." ## ## We will need to set up the computer to receive x and y data from the user, then ## perform a non-linear least-squares regression fit of the data. Finally, we will tell ## it how to make pretty graphs of the data for the user to see. ## ------------------------------------------------------------------------------ ## Let's start with a function/shortcut to collect user data. def GetData(): # this line tells the computer that the shorthand for all these steps # is called "GetData". When I later type "GetData()", I mean # "do all this stuff". print("Non-Linear Fitting for Dose-Response Data\n") # this shows a nice title on # the screen to start input_x = raw_input("Type X values, comma seperated, then hit return\n") # this line does a lot! It prints a prompt for X data to the user, # then grabs that data and saves it as "input_x" # the user will input data, so it will receive something like: # 1,2,3,4,5 input_y = raw_input("Type Y values, comma seperated, then hit return\n") # this line then does the same for our Y data # Now we have a problem. The data the user typed is still just text to the computer. # We need to let the computer know that it is a list of numbers. That means taking # out the commas and storing each value into an array or list (like a shopping list). x = numpy.array(input_x.split(','),dtype=float) # turns the X values (i.e. "1,2,3,4,5") into an array, which # we represent as [1,2,3,4,5] , although it actually has # no commas-- we could equally well describe it as # [ 1 2 3 4 5] y = numpy.array(input_y.split(','),dtype=float) # this does the same thing for Y values, turning # "10,15,40,80,90" into [ 10 15 40 80 90] # Now, we have nice arrays of our data. The next step is to pass this data to # a new shortcut/function which will do all the math. # It's considered good form to keep all these parts separate, so if you need to # change one, you don't have to worry about changing everything. FittingMath(x,y) # this calls a new shortcut and hands it our arrays of data return # this tells the computer that we're at the end of this shortcut, # and it can do something else. ## -------------------------------------------------------------------------------- ## Next, we'll write a function that tells the computer how to fit the data to a model def FittingMath(x,y): # this lines tells the computer this shortcut is named # FittingMath, and that it will get handed a set of x and y # data to act on # For this example, we'll be fitting dose-response data-- like an increase in # fluorescence as two chemicals react in a concentration-dependent manner # The first step is telling the computer the equation that the data "should" fit: # We'll use a 3 parameter dose response curve, usually called Sigmoidal Dose # Response. It fits the model: # Y = Bottom + (Top - Bottom) / (1 + 10^(Log(KD)-X) ) and looks like an "S" equation = lambda variables,x: (# this line defines an equation that is a function # of x and a set of variables variables[0] + # the 0th piece of data in the set of variables is # the Bottom value (variables[1]-variables[0]) / # the 1st piece of data in variables # is the Top value (1+10**((variables[2]-x)))) # the 2nd piece of data is the log(KD) value # Next, we tell the computer how to measure the "goodness" of the fit: by how close # it's guess is to actual values. We call that the error of the fit. error = lambda variables,x,y: ( # Error as an equation needs our variables, x, and y equation(variables,x) # It is simply our predicted value -y ) # minus the actual value # for this to work, we must give initial guesses for all of our variables variable_guesses = [ # this is a list of our guesses for the initial variables numpy.min(y), # Bottom will be the minimum y value numpy.max(y), # Top will be the maximum Y value numpy.mean(x) ] # and Log(Kd) will be the average X value # Lastly, now that the computer knows the equation, our guesses, and how to look for # errors, we can call the function that will fit the data, and grab all of the output output = leastsq( # We'll use a routine for least-squares regression error, # using our error equation variable_guesses, # our initial variable guesses args=(x,y), # and our X and Y data full_output=1 ) # and getting lots of information back # Then, we'll process the out put to: variables = output[0] # grab the final variables that the computer finds and fitinfo = output[2] # information about the quality / goodness of the fit. ## We might also want a standard degree of goodness of fit like R^2. ## We can call another function/shortcut to find that for us. r_squared = residuals(y,fitinfo) # use the shortcut "residuals" to find R^2 ## Let's tell the user the answer print("The fitted variables are: "+str(variables) ) # Print out the answer x_range = numpy.arange( # First, let's split X into a lot of points so that our # curve looks smooth numpy.min(x),numpy.max(x),abs(numpy.max(x)/100)) ## And then call a function to draw a pretty graph Grapher( # This function is called Grapher x,y, # and needs our X and Y data to graph equation(variables,x_range), # as well as the predicted data, x_range, # our smoothed data variables[2], # the Kd value, r_squared ) # and the R^2 value return # this tells the computer that we're at the end of this shortcut, # and it can do something else. ## -------------------------------------------------------------------------------- ## We can quickly define a shortcut/ function to find R^2 values def residuals(y,fitinfo): # To do so, we need y values and the fit information fit_error = 0 # We'll start with 0 assumed error and fit_variance = 0 # 0 assumed variance # Then, we'll go through a list of each Y data point, and check the error and # the variance for i in range(len(fitinfo['fvec'])): # This complex block tells us that # we'll look at each Y data point fit_error += (fitinfo['fvec'][i])**2 # Here, we add a given point's error to # our total error fit_variance += (y[i]-numpy.mean(y))**2 # Here, we add a give point's variance # to our total variance r_squared = 1 - (fit_error/fit_variance) # and then solve for our R^2 value return r_squared # Tell that R^2 value to the function that asked ## --------------------------------------------------------------------------------- ## The last bit is printing an informative graph for the user. def Grapher( # This shortcut is for drawing a graph x,y, # Using our X and Y data to draw points equation, # the equation to draw a curve x_range, # smoothed data, kd_value,r_squared): # and label it with our Kd and R^2 values ## Now, let's define a figure fig = pylab.figure(1) # This is a pylab-made figure axis = fig.add_subplot(111) # It has one set of axis on it axis.plot(x,y,'ro',label='data') # We draw x,y data points on it axis.plot(x_range,equation,label='fit') # and a smooth curve of our fit axis.set_xlabel('Concentration') # The X axis is labeled concentration axis.set_ylabel('Intensity') # and the Y axis is labeled Intensity axis.set_title("Dose Response Fit") # Give the window a Title # Let's make room on the plot for our axis labels fig.subplots_adjust(bottom=0.15) fig.subplots_adjust(left=0.15) # Then add some text listing the Kd and R^2 text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value, transform=axis.transAxes, va='top') rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared, transform=axis.transAxes, va='top') # Lastly, let's display the graph to the user pylab.show() ## ---------------------------------------------------------------------------------- ## Everything till now has told the computer _how_ to do things, but hasn't told it ## to actually _do_ anything. Let's change that. if __name__ == '__main__': #If we're running this program... GetData() # then activate the GetData shortcut and start the show! |

That looks like a **lot** of code, but almost all of it is comments, explanations, and advice. If we remove that and leave just the program, it looks *much* smaller:

import numpy from scipy.optimize import leastsq import pylab def GetData(): print("Non-Linear Fitting for Dose-Response Data\n") input_x = raw_input("Type X values, comma seperated, then hit return\n") input_y = raw_input("Type Y values, comma seperated, then hit return\n") x = numpy.array(input_x.split(','),dtype=float) y = numpy.array(input_y.split(','),dtype=float) FittingMath(x,y) return def FittingMath(x,y): equation = lambda variables,x: (variables[0]+(variables[1]-variables[0])/(1+10**((variables[2]-x)))) error = lambda variables,x,y: (equation(variables,x)-y) variable_guesses = [numpy.min(y),numpy.max(y),numpy.mean(x)] output = leastsq(error,variable_guesses,args=(x,y),full_output=1 ) variables = output[0] fitinfo = output[2] r_squared = residuals(y,fitinfo) print("The fitted variables are: "+str(variables) ) x_range = numpy.arange(numpy.min(x),numpy.max(x),abs(numpy.max(x)/100)) Grapher(x,y,equation(variables,x_range),x_range,variables[2],r_squared) return def residuals(y,fitinfo): fit_error = 0 fit_variance = 0 for i in range(len(fitinfo['fvec'])): fit_error += (fitinfo['fvec'][i])**2 fit_variance += (y[i]-numpy.mean(y))**2 r_squared = 1 - (fit_error/fit_variance) return r_squared def Grapher(x,y,equation,x_range,kd_value,r_squared): fig = pylab.figure(1) axis = fig.add_subplot(111) axis.plot(x,y,'ro',label='data') axis.plot(x_range,equation,label='fit') axis.set_xlabel('Concentration') axis.set_ylabel('Intensity') axis.set_title("Dose Response Fit") fig.subplots_adjust(bottom=0.15) fig.subplots_adjust(left=0.15) text = axis.text(0.05, 0.95, 'Kd: %s'% kd_value,transform=axis.transAxes, va='top') rtext = axis.text(0.05, 0.90, 'R^2: %s'% r_squared,transform=axis.transAxes, va='top') pylab.show() if __name__ == '__main__': GetData() |