# Uppyn's Blog

The world through Dutch eyes in Sydney

## Learning R – simple linear regression

April 3, 2010

Posted by on Basic linear regression is very important in engineering and science. In linear regression one tries to fit a line to whatever data you are sitting on top of. The objective is off course to investigate if that line would reveal a meaningful relationship between the variables of your data. The linear regression can also be used to test a hypothesis that tells you if the new-found relationship is statistically significant.

Depending on the format of your data, it is pretty easy to do a simple linear regression in R. To do that, we first need to get some data. In this case I use a simple data file that contains measurements of the boiling point of water as a function of height. The data is available on the internet but I downloaded to my hard disk. Here is the content of the file:

BPt Pressure

194.5 20.79

194.3 20.79

197.9 22.4

198.4 22.67

199.4 23.15

199.9 23.35

200.9 23.89

201.1 23.99

201.4 24.02

201.3 24.01

203.6 25.14

204.6 26.57

209.5 28.49

208.6 27.76

210.7 29.04

211.9 29.88

212.2 30.06

The following command reads the data file into R and associates the data with the variable ‘Alps’ (R is case sensitive!):

Alps<-read.table(“alps-R (1).dat”, header=TRUE)

I then need to attach the data so that I can refer to the data as ‘Bpt’ and ‘Pressure’ instead of the longer ‘Alps$Bpt’ and ‘Alps$Pressure’.

attach(Alps)

The result is that the commands ‘Pressure’ and ‘Alps$Pressure’ result in the same:

> Pressure

[1] 20.79 20.79 22.40 22.67 23.15 23.35 23.89 23.99 24.02 24.01 25.14 26.57

[13] 28.49 27.76 29.04 29.88 30.06

To visualise the data I can say:

Plot(Alps)

And I get, as expected, this standard plot:

Now I would like to make a linear model in R. This turns out to be fairly easy once you understand the R notation. I need to tell R the following:

Alps.lm=glm(Pressure ~ BPt)

Alps.lm creates an object that will contain everything that is I need for my linear model. I call a standard R function ‘glm’ and this will create the linear model with Pressure as the dependent variable and Bpt as the independent variable. The operator ~ tells R what is dependent and independent.

I can now ask R for a summary of this generalised linear model:

Summary(Alps.lm)

And I get a lot:

Call:

glm(formula = Pressure ~ BPt)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.25717 -0.11246 -0.05102 0.14283 0.64994

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -81.06373 2.05182 -39.51 <2e-16 ***

BPt 0.52289 0.01011 51.74 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

(Dispersion parameter for gaussian family taken to be 0.05420953)

Null deviance: 145.93778 on 16 degrees of freedom

Residual deviance: 0.81314 on 15 degrees of freedom

AIC: 2.5629

Number of Fisher Scoring iterations:

Without going into the details, but summary tells me what is the coefficient of the line I am trying to fit (0.52) and also the value where the line will intercept the Y-axis (-81.06). So I can now try to plot this line. R provides a very easy way to do this:

abline(Alps.lm)

And this simple command will add the regression line to my graph:

At this point I have achieved all that I wanted: I have calculated a linear model to my data, I have learner what is the coefficient and the intercept of that line and I have plotted the line onto my graph. All of that with these 5 lines of code:

Alps<-read.table(“alps-R (1).dat”, header=TRUE)

Attach(Alps)

Alps.lm=glm(Pressure ~ BPt)

Plot(Alps)

Abline(Alps.lm)

I would like to see you do that with Excel in just 5 steps!

Advertisements