# Uppyn's Blog

The world through Dutch eyes in Sydney

## Learning R – simple linear regression

Basic linear regression is very important in engineering and science. In linear regression one tries to fit a line to whatever data you are sitting on top of. The objective is off course to investigate if that line would reveal a meaningful relationship between the variables of your data. The linear regression can also be used to test a hypothesis that tells you if the new-found relationship is statistically significant.

Depending on the format of your data, it is pretty easy to do a simple linear regression in R. To do that, we first need to get some data. In this case I use a simple data file that contains measurements of the boiling point of water as a function of height. The data is available on the internet but I downloaded to my hard disk. Here is the content of the file:

Alps<-read.table(“alps-R (1).dat”, header=TRUE)

I then need to attach the data so that I can refer to the data as ‘Bpt’ and ‘Pressure’ instead of the longer ‘Alps$Bpt’ and ‘Alps$Pressure’.

attach(Alps)

The result is that the commands ‘Pressure’ and ‘Alps$Pressure’ result in the same:

> Pressure

[1] 20.79 20.79 22.40 22.67 23.15 23.35 23.89 23.99 24.02 24.01 25.14 26.57

[13] 28.49 27.76 29.04 29.88 30.06

To visualise the data I can say:

Plot(Alps)

And I get, as expected, this standard plot:

Now I would like to make a linear model in R. This turns out to be fairly easy once you understand the R notation. I need to tell R the following:

Alps.lm=glm(Pressure ~ BPt)

Alps.lm creates an object that will contain everything that is I need for my linear model. I call a standard R function ‘glm’ and this will create the linear model with Pressure as the dependent variable and Bpt as the independent variable. The operator ~ tells R what is dependent and independent.

I can now ask R for a summary of this generalised linear model:

Summary(Alps.lm)

And I get a lot:

Call:

glm(formula = Pressure ~ BPt)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.25717 -0.11246 -0.05102 0.14283 0.64994

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -81.06373 2.05182 -39.51 <2e-16 ***

BPt 0.52289 0.01011 51.74 <2e-16 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

(Dispersion parameter for gaussian family taken to be 0.05420953)

Null deviance: 145.93778 on 16 degrees of freedom

Residual deviance: 0.81314 on 15 degrees of freedom

AIC: 2.5629

Number of Fisher Scoring iterations:

Without going into the details, but summary tells me what is the coefficient of the line I am trying to fit (0.52) and also the value where the line will intercept the Y-axis (-81.06). So I can now try to plot this line. R provides a very easy way to do this:

abline(Alps.lm)

And this simple command will add the regression line to my graph:

At this point I have achieved all that I wanted: I have calculated a linear model to my data, I have learner what is the coefficient and the intercept of that line and I have plotted the line onto my graph. All of that with these 5 lines of code:

Alps<-read.table(“alps-R (1).dat”, header=TRUE)

Attach(Alps)

Alps.lm=glm(Pressure ~ BPt)

Plot(Alps)

Abline(Alps.lm)

I would like to see you do that with Excel in just 5 steps!