Uppyn's Blog

The world through Dutch eyes in Sydney

Learning R – simple linear regression

Basic linear regression is very important in engineering and science. In linear regression one tries to fit a line to whatever data you are sitting on top of. The objective is off course to investigate if that line would reveal a meaningful relationship between the variables of your data. The linear regression can also be used to test a hypothesis that tells you if the new-found relationship is statistically significant.

Depending on the format of your data, it is pretty easy to do a simple linear regression in R. To do that, we first need to get some data. In this case I use a simple data file that contains measurements of the boiling point of water as a function of height. The data is available on the internet but I downloaded to my hard disk. Here is the content of the file:

BPt Pressure
194.5 20.79
194.3 20.79
197.9 22.4
198.4 22.67
199.4 23.15
199.9 23.35
200.9 23.89
201.1 23.99
201.4 24.02
201.3 24.01
203.6 25.14
204.6 26.57
209.5 28.49
208.6 27.76
210.7 29.04
211.9 29.88
212.2 30.06
The following command reads the data file into R and associates the data with the variable ‘Alps’ (R is case sensitive!):

Alps<-read.table(“alps-R (1).dat”, header=TRUE)

I then need to attach the data so that I can refer to the data as ‘Bpt’ and ‘Pressure’ instead of the longer ‘Alps$Bpt’ and ‘Alps$Pressure’.


The result is that the commands ‘Pressure’ and ‘Alps$Pressure’ result in the same:

> Pressure

[1] 20.79 20.79 22.40 22.67 23.15 23.35 23.89 23.99 24.02 24.01 25.14 26.57

[13] 28.49 27.76 29.04 29.88 30.06

To visualise the data I can say:


And I get, as expected, this standard plot:

Now I would like to make a linear model in R. This turns out to be fairly easy once you understand the R notation. I need to tell R the following:

Alps.lm=glm(Pressure ~ BPt)

Alps.lm creates an object that will contain everything that is I need for my linear model. I call a standard R function ‘glm’ and this will create the linear model with Pressure as the dependent variable and Bpt as the independent variable. The operator ~ tells R what is dependent and independent.

I can now ask R for a summary of this generalised linear model:


And I get a lot:


glm(formula = Pressure ~ BPt)

Deviance Residuals:

Min        1Q    Median        3Q       Max

-0.25717  -0.11246  -0.05102   0.14283   0.64994


Estimate         Std. Error t value   Pr(>|t|)

(Intercept) -81.06373    2.05182  -39.51   <2e-16 ***

BPt           0.52289    0.01011   51.74   <2e-16 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

(Dispersion parameter for gaussian family taken to be 0.05420953)

Null deviance: 145.93778  on 16  degrees of freedom

Residual deviance:   0.81314  on 15  degrees of freedom

AIC: 2.5629

Number of Fisher Scoring iterations:

Without going into the details, but summary tells me what is the coefficient of the line I am trying to fit (0.52) and also the value where the line will intercept the Y-axis (-81.06). So I can now try to plot this line. R provides a very easy way to do this:


And this simple command will add the regression line to my graph:

At this point I have achieved all that I wanted: I have calculated a linear model to my data, I have learner what is the coefficient and the intercept of that line and I have plotted the line onto my graph. All of that with these 5 lines of code:

Alps<-read.table(“alps-R (1).dat”, header=TRUE)


Alps.lm=glm(Pressure ~ BPt)



I would like to see you do that with Excel in just 5 steps!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: