Principal Components Analysis in R

Principal Components are really useful for dataset with a large number of variables that potentially are correlated between them. By creating vectors using the variables, we reduce the number of ‘variables’ to be included in the model. The aim is to include the components that explain the larger volume of variation of the dataset.

How to do Principal Components Analysis using R?

Initially, we need data so let’s go to create:

x1<- c(122, 21, 105, 101, 155, 131, 115, 53, 75, 45)
x2<-c(117, 32, 140, 105, 149, 146, 82, 60, 82, 37)

The we will scale it with:

x1_scaled<-(x1-mean(x1))/sd(x1)
x2_scaled<-(x2-mean(x2))/sd(x2)

And create a variable that hold both results:

X<-cbind(x1_scaled,x2_scaled)
X

PCA output 1

You can use the prcomp function to run the Principal Component Analysis in the data:

pca_example<-prcomp(X, scale.= T, centered = T)
pca_example

PCA output 4

To see the original data in relation to the newly defined principal components, you can see the new scores with:

pca_example$x

PCA output 2

So we can plot the initial values and the new scores to see the difference between them:

PCA output 3 graphs

The aim of Principal Components Analysis is to reduce dimensions. This example had two variables and we have retained two Principal Components, so 100% of the variance it is explained.

Advertisements