Principal Components are really useful for dataset with a large number of variables that potentially are correlated between them. By creating vectors using the variables, we reduce the number of ‘variables’ to be included in the model. The aim is to include the components that explain the larger volume of variation of the dataset.
How to do Principal Components Analysis using R?
Initially, we need data so let’s go to create:
x1<- c(122, 21, 105, 101, 155, 131, 115, 53, 75, 45)
x2<-c(117, 32, 140, 105, 149, 146, 82, 60, 82, 37)
The we will scale it with:
And create a variable that hold both results:
You can use the prcomp function to run the Principal Component Analysis in the data:
pca_example<-prcomp(X, scale.= T, centered = T)
To see the original data in relation to the newly defined principal components, you can see the new scores with:
So we can plot the initial values and the new scores to see the difference between them:
The aim of Principal Components Analysis is to reduce dimensions. This example had two variables and we have retained two Principal Components, so 100% of the variance it is explained.