If you are hoping for blinding insight in this post, then I think you better stop reading now. A friend asked me how to show him how to display the observation labels (or class labels) in a PCA biplot. Given how long prcomp
has been around, this is hardly new information. It might now even be a feature of plot.prcomp
. However, for posterity, and for the searchers, I will give some simple solutions. I will, for the sake of pedagogy, even stop effing and blinding about how the Fisher’s Iris data set deserves to be left to crumble into a distant memory of when we used to do computing with a Jacquard Loom.
I will make use of tidyverse
functions in this example, because kids these days. I would not worry about it too much. It just makes some of the data manipulation slightly more transparent. Firstly I will load the data. The Iris data set is an internal R data set so the data
command will do it.
data(iris) names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
Now I will put out the labels into a separate vector
iris.labels = iris %>% pull(Species) %>% as.character() iris.data = iris %>% select(-Species)
Performing the PCA itself is pretty straightforward. I have chosen to scale the variables in this example for no particular reason. The post is not about PCA so it does not matter too much.
pc = prcomp(iris.data, scale. = TRUE)
The scores—the projected values of the observations—are stored in a matrix called pc$x
. First we will produce a plot with just observation number.
plot(pc$x[,1:2], type = 'n') text(pc$x[,1:2], labels = 1:nrow(iris.data))

It would be nice to put the species labels on instead of numbers. The species names are rather long though, so I am going to recode them
iris.labels = recode(iris.labels, 'setosa' = 's', 'versicolor' = 'v', 'virginica' = 'i')
And now we can plot the labels if we want:
plot(pc$x[,1:2], type = 'n') text(pc$x[,1:2], labels = 1:nrow(iris.data))

Hey you promised us ggplot2
ya bum!
Okay, okay. To do this we need to coerce the scores into a data.frame
. The nice thing about doing this is that the principal components are conveniently labelled PC1
, PC2
, etc. This makes the mapping fairly easy.
library(ggplot2) pcs = data.frame(pc$x) p = pcs %>% ggplot(aes(x = PC1, y = PC2, label = iris.labels)) + geom_text() p