# Producing a biplot with labels

If you are hoping for blinding insight in this post, then I think you better stop reading now. A friend asked me how to show him how to display the observation labels (or class labels) in a PCA biplot. Given how long `prcomp` has been around, this is hardly new information. It might now even be a feature of `plot.prcomp`. However, for posterity, and for the searchers, I will give some simple solutions. I will, for the sake of pedagogy, even stop effing and blinding about how the Fisher’s Iris data set deserves to be left to crumble into a distant memory of when we used to do computing with a Jacquard Loom.

I will make use of `tidyverse` functions in this example, because kids these days. I would not worry about it too much. It just makes some of the data manipulation slightly more transparent. Firstly I will load the data. The Iris data set is an internal R data set so the `data` command will do it.

```data(iris)
names(iris)
```
``````##  "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"
##  "Species"``````

Now I will put out the labels into a separate vector

```iris.labels = iris %>%
pull(Species) %>%
as.character()
iris.data = iris %>%
select(-Species)
```

Performing the PCA itself is pretty straightforward. I have chosen to scale the variables in this example for no particular reason. The post is not about PCA so it does not matter too much.

```pc = prcomp(iris.data, scale. = TRUE)
```

The scores—the projected values of the observations—are stored in a matrix called `pc\$x`. First we will produce a plot with just observation number.

```plot(pc\$x[,1:2], type = 'n')
text(pc\$x[,1:2], labels = 1:nrow(iris.data))
```

It would be nice to put the species labels on instead of numbers. The species names are rather long though, so I am going to recode them

```iris.labels = recode(iris.labels,
'setosa' = 's',
'versicolor' = 'v',
'virginica' = 'i')
```

And now we can plot the labels if we want:

```
plot(pc\$x[,1:2], type = 'n')
text(pc\$x[,1:2], labels = 1:nrow(iris.data))
```

### Hey you promised us `ggplot2` ya bum!

Okay, okay. To do this we need to coerce the scores into a `data.frame`. The nice thing about doing this is that the principal components are conveniently labelled `PC1`, `PC2`, etc. This makes the mapping fairly easy.

```library(ggplot2)
pcs = data.frame(pc\$x)
p = pcs %>%
ggplot(aes(x = PC1,
y = PC2,
label = iris.labels)) +
geom_text()
p
```

Et voila!