Unsupervised Learning


The Goals of Unsupervised Learning


The Challenge of Unsupervised Learning


Principal Components Analysis



Computation of Principal Components


Geometry of PCA

  • The loading vector φ1 with elements \(\phi_{11}, \phi_{21},...,\phi_{p1}\) defines a direction in feature space along which the data vary the most

  • If we project the n data points \(x_1,...,x_n\) onto this direction, the projected values are the principal component scores \(z_{11},...,z_{n1}\) themselves


PCA Continued

  • The second principal component is the linear combination of \(X_1,...,X_p\) that has maximal variance among all linear combinations that are uncorrelated with \(Z_1\)

  • The second principal component scores \(z_{12},z{22},...,z_{n2}\) take the form

    • \(z_{i2} = \phi_{12}x_{i1} + \phi_{22}x_{i2} + ... + \phi_{p2}x_{ip}\)

      • Where \(\phi_2\) is the second principal component loading vector, with elements \(\phi_{12}, \phi_{22},...,\phi_{p2}\)
  • It turns out that constraining \(Z_2\) to be uncorrelated with \(Z_1\) is equivalent to constraining the direction \(\phi_2\) to be orthogonal (perpendicular) to the direction \(\phi_1\). And so on

  • The principal component directions \(\phi_1, \phi_2, \phi_3, . . .\) are the ordered sequence of right singular vectors of the matrix \(\mathbf{X}\), and the variances of the components are \(\frac{1}{n}\) times the squares of the singular values. There are at most \(min(n − 1, p)\) principal components

  • Illustration

    • USAarrests data: For each of the fifty states in the United States, the data set contains the number of arrests per 100, 000 residents for each of three crimes: Assault, Murder, and Rape. We also record UrbanPop (the percent of the population in each state living in urban areas)

    • The principal component score vectors have length n = 50, and the principal component loading vectors have length p = 4

    • PCA was performed after standardizing each variable to have mean zero and standard deviation one


  • The first two principal components for the USArrests data

    • The blue state names represent the scores for the first two principal components

    • The orange arrows indicate the first two principal component loading vectors (with axes on the top and right). For example, the loading for Rape on the first component is 0.54, and its loading on the second principal component 0.17 [the word Rape is centered at the point (0.54, 0.17)]

    • This figure is known as a biplot, because it displays both the principal component scores and the principal component loadings

  • PCA loadings