Motivation

It’s hard to visualize and correlate data with high dimensions ( dimensions). So PCA is used to reduce the data dimesnions that keeps most of the information about the original data.

PCA

Orthogonal projection of the data onto a lower-dimension linear space that;

  • Maximizes the variance of the projected data
  • Minimizes the mean square difference between the data points and the projections

PCA Applications

  • Data Visulaization
  • Data Compression
  • Noise Reduction

PCA Vectors

find vectors onto which to project the data, so as to minimize the projection error.

  • Originate from the center of mass.
  • Principle component 1 points in the direction of the largest variance.
  • Each subsequent principle component is orthogonal to the previous ones, and points in the directions of largest variance of the residual subspace.

PCA Algorithm

  • Given: Training set:
  • Goal: Reduce data from -dimensions to -dimensions
  • Preprocessing (Mean normalization and feature scaling if needed):
    1. Mean normalization steps:
      • Calculate the mean of data
      • Subtract the mean from each column vector in the data matrix
    2. Calculate the covariance matrix sigma
    3. Extract from [u,s,v] = svd(sigma) using Matlab or Octave.
    4. Take the first k principle component vectors (eigen vectors) from the matrix .