Posted in : Machine Learning
Machine Learning - PCA
Motivation
It’s hard to visualize and correlate data with high dimensions ( dimensions). So PCA is used to reduce the data dimesnions that keeps most of the information about the original data.
PCA
Orthogonal projection of the data onto a lower-dimension linear space that;
- Maximizes the variance of the projected data
- Minimizes the mean square difference between the data points and the projections
PCA Applications
- Data Visulaization
- Data Compression
- Noise Reduction
PCA Vectors
find vectors onto which to project the data, so as to minimize the projection error.
- Originate from the center of mass.
- Principle component 1 points in the direction of the largest variance.
- Each subsequent principle component is orthogonal to the previous ones, and points in the directions of largest variance of the residual subspace.
PCA Algorithm
- Given: Training set:
- Goal: Reduce data from -dimensions to -dimensions
- Preprocessing (Mean normalization and feature scaling if needed):
- Mean normalization steps:
- Calculate the mean of data
- Subtract the mean from each column vector in the data matrix
- Calculate the covariance matrix
sigma
- Extract from
[u,s,v] = svd(sigma)
using Matlab or Octave. - Take the first k principle component vectors (eigen vectors) from the matrix .
- Mean normalization steps: