When we run PCA on a dataset, we’ll get a set of features that is a linear combination of the existing features and data on how much of the original variation in the data is kept. It’s hard to know which features to play around with when you’re looking at 10 features, much less 100 or 1000. ML Models use features as independent variables for classification. In the machine learning field, it’s common for datasets to come with 10s, 100s, or even 1000s of features. The most common applications of PCA are at the start of a project that we want to use machine learning on for data cleaning and as a data compression technique.
PCA is a dimensionality reduction technique.
Now let’s get into something a little more complex – Principal Component Analysis (PCA) in Python. regression) and Logistic Regression is a version of regression that uses a softmax function to do classification. Just to recap, Linear Regression is the simplest implementation of continuous prediction (i.e. So far we’ve covered Linear Regression and Logistic Regression. Welcome to the third module in our Machine Learning series.