Principal component analysis (PCA) is a dimensionality reduction and visualization technique where datasets containing a high number of dimensions (greater than 3 ) can be reduced for either plotting on a 2D scale or simply reduce the number of dimensions in a dataset.

For example let us consider iris dataset , here we have 4 columns(plus one target variable — ‘species’)namely sepal_length , sepal_width , petal_length , petal_width. This is a 4 Dimentional dataset . Now the problem is how do we visualize this on a 2D scale?

Exploratory data analysis(EDA) is the first step in any data science project. It gives us an overview of the data and generates meaningful insights with just a few lines of code.

EDA is crucial to generate feature importance and have a practical and intuitive understanding of your data set.

Getting started with your data science project can be daunting, especially as a beginner . Here are the 10 steps or checklists you can refer to get started with your EDA process.

The data set i’m using can be found here:

This is a classification problem. Here the task is to…


