Principal component analysis (PCA) is a dimensionality reduction and visualization technique where datasets containing a high number of dimensions (greater than 3 ) can be reduced for either plotting on a 2D scale or simply reduce the number of dimensions in a dataset.

Photo by Isaac Smith on Unsplash

For example let us consider iris dataset , here we have 4 columns(plus one target variable — ‘species’)namely sepal_length , sepal_width , petal_length , petal_width. This is a 4 Dimentional dataset . Now the problem is how do we visualize this on a 2D scale?

Exploratory data analysis(EDA) is the first step in any data science project. It gives us an overview of the data and generates meaningful insights with just a few lines of code.

EDA is crucial to generate feature importance and have a practical and intuitive understanding of your data set.

Getting started with your data science project can be daunting, especially as a beginner . Here are the 10 steps or checklists you can refer to get started with your EDA process.

Photo by Franki Chamaki on Unsplash

The data set i’m using can be found here:

This is a classification problem. Here the task is to…


Computer Science student and a Data Science enthusiast . Currently looking for data analyst roles .

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store