Top 10 Data Science Projects for Beginners in 2020

Finding and Working on Data Science Projects is a really important part of your career as an aspiring data scientist. 

Projects will boost your data science confidence, skills, and knowledge when you will be appearing for your interviews. 

Showcasing your projects in your resume be it from basic datasets to a certainly better ones, with good amount of accuracy, will help you to get a data science job.

Below I’ve shared the top 10 Data Science project ideas for 2020

In this blog, we will list out different data science project examples in the languages R and Python which are the most commonly used programming languages in the industry.

Movie Recommendation System

The movie recommendation system is an R project which will make you grow your skills in machine learning. Basically, it is a recommendation system that suggests users different suggestions based on their browsing history and preferences of the movies that they have watched previously.

Recommendation systems are mainly of two types- Collaborative filtering recommendation and Content-based recommendation system. This project is based on the collaborative filtering recommendation system. 

This type of recommendation system will suggest movies based on the browsing history of other people who might see movies of the same preferences.

  • Language: R
  • Dataset: MovieLens
  • Packages: recommenderlab, ggplot2, data.table, reshape2

Source Code

Movie Recommendation SourceCode

Sentiment Analysis Project

Almost every data-driven organization is using the sentiment analysis model to determine the attitude of its customers toward the company products,or maybe if you want to know the sentiments analysis of the audience towards a certain movie, reform,post or anything,

If you are engrossed with machine learning and want to elevate your skills in the same then, this project would be perfect for you. This R project is based on the classification.

The sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the consumer’s attitude towards a particular product or topic is positive, negative, or neutral.

  • Language: R
  • Dataset: janeaustenR
  • Packages: Tidytext

Source Code

Fake News Detection

Fake news is false information which in today’s generation is the most widely spread thing, be it on social media, news channels or even in general. 

In  this data science project, we can use Python to build a model that can classify whether a piece of news is real or fake.

 To implement this project, you should be very well aware of the terms like Fake News, TfidfVectorizer, PassiveAggressiveClassifier, and Python libraries pandas, numpy, and sklearn.

  • Language: Python
  • Dataset/Packages: news.csv

Source Code

Fake News Detection Source Code


A chatbot is one of the most famous projects among aspiring data scientists and plays an important role in business. Chatbots are used to provide better services to customers with less manpower. It uses deep learning techniques to interact with customers, and you can easily implement this project with Python.

There are two types of chatbot: the first one is domain-specific which can solve a particular problem and the second one is an open-domain chatbot that can be asked any type of question, so it requires huge amounts of data to train.

  • Language: Python
  • Dataset: Intents JSON file

Source Code

ChatBot SourceCode

Driver Drowsiness Detection

We have seen many accidents that occur due to driver’s drowsiness. A dazed driver is very dangerous for himself and for others as well. That’s why this Python project has been introduced. This project will detect the dazed drivers and will also flag them by beeping alarms.

This Python project is based on a deep learning model. This model will assess whether the driver’s eyes are closed or open. Moreover, for working on this project, a webcam is required.

  • Language: Python
  • Packages: OpenCV, Tensorflow, Pygane, Keras

Source Code

DriverDetection SourceCode

Speech Emotion Recognition

SER which is an acronym for speech emotion recognition and is a very compelling Python project. This project attempts to perceive human emotions from the speech. In the project, you’ll learn how to build an MLP classifier. This classifier will be capable of sighting emotions from a human’s voice.

Moreover, for sighting human emotion, different sound files are used as the dataset. Along with this, by working on the project you’ll rack up knowledge in the Librosa package which is used for analyzing music and audio.

  • Language: Python
  • Dataset: RAVDESS
  • Packages: Librosa, Sound file, NumPy, Sklearn, Pyaudio

Source Code

Speech Emotion SourceCode

Breast Cancer Classification

If you want to gain proficiency in machine learning as well as in deep learning, then go for this Python project. You’ll become experienced in terms like deep neural networks, convolutional neural networks, recurrent neural networks, deep belief networks, etc.

Along with this, you’ll also get familiar with the Keras library. In the project, a classifier will be made. This classifier will be 80% trained with the image dataset and the rest is for validation.

  • Language: Python
  • Dataset: IDC (Invasive Ductal Carcinoma)
  • Packages: NumPy, OpenCV, Pillow, Tensorflow, Keras, Imutils, Scikit, Matplotlib

Source Code

CancerDetection SourceCode

Customer Segmentation

Customer segmentation is a basic project and one of the most vital exercises of unsupervised learning. Companies use the clustering process for sighting the segments of people with similar behavior. They do so for targeting the potential user base.

By working on the project you’ll become a buddy-buddy to the K-means clustering. K-means clustering is a top method for clustering unlabelled dataset. With the help of customer segmentation, companies get to know their customers and their requirements better. In this, data correlated with demographics, economic status, geography, and behavioural patterns are very important.

  • Language: R

Source Code

Customer Segmentation SourceCode

Gender and Age Detection

For upgrading your skills in computer vision, you can pin down the gender and age detection python project. A model will be built in the project which will recognize the age and gender of a person through his/her single image of the face. Though, age and gender could not be detected exactly because of many factors like makeup, facial expressions, lighting, etc. That’s why this detection is disposed of as classification instead of a regression problem.

  • Language: Python
  • Dataset: Adience
  • Packages: OpenCV

Source Code

Gender and Age Detection SourceCode

Credit Card Fraud Detection

Credit card fraud has skyrocketed. The objective of this project is to build a classifier. This classifier will detect whether the card transaction is true or not. In this project, various machine learning algorithms are used which will differentiate between a non-fraudulent transaction and fraudulent one.

Moreover, by working on this project, you will procure knowledge in how to make machine learning algorithms for classification.

  • Language: R or Python
  • Dataset: Data on the transaction of credit cards is used here as a dataset.

Source Code

Credit Card SourceCode


The projects we have discussed are some of the best Data Science projects you can do in 2020. If you have good knowledge of Python and R then doing a Data Science project is not a hard cookie to crack.

You Don’t have to be Great to Start, But you have to Start to be a Great

