## Feature Engineering Tutorial Series 5: Outliers

An outlier is a data point which is significantly different from the remaining data. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.” [D. Hawkins. Identification of Outliers, Chapman and Hall , 1980.] Should Read more…

## Feature Engineering Tutorial Series 4: Linear Model Assumptions

Linear models make the following assumptions over the independent variables X, used to predict Y: There is a linear relationship between X and the outcome Y The independent variables X are normally distributed There is no or little co-linearity among the independent variables Homoscedasticity (homogeneity of variance) Examples of linear Read more…

## Feature Engineering Series Tutorial 3: Rare Labels

Labels that occur rarely Categorical variables are those whose values are selected from a group of categories, also called labels. Different labels appear in the dataset with different frequencies. Some categories appear more frequently in the dataset, whereas some other categories appear only in a few number of observations. For Read more…

## Types of Data types every Data Scientist should know

One of the central concepts of data science is gaining insights from data. Statistics is an excellent tool for unlocking such insights in data. In this post, we’ll see some basic types of data(variable) which can be present in your dataset. What is a Variable? A variable is any characteristic, Read more…

## Matplotlib Crash Course

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is a cross-platform library for making 2D plots from data in arrays. It can be used in Python and IPython shells, Jupyter notebook and web application servers also. Matplotlib is written in Python and makes Read more…

## Multi-Label Image Classification on Movies Poster using CNN

Multi-Label Image Classification in Python In this project, we are going to train our model on a set of labeled movie posters. The model will predict the genres of the movie based on the movie poster. We will consider a set of 25 genres. Each poster can have more than Read more…

## Breast Cancer Detection Using CNN

Breast Cancer Detection Using CNN in Python Breast cancer is the most commonly occurring cancer in women and the second most common cancer overall. There were over 2 million new cases in 2018, making it a significant health problem in present days. The key challenge in breast cancer detection is Read more…

## Classify Dog or Cat by the help of Convolutional Neural Network(CNN)

Use of Dropout and Batch Normalization in 2D CNN on Dog Cat Image Classification in TensorFlow 2.0 We are going to predict cat or dog by the help of Convolutional neural network. I have taken the dataset from kaggle https://www.kaggle.com/tongpython/cat-and-dog. In this dataset there is two class cats and dogs Read more…

## Credit Card Fraud Detection using CNN

Classification using CNN It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this project we are going to build a model using CNN which predicts if the transaction is genuine Read more…

## Bank Customer Satisfaction Prediction Using CNN and Feature Selection

Feature Selection and CNN In this project we are going to build a neural network to predict if a particular bank customer is satisfies or not. To do this we are going to use Convolutional Neural Networks. The dataset which we are going to use contains 370 features. We are going Read more…

## Airline Passenger Prediction using RNN – LSTM

Prediction of number of passengers for an airline using LSTM In this project we are going to build a model to predict the number of passengers in an airline. To do so we are going to use Recurrent Neural Networks, more precisely Long Short Term Memory. Recurrent Neural Network Neural Networks are Read more…

## Human Activity Recognition Using Accelerometer Data

Prediction of Human Activity In this project we are going to use accelometer data to train the model so that it can predict the human activity. We are going to use 2D Convolutional Neural Networks to build the model. source = “Deep Neural Network Example” by Nils Ackermann is licensed under Creative Commons CC Read more…

## Deep Learning with Tensorflow 2.0 Tutorial – Getting Started with Tensorflow 2.0 and Keras for Beginners

Classification using Fashion MNIST  dataset What is TensorFlow? TensorFlow is one of the best libraries to implement deep learning. TensorFlow is a software library for numerical computation of mathematical expressional, using data flow graphs. Nodes in the graph represent mathematical operations, while the edges represent the multidimensional data arrays (tensors) that flow between Read more…

## Complete Seaborn Python Tutorial for Data Visualization in Python

Visualizing statistical relationships Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables. Visualization Read more…

## Feature Selection Based on Univariate ROC_AUC for Classification and MSE for Regression | Machine Learning | KGP talkie

Feature Selection Based on Univariate ROC_AUC for Classification and MSE for Regression Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is ROC_AUC The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a Read more…

## Feature Selection using Fisher Score and Chi2 (χ2) Test | Titanic Dataset | Machine Learning | KGP Talkie

Feature Selection using Fisher Score and Chi2 (χ2) Test Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Fisher Score and Chi2 ( χ2) Test Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which Read more…

## Feature Selection Based on Univariate (ANOVA) Test for Classification | Machine Learning | KGP Talkie

Feature Selection Based on Univariate (ANOVA) Test for Classification Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Univariate (ANOVA) Test The elimination process aims to reduce the size of the input feature set and at the same time to retain the class discriminatory information for classification problems. An F-test is any statistical Read more…

## Feature Selection Based on Mutual Information (Entropy) Gain for Classification and Regression | Machine Learning | KGP Talkie

Feature Selection Based on Mutual Information (Entropy) Gain Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Mutual Information The elimination process aims to reduce the size of the input feature set and at the same time to retain the class discriminatory information for classification problems. Mutual information (MI) is a measure of Read more…

## Feature Selection with Filtering Method | Constant, Quasi Constant and Duplicate Feature Removal

Filtering method Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH Unnecessary and redundant features not only slow down the training time of an algorithm, but they also affect the performance of the algorithm. There are several advantages of performing feature selection before training machine learning models: Models with less number of features have higher Read more…

## Feature Dimention Reduction Using LDA and PCA with Python | Principal Component Analysis in Feature Selection | KGP Talkie

Feature Dimension Reduction Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is LDA (Linear Discriminant Analysis)? The idea behind LDA is simple. Mathematically speaking, we need to find a new feature space to project the data in order to maximize classes separability Linear Discriminant Analysis is a supervised algorithm as it takes the Read more…