## Feature Engineering Tutorial Series 6: Variable magnitude

Does the magnitude of the variable matter? In Linear Regression models, the scale of variables used to estimate the output matters. Linear models are of the type y = w x + b, where the regression coefficient w represents the expected change in y for a one unit change in x Read more…

## Feature Engineering Tutorial Series 5: Outliers

An outlier is a data point which is significantly different from the remaining data. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.” [D. Hawkins. Identification of Outliers, Chapman and Hall , 1980.] Should Read more…

## Feature Engineering Tutorial Series 4: Linear Model Assumptions

Linear models make the following assumptions over the independent variables X, used to predict Y: There is a linear relationship between X and the outcome Y The independent variables X are normally distributed There is no or little co-linearity among the independent variables Homoscedasticity (homogeneity of variance) Examples of linear Read more…

## Feature Engineering Series Tutorial 3: Rare Labels

Labels that occur rarely Categorical variables are those whose values are selected from a group of categories, also called labels. Different labels appear in the dataset with different frequencies. Some categories appear more frequently in the dataset, whereas some other categories appear only in a few number of observations. For Read more…

## Feature Engineering Series Tutorial 2: Cardinality in Machine Learning

Cardinality refers to the number of possible values that a feature can assume. For example, the variable “US State” is one that has 50 possible values. The binary features, of course, could only assume one of two values (0 or 1). The values of a categorical variable are selected from Read more…

## Feature Engineering Series Tutorial 1: Missing Values and its Mechanisms

Missing data, or missing values, occur when no data / no value is stored for certain observations within a variable. Incomplete data is an unavoidable problem in most data sources, and may have a significant impact on the conclusions that can be derived from the data. Why is data missing? The source of missing Read more…

## Types of Data types every Data Scientist should know

One of the central concepts of data science is gaining insights from data. Statistics is an excellent tool for unlocking such insights in data. In this post, we’ll see some basic types of data(variable) which can be present in your dataset. What is a Variable? A variable is any characteristic, Read more…

## Word Embedding and NLP with TF2.0 and Keras on Twitter Sentiment Data

Word Embedding and Sentiment Analysis What is Word Embedding? Natural Language Processing(NLP) refers to computer systems designed to understand human language. Human language, like English or Hindi consists of words and sentences, and NLP attempts to extract information from these sentences. Machine learning and deep learning algorithms only take numeric Read more…

## Multi-Label Image Classification on Movies Poster using CNN

Multi-Label Image Classification in Python In this project, we are going to train our model on a set of labeled movie posters. The model will predict the genres of the movie based on the movie poster. We will consider a set of 25 genres. Each poster can have more than Read more…

## Breast Cancer Detection Using CNN

Breast Cancer Detection Using CNN in Python Breast cancer is the most commonly occurring cancer in women and the second most common cancer overall. There were over 2 million new cases in 2018, making it a significant health problem in present days. The key challenge in breast cancer detection is Read more…

## Classify Dog or Cat by the help of Convolutional Neural Network(CNN)

Use of Dropout and Batch Normalization in 2D CNN on Dog Cat Image Classification in TensorFlow 2.0 We are going to predict cat or dog by the help of Convolutional neural network. I have taken the dataset from kaggle https://www.kaggle.com/tongpython/cat-and-dog. In this dataset there is two class cats and dogs Read more…

## Credit Card Fraud Detection using CNN

Classification using CNN It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In this project we are going to build a model using CNN which predicts if the transaction is genuine Read more…

## NLP: End to End Text Processing for Beginners

Complete Text Processing for Beginners Everything we express (either verbally or in written) carries huge amounts of information. The topic we choose, our tone, our selection of words, everything adds some type of information that can be interpreted and value can be extracted from it. In theory, we can understand Read more…

## Text Generation using Tensorflow, Keras and LSTM

Automatic Text Generation Automatic text generation is the generation of natural language texts by computer. It has applications in automatic documentation systems, automatic letter writing, automatic report generation, etc. In this project, we are going to generate words given a set of input words. We are going to train the Read more…

## Bank Customer Satisfaction Prediction Using CNN and Feature Selection

Feature Selection and CNN In this project we are going to build a neural network to predict if a particular bank customer is satisfies or not. To do this we are going to use Convolutional Neural Networks. The dataset which we are going to use contains 370 features. We are going Read more…

## Airline Passenger Prediction using RNN – LSTM

Prediction of number of passengers for an airline using LSTM In this project we are going to build a model to predict the number of passengers in an airline. To do so we are going to use Recurrent Neural Networks, more precisely Long Short Term Memory. Recurrent Neural Network Neural Networks are Read more…

## Words Embedding using GloVe Vectors

NLP Tutorial – GloVe Vectors Embedding with TF2.0 and Keras GloVe stands for global vectors for word representation. It is an unsupervised learning algorithm developed by Stanford for generating word embeddings by aggregating a global word-word co-occurrence matrix from a corpus. The resulting embeddings show interesting linear substructures of the word in Read more…

## Star Rating Prediction

Star Rating Prediction of Amazon Products Reviews Objective In this notebook, we are going to predict the Ratings of Amazon products reviews by the help of given reviewText column. Natural Language Processing (NLP) is a sub-field of artificial intelligence that deals understanding and processing human language. In light of new Read more…

## Human Activity Recognition Using Accelerometer Data

Prediction of Human Activity In this project we are going to use accelometer data to train the model so that it can predict the human activity. We are going to use 2D Convolutional Neural Networks to build the model. source = “Deep Neural Network Example” by Nils Ackermann is licensed under Creative Commons CC Read more…

## Multi-step-Time-series-predicting using RNN LSTM

Household Power Consumption Prediction using RNN-LSTM Power outage accidents will cause huge economic loss to the social economy. Therefore, it is very important to predict power consumption. Given the rise of smart electricity meters and the wide adoption of electricity generation technology like solar panels, there is a wealth of Read more…