DistilBERT – Smaller, faster, cheaper, lighter and ofcourse Distilled!

Sentiment Classification Using DistilBERT Problem Statement We will use the IMDB Movie Reviews Dataset, where based on the given review we have to classify the sentiments of that particular review like positive or negative. The motivational BERT BERT became an essential ingredient of many NLP deep learning pipelines. It is considered Read more…

Sentiment Analysis Using Scikit-learn

Sentiment Analysis Objective In this notebook we are going to perform a binary classification i.e. we will classify the sentiment as positive or negative according to the `Reviews’ column data of the IMDB dataset.  We will use TFIDF for text data vectorization and Linear Support Vector Machine for classification. Natural Read more…

Multi-Label Text Classification on Stack Overflow Tag Prediction

Multi-Label Text Classification In this notebook, we will use the dataset “StackSample:10% of Stack Overflow Q&A” and we use the questions and the tags data. We will be developing a text classification model that analyzes a textual comment and predicts multiple labels associated with the questions. We will implement a Read more…

IMDB Review Sentiment Classification using RNN LSTM

Sentiment Classification in Python In this notebook we are going to implement a LSTM model to perform classification of reviews. We are going to perform binary classification i.e. we will classify the reviews as positive or negative according to the sentiment. Recurrent Neural Network Neural Networks are set of algorithms Read more…

Feature Selection Based on Univariate ROC_AUC for Classification and MSE for Regression | Machine Learning | KGP talkie

Feature Selection Based on Univariate ROC_AUC for Classification and MSE for Regression Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is ROC_AUC The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a Read more…

Feature Selection using Fisher Score and Chi2 (χ2) Test | Titanic Dataset | Machine Learning | KGP Talkie

Feature Selection using Fisher Score and Chi2 (χ2) Test Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Fisher Score and Chi2 ( χ2) Test Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which Read more…

Feature Selection Based on Univariate (ANOVA) Test for Classification | Machine Learning | KGP Talkie

Feature Selection Based on Univariate (ANOVA) Test for Classification Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Univariate (ANOVA) Test The elimination process aims to reduce the size of the input feature set and at the same time to retain the class discriminatory information for classification problems. An F-test is any statistical Read more…

Feature Selection Based on Mutual Information (Entropy) Gain for Classification and Regression | Machine Learning | KGP Talkie

Feature Selection Based on Mutual Information (Entropy) Gain Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is Mutual Information The elimination process aims to reduce the size of the input feature set and at the same time to retain the class discriminatory information for classification problems. Mutual information (MI) is a measure of Read more…

Feature Selection with Filtering Method | Constant, Quasi Constant and Duplicate Feature Removal

Filtering method Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH Unnecessary and redundant features not only slow down the training time of an algorithm, but they also affect the performance of the algorithm. There are several advantages of performing feature selection before training machine learning models: Models with less number of features have higher Read more…

Feature Dimention Reduction Using LDA and PCA with Python | Principal Component Analysis in Feature Selection | KGP Talkie

Feature Dimension Reduction Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH What is LDA (Linear Discriminant Analysis)? The idea behind LDA is simple. Mathematically speaking, we need to find a new feature space to project the data in order to maximize classes separability Linear Discriminant Analysis is a supervised algorithm as it takes the Read more…

Use of Linear and Logistic Regression Coefficients with Lasso (L1) and Ridge (L2) Regularization for Feature Selection in Machine Learning

Watch Full Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH Linear Regression Let’s first understand what exactly linear regression is, it is a straight forward approach to predict the response y on the basis of different prediction variables such x and ε. . There is a linear relation between x and y. 𝑦𝑖 = 𝛽0 + Read more…

Recursive Feature Elimination (RFE) by Using Tree Based and Gradient Based Estimators | Machine Learning | KGP Talkie

Recursive Feature Elimination (RFE) Playlist: https://www.youtube.com/playlist?list=PLc2rvfiptPSQYzmDIFuq2PqN2n28ZjxDH As it’s name suggests, it eliminates the features recursively and build a model using remaining attributes then again calculates the model accuracy of the model..Moreover how it do it train the model on all the dataset and it tries to remove the least performing Read more…

Step Forward, Step Backward and Exhaustive Feature Selection | Wrapper Method | KGP Talkie

Wrapping method Uses of Wrapping method Use combinations of variables to determine predictive power. To find the best combination of variables. Computationally expensive than filter method. To perform better than filter method. Not recommended on high number of features. Forward Step Selection In this wrapping method, it selects one best Read more…

Lasso and Ridge Regularisation for Feature Selection in Classification | Embedded Method | KGP Talkie

What is Regularisation? Regularization adds a penalty on the different parameters of the model to reduce the freedom of the model. Hence, the model will be less likely to fit the noise of the training data and will improve the generalization abilities of the model. There are basically 3-types of Read more…

Logistic Regression with Python in Machine Learning | KGP Talkie

What is Logistic Regression? Logistic Regression is a Machine Learning algorithm which is used for the classification problems, it is a predictive analysis algorithm and based on the concept of probability. Logistic regression is basically a supervised classification algorithm. In a classification problem, the target variable(or output),y, can take only Read more…

PCA with Python | Principal Component Analysis Machine Learning | KGP Talkie

Principal Component Analysis(PCA) According to Wikipedia, PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. Principal components These Read more…