Feature Engineering Tutorial Series 4: Linear Model Assumptions

Linear models make the following assumptions over the independent variables X, used to predict Y: There is a linear relationship between X and the outcome Y The independent variables X are normally distributed There is no or little co-linearity among the independent variables Homoscedasticity (homogeneity of variance) Examples of linear Read more…

Matplotlib Crash Course

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is a cross-platform library for making 2D plots from data in arrays. It can be used in Python and IPython shells, Jupyter notebook and web application servers also. Matplotlib is written in Python and makes Read more…

Pandas Crash Course

What is Pandas? pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, in Python programming language. It is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. Read more…

LinkedIn Profile Scrapper in Python

LinkedIn Profile Scrapping using Selenium and Beautiful Soup Scraping of LinkedIn profiles is a very useful activity especially to achieve public relations/marketing tasks. In this project, we are going to scrap important data from a LinkedIn profile. The first part of this project is to automatically log in to our Read more…

SpaCy – Introduction for NLP | Combining NLP Models and Custom rules

Combining NLP Models and Creation of Custom rules using SpaCy Objective: In this article, we are going to create some custom rules for our requirements and will add that to our pipeline like explanding named entities and identifying person’s organization name from a given text. For example: For example, the Read more…

Multi-step-Time-series-predicting using RNN LSTM

Household Power Consumption Prediction using RNN-LSTM Power outage accidents will cause huge economic loss to the social economy. Therefore, it is very important to predict power consumption. Given the rise of smart electricity meters and the wide adoption of electricity generation technology like solar panels, there is a wealth of Read more…

Complete Seaborn Python Tutorial for Data Visualization in Python

Visualizing statistical relationships Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables. Visualization Read more…

DistilBERT – Smaller, faster, cheaper, lighter and ofcourse Distilled!

Sentiment Classification Using DistilBERT Problem Statement We will use the IMDB Movie Reviews Dataset, where based on the given review we have to classify the sentiments of that particular review like positive or negative. The motivational BERT BERT became an essential ingredient of many NLP deep learning pipelines. It is considered Read more…

Sentiment Analysis Using Scikit-learn

Sentiment Analysis Objective In this notebook we are going to perform a binary classification i.e. we will classify the sentiment as positive or negative according to the `Reviews’ column data of the IMDB dataset.  We will use TFIDF for text data vectorization and Linear Support Vector Machine for classification. Natural Read more…