georgiannacambel, Author at KGP Talkie

Feature Engineering Tutorial Series 6: Variable magnitude

Does the magnitude of the variable matter? In Linear Regression models, the scale of variables used to estimate the output matters. Linear models are of the type y = w x + b, where the regression coefficient w represents the expected change in y for a one unit change in x Read more

By georgiannacambel, 6 years4 October 2020 ago

Feature Selection Machine Learning Matplotlib Numpy Pandas Python

Feature Engineering Tutorial Series 5: Outliers

An outlier is a data point which is significantly different from the remaining data. “An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.” [D. Hawkins. Identification of Outliers, Chapman and Hall , 1980.] Should Read more

By georgiannacambel, 6 years3 October 2020 ago

Feature Selection Machine Learning Matplotlib Numpy Pandas Python

Feature Engineering Tutorial Series 4: Linear Model Assumptions

Linear models make the following assumptions over the independent variables X, used to predict Y: There is a linear relationship between X and the outcome Y The independent variables X are normally distributed There is no or little co-linearity among the independent variables Homoscedasticity (homogeneity of variance) Examples of linear Read more

By georgiannacambel, 6 years2 October 2020 ago

Feature Selection Machine Learning Matplotlib Numpy Pandas Python

Feature Engineering Series Tutorial 3: Rare Labels

Labels that occur rarely Categorical variables are those whose values are selected from a group of categories, also called labels. Different labels appear in the dataset with different frequencies. Some categories appear more frequently in the dataset, whereas some other categories appear only in a few number of observations. For Read more

By georgiannacambel, 6 years1 October 2020 ago

Feature Selection Machine Learning Numpy Pandas Python

Feature Engineering Series Tutorial 2: Cardinality in Machine Learning

Cardinality refers to the number of possible values that a feature can assume. For example, the variable “US State” is one that has 50 possible values. The binary features, of course, could only assume one of two values (0 or 1). The values of a categorical variable are selected from Read more

By georgiannacambel, 6 years29 September 2020 ago

Machine Learning Numpy Pandas Python

Feature Engineering Series Tutorial 1: Missing Values and its Mechanisms

Missing data, or missing values, occur when no data / no value is stored for certain observations within a variable. Incomplete data is an unavoidable problem in most data sources, and may have a significant impact on the conclusions that can be derived from the data. Why is data missing? The source of missing Read more

By georgiannacambel, 6 years28 September 2020 ago

Feature Selection Machine Learning Matplotlib Numpy Pandas Python

Types of Data types every Data Scientist should know

One of the central concepts of data science is gaining insights from data. Statistics is an excellent tool for unlocking such insights in data. In this post, we’ll see some basic types of data(variable) which can be present in your dataset. What is a Variable? A variable is any characteristic, Read more

By georgiannacambel, 6 years26 September 2020 ago

Matplotlib Python

Matplotlib Crash Course

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is a cross-platform library for making 2D plots from data in arrays. It can be used in Python and IPython shells, Jupyter notebook and web application servers also. Matplotlib is written in Python and makes Read more

By georgiannacambel, 6 years19 September 2020 ago

Machine Learning Pandas Python

Data Visualization with Pandas

Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, in Python programming Read more

By georgiannacambel, 6 years18 September 2020 ago

Pandas Python

Pandas Crash Course

What is Pandas? pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, in Python programming language. It is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. Read more

By georgiannacambel, 6 years17 September 2020 ago

Machine Learning Natural Language Processing (NLP) Python Spacy Text Processing

Resume and CV Summarization

Resume NER Training In this blog, we are going to create a model using SpaCy which will extract the main points from a resume. We are going to train the model on almost 200 resumes. After the model is ready, we will extract the text from a new resume and Read more

By georgiannacambel, 6 years14 September 2020 ago

Deep Learning Keras Machine Learning Natural Language Processing (NLP) Numpy Pandas Python Tensorflow 2 Text Processing

Word Embedding and NLP with TF2.0 and Keras on Twitter Sentiment Data

Word Embedding and Sentiment Analysis What is Word Embedding? Natural Language Processing(NLP) refers to computer systems designed to understand human language. Human language, like English or Hindi consists of words and sentences, and NLP attempts to extract information from these sentences. Machine learning and deep learning algorithms only take numeric Read more

By georgiannacambel, 6 years13 September 2020 ago

Machine Learning Natural Language Processing (NLP) Pandas Python Spacy Text Processing

Amazon and IMDB Review Sentiment Classification using SpaCy

Sentiment Classification using SpaCy What is NLP? Natural Language Processing (NLP) is the field of Artificial Intelligence concerned with the processing and understanding of human language. Since its inception during the 1950s, machine understanding of language has played a pivotal role in translation, topic modeling, document indexing, information retrieval, and Read more

By georgiannacambel, 6 years12 September 2020 ago

Natural Language Processing (NLP) Python Spacy Text Processing

Processing Pipeline in SpaCy

What is SpaCy? spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? Who is doing what to whom? Read more

By georgiannacambel, 6 years11 September 2020 ago

Natural Language Processing (NLP) Python Spacy Text Processing

Phone Number, Email, Emoji Extraction in SpaCy for NLP

Text Extraction in SpaCy spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? Who is doing what to Read more

By georgiannacambel, 6 years10 September 2020 ago

Natural Language Processing (NLP) Spacy Text Processing

Rule-Based Phrase Text Extraction and Matching Using SpaCy

Text Extraction and Matching spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? Who is doing what to Read more

By georgiannacambel, 6 years9 September 2020 ago

Natural Language Processing (NLP) Python

Working with Text Files in Python for NLP

Working with the text files Working with f-strings for formated print Working with .CSV, .TSV files to read and write Working with %%writefile to create simple .txt files [works in jupyter notebook only] Working with Python’s inbuilt file read and write Watch full video here: String Formatter String formatting enables Read more

By georgiannacambel, 6 years8 September 2020 ago

Deep Learning Keras Machine Learning Matplotlib Numpy Pandas Python Tensorflow 2

Multi-Label Image Classification on Movies Poster using CNN

Multi-Label Image Classification in Python In this project, we are going to train our model on a set of labeled movie posters. The model will predict the genres of the movie based on the movie poster. We will consider a set of 25 genres. Each poster can have more than Read more

By georgiannacambel, 6 years7 September 2020 ago

Python

LinkedIn Profile Scrapper in Python

LinkedIn Profile Scrapping using Selenium and Beautiful Soup Scraping of LinkedIn profiles is a very useful activity especially to achieve public relations/marketing tasks. In this project, we are going to scrap important data from a LinkedIn profile. The first part of this project is to automatically log in to our Read more

By georgiannacambel, 6 years6 September 2020 ago

Python

LinkedIn Auto Connect Bot with Personalized Messaging

Auto Connect Bot for LinkedIn In this project, we are going to create a bot that finds the people in your LinkedIn suggestions and sends a connection request to each one of them with a message. It also finds the suggestions of your suggestions and sends them a connection request. Read more

By georgiannacambel, 6 years5 September 2020 ago