Feature Engineering: Variable Magnitude
Understand the impact of feature magnitude on ML algorithms, and learn scaling techniques in Python including Standard, MinMax, and Robust scaling.
Explore data analysis, exploratory data analysis (EDA), feature engineering, and the implementation of classic machine learning models using Scikit-Learn, Pandas, and NumPy.
Browse, search, and work through all available articles for this category.
Understand the impact of feature magnitude on ML algorithms, and learn scaling techniques in Python including Standard, MinMax, and Robust scaling.
Detect and handle outliers in Python using IQR and Z-score methods, with boxplot and Q-Q plot visualization and practical boundary calculations.
Detect and fix violations of linear model assumptions — linearity, normality, homoscedasticity, and multicollinearity — with Q-Q plots and log transforms.
Learn what rare labels are in categorical variables, why they cause overfitting and train/test mismatches, and how to group them safely in Python.
Understand cardinality in categorical variables and its effect on model performance. Learn to handle high-cardinality features using Python techniques.
Understand MCAR, MAR, and MNAR missing data mechanisms and their impact on machine learning. Covers detection, analysis, and treatment strategies using Python.
A practical guide to the four variable types in any dataset — numeric, categorical, date-time, and mixed — with examples from a real loan dataset.
A hands-on crash course covering matplotlib's pyplot API and object-oriented interface: line plots, scatter, bar, histograms, box plots, subplots, and axis controls.
A hands-on guide to building line, bar, histogram, box, scatter, KDE, Andrews curve, and subplot visualizations directly from a pandas DataFrame or Series.
Learn the fundamentals of pandas DataFrames, loading CSVs, column operations, handling missing values, mean imputation, and correlation analysis.
A hands-on guide to seaborn covering relational, categorical, distribution, and regression plots with the tips, fmri, iris, and Titanic datasets.
Build a resume parser using spaCy NER trained on 200 resumes. Extract names, skills, and experience fields automatically from new CV documents in Python.
Build a Python pipeline that transcribes live microphone audio and classifies sentiment polarity in real time using NLTK and TextBlob.
Predict Amazon product star ratings from review text using TF-IDF vectorization and a Support Vector Machine classifier in Python with scikit-learn.
Build a binary sentiment classifier for IMDB movie reviews using TF-IDF text vectorization and a Linear Support Vector Machine in Python with scikit-learn.
Predict Stack Overflow tags with multi-label classification — TF-IDF vectorization, OneVsRest strategy, and Hamming loss and Jaccard score evaluation.
Scrape public LinkedIn profile data using Selenium and BeautifulSoup in Python. Covers automated login, profile extraction, and exporting structured results.
Build a LinkedIn automation bot in Python using Selenium and BeautifulSoup that sends personalized connection requests to suggested profiles automatically.
Automate HD wallpaper downloads from Unsplash using Python and the Unsplash API. Covers API authentication, search parameters, and automatic image saving.
Select features with ROC-AUC for classification and MSE for regression — score every feature individually, rank them, and keep the most predictive.
Apply Fisher Score and Chi-squared tests for feature selection on the Titanic dataset in Python. Covers categorical feature scoring with scikit-learn chi2.
Use univariate ANOVA F-tests to rank and select the most informative classification features with f_classif and SelectKBest in scikit-learn.
Learn how to use mutual information (entropy gain) to select the most predictive features for classification and regression in Python with scikit-learn.
Remove constant, quasi-constant, and duplicate features from ML datasets using Python. Covers VarianceThreshold and correlation-based duplicate feature removal.
Learn how to use linear and logistic regression coefficients with Lasso (L1) and Ridge (L2) regularization to select the most informative features in Python.
Apply Recursive Feature Elimination (RFE) with Random Forest and Gradient Boosting to select the most predictive breast cancer dataset features.
Learn how to use wrapper-based feature selection — Sequential Forward, Backward, and Exhaustive Search — with mlxtend and scikit-learn on the Wine dataset.
Learn how Lasso (L1) and Ridge (L2) regularization act as embedded feature selectors. Apply SelectFromModel and RidgeClassifierCV on the Titanic dataset in Python.
Reduce high-dimensional feature spaces with LDA and PCA in scikit-learn — applied to the Santander dataset with accuracy and speed comparisons.
Learn how PCA works, then reduce 30 breast-cancer features to 2 components with scikit-learn while retaining maximum variance.
From sigmoid to cost function — build a Titanic survival classifier with scikit-learn, recursive feature elimination, and ROC-AUC evaluation.
Implement a tuned K-Nearest Neighbors classifier with scikit-learn, including feature standardization and cross-validation to find the optimal K.
Learn how K-Means clustering works and implement it with scikit-learn — centroid initialization, the elbow method, inertia, and cluster visualization.
Learn linear regression with scikit-learn on the Boston housing dataset — simple and multiple regression, feature selection, and R2, MAE, MSE evaluation.
Learn how Random Forest combines decision trees through bagging — train a regressor and a classifier with scikit-learn and extract feature importances.
Cut training time by splitting data across parallel estimators — implement a BaggingClassifier with SVM on Iris and benchmark against a single model.
Learn how bagging, boosting, and voting combine models to boost accuracy — train Random Forest, AdaBoost, Gradient Boosting, and XGBoost with scikit-learn.
Train decision tree classifiers and regressors in Python with scikit-learn. Covers splitting criteria, key hyperparameters, pruning, and model evaluation.
Learn how SVMs work — hyperplanes, margin maximization, and kernel tricks — and train classifiers on the breast cancer dataset with scikit-learn.