Human Activity Recognition Using Accelerometer Data

Published by Aarya on

Prediction of Human Activity

In this project we are going to use accelometer data to train the model so that it can predict the human activity. We are going to use 2D Convolutional Neural Networks to build the model.

source = “Deep Neural Network Example” by Nils Ackermann is licensed under Creative Commons CC BY-ND 4.0


Dataset Link: or

This WISDM dataset contains data collected through controlled, laboratory conditions. The total number of examples is 1,098,207. The dataset contains six different labels (Downstairs, Jogging, Sitting, Standing, Upstairs, Walking).

Here we are importing the necessary libraries. We will be using tensorflow-keras to build the CNN. We are also importing the necessary layers required to build the CNN.

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPool2D
from tensorflow.keras.optimizers import Adam
  • pandas is used to read the dataset.
  • numpy is used to perform basic array operations.
  • pyplot from matplotlib is used to visualize the results.
  • train_test_split from sklearn is used split the data into training and testing dataset.
  • LabelEncoder from sklearn is used to encode target labels with value between 0 and number of classes-1.
  • StandardScaler from sklearn is used to bring all the data in the same scale.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

Load and process the Dataset

If we try to read this data directly using pd.read_csv() we will get an error because this data is not pre-processed properly. So we will have to read this data into a native python file and then pre-process it.

Using open() we will first open the file. Then we will read all the lines of the file into the read variable. Now we will consider all the lines one by one using a for loop. For each line the following operations will be performed-

  • line = line.split(',') splits the line wherever there is a comma and returns an array of separated elements.
  • last = line[5].split(';')[0] removes the semicolon after the last element in the array.
  • last = last.strip() removes any extra space.
  • Then if last is not empty we copy all the elements into temp.
  • Now that the line is ready we append it to processedList

try and except is used for error handling. In this process if we get an error, the number of the line which is throwing that error is displayed.

file = open('WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')
lines = file.readlines()

processedList = []

for i, line in enumerate(lines):
        line = line.split(',')
        last = line[5].split(';')[0]
        last = last.strip()
        if last == '':
        temp = [line[0], line[1], line[2], line[3], line[4], last]
        print('Error at line number: ', i)
Error at line number:  281873
Error at line number:  281874
Error at line number:  281875

Now we have the processedList. It is a list of lists. Each inner list has the user IDactivitytimestamp and then the xyz data.

[['33', 'Jogging', '49105962326000', '-0.6946377', '12.680544', '0.50395286'],
 ['33', 'Jogging', '49106062271000', '5.012288', '11.264028', '0.95342433'],
 ['33', 'Jogging', '49106112167000', '4.903325', '10.882658', '-0.08172209'],
 ['33', 'Jogging', '49106222305000', '-0.61291564', '18.496431', '3.0237172'],
 ['33', 'Jogging', '49106332290000', '-1.1849703', '12.108489', '7.205164'],
 ['33', 'Jogging', '49106442306000', '1.3756552', '-2.4925237', '-6.510526'],
 ['33', 'Jogging', '49106542312000', '-0.61291564', '10.56939', '5.706926'],
 ['33', 'Jogging', '49106652389000', '-0.50395286', '13.947236', '7.0553403'],
 ['33', 'Jogging', '49106762313000', '-8.430995', '11.413852', '5.134871'],
 ['33', 'Jogging', '49106872299000', '0.95342433', '1.3756552', '1.6480621']]

Now we will create a DataFrame with the processed data and proper column names. data.head() will display the first 5 rows of data.

columns = ['user', 'activity', 'time', 'x', 'y', 'z']
data = pd.DataFrame(data = processedList, columns = columns)

data has 343416 rows and 6 columns.

(343416, 6)

This will give more information about data. It says that all the values are string objects.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 343416 entries, 0 to 343415
Data columns (total 6 columns):
user        343416 non-null object
activity    343416 non-null object
time        343416 non-null object
x           343416 non-null object
y           343416 non-null object
z           343416 non-null object
dtypes: object(6)
memory usage: 15.7+ MB

Now we will see if any null values are present in the dataset using isnull().

user        0
activity    0
time        0
x           0
y           0
z           0
dtype: int64

To see the distribution of data we will see the count of each unique activity using value_counts().

Walking       137375
Jogging       129392
Upstairs       35137
Downstairs     33358
Sitting         4599
Standing        3555
Name: activity, dtype: int64

Balance this data

From the data distribution shown above we can observe that the data is unbalanced. Standing has very less examples compared to Walking and Jogging'. If we use this data directly it will overfit and will be skewed towards Walking and Jogging'.

As we saw earlier the data is in string data type. Here we have converted the xyz values into floating values using astype('float').

data['x'] = data['x'].astype('float')
data['y'] = data['y'].astype('float')
data['z'] = data['z'].astype('float')

We can see that the data type of xyz has changed.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 343416 entries, 0 to 343415
Data columns (total 6 columns):
user        343416 non-null object
activity    343416 non-null object
time        343416 non-null object
x           343416 non-null float64
y           343416 non-null float64
z           343416 non-null float64
dtypes: float64(3), object(3)
memory usage: 15.7+ MB

Now we will plot xyz for few seconds. The sampling rate of this data is 20Hz. So we have set a variable Fs=20activities is a list of all the unique activities.

Fs = 20
activities = data['activity'].value_counts().index
Index(['Walking', 'Jogging', 'Upstairs', 'Downstairs', 'Sitting', 'Standing'], dtype='object')

Now we will plot xyz for each activity for 10 seconds.

def plot_activity(activity, data):
    fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, figsize=(15, 7), sharex=True)
    plot_axis(ax0, data['time'], data['x'], 'X-Axis')
    plot_axis(ax1, data['time'], data['y'], 'Y-Axis')
    plot_axis(ax2, data['time'], data['z'], 'Z-Axis')

def plot_axis(ax, x, y, title):
    ax.plot(x, y, 'g')
    ax.set_ylim([min(y) - np.std(y), max(y) + np.std(y)])
    ax.set_xlim([min(x), max(x)])

for activity in activities:
    data_for_plot = data[(data['activity'] == activity)][:Fs*10]
    plot_activity(activity, data_for_plot)

Here we will remove the columns user and time from the dataset by using drop().

df = data.drop(['user', 'time'], axis = 1).copy()
Walking       137375
Jogging       129392
Upstairs       35137
Downstairs     33358
Sitting         4599
Standing        3555
Name: activity, dtype: int64

As this data is highly imbalanced we will take only the first 3555 lines for each activity into seperate lists. Then we will create a dataframe balanced_data using pd.DataFrame() and append all the lists to balanced_data. The final shape of balanced_data is 21330 rows and 4 columns.

Walking = df[df['activity']=='Walking'].head(3555).copy()
Jogging = df[df['activity']=='Jogging'].head(3555).copy()
Upstairs = df[df['activity']=='Upstairs'].head(3555).copy()
Downstairs = df[df['activity']=='Downstairs'].head(3555).copy()
Sitting = df[df['activity']=='Sitting'].head(3555).copy()
Standing = df[df['activity']=='Standing'].copy()

balanced_data = pd.DataFrame()
balanced_data = balanced_data.append([Walking, Jogging, Upstairs, Downstairs, Sitting, Standing])
(21330, 4)

Now the data is balanced. We can see this by calling value_counts() on the activity column of balanced_data.

Upstairs      3555
Walking       3555
Jogging       3555
Standing      3555
Sitting       3555
Downstairs    3555
Name: activity, dtype: int64

As we can see above, the values in activity are of data type string. We will convert them into numeric values using LabelEncoder from sklearn which we have already imported. fit_tranform fits label encoder and returns encoded labels. We will add a new column in the dataset with the name label which will have the encoded values.

label = LabelEncoder()
balanced_data['label'] = label.fit_transform(balanced_data['activity'])

We can use .classes_ attribute to recover the mapping of classes.

array(['Downstairs', 'Jogging', 'Sitting', 'Standing', 'Upstairs',
       'Walking'], dtype=object)

Standardization of data

Here we are reading the feature space into X and the label into y.

X = balanced_data[['x', 'y', 'z']]
y = balanced_data['label']

Now we will bring all the values in X in the same range using StandardScaler() from sklearn which we have already imported. scaled_X contains the scaled values of x, y, z and the labels.

scaler = StandardScaler()
X = scaler.fit_transform(X)

scaled_X = pd.DataFrame(data = X, columns = ['x', 'y', 'z'])
scaled_X['label'] = y.values


Frame Preparation

We are going to divide the data into frames of 4 seconds. To do this we will import scipy.stats.

import scipy.stats as stats

We will multiply the frequency by 4 seconds. Hence we will consider 80 observations at a time. Hop size will be 40 which means there will be some overlapping.

Fs = 20
frame_size = Fs*4 # 80
hop_size = Fs*2 # 40

get_frames() creates frames of 4 seconds i.e. 80 observations with advancement of 40 observations. The label for this 4 seconds frame is the mode of the labels for the 80 observations which make the 4 seconds frame. get_frames() returns two np.arraysframes containing all the 4 second frames and labels containing its corresponding labels. These are stored in X and y respectively. X contains 532 frames, each having 80 values of xyzy containes 532 labels for the frames in X.

def get_frames(df, frame_size, hop_size):

    N_FEATURES = 3

    frames = []
    labels = []
    for i in range(0, len(df) - frame_size, hop_size):
        x = df['x'].values[i: i + frame_size]
        y = df['y'].values[i: i + frame_size]
        z = df['z'].values[i: i + frame_size]
        # Retrieve the most often used label in this segment
        label = stats.mode(df['label'][i: i + frame_size])[0][0]
        frames.append([x, y, z])

    # Bring the segments into a better shape
    frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
    labels = np.asarray(labels)

    return frames, labels

X, y = get_frames(scaled_X, frame_size, hop_size)

X.shape, y.shape
((532, 80, 3), (532,))

We have 3555 observations for each of the 6 activities. Hence we have a total of (3555*6) observations. This divided by the hop_size which is 40 is approximately 532. Hence we have 532 frames in our data.


Here we are dividing the data into training data and test data using train_test_split() from sklearn which we have already imported. We are going to use 80% of the data for training the model and 20% of the data for testing. random_state controls the shuffling applied to the data before applying the split. stratify = y splits the data in a stratified fashion, using y as the class labels.

We can see that we have got 425 samples in the traning dataset and 107 samples in the test dataset.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0, stratify = y)
X_train.shape, X_test.shape
((425, 80, 3), (107, 80, 3))

The entire dataset is 3 dimentional but each sample in the data is 2 dimentional.

X_train[0].shape, X_test[0].shape
((80, 3), (80, 3))

CNN accepts 3 dimentional data so we are going to reshape() our data.

X_train = X_train.reshape(425, 80, 3, 1)
X_test = X_test.reshape(107, 80, 3, 1)

Now we can see that each sample in the dataset is 3 dimentional.

X_train[0].shape, X_test[0].shape
((80, 3, 1), (80, 3, 1))

2D CNN Model

Sequential() model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Conv2D() is a 2D Convolution Layer, this layer creates a convolution kernel that is wind with layers input which helps produce a tensor of outputs. In image processing kernel is a convolution matrix or masks which can be used for blurring, sharpening, embossing, edge detection, and more by doing a convolution between a kernel and an image. In the first Conv2D() layer we are learning a total of 16 filters each having size (2,2). We will be using ReLu activation function. The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero.


Dropout layer is used to by randomly set the outgoing edges of hidden units to 0 at each update of the training phase. The value passed in dropout specifies the probability at which outputs of the layer are dropped out.

Flatten() is used to convert the data into a 1-dimensional array for inputting it to the next layer.

Dense layer is the regular deeply connected neural network layer with 64 neurons. The output layer is also a dense layer with 6 neurons for the 6 classes. The activation function used is softmax. Softmax converts a real vector to a vector of categorical probabilities. The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.

model = Sequential()
model.add(Conv2D(16, (2, 2), activation = 'relu', input_shape = X_train[0].shape))

model.add(Conv2D(32, (2, 2), activation='relu'))


model.add(Dense(64, activation = 'relu'))

model.add(Dense(6, activation='softmax'))

Here we are compiling the model and fitting it to the training data. We will use 10 epochs to train the model. An epoch is an iteration over the entire data provided. validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. As metrics = ['accuracy'] the model will be evaluated based on the accuracy.

model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
history =, y_train, epochs = 10, validation_data= (X_test, y_test), verbose=1)
Train on 425 samples, validate on 107 samples
Epoch 1/10
425/425 [==============================] - 1s 2ms/sample - loss: 1.6548 - accuracy: 0.2400 - val_loss: 1.3757 - val_accuracy: 0.4206
Epoch 2/10
425/425 [==============================] - 0s 292us/sample - loss: 1.3048 - accuracy: 0.4871 - val_loss: 1.0143 - val_accuracy: 0.7103
Epoch 3/10
425/425 [==============================] - 0s 294us/sample - loss: 0.9848 - accuracy: 0.6659 - val_loss: 0.7149 - val_accuracy: 0.8598
Epoch 4/10
425/425 [==============================] - 0s 273us/sample - loss: 0.7407 - accuracy: 0.7459 - val_loss: 0.4961 - val_accuracy: 0.8411
Epoch 5/10
425/425 [==============================] - 0s 299us/sample - loss: 0.5676 - accuracy: 0.8188 - val_loss: 0.3573 - val_accuracy: 0.9065
Epoch 6/10
425/425 [==============================] - 0s 296us/sample - loss: 0.4372 - accuracy: 0.8494 - val_loss: 0.2836 - val_accuracy: 0.9159
Epoch 7/10
425/425 [==============================] - 0s 301us/sample - loss: 0.3648 - accuracy: 0.8871 - val_loss: 0.2614 - val_accuracy: 0.9065
Epoch 8/10
425/425 [==============================] - 0s 315us/sample - loss: 0.3070 - accuracy: 0.9035 - val_loss: 0.3019 - val_accuracy: 0.8598
Epoch 9/10
425/425 [==============================] - 0s 287us/sample - loss: 0.3254 - accuracy: 0.8918 - val_loss: 0.2392 - val_accuracy: 0.9065
Epoch 10/10
425/425 [==============================] - 0s 303us/sample - loss: 0.2385 - accuracy: 0.9388 - val_loss: 0.2269 - val_accuracy: 0.8972

We will now plot the model accuracy and model loss. In model accuracy we will plot the training accuracy and validation accuracy and in model loss we will plot the training loss and validation loss.

def plot_learningCurve(history, epochs):
  # Plot training & validation accuracy values
  epoch_range = range(1, epochs+1)
  plt.plot(epoch_range, history.history['accuracy'])
  plt.plot(epoch_range, history.history['val_accuracy'])
  plt.title('Model accuracy')
  plt.legend(['Train', 'Val'], loc='upper left')

  # Plot training & validation loss values
  plt.plot(epoch_range, history.history['loss'])
  plt.plot(epoch_range, history.history['val_loss'])
  plt.title('Model loss')
  plt.legend(['Train', 'Val'], loc='upper left')
plot_learningCurve(history, 10)

Confusion Matrix

  • confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.
  • Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa)
  • The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
  • All correct predictions are located in the diagonal of the table, so it is easy to visually inspect the table for prediction errors, as they will be represented by values outside the diagonal. For two classes the confusion matrix looks like this-

where:TP = True Positive; FP = False Positive; TN = True Negative; FN = False Negative.

Detailed video is available here:

To calculate the confusion matrix we will use confusion_matrix from sklearn. We will be using mlxtend to plot the confusion matrix. You can install it using the command or from the link mentioned.

pip install mlxtend ->

from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import confusion_matrix

predict_classes generates class predictions for the input samples.

y_pred = model.predict_classes(X_test)
mat = confusion_matrix(y_test, y_pred)
plot_confusion_matrix(conf_mat=mat, class_names=label.classes_, show_normed=True, figsize=(7,7))

As you can see we are getting 100% accuracy for Sitting and Standing. The confusion matrix also tells us that our model is getting confused between Upstairs and Downstairs.

We have got a decent accuracy for this data. If you want to further increase the accuracy you can play around with many things. You can try traning the model with more data or you can even try tuning frame_size and hop_size.

Lastly, you can save the model using save_weights().



Hi, I am Aarya Tadvalkar! Currently, I am pursuing Computer Engineering. I have a keen interest in Machine Learning and Data Science. I am always enthusiastic about learning new things and expanding my knowledge!

Notify of
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x