Discovering 135 Nights of Sleep with Data, Anomaly Detection, and Time Series

Introduction

This experiment is all about sleeping and data. Here, I want to investigate how I’ve been sleeping ever since I commenced my adventure almost six months ago. Generally speaking, I wanted to learn my sleep times, its trend, and my nightly restless time, applying techniques such as descriptive statistics, time series analysis, and anomaly detection. So, I came up with the following key questions I’ll try to answer.

  • At what time do I go to bed? When do I wake up?
  • How much time do I spend sleeping?
  • Is there a correlation between time in bed and time sleeping?
  • On average, how many “restless” moments I suffer per night?
  • How much time do I spend up each night?
  • How has my sleep pattern evolved? What’s my weekly routine? What’s the overall trend?
  • What starting and ending times are outliers?

The Data

All of the data I’m using to answer these existential questions come from my Fitbit watch. This fantastic device, which I wear almost 24/7, spends every single night restlessly tracking the information I’m about to dissect. In total, my dataset consists of 135 rows, or sleep sessions, celebrated after May 28 (the day when I started to backpack). The dataset’s features contain information such as the sleeping start time, end time, and minutes after wakeup. Notwithstanding, I want to clarify that many of the metrics Fitbit calculates aren’t, in my opinion, well documented, so I’ve no idea how the device derives them. Still, I won’t question the values and will assume that they are correct and accurate.

The Tools

The experiment employs both R and Python. With R, I performed the exploratory data analysis and drew most of the plots. Python, on the other hand, took care of the time series analysis with the Prophet package, and the anomaly detection using the popular scikit-learn.

Getting the Data

As with most data-related problems, this one also starts with gathering the data. To get it, I used Fitbit’s API through the Python package python-fitbit. To be more specific, I used the “Get Sleep Logs by Date” endpoint, a method that takes a date as input and returns that day’s sleeping sessions and all the information that comes with them. The following code presents how I did it.

import fitbit
import os
import argparse
import datetime
import pandas as pd

parser = argparse.ArgumentParser()
parser.add_argument('--base_date', '-bd', help="Starting date", type=str,
default='2019-05-28')
args = parser.parse_args()


def run():
# use Germany locale so the units are in the metric system
client = fitbit.Fitbit(os.environ['FITBIT_KEY'],
os.environ['FITBIT_SECRET'],
access_token=os.environ['ACCESS_TOKEN'],
refresh_token=os.environ['REFRESH_TOKEN'],
system='en_DE')
base_date = args.base_date
df = get_sleep_data(client, base_date)
df.to_csv('data/df.csv', index=False, encoding='utf-8')


def get_sleep_data(client, base_date):
"""
This function retrieves sleep data, from base_date until today
"""
# get sleep data
start = datetime.datetime.strptime(base_date, '%Y-%m-%d')
delta = datetime.datetime.today() - start
dates = [start + datetime.timedelta(days=i) for i in range(delta.days + 1)]
sleep_data = []

for date in dates:
single_day_sleep = client.get_sleep(date.date())
stages = single_day_sleep.get('summary').get('stages')
for sleep_activity in single_day_sleep.get('sleep'):
# ignore naps
if not sleep_activity.get('isMainSleep'):
continue
sleep_data.append((sleep_activity.get('dateOfSleep'),
sleep_activity.get('efficiency'),
sleep_activity.get('startTime'),
sleep_activity.get('endTime'),
sleep_activity.get('timeInBed'),
sleep_activity.get('minutesAsleep'),
sleep_activity.get('restlessCount'),
sleep_activity.get('minutesAfterWakeup'),
sleep_activity.get('minutesToFallAsleep'),
sleep_activity.get('minutesAwake'),
sleep_activity.get('restlessDuration'),
stages.get('deep'),
stages.get('light'),
stages.get('rem'),
stages.get('wake')))

return pd.DataFrame(sleep_data, columns=['date', 'efficiency', 'startTime',
'endTime', 'timeInBed',
'minutesAsleep',
'restlessCount',
'minutesAfterWakeup',
'minutesToFallAsleep',
'minutesAwake',
'restlessDuration', 'deep',
'light', 'rem', 'wake'])


if __name__ == "__main__":
print('Starting....')
run()

Sleep Times

I’ll open the discussion with a look at my sleep start and end times. On average (using the median), I usually go to sleep at 1 am. and wake up at 9 am., giving me precisely the eight recommended hours. However, we’ll soon see that this is not that correct. The next two histograms show the hours’ distribution (the black vertical line indicates the median).

Restless Time

Admit it. You never fall asleep the minute your body touches the bed. In the period between laying down and falling asleep, we’re just there, in limbo, trying to cross the gates to Sleepytown (or Napcity like a friend likes to say). Fitbit calculates this “wandering” time, and here I’ll present mine.

Start and End Times with Anomaly Detection

Back when I showed the “start sleep” and “end sleep” graphs, I didn’t point out many of its peculiarities except for that fateful day when I went to bed at 7 am. That point is an outlier, a data observation that significantly varies from the rest. But was that the only outlier in these two graphs? I don’t know (ok, I do, but I’m not going to spoil!), but hey, we can find out!

"""
This script fits a One Class SVM
Code for plotting the decision function was taken from:
https://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#sphx-glr-auto-examples-svm-plot-oneclass-py
"""

from sklearn.svm import OneClassSVM
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# setting the Seaborn aesthetics.
sns.set(font_scale=1.5)

df = pd.read_csv('data/start_times.csv', encoding='utf-8')
X_train = df[['weekday', 'time']]

clf = OneClassSVM(nu=0.1, kernel="rbf", gamma=0.1)
clf.fit(X_train)

# plot of the decision frontier
xx, yy = np.meshgrid(np.linspace(0, 8, 500), np.linspace(-2, 25, 500))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("\"Sleep Times\" Decision Boundary")
# comment out the next line to see the "ripples" of the boundary
# plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), 0, 7), cmap=plt.cm.PuBu)
a = plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='darkred')
plt.contourf(xx, yy, Z, levels=[0, Z.max()], colors='palevioletred')
b1 = plt.scatter(X_train.iloc[:, 0], X_train.iloc[:, 1])
plt.xlabel('Day of the week (as number)')
plt.ylabel('Time of the day')
plt.grid(True)
plt.show()

Time Asleep Trend with Time Series

I had the impression that my time in bed has changed since I started my backpack days. In the beginning, I spent around a month in the mountains of Austria. There, I didn’t do much; just hiking, playing Switch, and resting (a lot!). But then, I arrived in Asia, and here the story has been a different one. On those first days, I barely slept; the excitement and desire to see, and eat everything had me up until the wee hours. Then, a month later, the weariness finally reached me, and there, at the northern beaches of Malaysia and the southern ones of Thailand, I rested (a bit). But not for long.

"""
This script fits a time series model using my Fitbit steps data.
"""
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from fbprophet import Prophet

# setting the Seaborn aesthetics.
sns.set()

df = pd.read_csv('data/time_in_bed.csv')

m = Prophet(changepoint_prior_scale=0.5)
m.fit(df)
forecast = m.predict(df)
fig = m.plot_components(forecast)
# this plot shows the trend, weekly and daily seasonality
# but for this case, the daily doesn't make any sense
plt.show()

Recap and conclusion

Sleep. The fantastic state of mind that brings up closer to our dreams, re-energies our bodies, and takes us to a new day. Yet, as natural and recurrent as it is, I barely know anything about how I perform this soothing activity. In this article, I showed how I turned to my trusty Fitbit’s sleep data to learn about my sleep patterns with anomaly detection and time series data.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

94K Followers

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.