13  Analysing Chicago Bike Times - DRAFT 🛠

author: Daniel Slutsky

last update: 2024-10-23

This tutorial demonstrates a simple analysis of time patterns in transportation data.

13.1 Question

Can we distinguish weekends from weekdays in terms of the hours in which people tend to use their bikes?

13.2 Setup

(ns noj-book.chicago-bike-times
  (:require [tablecloth.api :as tc]
            [tech.v3.dataset :as ds]
            [tech.v3.dataset.modelling :as dsmod]
            [tech.v3.datatype.datetime :as datetime]
            [tech.v3.dataset.reductions :as reductions]
            [scicloj.metamorph.ml :as ml]
            [scicloj.kindly.v4.kind :as kind]
            [clojure.string :as str]
            [scicloj.metamorph.ml.regression]
            [scicloj.tableplot.v1.hanami :as hanami]
            [scicloj.tableplot.v1.plotly :as plotly]
            [fastmath.transform :as transform]
            [fastmath.core :as fastmath]))

13.3 Reading data

You may learn more about the Cyclistic Bike Share 2023 dataset in our Chicago bike trips tutorial.

(defonce raw-trips
  (-> "data/chicago-bikes/202304_divvy_tripdata.csv.gz"
      (tc/dataset {:key-fn keyword
                   :parser-fn {"started_at"
                               [:local-date-time
                                "yyyy-MM-dd HH:mm:ss"]
                               "ended_at"
                               [:local-date-time
                                "yyyy-MM-dd HH:mm:ss"]}})))

13.4 Processing data

(def processed-trips
  (-> raw-trips
      (tc/add-columns {:day (fn [ds]
                              (->> ds
                                   :started_at
                                   (datetime/long-temporal-field
                                    :days)))
                       :day-of-week (fn [ds]
                                      (->> ds
                                           :started_at
                                           (datetime/long-temporal-field
                                            :day-of-week)))
                       :hour (fn [ds]
                               (->> ds
                                    :started_at
                                    (datetime/long-temporal-field
                                     :hours)))})
      (tc/map-columns :truncated-datetime
                      [:day :hour]
                      (fn [d h]
                        (format "2023-04-%02dT%02d:00:00" d h)))))
(-> processed-trips
    (tc/select-columns [:started_at :truncated-datetime :day :day-of-week :hour]))

data/chicago-bikes/202304_divvy_tripdata.csv.gz [426590 5]:

:started_at :truncated-datetime :day :day-of-week :hour
2023-04-02T08:37:28 2023-04-02T08:00:00 2 7 8
2023-04-19T11:29:02 2023-04-19T11:00:00 19 3 11
2023-04-19T08:41:22 2023-04-19T08:00:00 19 3 8
2023-04-19T13:31:30 2023-04-19T13:00:00 19 3 13
2023-04-19T12:05:36 2023-04-19T12:00:00 19 3 12
2023-04-19T12:17:34 2023-04-19T12:00:00 19 3 12
2023-04-19T09:35:48 2023-04-19T09:00:00 19 3 9
2023-04-11T16:13:43 2023-04-11T16:00:00 11 2 16
2023-04-11T16:29:24 2023-04-11T16:00:00 11 2 16
2023-04-19T17:35:40 2023-04-19T17:00:00 19 3 17
2023-04-14T07:12:34 2023-04-14T07:00:00 14 5 7
2023-04-24T07:27:02 2023-04-24T07:00:00 24 1 7
2023-04-12T08:16:48 2023-04-12T08:00:00 12 3 8
2023-04-28T07:24:54 2023-04-28T07:00:00 28 5 7
2023-04-21T07:15:06 2023-04-21T07:00:00 21 5 7
2023-04-11T15:46:42 2023-04-11T15:00:00 11 2 15
2023-04-29T21:20:21 2023-04-29T21:00:00 29 6 21
2023-04-24T09:16:05 2023-04-24T09:00:00 24 1 9
2023-04-18T07:53:51 2023-04-18T07:00:00 18 2 7
2023-04-29T07:33:55 2023-04-29T07:00:00 29 6 7
2023-04-18T08:00:32 2023-04-18T08:00:00 18 2 8

13.5 The time series of hourly counts

(def hourly-time-series
  (-> processed-trips
      (tc/group-by [:truncated-datetime :day-of-week :hour])
      (tc/aggregate {:n tc/row-count})
      (tc/order-by [:truncated-datetime])))
(-> hourly-time-series
    (plotly/layer-line {:=x :truncated-datetime
                        :=y :n}))

We can visibly see the seasonal pattern of days, and maybe also some seasonality of weeks.

13.6 Analysis

Counts by hour

(-> processed-trips
    (tc/group-by [:hour])
    (tc/aggregate {:n tc/row-count})
    (tc/order-by [:hour])
    (plotly/layer-bar {:=x :hour
                       :=y :n}))

Counts by day-of-week and hour

(-> processed-trips
    (tc/group-by [:day-of-week :hour])
    (tc/aggregate {:n tc/row-count})
    (tc/group-by :day-of-week)
    (tc/without-grouping->
        (tc/order-by [:name]))
    (tc/process-group-data #(plotly/layer-bar
                             %
                             {:=x :hour
                              :=y :n}))
    kind/table)
name group-id data
1 4
2 2
3 1
4 3
5 6
6 5
7 0

13.7 Intermeidate conclusion

The pictutres show that weekends are different from weekdays in terms of the hours in which people tend to use their bikes.

13.8 Exploring further - DRAFT

How are they different?

(-> hourly-time-series
    (tc/add-column :predicted-n
                   (fn [ds]
                     (-> ds
                         (ds/categorical->one-hot [:day-of-week :hour])
                         (dsmod/set-inference-target :n)
                         (tc/drop-columns [:truncated-datetime
                                           :day-of-week-7
                                           :hour-23])
                         (ml/train {:model-type :fastmath/ols})
                         :model-data
                         :fitted)))
    (plotly/base {:=x :truncated-datetime})
    (plotly/layer-line {:=y :n})
    (plotly/layer-line {:=y :predicted-n}))
(-> processed-trips
    (tc/group-by [:day-of-week :hour])
    (tc/aggregate {:n tc/row-count})
    (tc/order-by [:day-of-week :hour])
    (tc/add-column :predicted-n
                   (fn [ds]
                     (-> ds
                         (ds/categorical->one-hot [:day-of-week :hour])
                         (dsmod/set-inference-target :n)
                         (tc/drop-columns [:day-of-week-7
                                           :hour-23])
                         (ml/train {:model-type :fastmath/ols})
                         :model-data
                         :fitted)))
    (tc/group-by :day-of-week)
    (tc/without-grouping->
        (tc/order-by [:name]))
    (tc/process-group-data (fn [ds]
                             (-> ds
                                 (plotly/base {:=x :hour})
                                 (plotly/layer-bar {:=y :n})
                                 (plotly/layer-line {:=y :predicted-n}))))
    kind/table)
name group-id data
1 0
2 1
3 2
4 3
5 4
6 5
7 6
source: notebooks/noj_book/chicago_bike_times.clj