20 Analysing Chicago Bike Times - DRAFT 🛠
author: Daniel Slutsky
last update: 2024-10-23
This tutorial demonstrates a simple analysis of time patterns in transportation data.
20.1 Question
Can we distinguish weekends from weekdays in terms of the hours in which people tend to use their bikes?
20.2 Setup
ns noj-book.chicago-bike-times
(:require [tablecloth.api :as tc]
(:as ds]
[tech.v3.dataset :as dsmod]
[tech.v3.dataset.modelling :as datetime]
[tech.v3.datatype.datetime :as reductions]
[tech.v3.dataset.reductions :as ml]
[scicloj.metamorph.ml :as kind]
[scicloj.kindly.v4.kind :as str]
[clojure.string
[scicloj.metamorph.ml.regression]:as hanami]
[scicloj.tableplot.v1.hanami :as plotly]
[scicloj.tableplot.v1.plotly :as transform]
[fastmath.transform :as fastmath]
[fastmath.core :as java-time])) [java-time.api
20.3 Reading data
You may learn more about the Cyclistic Bike Share 2023 dataset in our Chicago bike trips tutorial.
defonce raw-trips
(-> "data/chicago-bikes/202304_divvy_tripdata.csv.gz"
(:key-fn keyword
(tc/dataset {:parser-fn {"started_at"
:local-date-time
["yyyy-MM-dd HH:mm:ss"]
"ended_at"
:local-date-time
["yyyy-MM-dd HH:mm:ss"]}})))
20.4 Processing data
def processed-trips
(-> raw-trips
(:day-of-week (fn [ds]
(tc/add-columns {->> ds
(:started_at
(datetime/long-temporal-field:day-of-week)))
:hour (fn [ds]
->> ds
(:started_at
(datetime/long-temporal-field:hours)))})
:truncated-datetime
(tc/map-columns :started_at]
[% :hours)))) #(java-time/truncate-to
-> processed-trips
(:started_at :truncated-datetime :day :day-of-week :hour])) (tc/select-columns [
data/chicago-bikes/202304_divvy_tripdata.csv.gz [426590 4]:
:started_at | :truncated-datetime | :day-of-week | :hour |
---|---|---|---|
2023-04-02T08:37:28 | 2023-04-02T08:00 | 7 | 8 |
2023-04-19T11:29:02 | 2023-04-19T11:00 | 3 | 11 |
2023-04-19T08:41:22 | 2023-04-19T08:00 | 3 | 8 |
2023-04-19T13:31:30 | 2023-04-19T13:00 | 3 | 13 |
2023-04-19T12:05:36 | 2023-04-19T12:00 | 3 | 12 |
2023-04-19T12:17:34 | 2023-04-19T12:00 | 3 | 12 |
2023-04-19T09:35:48 | 2023-04-19T09:00 | 3 | 9 |
2023-04-11T16:13:43 | 2023-04-11T16:00 | 2 | 16 |
2023-04-11T16:29:24 | 2023-04-11T16:00 | 2 | 16 |
2023-04-19T17:35:40 | 2023-04-19T17:00 | 3 | 17 |
… | … | … | … |
2023-04-14T07:12:34 | 2023-04-14T07:00 | 5 | 7 |
2023-04-24T07:27:02 | 2023-04-24T07:00 | 1 | 7 |
2023-04-12T08:16:48 | 2023-04-12T08:00 | 3 | 8 |
2023-04-28T07:24:54 | 2023-04-28T07:00 | 5 | 7 |
2023-04-21T07:15:06 | 2023-04-21T07:00 | 5 | 7 |
2023-04-11T15:46:42 | 2023-04-11T15:00 | 2 | 15 |
2023-04-29T21:20:21 | 2023-04-29T21:00 | 6 | 21 |
2023-04-24T09:16:05 | 2023-04-24T09:00 | 1 | 9 |
2023-04-18T07:53:51 | 2023-04-18T07:00 | 2 | 7 |
2023-04-29T07:33:55 | 2023-04-29T07:00 | 6 | 7 |
2023-04-18T08:00:32 | 2023-04-18T08:00 | 2 | 8 |
20.5 The time series of hourly counts
def hourly-time-series
(-> processed-trips
(:truncated-datetime])
(tc/group-by [:n tc/row-count})
(tc/aggregate {:truncated-datetime]))) (tc/order-by [
-> hourly-time-series
(:truncated-datetime
(plotly/layer-line {:=x :n})) :=y
We can visibly see the seasonal pattern of days, and maybe also some seasonality of weeks.
20.6 Analysis
Counts by hour
-> processed-trips
(:hour])
(tc/group-by [:n tc/row-count})
(tc/aggregate {:hour])
(tc/order-by [:hour
(plotly/layer-bar {:=x :n})) :=y
Counts by day-of-week and hour
def i->day (comp [:Mon :Tue :Wed :Thu :Fri :Sat :Sun] dec)) (
-> processed-trips
(:day-of-week :hour])
(tc/group-by [:n tc/row-count})
(tc/aggregate {:day-of-week])
(tc/group-by [:plot (fn [ds]
(tc/aggregate {
[(plotly/layer-bar
ds:hour
{:=x :n})])})
:=y :day-of-week])
(tc/order-by [:day-of-week
(tc/map-columns :day-of-week]
[
i->day) kind/table)
day-of-week | plot-0 |
---|---|
Mon | |
Tue | |
Wed | |
Thu | |
Fri | |
Sat | |
Sun |
20.7 Intermeidate conclusion
The pictutres show that weekends are different from weekdays in terms of the hours in which people tend to use their bikes.
20.8 Exploring further - DRAFT
How are they different?