This is part of the Scicloj Clojure Data Scrapbook. |
Noj - getting started - from raw data to a blog post
This is a getting-started tutorial for Noj (scinojure) - a recommended way to use the emerging Clojure data stack.
It is considered part of the Clojure Data Scrapbook, but is maintained as a separate repo to serve as a self-contained small example of a data-analysis report.
Video tutorial:
Question
Can we distinguish weekends from weekdays in terms of the hours in which people tend to use their bikes?
Setup
ns index
(:require [tablecloth.api :as tc]
(:as datetime]
[tech.v3.datatype.datetime :as hanami]
[scicloj.noj.v1.vis.hanami :as ht]
[aerial.hanami.templates :as kind])) [scicloj.kindly.v4.kind
Reading data
You may learn more about the Cyclistic Bike Share 2023 dataset in our Chicago bike trips tutorial.
defonce raw-trips
(-> "data/202304_divvy_tripdata.csv.gz"
(:key-fn keyword
(tc/dataset {:parser-fn {"started_at"
:local-date-time
["yyyy-MM-dd HH:mm:ss"]
"ended_at"
:local-date-time
["yyyy-MM-dd HH:mm:ss"]}})))
Processing data
def processed-trips
(-> raw-trips
(:hour (fn [ds]
(tc/add-columns {->> ds
(:started_at
(datetime/long-temporal-field:hours)))
:day-of-week (fn [ds]
->> ds
(:started_at
(datetime/long-temporal-field:day-of-week)))})))
Analysis
def hours-plot
(-> processed-trips
(:hour])
(tc/group-by [:n tc/row-count})
(tc/aggregate {:hour])
(tc/order-by [
(hanami/plot ht/bar-chart:X "hour"
{:Y "n"})))
hours-plot
(kind/pprint hours-plot)
:encoding
{:y {:field "n", :type "quantitative"},
{:x {:field "hour", :type "quantitative"}},
:usermeta {:embedOptions {:renderer :svg}},
:mark {:type "bar", :tooltip true},
:width 400,
:background "floralwhite",
:height 300,
:data
:values
{"hour,n\n0,5111\n1,3235\n2,1874\n3,1165\n4,950\n5,2802\n6,8863\n7,17065\n8,22170\n9,17316\n10,17489\n11,20593\n12,24050\n13,24837\n14,25900\n15,30730\n16,40374\n17,46889\n18,38281\n19,27107\n20,17252\n21,14388\n22,10822\n23,7327\n",
:format {:type "csv"}}}
(kind/portal hours-plot)
-> processed-trips
(:day-of-week :hour])
(tc/group-by [:n tc/row-count})
(tc/aggregate {:day-of-week])
(tc/group-by [
(hanami/plot ht/bar-chart:X "hour"
{:Y "n"})
:day-of-week])) (tc/order-by [
day-of-week | plot |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 |
Conclusion
Yes. Weekends are different from weekdays in terms of the hours in which people tend to use their bikes.
source: src/index.clj