| This is part of the Scicloj Clojure Data Scrapbook. |
Noj - getting started - from raw data to a blog post
This is a getting-started tutorial for Noj (scinojure) - a recommended way to use the emerging Clojure data stack.
It is considered part of the Clojure Data Scrapbook, but is maintained as a separate repo to serve as a self-contained small example of a data-analysis report.
Video tutorial:
Question
Can we distinguish weekends from weekdays in terms of the hours in which people tend to use their bikes?
Setup
(ns index
(:require [tablecloth.api :as tc]
[tech.v3.datatype.datetime :as datetime]
[scicloj.noj.v1.vis.hanami :as hanami]
[aerial.hanami.templates :as ht]
[scicloj.kindly.v4.kind :as kind]))Reading data
You may learn more about the Cyclistic Bike Share 2023 dataset in our Chicago bike trips tutorial.
(defonce raw-trips
(-> "data/202304_divvy_tripdata.csv.gz"
(tc/dataset {:key-fn keyword
:parser-fn {"started_at"
[:local-date-time
"yyyy-MM-dd HH:mm:ss"]
"ended_at"
[:local-date-time
"yyyy-MM-dd HH:mm:ss"]}})))Processing data
(def processed-trips
(-> raw-trips
(tc/add-columns {:hour (fn [ds]
(->> ds
:started_at
(datetime/long-temporal-field
:hours)))
:day-of-week (fn [ds]
(->> ds
:started_at
(datetime/long-temporal-field
:day-of-week)))})))Analysis
(def hours-plot
(-> processed-trips
(tc/group-by [:hour])
(tc/aggregate {:n tc/row-count})
(tc/order-by [:hour])
(hanami/plot ht/bar-chart
{:X "hour"
:Y "n"})))hours-plot(kind/pprint hours-plot){:encoding
{:y {:field "n", :type "quantitative"},
:x {:field "hour", :type "quantitative"}},
:usermeta {:embedOptions {:renderer :svg}},
:mark {:type "bar", :tooltip true},
:width 400,
:background "floralwhite",
:height 300,
:data
{:values
"hour,n\n0,5111\n1,3235\n2,1874\n3,1165\n4,950\n5,2802\n6,8863\n7,17065\n8,22170\n9,17316\n10,17489\n11,20593\n12,24050\n13,24837\n14,25900\n15,30730\n16,40374\n17,46889\n18,38281\n19,27107\n20,17252\n21,14388\n22,10822\n23,7327\n",
:format {:type "csv"}}}(kind/portal hours-plot)(-> processed-trips
(tc/group-by [:day-of-week :hour])
(tc/aggregate {:n tc/row-count})
(tc/group-by [:day-of-week])
(hanami/plot ht/bar-chart
{:X "hour"
:Y "n"})
(tc/order-by [:day-of-week]))| day-of-week | plot |
|---|---|
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 |
Conclusion
Yes. Weekends are different from weekdays in terms of the hours in which people tend to use their bikes.
source: src/index.clj