SciCloj logo
This is part of the Scicloj Clojure Data Scrapbook.

Noj - getting started - from raw data to a blog post

This is a getting-started tutorial for Noj (scinojure) - a recommended way to use the emerging Clojure data stack.

It is considered part of the Clojure Data Scrapbook, but is maintained as a separate repo to serve as a self-contained small example of a data-analysis report.

Video tutorial:

Question

Can we distinguish weekends from weekdays in terms of the hours in which people tend to use their bikes?

Setup

(ns index
  (:require [tablecloth.api :as tc]
            [tech.v3.datatype.datetime :as datetime]
            [scicloj.noj.v1.vis.hanami :as hanami]
            [aerial.hanami.templates :as ht]
            [scicloj.kindly.v4.kind :as kind]))

Reading data

You may learn more about the Cyclistic Bike Share 2023 dataset in our Chicago bike trips tutorial.

(defonce raw-trips
  (-> "data/202304_divvy_tripdata.csv.gz"
      (tc/dataset {:key-fn keyword
                   :parser-fn {"started_at"
                               [:local-date-time
                                "yyyy-MM-dd HH:mm:ss"]
                               "ended_at"
                               [:local-date-time
                                "yyyy-MM-dd HH:mm:ss"]}})))

Processing data

(def processed-trips
  (-> raw-trips
      (tc/add-columns {:hour (fn [ds]
                               (->> ds
                                    :started_at
                                    (datetime/long-temporal-field
                                     :hours)))
                       :day-of-week (fn [ds]
                                      (->> ds
                                           :started_at
                                           (datetime/long-temporal-field
                                            :day-of-week)))})))

Analysis

(def hours-plot
  (-> processed-trips
      (tc/group-by [:hour])
      (tc/aggregate {:n tc/row-count})
      (tc/order-by [:hour])
      (hanami/plot ht/bar-chart
                   {:X "hour"
                    :Y "n"})))
hours-plot
(kind/pprint hours-plot)
{:encoding
 {:y {:field "n", :type "quantitative"},
  :x {:field "hour", :type "quantitative"}},
 :usermeta {:embedOptions {:renderer :svg}},
 :mark {:type "bar", :tooltip true},
 :width 400,
 :background "floralwhite",
 :height 300,
 :data
 {:values
  "hour,n\n0,5111\n1,3235\n2,1874\n3,1165\n4,950\n5,2802\n6,8863\n7,17065\n8,22170\n9,17316\n10,17489\n11,20593\n12,24050\n13,24837\n14,25900\n15,30730\n16,40374\n17,46889\n18,38281\n19,27107\n20,17252\n21,14388\n22,10822\n23,7327\n",
  :format {:type "csv"}}}
(kind/portal hours-plot)
(-> processed-trips
    (tc/group-by [:day-of-week :hour])
    (tc/aggregate {:n tc/row-count})
    (tc/group-by [:day-of-week])
    (hanami/plot ht/bar-chart
                 {:X "hour"
                  :Y "n"})
    (tc/order-by [:day-of-week]))
day-of-week plot
1
2
3
4
5
6
7

Conclusion

Yes. Weekends are different from weekdays in terms of the hours in which people tend to use their bikes.

source: src/index.clj