Data Science in Clojure at Rails Girls Summer of Code
In this post I wish to write briefly about the "data science in clojure" project in Rails Girls Summer of Code 2020.
Hoping to encourage people to join as students or coaches, I will try to argue why (in my opinion) it is important to the healthy growth of the clojure data science ecosystem: not only by actively addressing our goal to create a welcoming community, but also by creating a continuous process of experimentation and reflection about the usability of the emerging stack.
For several years, Rails Girls Summer of Code has been organizing Summer of Code projects, where students of underrepresented groups (specifically women and non-binary people) work on open source projects for a monthly stipend during the summer. The explicit main goal is to bring more diversity into Open Source.
Two Clojure projects were accepted to RGSoC for 2020:
- Kaocha, submitted by Arne Brasseur
- data science in clojure, that I have submitted in dialogue with the scicloj organizing team
Here, we will talk about the latter.
At scicloj, diversity is an explicit goal as well. Some of the other goals we have been discussing recently are making it easy to get involved and contribute, experimenting with the tools we are building and seeking a cohesive grammer for data science in Clojure. The RGSoC project will be addressing all of these goals.
As argued by Chris Nuernberger in one of the discussions that anticipated the creation of scicloj, a case-driven approach is an excellent way to continuously verify that we are creating a useful stack. Indeed, this principle has been affecting the growth of the clojure data science ecosystem in the last year. Writing tutorials has been our main way to learn about the state of the stack, and realize what may still be missing. At the same time, it has been a way to share what was already becoming possible.
Following this principle, the first component of the "data science in clojure" RGSoC project will looking into data problems. The team will choose various old & new data science problems, and solve them in Clojure.
This will teach us about different aspects of usability of the emerging stack, not only in terms of functionality, but also in terms of clarity, simplicity and ease. In this way, the team will take an extremely important role in the ongoing discussion of scicloj regarding our goals and priorities.
To make this role explicit, the second component of the project will be discussing and prioritizing the arising issues and ideas together with the library maintainers. Selected issues will be solved by the maintainers, involving the team as much as possible in the technical process.
The project statement focuses on ClojisR - a jisr between Clojure and R, which is one of the pieces of the emerging stack. Being one of the library authors allowed me to submit it for the RGSoc project. However, the project does leave some room for expansion, and we will be happy to include other relevant libraries, as long as it is desired by the teams and by the maintainers of those libraries.
As commented recently by Teodor Heggelund, writing example walkthroughs can be done in different ways: some can be just about experimentation, and others can be about education -- that is, carefully teaching a certain practice or idea. In this RGSoC project, we seek both.
Therefore, the third component of the project would be choosing selected examples, organizing them and making a coherent story out of them, in the form of an open source book.
Often, discussing and solving specific problems can be a source for inspiration for broader reasoning about our journey. This kind of reflection is an important part of our work.
Thus, in the forth component of the RGSoc project, the team members will be engaging in general community discussions, sharing their thoughts and experiences.
There are many ways to help out, and the two main ones are joining join the project as students or coaches. These are not small commitments. They require some careful thought and dedication. But I strongly believe that they may have huge impact on our journey, in unique and beautiful ways.
Scicloj has been going through a long, spiral path of growing the ecosystem, that will eventually converge to a simple, stable, flexible and friendly environment for data science. On our way towards that goal, fresh opinions and diverse points of view are extremely important. I strongly believe that the RGSoC will allow scicloj this kind of refreshment. That is one of the reasons I am so hopeful about this project.