Scientific Collaboration and Project Management in GitHub

Ryan Abernathey
8 min readOct 26, 2020

--

This short blog post is a technical followup to a post from earlier this year:

In that post, I shared the following about my struggles with collaboration / project management tools, and the need for a better system, particular given the fact that we are now all working remotely thanks to COVID-19:

Keeping track of many different projects and collaborations requires organization. This is especially important when team members are not regularly meeting face-to-face. I tend to struggle in this department. However, I have found that technology solutions can really help. In the past, I we used a mix of different tools, such as Nirvana for personal task tracking and Basecamp for group projects. In the end, I had tasks and todos spread among many different systems. 🤨 My resolution for this year is to track every aspect of my work in GitHub. Between Issues and Projects, I realized that GitHub is actually the most flexible and powerful project management system out there.

We have been using GitHub for a few months this way — not just for version controlling software but for managing general scientific research projects. The main reasons for choosing GitHub are:

  1. Project management features. GitHub allows you to use issues to track TODO items for a project, and then organize issues from any different repos into Project Boards in order to track large-scale efforts. Nearly every project management tool has these features, but GitHub offers something more…
  2. Global namespace. Everyone is basically already on GitHub. Therefore, anyone we might want to collaborate with can be tagged in / assigned to issues. No forcing people to sign up for yet another account.
  3. Cross references. GitHub makes it easy to cross reference issues from different repos / organizations. This is important because our projects are often interrelated and dependent upon software (which is also managed via GitHub).
  4. Rich communication. GitHub issues are an extremely rich way to communicate. GitHub Markdown allows you to write complex, linked documents, embed images, and easily tag other people / projects.. And with the Chrome Github MathJax extension, you can even write equations in GitHub.
Example scientific discussion on a GitHub issue, including a copy-pasted figure and a MathJax equation.

I’ll briefly explain some details of how we are organizing our scientific work on GitHub.

Project == Repo

Every scientific “project” gets its own GitHub repo. What is a project? This means different things to different people. To us, a project is a concrete piece of work that, when complete, roughly with a single scientific publication. The project is finished when the paper is published. A project typically has one “lead” (i.e. the lead author; usually a student or postdoc) and one or more collaborators, including the PI.

Most of our repos live in our group’s GitHub organization:

However, they done have to live there. They can live anywhere on GitHub — in a personal account, or in another group’s organization. This decentralization is part of the appeal of GitHub.

By the end of the project, the repo should contain all of the code needed to reproduce the project. For those looking for a template for their repo, I recommend Julius Busecke’s excellent “Cookie Cutter Science Project.”

However, you don’t have to start with anything at all. In contrast to a typical GitHub project, which is all about the code, our use of GitHub repos, especially in the early stages of a project, is primarily about tracking work that needs to be done. We use the issue tracker for this.

TODO == Issue

The bread and butter of our system are GitHub issues. Issues represent small and specific things that need to get done in order to move the project forward. We strive to enumerate every aspect of the project in the issue tracker. Of course you can’t do this all from scratch at the start of a project; rather we start with a few first steps and keep adding as the project evolves. Open issues represent things that need to be done in the future. Closed issues have already been accomplished / resolved. Here are some open issues from one project:

And here are the closed ones.

The list of closed issues provides a summary of everything that has been done so far on this project.

Anyone involved with a project can add issues to it. If you wake up in the middle of the night with an idea for a project…create an issue! Issues should be “assigned” to the person who needs to do them. Usually this is the project lead, but, for collaborative projects, many people can work on different things simultaneously.

Issues can be very short (e.g. “Download this data”) or extremely detailed, involving lots of discussion. Issue #1 about has 18 comments, including many figures and equations. This discussion is important for going into detail on important technical questions. One thing I love about GitHub is that this sort of detailed technical discussion can happen asynchronously, without need for a meeting.

Meeting == Milestone

GitHub allows you to organize issues into “milestones” with specific due dates. There are many possible uses for this depending on the nature of the project. We have found that it’s useful to have a milestone for each team meeting. This provides a nice focus and motivation for team members to get something done, which is always a challenge in a work-from-home situation!

This also provides an automatic structure and agenda for every meeting. Every meeting should:

  1. Review the progress on the issues targeted for that day’s milestone, discussing any challenges or setbacks.
  2. Brainstorm new issues for the future and decide which issues to assign to the next meeting’s milestones.

Everything in one place

If you are only involved in one project at a time, then you can just monitor that one repo. This may be the case for some students / postdocs. But supervisors, or people involved in many projects, will likely have their work spread over many different repos, in many different organizations. The key to resolving this and keeping track of everything is the magic url https://github.com/issues. This page allows you to see all the issues you have created, been assigned to, or been mentioned in.

Another tool you can use to keep track of things are GitHub Project Boards, which provide a Kanban-style interface for visualizing task flow.

https://docs.github.com/en/free-pro-team@latest/github/managing-your-work-on-github/about-project-boards

Project boards can bring together issues and milestones from many different repos (within the same organization) to provide a high-level view of a large project.

We have not yet incorporated Project Boards into our workflow. However, I can imagine they could be very useful for keeping track of milestones and deliverables for a multi-year, multi-institution grant.

Integration with Slack and Email

The goal is to keep track of all work in GitHub. But TODOs often originate elsewhere, for example, from Slack or Email.

Slack has emerged as a great tool for keeping our research group connected during the pandemic, via informal chatting. However, Slack is a chat app, not a project management tool. Stuff gets lost in Slack. To keep Slack and GitHub connected, we use the Slack + GitHub connector:

We create a slack channel for each project and then “subscribe” that channel to notification from GitHub using the syntax

/github subscribe <organization>/<repo> comments

Now Slack posts on that channel every time there is activity on the repo! We can also open new issues directly from slack.

For email, we have been experimenting with the “Fire” tool. This allows you to forward an email to a specific address and have it automatically converted to an issue. It’s not perfect, but still very useful.

Problems / Challenges

Overall I am pretty happy with this new approach to project management. It’s enabling us to be productive and communicate asynchronously across a range of different projects with everyone working remotely.

It’s important to note that this system requires effort. You have to commit to working this way: checking the issues / repos regularly; participating in detailed discussions, and using the issue trackers to set meeting agendas. The system is much less effective if you have a “shadow” organizational system where you keep your real priorities. But the hope is that this effort pays off in terms of overall productivity and efficiency.

There are still some challenges to this system that make it less than perfect:

  1. Keeping track of many issues is hard. I personally participate in hundreds of different discussions, and I do lose track of important conversations from time to time.
  2. Some colleagues don’t want to participate. Senior colleagues in particular will not be convinced to use such an unfamiliar tool. This forces you to have two different platforms (e.g. GitHub + email) for project coordination.
  3. Lack of integration with writing tools. When it comes time to write papers, we need to switch to something like OverLeaf. We have not found a great way to automatically intergrate this with GitHub (although that might be possible as well).

We will continue to iterate this system, and we welcome suggestions / comments on how to improve it. I hope that sharing these thoughts is useful to others trying to adapt scientific research to a remote work situation.

--

--

Ryan Abernathey

Associate Professor, Earth & Environmental Sciences, Columbia University. https://rabernat.github.io/