🎉
CockroachDB 1.0 is now available! Get more details in this blog post.

Creating a Digestible GitHub Digest

If you’ve ever “watched” a busy GitHub repository, your email inbox has discovered what it feels like to step in front of a firehose. If the project in question has active code reviewers, the problem is often worse by an order of magnitude. Every comment yields another email to all watchers. The CockroachDB repository’s weekly average is at 81 pull requests and 440 notification-generating comments.

Most of us who once paid close attention to incoming changes have since lost the ability to do so; these days, monitoring the stream requires a superhuman effort. The mere mortals among us can only pay attention to the pull requests we’ve authored or are tasked with reviewing. What’s surprising is that the watching functionality provided by GitHub is so coarse-grained. The dial apparently only has settings for “0” and “11”.

GitHub Digest Options

A search for GitHub digests yields some choices. Diffmatic is neat, as is this open source Ruby digest from the folks at Heroku.

Taking a cue from these, I decided to spend a Free Friday repurposing my previous efforts to analyze our GitHub stargazers to build repo-digest, a GitHub pull request digester which provides a daily appraisal of PRs in a concise and parsable format.

GitHub digest for CockroachDB

repo-digest provides daily digesting of one or more GitHub repos. By default, the digest includes all pull requests which were opened or closed within the past 24 hours, but this is something you can tinker with to suit your preferences via the –since command line flag. The digest sorts all pull requests – both open and closed – in descending order, by total changes, to highlight consequential PRs.

Example usage:

repo-digest --since=2016-02-24T19:00:00-05:00 
--repos=cockroachdb/cockroach,cockroachdb/docs 
--token=f87456b1112dadb2d831a5792bf2ca9a6afca7bc

How does it work? It’s a straightforward usage of the GitHub API. Pull requests are queried in reverse date order from the comma-separated list of repositories specified via the –repos flag. Each pull request is queried in turn to get more details and its list of changed files. Changed files are again queried in turn to get addition and deletion counts for each.

The digest provides additional insight into the focus of a pull request by listing the most important subdirectories. File changes are tallied for each subdirectory, and those which comprise the top 80th percentile are deemed representative of the pull request and listed immediately underneath the additions and deletions totals:

Details of a single PR in the CockroachDB GitHub digest

Customizing the look of the digest

The images shown here have been styled to match Cockroach Labs branding, but the template is flexible and very easy to customize. I used golang’s nifty templating language.

The default template is generically styled. You can create and specify your own template using –template=. repo-digest automatically inlines CSS styles to make the output suitable for sending via email.

The CockroachDB digest is currently being run daily via a cron job and disseminated to subscribers using a public Google group.

When is CockroachDB a good choice?

Read the FAQ