Creating a digestible GitHub digest

Creating a digestible GitHub digest

If you’ve ever “watched” a busy GitHub repository, your email inbox has discovered what it feels like to step in front of a firehose. If the project in question has active code reviewers, the problem is often worse by an order of magnitude. Every comment yields another email to all watchers. The CockroachDB repository’s weekly average is at 81 pull requests and 440 notification-generating comments.

Most of us who once paid close attention to incoming changes have since lost the ability to do so; these days, monitoring the stream requires a superhuman effort. The mere mortals among us can only pay attention to the pull requests we’ve authored or are tasked with reviewing. What’s surprising is that the watching functionality provided by GitHub is so coarse-grained. The dial apparently only has settings for “0” and “11”.

GitHub Digest Options

A search for GitHub digests yields some choices. Diffmatic is neat, as is this open source Ruby digest from the folks at Heroku.

Taking a cue from these, I decided to spend a bit of flex time repurposing my previous efforts to analyze our GitHub stargazers to build repo-digest, a GitHub pull request digester which provides a daily appraisal of PRs in a concise and parsable format.

repo-digest provides daily digesting of one or more GitHub repos. By default, the digest includes all pull requests which were opened or closed within the past 24 hours, but this is something you can tinker with to suit your preferences via the –since command line flag. The digest sorts all pull requests – both open and closed – in descending order, by total changes, to highlight consequential PRs.

Example usage:

repo-digest --since=2016-02-24T19:00:00-05:00 
--repos=cockroachdb/cockroach,cockroachdb/docs 
--token=f87456b1112dadb2d831a5792bf2ca9a6afca7bc

How does it work? It’s a straightforward usage of the GitHub API. Pull requests are queried in reverse date order from the comma-separated list of repositories specified via the –repos flag. Each pull request is queried in turn to get more details and its list of changed files. Changed files are again queried in turn to get addition and deletion counts for each.

The digest provides additional insight into the focus of a pull request by listing the most important subdirectories. File changes are tallied for each subdirectory, and those which comprise the top 80th percentile are deemed representative of the pull request and listed immediately underneath the additions and deletions totals:

Customizing the look of the digest

The images shown here have been styled to match Cockroach Labs branding, but the template is flexible and very easy to customize. I used golang’s nifty templating language.

The default template is generically styled. You can create and specify your own template using –template=. repo-digest automatically inlines CSS styles to make the output suitable for sending via email.

About the author

Spencer Kimball github link

Spencer Kimball is the co-founder and CEO of Cockroach Labs, where he maintains a delicate balance between a love for programming distributed systems and the excitement of helping the company grow smoothly. He cut his teeth on databases during the dot com heyday, and had a front row seat at Google for a decade’s worth of their evolution.

Keep Reading

Adventures in performance debugging

As we’ve built CockroachDB, correctness has been our primary concern. But as we’ve drawn closer …

Read more
Could CockroachDB ever replace Redis? An experiment

The goal of CockroachDB is to “make data easy,” and while it seems like a stretch now, we eventually …

Read more
The cost and complexity of Cgo

Cgo is a pretty important part of Go: It’s your window to calling anything that isn’t Go (or, …

Read more