What can we learn from our GitHub stars?

It’s been almost two years since CockroachDB became a GitHub project. In that time, the project has racked up more than 6,000 GitHub stars, which is a simple way for GitHub users to bookmark repositories that interest them. Naturally, we’ve wondered how people find out about our project. Are there things we could do to accelerate awareness and interest?

I decided to dedicate my Free Friday (our version of 20% time) to stargazers, a tool to query the CockroachDB repository for information about its GitHub stars and analyze the results.

Github and the New Era of Open Source Community

I’m going to wax rhapsodic about open source for a moment. My first experience with “open source” was typing in BASIC programs on an Apple IIc. I was soaking up the smarts of other software engineers by manually transcribing lines of code published in Byte magazine. Years later, I discovered an open source implementation of Pascal on a BBS. I spent hours trying to puzzle out how it worked using a translation dictionary (the comments were all in German). But wow, not having to type the whole thing in from a printout was a big step forward.

At UC Berkeley, the internet was suddenly available for the first time, and I discovered there was a whole universe beyond the blinkered world of Borland and Microsoft. gcc, bash, emacs, X11, Linux…all there for the looking, and for the taking. A true embarrassment of riches! You’d search for code tarballs located in FTP archives using archie, or by searching Usenet. Archaic by today’s standards, but I assure you, another big step forward. Which brings us very roughly to the present, and another evolutionary step.

GitHub, by some potent combination of critical mass and ease of use, has added a significant community dimension to open source projects. OSS projects have become living things, growing and evolving through the attention and ministration of many intelligences. And there are 10 million of them! If you take a little time to dig around using GitHub’s API, you can start to get some idea of how interconnected they are, and how they inform each other.

What drives GitHub stars?

The first thing I did was to take a look at the data and try to match up any notable discontinuities in the accumulation of GitHub stars with exogenous events. Turns out that press matters! Our first ever mention was in Hacker News, then Wired, then another pivotal Hacker News story followed by a glut of news when we announced that we were starting a company and had received funding (VB, Wired, WSJ). Interestingly, the announcement of FoundationDB being acquired by Apple drove interest as well. Conference talks work well: there was a presentation by Tobias at FOSDEM and a talk I gave at CoreOS Fest.

github-stars-over-time

Now, unless you happen to be a billionaire reality television star with an interesting hairdo, press isn’t something you can easily gin up on a whim by saying the same things over and over again. But you can create new content, this blog being one example, and if it’s interesting enough, it might get a mention on Hacker News and even spawn a discussion. In fact, the positive impact Hacker News can have on your project’s GitHub stars was analyzed in this Reddit postContentious topics aren’t something to fear either; they get noticed.

What else are our stargazers starring?

People seem to really like starring projects. Our 6,000+ “stargazers” (GitHub’s term for them) have starred more than 1.2M repos (for this analysis, we stopped counting at 300 stars per user, as some people apparently star many thousands). 227K of the 1.2M are unique, so there’s significant overlap in interests. If you think about it, that overlap represents other repositories which are correlated to CockroachDB in terms of our stargazers’ interests. Again, curious to learn more, we did an analysis counting the overlap and ranking the most commonly co-starred repositories.

The top 15:

CockroachDB’s Most Correlated Starred Repos
Repository Count
kubernetes/kubernetes 1,220
golang/go 1,159
tensorflow/tensorflow 1,087
docker/docker 1,074
jlevy/the-art-of-command-line 1,040
avelino/awesome-go 944
coreos/etcd 938
atom/electron 916
apple/swift 906
influxdata/influxdb 882
facebook/react-native 869
google/cayley 833
sindresorhus/awesome 826
facebook/react 802
gogits/gogs 795

What projects do our stargazers contribute to?

But what about our stargazers themselves? What exactly are they up to? To answer that question, we took a look at what repositories our stargazers subscribe to. A subscription often implies contributory activity, and at the very least a high level of interest.

Here are the top repositories sorted by the number of subscribed stargazers who are also committers, with total commits, additions, and deletions made by our stargazers:

CockroachDB Stargazers’ Most Committed Repos
Repository Committers Commits Additions Deletions
revel/revel 18 164 10,036 8,635
coreos/etcd 17 3,056 870,718 617,198
pingcap/tidb 14 1,393 77,031 52,215
coreos/rkt 14 161 140,773 12,656
nsqio/nsq 13 1,096 70,485 61,103
influxdata/influxdb 12 778 131,715 185,005
meteor/meteor 11 631 27,877 14,621
boltdb/bolt 10 314 45,636 35,230
hashicorp/consul 9 45 1,346 494
docker/docker 8 617 63,243 29,214
golang/go 7 253 11,855 3,182
angular/angular.js 7 33 15,852 647
tornadoweb/tornado 7 69 1,424 781
kubernetes/kubernetes 6 1,063 164,511 72,548
elastic/elasticsearch 5 10 1,551 162
facebook/react 4 113 3,994 2,064
gogits/gogs 4 281 31,185 25,597

So a contingent of our stargazers are also active committers to highly correlated projects.

How much do our stargazers contribute to open source projects?

Turns out roughly 2,200 of the 6,000 stargazers have made at least one commit to a repository. To make the numbers meaningful, we included only those repositories with at least 25 of their own stargazers, 10 forks, or 10 open issues. All told, our stargazers made 728K commits to 36K repositories meeting those minimum thresholds! The average number of commits was 325 and the median was 64.

The top 15 most prolific stargazers:

CockroachDB Stargazers’ Aggregate Commit Stats
Commits Additions Deletions
21,914 1,228,112 683,999
19,336 19,710,668 21,386,983
18,273 26,754 19,683
8,156 180,943 71,252
7,134 3,277,017 2,251,260
6,484 1,514,754 1,048,409
6,482 640,969 292,383
5,782 846,438 592,989
5,458 674,662 284,865
5,288 2,154,789 2,198,961
5,004 1,113,409 894,033
4,976 712,732 506,236
4,788 312,345 220,628
4,666 7,895,406 8,456,320
4,661 911,605 709,218

Who’s paying attention to our stargazers?

Another thing we can look at is our stargazers’ followers. Turns out that in total, our stargazers are themselves followed by 216K other GitHub users. 112K of the 216K are unique, so again, there’s significant overlap. Perhaps not surprising, but that’s a lot of connectedness. Certainly to the extent we have any luck getting these people to contribute to or use CockroachDB, it’s difficult to imagine a better way to drive developer adoption.

Here’s a histogram of follower counts for our stargazers (note that the follower counts are logarithmic, so 1=10, 2=100, 3=1000):

histogram-of-cockroachDB-followers

Have our stargazers changed over time?

Is there an evolution in the attributes of new stargazers as they discover the CockroachDB project? We looked at the average counts for followers and total commits. Turns out earlier interest in the project is correlated with more GitHub involvement, with both number of commits and number of followers.

But what if we correct by normalizing by the GitHub age of the stargazer? The suspicion is that earlier stargazers have had more time to gain followers and merge pull requests. However, even when normalized by age, both average followers and commits are still positively correlated with earlier interest in CockroachDB.

github-stars-avg-followers-and-commits

Interested in looking at another repository?

You can use the stargazers app yourself to analyze the composition of starring users on any GitHub repository.  In the case of cockroachdb/cockroach, it required querying 24G of data from the API. At 5,000 API requests per hour, it took a couple of days to run, so not exactly for the faint of heart if you have a repository with a substantial following.

When is CockroachDB a good choice?

Read the FAQ