What can we learn from our GitHub stars?
It’s been almost two years since CockroachDB became a GitHub project. In that time, the project has racked up more than 6,000 GitHub stars, which is a simple way for GitHub users to bookmark repositories that interest them. Naturally, we’ve wondered how people find out about our project. Are there things we could do to accelerate awareness and interest?
Github and the New Era of Open Source Community
I’m going to wax rhapsodic about open source for a moment. My first experience with “open source” was typing in BASIC programs on an Apple IIc. I was soaking up the smarts of other software engineers by manually transcribing lines of code published in Byte magazine. Years later, I discovered an open source implementation of Pascal on a BBS. I spent hours trying to puzzle out how it worked using a translation dictionary (the comments were all in German). But wow, not having to type the whole thing in from a printout was a big step forward.
At UC Berkeley, the internet was suddenly available for the first time, and I discovered there was a whole universe beyond the blinkered world of Borland and Microsoft. gcc, bash, emacs, X11, Linux…all there for the looking, and for the taking. A true embarrassment of riches! You’d search for code tarballs located in FTP archives using archie, or by searching Usenet. Archaic by today’s standards, but I assure you, another big step forward. Which brings us very roughly to the present, and another evolutionary step.
GitHub, by some potent combination of critical mass and ease of use, has added a significant community dimension to open source projects. OSS projects have become living things, growing and evolving through the attention and ministration of many intelligences. And there are 10 million of them! If you take a little time to dig around using GitHub’s API, you can start to get some idea of how interconnected they are, and how they inform each other.
What drives GitHub stars?
The first thing I did was to take a look at the data and try to match up any notable discontinuities in the accumulation of GitHub stars with exogenous events. Turns out that press matters! Our first ever mention was in Hacker News, then Wired, then another pivotal Hacker News story followed by a glut of news when we announced that we were starting a company and had received funding (VB, Wired, WSJ). Interestingly, the announcement of FoundationDB being acquired by Apple drove interest as well. Conference talks work well: there was a presentation by Tobias at FOSDEM and a talk I gave at CoreOS Fest.
Now, unless you happen to be a billionaire reality television star with an interesting hairdo, press isn’t something you can easily gin up on a whim by saying the same things over and over again. But you can create new content, this blog being one example, and if it’s interesting enough, it might get a mention on Hacker News and even spawn a discussion. In fact, the positive impact Hacker News can have on your project’s GitHub stars was analyzed in this Reddit post. Contentious topics aren’t something to fear either; they get noticed.
What else are our stargazers starring?
People seem to really like starring projects. Our 6,000+ “stargazers” (GitHub’s term for them) have starred more than 1.2M repos (for this analysis, we stopped counting at 300 stars per user, as some people apparently star many thousands). 227K of the 1.2M are unique, so there’s significant overlap in interests. If you think about it, that overlap represents other repositories which are correlated to CockroachDB in terms of our stargazers’ interests. Again, curious to learn more, we did an analysis counting the overlap and ranking the most commonly co-starred repositories.
The top 15:
|CockroachDB’s Most Correlated Starred Repos|
What projects do our stargazers contribute to?
But what about our stargazers themselves? What exactly are they up to? To answer that question, we took a look at what repositories our stargazers subscribe to. A subscription often implies contributory activity, and at the very least a high level of interest.
Here are the top repositories sorted by the number of subscribed stargazers who are also committers, with total commits, additions, and deletions made by our stargazers:
|CockroachDB Stargazers’ Most Committed Repos|
So a contingent of our stargazers are also active committers to highly correlated projects.
How much do our stargazers contribute to open source projects?
Turns out roughly 2,200 of the 6,000 stargazers have made at least one commit to a repository. To make the numbers meaningful, we included only those repositories with at least 25 of their own stargazers, 10 forks, or 10 open issues. All told, our stargazers made 728K commits to 36K repositories meeting those minimum thresholds! The average number of commits was 325 and the median was 64.
The top 15 most prolific stargazers:
|CockroachDB Stargazers’ Aggregate Commit Stats|
Who’s paying attention to our stargazers?
Another thing we can look at is our stargazers’ followers. Turns out that in total, our stargazers are themselves followed by 216K other GitHub users. 112K of the 216K are unique, so again, there’s significant overlap. Perhaps not surprising, but that’s a lot of connectedness. Certainly to the extent we have any luck getting these people to contribute to or use CockroachDB, it’s difficult to imagine a better way to drive developer adoption.
Here’s a histogram of follower counts for our stargazers (note that the follower counts are logarithmic, so 1=10, 2=100, 3=1000):
Have our stargazers changed over time?
Is there an evolution in the attributes of new stargazers as they discover the CockroachDB project? We looked at the average counts for followers and total commits. Turns out earlier interest in the project is correlated with more GitHub involvement, with both number of commits and number of followers.
But what if we correct by normalizing by the GitHub age of the stargazer? The suspicion is that earlier stargazers have had more time to gain followers and merge pull requests. However, even when normalized by age, both average followers and commits are still positively correlated with earlier interest in CockroachDB.
Interested in looking at another repository?
You can use the stargazers app yourself to analyze the composition of starring users on any GitHub repository. In the case of cockroachdb/cockroach, it required querying 24G of data from the API. At 5,000 API requests per hour, it took a couple of days to run, so not exactly for the faint of heart if you have a repository with a substantial following.