Stargazers: A tool for analyzing your GitHub stars

Stargazers: A tool for analyzing your GitHub stars

It’s been over six years since CockroachDB became a GitHub project. In that time, the project has racked up more than 20,000 GitHub stars, which is a simple way for GitHub users to bookmark repositories that interest them. Naturally, we’ve wondered how people find out about our project. Are there things we could do to accelerate awareness and interest?

Years ago I built stargazers, a tool to query the CockroachDB repository for information about its GitHub stars and analyze the results. At the time of writing, we had 6,000+ stars (which felt like a lot), and the data in this blog will be based on that original set of 6,000 stargazers. 

Github and the New Era of Open Source Community

I’m going to wax rhapsodic about open source for a moment. My first experience with “open source” was typing in BASIC programs on an Apple IIc. I was soaking up the smarts of other software engineers by manually transcribing lines of code published in Byte magazine. Years later, I discovered an open source implementation of Pascal on a BBS. I spent hours trying to puzzle out how it worked using a translation dictionary (the comments were all in German). But wow, not having to type the whole thing in from a printout was a big step forward.

At UC Berkeley, the internet was suddenly available for the first time, and I discovered there was a whole universe beyond the blinkered world of Borland and Microsoft. gcc, bash, emacs, X11, Linux…all there for the looking, and for the taking. A true embarrassment of riches! You’d search for code tarballs located in FTP archives using archie, or by searching Usenet. Archaic by today’s standards, but I assure you, another big step forward. Which brings us very roughly to the present, and another evolutionary step.

GitHub, by some potent combination of critical mass and ease of use, has added a significant community dimension to open source projects. OSS projects have become living things, growing and evolving through the attention and ministration of many intelligences. If you take a little time to dig around using GitHub’s API, you can start to get some idea of how interconnected they are, and how they inform each other.

What drives GitHub stars?

The first thing I did was to take a look at the data and try to match up any notable discontinuities in the accumulation of GitHub stars with exogenous events. Turns out that press matters! Our first ever mention was in Hacker News, then Wired, then another pivotal Hacker News story followed by a glut of news when we announced that we were starting a company and had received funding (VB, Wired, WSJ). Interestingly, the announcement of FoundationDB being acquired by Apple drove interest as well. Conference talks work well: there was a presentation by Tobias at FOSDEM and a talk I gave at CoreOS Fest.

github-stars-over-time

Needle-moving press mentions are difficult to gin up for small startups but you can create new content, this blog being one example, and if it’s interesting enough, it might get a mention on Hacker News and even spawn a discussion. In fact, the positive impact Hacker News can have on your project’s GitHub stars was analyzed in this Reddit post. Contentious topics aren’t something to fear either; they get noticed.

What else are our stargazers starring?

People seem to really like starring projects. At the time this blog was originally published we had 6,000+ “stargazers” (GitHub’s term for them) that had starred more than 1.2M repos (for this analysis, we stopped counting at 300 stars per user, as some people apparently star many thousands). 227K of the 1.2M are unique, so there’s significant overlap in interests. If you think about it, that overlap represents other repositories which are correlated to CockroachDB in terms of our stargazers’ interests. Again, curious to learn more, we did an analysis counting the overlap and ranking the most commonly co-starred repositories.

The top 15:

Top 15 Most Correlated Starred Repos

What projects do our stargazers contribute to?

But what about our stargazers themselves? What exactly are they up to? To answer that question, we took a look at what repositories our stargazers subscribe to. A subscription often implies contributory activity, and at the very least a high level of interest.

Here are the top repositories sorted by the number of subscribed stargazers who are also committers, with total commits, additions, and deletions made by our stargazers:

 CockroachDB Stargazers’ Most Committed Repos

How much do our stargazers contribute to open source projects?

Turns out roughly 2,200 of the 6,000 stargazers have made at least one commit to a repository. To make the numbers meaningful, we included only those repositories with at least 25 of their own stargazers, 10 forks, or 10 open issues. All told, our stargazers made 728K commits to 36K repositories meeting those minimum thresholds! The average number of commits was 325 and the median was 64.

The top 15 most prolific stargazers:

CockroachDB Stargazers’ Aggregate Commit Stats

Another thing we can look at is our stargazers’ followers. Turns out that in total, our stargazers are themselves followed by 216K other GitHub users. 112K of the 216K are unique, so again, there’s significant overlap. Perhaps not surprising, but that’s a lot of connectedness. Certainly to the extent we have any luck getting these people to contribute to or use CockroachDB, it’s difficult to imagine a better way to drive developer adoption.

Here’s a histogram of follower counts for our stargazers (note that the follower counts are logarithmic, so 1=10, 2=100, 3=1000):

histogram-of-cockroachDB-followers

Have our stargazers changed over time?

Is there an evolution in the attributes of new stargazers as they discover the CockroachDB project? We looked at the average counts for followers and total commits. Turns out earlier interest in the project is correlated with more GitHub involvement, with both number of commits and number of followers.

But what if we correct by normalizing by the GitHub age of the stargazer? The suspicion is that earlier stargazers have had more time to gain followers and merge pull requests. However, even when normalized by age, both average followers and commits are still positively correlated with earlier interest in CockroachDB.

github-stars-avg-followers-and-commits

Interested in looking at another repository?

You can use the stargazers app yourself to analyze the composition of starring users on any GitHub repository.  In the case of cockroachdb/cockroach, it required querying 24G of data from the API. At 5,000 API requests per hour, it took a couple of days to run, so not exactly for the faint of heart if you have a repository with a substantial following.

About the author

Spencer Kimball github link

Spencer Kimball is the co-founder and CEO of Cockroach Labs, where he maintains a delicate balance between a love for programming distributed systems and the excitement of helping the company grow smoothly. He cut his teeth on databases during the dot com heyday, and had a front row seat at Google for a decade’s worth of their evolution.

Keep Reading

Sharing screens: What's it like to be an engineer at Cockroach Labs

When it comes to learning, we have all benefited from social learning in the workplace. Social learning is an …

Read more
Why I left IBM to work on CockroachDB

I’m a database nerd. Or, to be more precise, a DBMS nerd. What I love most about them is that while they’re everywhere, …

Read more
Alter column types without taking tables offline

There are many reasons you might want to alter the schema of your database but in many databases, this process …

Read more