Can We Trust GitHub Stars?

GitHub stars are an essential growth factor for many open source projects, but they can easily be from bot accounts. How can we trust GitHub stars again?

For Open Source GitHub projects, stars are a primordial metric. Of course, there are ways to abuse this system, as you might have heard recently. As an open source company, we want our community’s legitimacy to be transparent, and we want to help the open source community do the same for other projects.

In the past, there have been many occurrences of people abusing the GitHub API:

  • Gaining followers quickly by making their account follow thousands of other people, hoping for them to follow back
  • Faking their GitHub contributions to make themselves look like outstanding open source developers
  • Automatically creating bot accounts to star repositories and virtually increase their popularity (Examples: here and here among others)
starbot is an automated bot account creator which starred repositories (has been taken down)

GitHub has taken down several of the repositories responsible for such abuses, but not all of them. The main reason why it was so simple to write those bots was that creating GitHub accounts did not used to require any kind of verification, not even a confirmation link via email.

While GitHub recently changed that by including a verification system during account creation, it turns out that the challenge can be switched to an audio challenge, which is significantly weaker than the standard one. Also, many of the bot accounts created for the last few years are still there, from before the introduction of the verification procedures.

This means that some repositories might have used simple bot creation scripts in the past, or might be currently using more advanced ones, able to bypass the verification system. This has serious implications, because of the impact of GitHub stars on today’s open source ecosystem.

The fastest growing repositories in terms of stars are put on the trending page, sent in the GitHub daily mailing list to thousands of users, are shown more frequently in the Explore tab, and attract more attention in general. It’s also a very well known phenomenon that many startups tend to use technologies based on how many stars they have, as it’s usually a good indicator of the size of the community behind a project.

That’s why we decided to start a side-project, called Astronomer, as a reference to analyzing GitHub stars and fighting against astro-turfing.

It’s an open source tool which leverages the GitHub API to scan GitHub stargazers and compute an overall trust level for the repository, based on multiple statistics found within the stargazers. Using Astronomer can help the open source community prove the authenticity of their communities, and stop accusing each other based on gut feelings.

Astronomer, scanning itself

Astronomer can also give more detailed statistics for an in-depth look at a community. It computes many factors upon which it determines the trust level:

  • Weighted contributions (older contributions being worth more trust)
  • Private contributions (having the lowest weight when computing overall trust)
  • Independent factors for different types of public contributions (Issues created, Pull Requests created, Code reviews, etc.) which helps mitigate the impact potential bots which would have tons of fake code contributions.
  • Account age (older accounts being worth more trust)
  • Amount of owned repositories
  • Every 5th percentile of the weighted contribution score, from 5 to 95

It scans two sets of stargazers: the first 200 users to have starred the project (since they are the most likely to be bots), and random stargazers. It computes two trust reports and combines them into its final trust report.

Astronomer then sends the trust reports that it generates to the Astrolab server, which serves GitHub badges for the repositories that were scanned by the community. If you want to generate a badge for your repository, all you need is to docker run astronomer right now and it will generate it for you.

As expected, the results of scanning Traefik are very positive, since its community tends to contribute to many open source repositories. At the bottom of an Astronomer report, the generated GitHub badge URL is displayed, ready to be inserted in the repository’s readme file.

The next step for Astronomer will be to provide a web application to view detailed reports for all scanned repositories.

We sincerely hope that this tool will help the open source community overall, and that it will help in reducing the tension between competing open source projects by removing the source of doubt in their communities and their legitimacies.


This is a companion discussion topic for the original entry at https://containo.us/blog/can-we-trust-github-stars-e8aa8b6b0baa/