4.5M Suspected Fake Stars in GitHub
Role and Meaning of GitHub Stars
- Many commenters say stars are essentially bookmarks: “I might want to look at this later,” not an endorsement of quality or even usage.
- Others treat star count as a rough popularity signal, e.g., when choosing between libraries (“10 stars vs 30k stars”).
- Several note that GitHub’s own docs present stars as a way to save repos, but an ecosystem has grown that treats them as clout, traction, or credibility.
Incentives and Star-Gaming
- Stars matter for CVs, perceived legitimacy, VC pitches, open-core traction, and funding; that creates strong incentives to buy or otherwise game them.
- Some see this as a classic prisoner’s dilemma: if gaming is allowed, not gaming becomes a disadvantage.
- Hackathon sponsorships that demand stars from participants and ads pushing repos are cited as manipulative, if not strictly fraudulent.
- One commenter notes the paper’s numbers: millions of suspected fake stars but far fewer unique accounts after de-duplication.
Usefulness of Stars as a Metric
- Many argue stars are a poor quality metric: trivially easy to click, heavily influenced by age, hype, and personality/brand.
- Others still find them useful as a first-pass filter or for sorting search results, especially when entering a new ecosystem.
- There’s skepticism that “N strangers clicked an icon” should ever be treated as a safety or security signal.
Alternative and Composite Signals
- Commonly suggested better indicators:
- Recent commit activity and total commits.
- Open vs. closed issues and PRs; issue resolution patterns.
- Number of contributors and dependency usage / reverse dependencies.
- Clone/download counts or imports (with caveats that these can also be gamed).
- Several propose multi-factor or third‑party “repo quality scores,” but doubt anyone would pay for such a service.
Detection, Defense, and Social/Trust Models
- Some think GitHub should detect and discount fake stars (e.g., only count “active developer” accounts), others argue every rule set is easily automated around.
- Web‑of‑trust ideas (prioritizing stars from people you follow or friends‑of‑friends) are discussed but criticized as gameable, low-signal, or misaligned with how developers actually use GitHub.
- Broader point: all metrics become targets (Goodhart’s law), so users must treat any single number with caution.