Github Health Categories and Metrics Suggestions #7

Ashkeelun · 2017-01-26T15:08:50Z

Hello Everyone,

In working with @germonprez in his class at the University of Nebraska Omaha, I was asked to identify categories and some metrics associated with them. He suggested posting this here for discussion.

When presented with this problem I first took a step back and looked at how repositories flourish on Github. It is important to note that the code itself is not the only thing of concern. The goal here is to address and rank repositories on what the user is concerned about. From this I derive five key areas that I named:
• Community – Active contributors to the Repository and their growth and activity.
• Code – Quality and reliability of the source code itself.
• Assistance – Quality and helpfulness of issue resolution.
• Adaptability – The ability for the code to have a variety of uses.
• Licensing – Usability of the code.

It is important to segregate these concerns and have the ability to judge them separately as this should help diversify the systems adaptiveness to varying concerns. An entity that only plans on using source internally may not need to consider the license but is concerned about the quality of the code and community to support that code.

Community refers to the active contributors that support a repository. Looking at the activity of each contributor both on the repository and then on Github in general to determine how much time they commit to the repo compared to other projects. This should also look at the interaction, closeness and growth of the contributors over time. A few metrics that would apply to this are:
• number of contributors
• frequency of contributions
• activity level of contributors
• Truck Factor (“The number of developers it would need to lose to destroy its progress.” From the Influence Analysis of Github Repositories)
• Time to become contributor
• Distribution of work across community
• rate of acceptance of new contributions

Code is probably the easiest category and the hardest category to evaluate. Ideally, we want to know that it is routinely kept up to date, is clean and well documented, and will continue to stay that way for the foreseeable future. This is easier said than done, one thing that makes it easier to analyze is the fact that it has the most meta data to work with. A few metrics that would apply are:
• number of updates
• regularity of updates
• time since last update
• number of pull rejections
• number of CVEs and % of age open – https://www.coreinfrastructure.org/programs/census-project
• 3rd party Dependencies (if obtainable) – https://www.coreinfrastructure.org/programs/census-project
• stars
• overall size of the repository / commits

Assistance is exactly what it sounds like. As a user of the code how much assistance can you get in implementing it. Additionally, while this may not be directly relevant to some entities, it is indirectly relevant to everyone. Lack of support leads to lower adoption which also leads to a smaller set of stakeholders willing to keep it going. A few metrics that would apply are:
• number of open issues / label
• time to close
• communication level (response time and number)

Adaptability refers to the degree that the project could be easily adapted to your specific needs. While this is very useful it’s also extremely hard to determine from the metrics. However, I believe a couple could give small indirect indications of flexibility. The first is the number of forks in the repository followed by the number of downloads. A large number of forks with lower downloads tend indicate a useful code that can be expanded upon in many ways. Where a low number of forks but large number of downloads may indicate a project that is specific but widely useful. More research will need to be done to identify and refine these assumptions.

Licensing which is in reference to the usability of the code. More restrictive licenses may be a turn off or may just require more adaptability and community to be viable. A couple metrics for licenses would be:
• Is there a license
• Number of licenses
• Flexibility of licenses

abuhman · 2017-01-26T20:59:34Z

Thanks, these are some good ideas!

GeorgLink · 2017-01-27T14:34:28Z

This issue was moved to ComputationalMystics/ResearchProject#1

Updated Readme for sprint 3

Sprint2 to main

feat: add stubs for complexity metrics

GeorgLink mentioned this issue Jan 27, 2017

Github Health Categories and Metrics Suggestions ComputationalMystics/ResearchProject#1

Closed

GeorgLink closed this as completed Jan 27, 2017

sgoggins pushed a commit that referenced this issue Jan 4, 2021

Merge pull request #7 from malkrc/gophers-frontend

6ba189d

Updated Readme for sprint 3

sgoggins pushed a commit that referenced this issue Dec 13, 2021

Merge pull request #7 from JamesDonovan1/Sprint2

7d6e254

Sprint2 to main

sgoggins pushed a commit that referenced this issue Jan 1, 2023

Merge pull request #7 from isaacwengler/eta-complexity-metrics

1b7de6d

feat: add stubs for complexity metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Github Health Categories and Metrics Suggestions #7

Github Health Categories and Metrics Suggestions #7

Ashkeelun commented Jan 26, 2017

abuhman commented Jan 26, 2017

GeorgLink commented Jan 27, 2017

Github Health Categories and Metrics Suggestions #7

Github Health Categories and Metrics Suggestions #7

Comments

Ashkeelun commented Jan 26, 2017

abuhman commented Jan 26, 2017

GeorgLink commented Jan 27, 2017