-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github Health Categories and Metrics Suggestions #7
Comments
Thanks, these are some good ideas! |
This issue was moved to ComputationalMystics/ResearchProject#1 |
sgoggins
pushed a commit
that referenced
this issue
Jan 1, 2023
feat: add stubs for complexity metrics
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello Everyone,
In working with @germonprez in his class at the University of Nebraska Omaha, I was asked to identify categories and some metrics associated with them. He suggested posting this here for discussion.
When presented with this problem I first took a step back and looked at how repositories flourish on Github. It is important to note that the code itself is not the only thing of concern. The goal here is to address and rank repositories on what the user is concerned about. From this I derive five key areas that I named:
• Community – Active contributors to the Repository and their growth and activity.
• Code – Quality and reliability of the source code itself.
• Assistance – Quality and helpfulness of issue resolution.
• Adaptability – The ability for the code to have a variety of uses.
• Licensing – Usability of the code.
It is important to segregate these concerns and have the ability to judge them separately as this should help diversify the systems adaptiveness to varying concerns. An entity that only plans on using source internally may not need to consider the license but is concerned about the quality of the code and community to support that code.
Community refers to the active contributors that support a repository. Looking at the activity of each contributor both on the repository and then on Github in general to determine how much time they commit to the repo compared to other projects. This should also look at the interaction, closeness and growth of the contributors over time. A few metrics that would apply to this are:
• number of contributors
• frequency of contributions
• activity level of contributors
• Truck Factor (“The number of developers it would need to lose to destroy its progress.” From the Influence Analysis of Github Repositories)
• Time to become contributor
• Distribution of work across community
• rate of acceptance of new contributions
Code is probably the easiest category and the hardest category to evaluate. Ideally, we want to know that it is routinely kept up to date, is clean and well documented, and will continue to stay that way for the foreseeable future. This is easier said than done, one thing that makes it easier to analyze is the fact that it has the most meta data to work with. A few metrics that would apply are:
• number of updates
• regularity of updates
• time since last update
• number of pull rejections
• number of CVEs and % of age open – https://www.coreinfrastructure.org/programs/census-project
• 3rd party Dependencies (if obtainable) – https://www.coreinfrastructure.org/programs/census-project
• stars
• overall size of the repository / commits
Assistance is exactly what it sounds like. As a user of the code how much assistance can you get in implementing it. Additionally, while this may not be directly relevant to some entities, it is indirectly relevant to everyone. Lack of support leads to lower adoption which also leads to a smaller set of stakeholders willing to keep it going. A few metrics that would apply are:
• number of open issues / label
• time to close
• communication level (response time and number)
Adaptability refers to the degree that the project could be easily adapted to your specific needs. While this is very useful it’s also extremely hard to determine from the metrics. However, I believe a couple could give small indirect indications of flexibility. The first is the number of forks in the repository followed by the number of downloads. A large number of forks with lower downloads tend indicate a useful code that can be expanded upon in many ways. Where a low number of forks but large number of downloads may indicate a project that is specific but widely useful. More research will need to be done to identify and refine these assumptions.
Licensing which is in reference to the usability of the code. More restrictive licenses may be a turn off or may just require more adaptability and community to be viable. A couple metrics for licenses would be:
• Is there a license
• Number of licenses
• Flexibility of licenses
The text was updated successfully, but these errors were encountered: