Skip to content

Latest commit

 

History

History
27 lines (15 loc) · 1.25 KB

README.md

File metadata and controls

27 lines (15 loc) · 1.25 KB

github-classifier

short description

This repository contains a deep-learning based classification tool for software repositories. The tool utilizes the ecore metamodel 'type graph' and a graph convolutional network. To use the tool, run 'main.py' after adding the directory containing the repositories you want to classify.

If you want to train the tool with different labels, replace the current labels with your own (or add them to the labels) in GraphClasses.py, and in function 'multi_hot_encoding' in Encoder.py. Optionally also in function 'count_class_elements' in CustomDataset.py if you want to know the number of samples in each class in your dataset. The labels in the tool are not mutually exclusive and are multi-hot encoded.

Currently, the tool only processes Python files.

labels

Application, Framework, Library, Plugin

data

Dataset with Python software repositories from GitHub, all with a dependency on at least one ML library. The labeled repositories the tool is trained with are in data/labeled_dataset_repos.xlsx.

requirements

pyecore~=0.14.0 or higher versions

autopep8

GRaViTY tool for visualizing the metamodels, see "https://github.com/GRaViTY-Tool/gravity-tool?tab=readme-ov-file" for instructions on how to install the tool