Machine learning code snippets semantic classification

This repository contains the source code of experiments from the paper "Machine learning code snippets semantic classification" (Valeriy Berezovskiy, Anastasia Gorodilova, Ekaterina Trofimova, Andrey Ustyuzhanin).

Preparation

Start by cloning the repository:

git clone https://github.com/vorobeevich/ml-snippets-classification

We highly recommend using conda for experiments: Anaconda.

After installation, make a new environment:

conda create --name cssc

conda activate cssc

Install the libraries from the requirements.txt. Torch versions may differ depending on your GPU: Start Locally | PyTorch

Data

Download the marked up data (7947 snippets), as well as the result of the partition algorithm from our Google Drive:

chmod 777 /src/scripts/load_data.sh

./src/scripts/load_data.sh

You can download the full version of Code4ML dataset (marked up data, a total set of 2.5 million snippets, our model predictions on all data) on Zenodo:

Also, you can read the paper about Code4ML Dataset: Code4ML: a Large-scale Dataset of annotated Machine Learning Code.

Usage

To reproduce any experiment from our paper, it is enough to run the training script with the desired config. Note that the result is non-deterministic (even with a fixed random seed) on various platforms due to the nature of libraries such as torch.

python src/scripts/train.py --device [ID OF CUDA DEVICE] --config src/configs/[CHOOSE CONFIG TO RUN]

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
notebooks		notebooks
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine learning code snippets semantic classification

Preparation

Data

Usage

About

Releases

Packages

Contributors 3

Languages

License

vorobeevich/ml-snippets-classification

Folders and files

Latest commit

History

Repository files navigation

Machine learning code snippets semantic classification

Preparation

Data

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages