Skip to content

emu-se/CodeCSE

Repository files navigation

CodeCSE

A simple pre-trained model for code and comment embeddings using contrastive learning. The pretrained model is hosted at https://huggingface.co/sjiang1/codecse Please check the inference script for how to download/use it.

Environment

This model was trained and tested in Python 3.9. The dependencies are in the requirements.txt. This repository uses CodeBERT/GraphCodeBERT for data preparation. To initialize the submodule:

git submodule init
git submodule update

Inference

Run the example script for inference:

GCB_PATH=./CodeBERT/GraphCodeBERT/codesearch \
PYTHONPATH=./CodeBERT/GraphCodeBERT/codesearch:./codecse:$PYTHONPATH \
python inference.py

Note: GraphCodeBERT is put at the beginning of PATH because Python has an internal 'parser' module, which conflicts with the package 'parser' in GraphCodeBERT/codesearch.

Troubleshooting

Error: 'parser/my-languages.so' (not a mach-o file)

The error message below means that the built file 'my-language.so' doesn't work on your machine.

OSError: dlopen(parser/my-languages.so, 0x0006): 
tried: 'parser/my-languages.so' (not a mach-o file), 
'/path/to/CodeCSE/CodeBERT/GraphCodeBERT/codesearch/parser/my-languages.so' (not a mach-o file)

To rebuild 'my-language.so', please follow the instructions in GraphCodeBERT/codesearch#tree-sitter-optional.

Error: 'ValueError: Incompatible Language version XX. Must be between YY and ZZ'

The error message means that the tree-sitter package pip installed is not compatible with the built 'parser/my-languages.so'. Upgrade the tree-sitter will solve this problem.

python -m pip install tree-sitter --upgrade

About

The public repository for CodeCSE.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages