Show details
- numpy
- six
- nltk
- experiment-impact-tracker
- scikit-learn
- pandas
- flake8==3.8.3
- spacy==2.3.0
- tb-nightly==2.3.0a20200621
- tensorboard-plugin-wit==1.6.0.post3
- torch==1.6.0
- torchtext==0.4.0
- torchvision==0.7.0
- tqdm==4.46.1
- OpenNMT-py==2.0.0rc2
- Create a virtual environment with python3.6 installed(
virtualenv
) git clone --recursive https://github.com/magnumresearchgroup/Magnum-NLC2CMD.git
- use
pip3 install -r requirements.txt
to install the two requirements files.
- Run
python3 main.py --mode preprocess --data_dir src/data --data_file nl2bash-data.json
andcd src/model && onmt_build_vocab -config nl2cmd.yaml -n_sample 10347 --src_vocab_threshold 2 --tgt_vocab_threshold 2
to process raw data. - You can also download the Original raw data here
cd src/model && onmt_train -config nl2cmd.yaml
- Modify the
world_size
insrc/model/nl2cmd.yaml
to the number of GPUs you are using and put the ids asgpu_ranks
. - You can also download one of our pre-trained model here
onmt_translate -model src/model/run/model_step_2000.pt -src src/data/invocations_proccess_test.txt -output pred_2000.txt -gpu 0 -verbose
-
python3 main.py --mode eval --annotation_filepath src/data/test_data.json --params_filepath src/configs/core/evaluation_params.json --output_folderpath src/logs --model_dir src/model/run --model_file model_step_2400.pt model_step_2500.pt
-
You can change the
gpu=-1
insrc/model/predict.py
togpu=0
, and replace the code insrc/model/predict.py
accordingly with the following code for faster inference timeShow details
invocations = [' '.join(tokenize_eng(i)) for i in invocations] translated = translator.translate(invocations, batch_size=n_batch) commands = [t[:result_cnt] for t in translated[1]] confidences = [ np.exp( list(map(lambda x:x.item(), t[:result_cnt])) )/2 for t in translated[0]] for i in range(len(confidences)): confidences[i][0] = 1.0
𝑆𝑐𝑜𝑟𝑒(𝐴(𝑛𝑙𝑐))=max𝑝∈𝐴(𝑛𝑙𝑐)𝑆(𝑝) if ∃𝑝∈𝐴(𝑛𝑙𝑐) such that 𝑆(𝑝)>0;
𝑆𝑐𝑜𝑟𝑒(𝐴(𝑛𝑙𝑐))=1|𝐴(𝑛𝑙𝑐)|∑𝑝∈𝐴(𝑛𝑙𝑐)𝑆(𝑝) otherwise.
- We used 2x
Nvidia 2080Ti GPU
+ 64G memory machine runningUbuntu 18.04 LTS
- Change the
batch_size
innl2cmd.yaml
to the largest your GPU can support withoutOOM error
- Train multiple models by modify
seed
innl2cmd.yaml
, you should also modify thesave_model
to avoid overwrite existing models. - Hand pick the best performed ones on local test set and put their directories in the main.py
This work was supported in part by NSF Award# 1552836, At-scale analysis of issues in cyber-security and software engineering.
See the LICENSE file for license rights and limitations (MIT).