-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A question #2
Comments
Hi, Those "predicted labels" come from exact name matching and BM25 retrieval. You can refer to Section 3.2 in our paper (https://arxiv.org/pdf/2202.05932.pdf). The contribution of BM25 in the retrieval stage is not very significant. That being said, if you want to approximately get the "predicted labels", you can implement a very simple exact name matching strategy. Specifically, if the name of a label appears in a document, it will be added to the "predicted labels". The result of this strategy should approximate what we show in MAG_candidates.json well. |
Hello, I want to ask that whether "venue","author","reference" and "citation" properties are required to run this model in {dataset}_test.json and {dataset}_train.json |
Hi, These fields are NOT required in {dataset}_test.json, but they are required in {dataset}_train.json. If your own datasets do not have such metadata information, you can use our MAG_train.json or PubMed_train.json for training and your own test set for testing. However, I cannot guarantee our model's performance in such a "transfer learning" setting. |
oh thanks, but If I use MAG_train.json for training and testing my own test set, whether the {dataset} _label.json and the {dataset} _candidates.json should correspond to my own test set? |
Yes, those two json files should correspond to your own test set. If you do not have ground truth labels and just want to do predictions, you can remove the last line in |
Hello!Thanks for your patient answer! I use my own data in test.json and those two json files, the prepare.sh seens successfully runned but the run.sh had some Errors as follow. What maybe the reason of the errors? Namespace(adam_epsilon=1e-08, architecture='cross', bert_model='scibert_scivocab_uncased/', eval=False, eval_batch_size=128, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, max_contexts_length=256, max_grad_norm=1.0, max_response_length=256, model_type='bert', num_train_epochs=1.0, output_dir='MAG_output/', poly_m=0, print_freq=500, seed=12345, test_file='MAG_input/test.txt', train_batch_size=4, train_dir='MAG_input/', use_pretrain=True, warmup_steps=100, weight_decay=0.01) |
Hello, Sorry for my late reply. I re-ran the code from my side and it worked well, so I am not quite sure about the reason. I guess it is still due to the package version issues. Could you please try to switch to Python 3.6 and refer to https://github.com/yuzhimanhua/MICoL/blob/master/requirements.txt for the versions of torch and transformers? Thanks! |
I want to ask that what is the origin of predicted_label in MAG_candidates.json?
The text was updated successfully, but these errors were encountered: