A question #2

GCTTTTTT · 2022-07-07T10:30:33Z

I want to ask that what is the origin of predicted_label in MAG_candidates.json?

yuzhimanhua · 2022-07-09T01:39:30Z

Hi,

Those "predicted labels" come from exact name matching and BM25 retrieval. You can refer to Section 3.2 in our paper (https://arxiv.org/pdf/2202.05932.pdf).

The contribution of BM25 in the retrieval stage is not very significant. That being said, if you want to approximately get the "predicted labels", you can implement a very simple exact name matching strategy. Specifically, if the name of a label appears in a document, it will be added to the "predicted labels". The result of this strategy should approximate what we show in MAG_candidates.json well.

GCTTTTTT · 2022-07-17T09:36:59Z

Hello, I want to ask that whether "venue","author","reference" and "citation" properties are required to run this model in {dataset}_test.json and {dataset}_train.json

yuzhimanhua · 2022-07-18T23:49:33Z

Hi,

These fields are NOT required in {dataset}_test.json, but they are required in {dataset}_train.json.

If your own datasets do not have such metadata information, you can use our MAG_train.json or PubMed_train.json for training and your own test set for testing. However, I cannot guarantee our model's performance in such a "transfer learning" setting.

GCTTTTTT · 2022-07-19T04:02:13Z

oh thanks, but If I use MAG_train.json for training and testing my own test set, whether the {dataset} _label.json and the {dataset} _candidates.json should correspond to my own test set?

yuzhimanhua · 2022-07-20T17:06:23Z

Yes, those two json files should correspond to your own test set.

If you do not have ground truth labels and just want to do predictions, you can remove the last line in run.sh. https://github.com/yuzhimanhua/MICoL/blob/master/run.sh#L12

GCTTTTTT · 2022-07-23T16:17:52Z

Hello！Thanks for your patient answer! I use my own data in test.json and those two json files, the prepare.sh seens successfully runned but the run.sh had some Errors as follow. What maybe the reason of the errors?

Namespace(adam_epsilon=1e-08, architecture='cross', bert_model='scibert_scivocab_uncased/', eval=False, eval_batch_size=128, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, max_contexts_length=256, max_grad_norm=1.0, max_response_length=256, model_type='bert', num_train_epochs=1.0, output_dir='MAG_output/', poly_m=0, print_freq=500, seed=12345, test_file='MAG_input/test.txt', train_batch_size=4, train_dir='MAG_input/', use_pretrain=True, warmup_steps=100, weight_decay=0.01)
Traceback (most recent call last):
File "main.py", line 158, in
tokenizer = TokenizerClass.from_pretrained(os.path.join(args.bert_model, "vocab.txt"), do_lower_case=True, clean_text=False)
File "/home/hxx/miniconda3/envs/pytorch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1653, in from_pretrained
f"Calling {cls.name}.from_pretrained() with the path to a single file or url is not "
ValueError: Calling BertTokenizerFast.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.
Namespace(adam_epsilon=1e-08, architecture='cross', bert_model='scibert_scivocab_uncased/', eval=True, eval_batch_size=128, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, max_contexts_length=256, max_grad_norm=1.0, max_response_length=256, model_type='bert', num_train_epochs=1.0, output_dir='MAG_output/', poly_m=0, print_freq=500, seed=12345, test_file='MAG_input/test.txt', train_batch_size=4, train_dir='MAG_input/', use_pretrain=True, warmup_steps=100, weight_decay=0.01)
Traceback (most recent call last):
File "main.py", line 158, in
tokenizer = TokenizerClass.from_pretrained(os.path.join(args.bert_model, "vocab.txt"), do_lower_case=True, clean_text=False)
File "/home/hxx/miniconda3/envs/pytorch/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1653, in from_pretrained
f"Calling {cls.name}.from_pretrained() with the path to a single file or url is not "
ValueError: Calling BertTokenizerFast.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

yuzhimanhua · 2022-08-01T22:43:34Z

Hello,

Sorry for my late reply. I re-ran the code from my side and it worked well, so I am not quite sure about the reason. I guess it is still due to the package version issues. Could you please try to switch to Python 3.6 and refer to https://github.com/yuzhimanhua/MICoL/blob/master/requirements.txt for the versions of torch and transformers?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question #2

A question #2

GCTTTTTT commented Jul 7, 2022

yuzhimanhua commented Jul 9, 2022

GCTTTTTT commented Jul 17, 2022

yuzhimanhua commented Jul 18, 2022

GCTTTTTT commented Jul 19, 2022

yuzhimanhua commented Jul 20, 2022

GCTTTTTT commented Jul 23, 2022

yuzhimanhua commented Aug 1, 2022 •

edited

Loading

A question #2

A question #2

Comments

GCTTTTTT commented Jul 7, 2022

yuzhimanhua commented Jul 9, 2022

GCTTTTTT commented Jul 17, 2022

yuzhimanhua commented Jul 18, 2022

GCTTTTTT commented Jul 19, 2022

yuzhimanhua commented Jul 20, 2022

GCTTTTTT commented Jul 23, 2022

yuzhimanhua commented Aug 1, 2022 • edited Loading

yuzhimanhua commented Aug 1, 2022 •

edited

Loading