Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pass use_fast parameter to get_tokenizer #106

Merged
merged 10 commits into from
Jul 21, 2021

Conversation

nikitajz
Copy link
Contributor

@nikitajz nikitajz commented Jul 16, 2021

A proposed solution for the issue #105

Copy link
Collaborator

@felixgwu felixgwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! There are only a few tiny changes that need to be made.

bert_score/utils.py Show resolved Hide resolved
tests/test_score_function.py Show resolved Hide resolved
@nikitajz
Copy link
Contributor Author

nikitajz commented Jul 19, 2021

@felixgwu after a few attempts I'm still getting failed tests due to OOM (OSError: [Errno 12] Cannot allocate memory) even for old tests without a fast tokenizer. I've seen the related issue but lost the link that Travis CI decreased memory limit to 3.5 Gb (from 7Gb) which might be related. Though tests pass successfully on my laptop (except for one unrelated test due to commented model 'scibert-scivocab-uncased' which I mentioned earlier).
Apart from that, seems get_idf_dict worth refactoring (at least for fast tokenizers) to use tokenizer.encode_batch instead of paralleling tokenizer.encode, see related issue

@felixgwu felixgwu marked this pull request as ready for review July 19, 2021 19:59
Copy link
Collaborator

@felixgwu felixgwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me now. I can either merge it or wait if you would like to spend more time on fixing the Travis CI. I really appreciate your help!

@nikitajz
Copy link
Contributor Author

ok, let see if I can devote more time to fixing the tests, otherwise, feel free to merge it.
Also, I'd recommend changing the model in the tests from roberta-large to something lighter, e.g. distilroberta-base

@nikitajz
Copy link
Contributor Author

I've refactored the tests a bit. I also added the missing assert statement to test_multi_refs_working, please check if it is correct. I temporarily added skip to test test_score_en_sci which fails due to commented model in utils. All tests pass successfully locally.
Unfortunately, I didn't fix failing tests on Travis CI quickly. As far as I can tell, there are two possible options:

  • migrate to another CI hosting with more powerful VMs in the free tier
  • replace model roberta-large in the test with lightweight, something like distilroberta-base, but this requires also changing some tests behaviour, e.g. test_multi_refs where a model is not specified explicitly, but chosen by language specified. Hence I didn't proceed with this option.

@felixgwu felixgwu merged commit 2a40716 into Tiiiger:master Jul 21, 2021
@felixgwu
Copy link
Collaborator

It looked good to me, so I just merged them. I would like to keep using roberta-large in the tests for now since it is the default model. Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants