Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antonyms substitute #222

Merged
merged 18 commits into from
Oct 11, 2021
Merged

Conversation

ZhexiongLiu
Copy link
Contributor

Add antonyms substitute to the transformation, which facilitates the diversity of the content. Specifically, it will help revert the semantics to an opposite position or uses double negation to express similar semantics.

@kaustubhdhole
Copy link
Collaborator

Hi @ZhexiongLiu thank you for your changes but unfortunately there was already a previous implementation which used antonyms : #150

I would suggest to keep a track of previous PRs (merged as well as unmerged) as well as check some of the suggestions here if they are of any help. #75

You could also mail us if you would like to clarify beforehand about your transformation!

@ZhexiongLiu
Copy link
Contributor Author

Hi @kaustubhdhole, thanks for sharing! I agree that #150 is close to our submission, but #150 only focuses on adjectives; however, our submission is a different one that takes care of adjectives, verbs, and nouns. The most interesting part is the double negation, which is beyond #150. Hope you could reconsider our submission. Thanks!

@ZhexiongLiu ZhexiongLiu reopened this Aug 30, 2021
@kaustubhdhole
Copy link
Collaborator

Okay @ZhexiongLiu apologies! Yes, you are good to go then. In that case, it is better to modify a previous transformation. But it's fine this way.

@ZhexiongLiu
Copy link
Contributor Author

ZhexiongLiu commented Sep 1, 2021

Thanks @kaustubhdhole! Would it be merged into the repo? Is there anything to add, in case you need?

@kaustubhdhole
Copy link
Collaborator

Hi @ZhexiongLiu, we will be shortly assigning reviewers to each of the PRs.

@marco-digio marco-digio mentioned this pull request Sep 14, 2021
- Jing Zhang ([email protected], Emory University)

## What type of a transformation is this?
This transformation could introduce semantic diversity by adding antonyms. Specifically, it will help revert the semantics to an opposite position or uses double negation to express the similar semantics.
Copy link
Contributor

@vyraun vyraun Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you give a few examples as to when the the semantics is reverted and when similar semantics result from the transformation? Just to make the readme better. Also, for its use in sentiment analysis, the transformation has to guarantee that the label is either preserved or altered. Currently, the transformation does not provide this. e.g. even if only one substitution is made, is it guaranteed that the label will be the same? e.g. That restaurant is fantastically awful. --> That restaurant is awfully fantastic. is a label change. But not all transformations will lead to label change, as with the examples in the test cases as well. So, without the label effects it is hard to do automatic augmentation for text classification. @kaustubhdhole If the targeted task is text classification, is it necessary to make the transformation label preserving?

Copy link
Contributor Author

@ZhexiongLiu ZhexiongLiu Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vyraun thanks for your review! I am not sure if we need to preserve the label while doing the transformation, but we could generally assume that an even number of transforms will remain the semantics in terms of sentiment; however, an odd number of transforms will revert the semantics. In your examples, yes it should have label changes but the case was not representative in most cases as the awful and fantastic are anonymous themself, which was not usual. We could develop a fine-grained algorithm focusing on ADJ or ADV only to ensure the preservation of the label but I would assume it was beyond the goal of this project.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if the meaning changes, it is important to have a way to track that. Either by modelling your transformation with a label eg. positive vs negative sentiment, etc. or else it can be hard to use the transformation in a subsequent experiment. Using a sentenceoperation would mean you could basically substitute one sentence for another in ideally any setting. Either you can mention specific tasks (in the README as well as add them in the TaskTypes class) which would be agnostic to antonym changes. Probably, it is hard to think of that since "antonym" theoretically means "opposite meaning" and depending on where in the sentence the antonym is reversed, it can be hard to track the consistency with a task. Another way to be slightly safer (as compared to the above approach) but not ideal again is to use your approach in a question answering setting where you can probably make changes to only words common to both the context and the question?

Copy link
Contributor Author

@ZhexiongLiu ZhexiongLiu Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kaustubhdhole, thanks for your great suggestion! We update our code to ensure the transformation only applies to the sentence that has an even number of revertable words. We call it double negation, which means it would revert back the sentence semantics to its original (e.g. more interested -> less uninterested; often reputable -> infrequently disreputable). We evaluated this transformation and achieved 83% accuracy on imdb dataset and even 91% on QQP dataset for text classification. All code and readme and test cases were updated. I think all our previous concerns are dismissed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZhexiongLiu , guaranteeing label preservation will make the transformation more usable. However, as in the counter-example I gave above, double negation doesn't guarantee label preservation: "That restaurant is fantastically awful." --> "That restaurant is awfully fantastic.", and of course, there could be more contexts when the general rule is not followed, e.g. "Ram is a talented and skilled archer" --> "Ram is a untalented and unskilled archer". Could you add a condition that if the two words in the applied double negation are already antonyms or synonyms, the transformation isn't applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vyraun , I've added the condition that does not apply double negation if two words are already antonyms or synonyms. Now the IMDB results boosted to 93. Great suggestions!

@vyraun
Copy link
Contributor

vyraun commented Sep 27, 2021

Hi @ZhexiongLiu , I am one of the assigned reviewers for this task. Overall, nice contribution! The transformation, mainly intended for data augmentation purposes, substitutes english content words (V, N, Adj, Adv) to their antonyms via wordnet. I think the readme could be improved, please take a look at the comments. For the code, the keywords are missing. Also, it would be nice to add any robustness evaluation (https://github.com/GEM-benchmark/NL-Augmenter/tree/main/evaluation) results, but the point is that we do not know if the transformation is label preserving, so maybe results could be added for the label-preserving subset only? Also, the test cases should be increased to at least 5, based on the review criteria (https://github.com/GEM-benchmark/NL-Augmenter/blob/main/docs/doc.md#review-criteria-for-submissions).

@ZhexiongLiu
Copy link
Contributor Author

ZhexiongLiu commented Sep 27, 2021

Hi @ZhexiongLiu , I am one of the assigned reviewers for this task. Overall, nice contribution! The transformation, mainly intended for data augmentation purposes, substitutes english content words (V, N, Adj, Adv) to their antonyms via wordnet. I think the readme could be improved, please take a look at the comments. For the code, the keywords are missing. Also, it would be nice to add any robustness evaluation (https://github.com/GEM-benchmark/NL-Augmenter/tree/main/evaluation) results, but the point is that we do not know if the transformation is label preserving, so maybe results could be added for the label-preserving subset only? Also, the test cases should be increased to at least 5, based on the review criteria (https://github.com/GEM-benchmark/NL-Augmenter/blob/main/docs/doc.md#review-criteria-for-submissions).

Hi @vyraun, thanks for your suggestion! I have updated the readme to make it more concrete in terms of label preserving as well as keywords in the code. Regarding the robustness, we noticed that it is optional so we evaluate our transformation on imdb dataset for text classification, which achieved 61 accuracies (previously 96). This is because the sentiment labels are not fully matched but we leave this as our further steps as we have no bandwidth available to run a large model currently. As for the test cases, I have increased to 5. Please let us know if you have further questions!

@vyraun
Copy link
Contributor

vyraun commented Oct 3, 2021

Hi @ZhexiongLiu , I am one of the assigned reviewers for this task. Overall, nice contribution! The transformation, mainly intended for data augmentation purposes, substitutes english content words (V, N, Adj, Adv) to their antonyms via wordnet. I think the readme could be improved, please take a look at the comments. For the code, the keywords are missing. Also, it would be nice to add any robustness evaluation (https://github.com/GEM-benchmark/NL-Augmenter/tree/main/evaluation) results, but the point is that we do not know if the transformation is label preserving, so maybe results could be added for the label-preserving subset only? Also, the test cases should be increased to at least 5, based on the review criteria (https://github.com/GEM-benchmark/NL-Augmenter/blob/main/docs/doc.md#review-criteria-for-submissions).

Hi @vyraun, thanks for your suggestion! I have updated the readme to make it more concrete in terms of label preserving as well as keywords in the code. Regarding the robustness, we noticed that it is optional so we evaluate our transformation on imdb dataset for text classification, which achieved 61 accuracies (previously 96). This is because the sentiment labels are not fully matched but we leave this as our further steps as we have no bandwidth available to run a large model currently. As for the test cases, I have increased to 5. Please let us know if you have further questions!

Thanks @ZhexiongLiu for the changes. The PR is much stronger now, please take a look at the above comment for one more suggestion.

@kaustubhdhole kaustubhdhole merged commit 25d6b23 into GEM-benchmark:main Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants