-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
French Verbs Transformation #250
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Verb Synonym Substitution 🦎 + ⌨️ → 🐍 | ||
|
||
|
||
This transformation change some words with synonyms according to if their POS tag is a VERB for simple french sentences. It requires Spacy_lefff (an extention of spacy for french POS and lemmatizing) and nltk package with the open multilingual wordnet dictionary. | ||
|
||
Authors : Lisa Barthe and Louanes Hamla from Fablab by Inetum in Paris | ||
|
||
## What type of transformation it is ? | ||
This transformation allows to create paraphrases with a different word in french. The general meaning of the sentence remains but it can be declined on different paraphrases with one verb variation. | ||
|
||
## Supported Task | ||
|
||
This perturbation can be used for any French task. | ||
|
||
## What does it intend to benefit ? | ||
|
||
This perturbation would benefit all tasks which have a sentence/paragraph/document as input like text classification, text generation, etc. that requires synthetic data augmentation / diversification. | ||
|
||
## What are the limitation of this transformation ? | ||
This tool does not take the general context into account, sometimes, the ouput will not match the general sense of te sentence. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .transformation import * | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
{ | ||
"type": "french_verbs_transformation", | ||
"test_cases": [ | ||
|
||
{ | ||
"class": "FrenchVerbsSynonymTransformation", | ||
"inputs": { | ||
"sentence": "je vais finir ce devoir avant demain" | ||
}, | ||
"outputs": [{ | ||
"sentence": "je vais terminer ce devoir avant demain" | ||
}] | ||
|
||
}, | ||
|
||
{ | ||
"class": "FrenchVerbsSynonymTransformation", | ||
"inputs": { | ||
"sentence": "Puis-je entrer ? Cela fait 10 minutes que je suis en face." | ||
}, | ||
"outputs": [{ | ||
"sentence": "Puis-je venir ? Cela fait 10 minutes que je suis en face." | ||
}] | ||
|
||
}, | ||
|
||
{ | ||
"class": "FrenchVerbsSynonymTransformation", | ||
"inputs": { | ||
"sentence": "Les psychologues vont devoir calmer les tensions" | ||
}, | ||
"outputs": [{ | ||
"sentence": "Les psychologues vont devoir soulager les tensions" | ||
}] | ||
|
||
}, | ||
|
||
|
||
{ | ||
"class": "FrenchVerbsSynonymTransformation", | ||
"inputs": { | ||
"sentence": "J'ai enfin pu faire remorquer la voiture !" | ||
}, | ||
"outputs": [{ | ||
"sentence": "J'ai enfin pu faire rouler la voiture !" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if you only change one verb per instance, i.e., you only generate one additional sentence per instance? |
||
}] | ||
|
||
} | ||
|
||
|
||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
from textblob import TextBlob, Blobber, Word | ||
import re | ||
from textblob_fr import PatternTagger, PatternAnalyzer | ||
import nltk | ||
nltk.download('wordnet') | ||
from textblob.wordnet import NOUN, VERB, ADV, ADJ | ||
import spacy | ||
from spacy_lefff import LefffLemmatizer, POSTagger | ||
from spacy.language import Language | ||
from nltk.corpus import wordnet | ||
import nltk | ||
nltk.download('omw') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe you might add the nltk in a similar way to spacy in initialize.py |
||
|
||
from interfaces.SentenceOperation import SentenceOperation | ||
from tasks.TaskTypes import TaskType | ||
|
||
@Language.factory('french_lemmatizer') | ||
def create_french_lemmatizer(nlp, name): | ||
return LefffLemmatizer() | ||
|
||
@Language.factory('POSTagger') | ||
def create_POSTagger(nlp, name): | ||
return POSTagger() | ||
|
||
|
||
nlp = spacy.load('fr_core_news_md') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might want to use spacy like this. |
||
|
||
nlp.add_pipe('POSTagger', name ='pos') | ||
nlp.add_pipe('french_lemmatizer', name='lefff', after='pos') | ||
|
||
def synonym_transformation(text): | ||
doc = nlp(text) | ||
verbs = [d.text for d in doc if d.pos_ == "VERB"] | ||
synonyms_verb_list = [] | ||
for i in verbs : | ||
dict_verb_synonyms = {} | ||
dict_verb_synonyms['verb'] = i | ||
dict_verb_synonyms['synonyms'] = list(set([l.name() for syn in wordnet.synsets(i, lang = 'fra', pos = VERB) for l in syn.lemmas('fra')])) | ||
if len(dict_verb_synonyms['synonyms']) > 0: | ||
synonyms_verb_list.append(dict_verb_synonyms) | ||
valid_verb_list = [] | ||
for j in synonyms_verb_list: | ||
for k in j['synonyms']: | ||
valid_verb_dict = {} | ||
valid_verb_dict['verb'] = j['verb'] | ||
valid_verb_dict['syn'] = k | ||
if nlp(j['verb']).similarity(nlp(k)) > .60 and not nlp(j['verb']).similarity(nlp(k)) >= .999: | ||
valid_verb_list.append(valid_verb_dict) | ||
text_verb_generated = [] | ||
pertu=[] | ||
for l in valid_verb_list: | ||
text_verb_generated.append(text.replace(l['verb'], l['syn'])) | ||
text_verb_generated.sort(reverse=True) | ||
for sent in text_verb_generated: | ||
if nlp(text).similarity(nlp(i)) > .10 and not nlp(text).similarity(nlp(i)) >= .999: | ||
pertu.append(sent) | ||
break | ||
|
||
return pertu | ||
|
||
|
||
|
||
|
||
|
||
class FrenchVerbsSynonymTransformation(SentenceOperation): | ||
tasks = [ | ||
TaskType.TEXT_CLASSIFICATION, | ||
TaskType.TEXT_TO_TEXT_GENERATION, | ||
TaskType.TEXT_TAGGING, | ||
] | ||
languages = ["fr"] | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add the some keywords too. |
||
def __init__(self, seed=0, max_outputs=1): | ||
super().__init__(seed, max_outputs=max_outputs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't use the param max_outputs. It means that you are generating all possible candidates? |
||
|
||
def generate(self, sentence : str): | ||
perturbed_texts = synonym_transformation( | ||
sentence | ||
) | ||
print("perturbed text inside of class",perturbed_texts) | ||
return perturbed_texts | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add email. Maybe, you can use this style: