GEM-benchmark · AudayBerro · Aug 31, 2021 · Aug 31, 2021 · Sep 2, 2021 · Sep 4, 2021
diff --git a/transformations/multi_pivot_paraphrases_generation/README.md b/transformations/multi_pivot_paraphrases_generation/README.md
@@ -0,0 +1,44 @@
+# From one English Sentence to a list of paraphrases 🦎  + ⌨️ → 🐍
+This transformation generates a list of paraphrases for an English sentence by leveraging Pivot-Transaltion approach.
+Pivot-Transaltion is an approach where a sentence in a source language is translated to a foreign language called the pivot language then translated back to the source language to get a paraprhase candidate, e.g. translate an English sentence to French, then translate back to English.
+
+The paraphrases generation is divided into two step:
+- Step 1: paraphrases Candidate Over-generation by leveraging Pivot-Transaltion. At this step, we generate a Pool of possible parparhases.
+- Step 2: apply a candidate selection over the Pool of paraphrases, since the pool can contain semantically unrelated or duplicate paraphrases.
+    We leverage Embedding Model such as Universal Sentence Encoder~(USE) to disqualify candidate paraphrases from the pool, by computing the Cosine Similarity socres of the
+    USE Embeddings between the reference sentence and the candidate paraphrase. Let R = USE_Embeding(reference_english_sentence) and P = USE_Embeding(candidate):
+    - if Cosine(R,P) < alpha => the candidate is semantically unrelated and then removed from the final list of paraphrases
+    - if Cosine(R,P) > beta => the candidate is a duplication and then removed from the final list of paraphrases
+    - By default Alpha=0.5 and Beta=0.95, we set the value as suggested by [Parikh et al.](https://arxiv.org/pdf/2004.03484.pdf) works
+
+Please refer to the test.json for all of the test cases catered.
+
+This transformation translates an English sentence to a list of predefined languages using Huggingface MariamMT and EasyNMT as Machine Transaltion models.
+- The transformation support Two Pivot-Transaltion Level.
+    - If Pivot-level = 1 => Transalte to only one foreign language. e.g. English -> French -> English  ||  English -> Arabic -> English  ||  English -> japanese -> English
+    - If Pivot-level = 2 => Transalte to only Two foreign language. e.g. English -> French -> Arabic -> English  ||  English -> Russian -> Chinese -> English
+
+Author name: Auday Berro ([email protected])
+
+## What type of a transformation is this?
+This transformation is a paraphrase generation for Natural English Sentences by lveraging Pivot-Transaltion techniques. The Pivot-Trnasaltion technique allow to get lexically and syntaxically diverse paraphrases.
+
+## What tasks does it intend to benefit?
+This transformation would benefit all tasks with a sentence as input like question generation, sentence generation, etc.
+
+## What are the limitations of this transformation?
+
+1. The transformation does not generate paraphrases for non-English sentences, e.g. Can't generate paraphrases for German or Chinese sentences
+
+2. This transformation only generate paraphrases for Natural Language English sentences.
+
+## Previous Work
+
+
+2) This work is partly inspired by the following work on robustness for Machine Translation:
+```bibtex
+@article{berroextensible,
+  title={An Extensible and Reusable Pipeline for Automated Utterance Paraphrases},
+  author={Berro, Auday and Zade, Mohammad-Ali Yaghub and Baez, Marcos and Benatallah, Boualem and Benabdeslem, Khalid}
+}
+```
diff --git a/transformations/multi_pivot_paraphrases_generation/__init__.py b/transformations/multi_pivot_paraphrases_generation/__init__.py
@@ -0,0 +1 @@
+from .transformation import *
diff --git a/transformations/multi_pivot_paraphrases_generation/constants.py b/transformations/multi_pivot_paraphrases_generation/constants.py
@@ -0,0 +1,16 @@
+# Huggign Face Marian Machine Translator Model to load. Set of Tuples in the form: tuple=(Source-2-target languages pairs, Huggingface MarianMT Helsinki-NLP model)
+HUGGINGFACE_MARIANMT_MODELS_TO_LOAD = {
+    ('en2romance','Helsinki-NLP/opus-mt-en-ROMANCE'),
+    ('romance2en','Helsinki-NLP/opus-mt-ROMANCE-en'),
+    ('de2en','Helsinki-NLP/opus-mt-de-en'),
+    ('ru2en','Helsinki-NLP/opus-mt-ru-en'),
+    ('en2ar','Helsinki-NLP/opus-mt-en-ar'),
+    ('en2zh','Helsinki-NLP/opus-mt-en-zh'),
+    ('en2jap','Helsinki-NLP/opus-mt-en-jap'),
+    ('en2ru','Helsinki-NLP/opus-mt-en-ru'),
+    ('en2de','Helsinki-NLP/opus-mt-en-de'),
+    ('zh2en','Helsinki-NLP/opus-mt-zh-en')
+  }
+
+
+EASYNMT_MODEL_NAME = 'm2m_100_418M'
diff --git a/transformations/multi_pivot_paraphrases_generation/easy_nmt.py b/transformations/multi_pivot_paraphrases_generation/easy_nmt.py
@@ -0,0 +1,22 @@
+""" EasyNMT - Easy to use, state-of-the-art Neural Machine Translation - https://github.com/UKPLab/EasyNMT """
+from easynmt import EasyNMT
+
+def load_easynmt_model(model_name='m2m_100_418M'):
+    """
+    EasyNMT model to load
+    :param model_name: name of the model to load - List of supported model visit: https://github.com/UKPLab/EasyNMT#available-models 
+    :return EasyNMT Machine translation model
+    """
+
+    return EasyNMT(model_name)
+
+def get_easynmt_translation(sentence,model,target_lang,source_lang=None):
+    """
+    Translate a sentence
+    :param sentence: sentence to translate
+    :param model: EasyNMT model
+    :param trg: Target language for the translation
+    :param source_lang: Source language for the translation. If None, determines the source languages automatically.
+    :return Translated sentence 
+    """
+    return model.translate(sentence, source_lang=source_lang, target_lang=target_lang)
diff --git a/transformations/multi_pivot_paraphrases_generation/requirements.txt b/transformations/multi_pivot_paraphrases_generation/requirements.txt
@@ -0,0 +1,2 @@
+EasyNMT
+numpy
diff --git a/transformations/multi_pivot_paraphrases_generation/test.json b/transformations/multi_pivot_paraphrases_generation/test.json
@@ -0,0 +1,206 @@
+{
+    "type": "multi_pivot_paraphrases_generation",
+    "test_cases": [
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "How does COVID-19 spread?"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "How is COVID-19 disseminated?"
+          },
+          {
+            "Paraphrase": "How is COVID-19 spread?"
+          },
+          {
+            "Paraphrase": "How did COVID-19 spread?"
+          },
+          {
+            "Paraphrase": "How is COVID-19 spreading?"
+          },
+          {
+            "Paraphrase": "How does COVID-19 spread?"
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "Book a flight from Lyon to Sydney?"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "To book a flight from Lyon to Sydney?"
+          },
+          {
+            "Paraphrase": "Have you booked a flight from Lyon to Sydney?"
+          },
+          {
+            "Paraphrase": "What is the journey from Lyon to Sydney?"
+          },
+          {
+            "Paraphrase": "Book a flight from Lyon to Sydney?"
+          },
+          {
+            "Paraphrase": "Are you booking a flight from Lyon to Sydney?"
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "Reserve an Italian Restaurant near Paris"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "Reserve an Italian restaurant near Paris"
+          },
+          {
+            "Paraphrase": "Italian restaurants near Paris"
+          },
+          {
+            "Paraphrase": "Book an Italian restaurant near Paris"
+          },
+          {
+            "Paraphrase": "It's a reservation at the Italian restaurant near Paris."
+          },
+          {
+            "Paraphrase": "Save the Italian restaurant near Paris."
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "how many 10 euros are worth in dollars"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "how many 10 euros are worth in dollars"
+          },
+          {
+            "Paraphrase": "how much 10 euros are worth in dollars"
+          },
+          {
+            "Paraphrase": "10 Euros in Dollars."
+          },
+          {
+            "Paraphrase": "How many Euros are worth in United States dollars?"
+          },
+          {
+            "Paraphrase": "How much is 10 euros in dollars?"
+          },
+          {
+            "Paraphrase": "how many 10 euros is worth in dollars"
+          },
+          {
+            "Paraphrase": "how many 10 euros in dollars are worth"
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "which company makes the ipod?"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "Which company is making iPods?"
+          },
+          {
+            "Paraphrase": "What company does the iPod make?"
+          },
+          {
+            "Paraphrase": "Which company does the ipod?"
+          },
+          {
+            "Paraphrase": "What kind of company does an iPod?"
+          },
+          {
+            "Paraphrase": "Which company manufactures ipods?"
+          },
+          {
+            "Paraphrase": "What company does the iPod do?"
+          },
+          {
+            "Paraphrase": "Which company makes the iPod?"
+          },
+          {
+            "Paraphrase": "What company manufactures the ipod?"
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "what states does the connecticut river flow through?"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "In what states does the connected river flow?"
+          },
+          {
+            "Paraphrase": "What state is the link to the river?"
+          },
+          {
+            "Paraphrase": "What states is the connecticut river going through?"
+          },
+          {
+            "Paraphrase": "Where does the river flow? What is the way the Nile flows?"
+          },
+          {
+            "Paraphrase": "What are you running through the Connecticut River?"
+          },
+          {
+            "Paraphrase": "What states does the river connecticut flow through?"
+          },
+          {
+            "Paraphrase": "In what state does the river connecticut flow?"
+          },
+          {
+            "Paraphrase": "What states pass through the river Kinkito?"
+          },
+          {
+            "Paraphrase": "What conditions does the Connecticut River flow through?"
+          },
+          {
+            "Paraphrase": "What states the river connecticut flows?"
+          }
+        ]
+      },
+      {
+        "class": "MultiPivotParaphrasesGeneration",
+        "inputs": {
+          "Reference sentence": "in which tournaments did west indies cricket team win the championship?"
+        },
+        "outputs": [
+          {
+            "Paraphrase": "In which tournaments did Western Indians win the championship?"
+          },
+          {
+            "Paraphrase": "What tournaments did the West Indies cricket team win the championship?"
+          },
+          {
+            "Paraphrase": "Which team won the World Cup in West India?"
+          },
+          {
+            "Paraphrase": "in which tournaments has West India cricket team won the championship?"
+          },
+          {
+            "Paraphrase": "In which tournaments did the cricket team of the West Indies win the championship?"
+          },
+          {
+            "Paraphrase": "What game did the Cricket Team of the West Indies win?"
+          },
+          {
+            "Paraphrase": "In what tournaments did the cricket team of the West Indies win the championship?"
+          },
+          {
+            "Paraphrase": "What tournament did the West Indies cricket team win?"
+          }
+        ]
+      }
+    ]
+  }
+