Automated Paraphrasing Tool

We propose an extensible and reusable pipeline tool that unifies, integrates and extends various paraphrasing techniques(e.g. Weak-Supervision, Pivot-Translation) to automatically generating paraphrases in English that are semantically relevant and diverse. In doing so, the pipeline describes as two-step process, including:

candidate over-generation, leveraging techniques that can be combined to generate a large number of diverse but (potentially) noisy candidate paraphrases
candidate selection, with techniques that can be incorporated to discard semantically irrelevant paraphrases and duplicates, thus filtering out low quality paraphrases.

The pipeline can be run through a command line see section 6 or by using the pipeline web interface see section5.

Features

Paraphrases generation through Pivot-Translation using Online Machine Translator(e.g.DeepL API and MyMemory API) or Pretrained Neural Translation Model(e.g.Huggingface MarianMT and EasyNMT)
Paraphrases generation through Weak Supervision Approach[1] by replacing selected token by relevant NLTK-WordNet synomym
Paraphrases generation using a pretrained Huggignface T5 Transformer.
Filter out bad paraphrases through Hugging Face's transformers BERT model and Universal Sentence Encoding semantic similarity
Remove deduplicate through Hugging Face's transformers BERT model

Virtual Environment Installation

In order to generate paraphrases, follow these steps:

Create and activate a virtual environment using Python 3.6.9 version:

Linux

Create the virtual environment:
```
virtualenv -p python3.6.9 my_venv
```
Activate the virtual environment:
```
source ./my_venv/bin/activate
```
Windows
- Download the desired Python version(do NOT add to PATH!), and remember the path\to\new_python.exe of the newly installed version
- Create the virtual environment open Command Prompt and enter :
```
virtualenv \path\to\my_env -p path\to\new_python.exe
```
Unlike most Unix systems and services, Windows does not include a system supported installation of Python. #Windows Python installation and Creation of virtual environments
- Activate the virtual environment:.\my_venv\Scripts\activate.bat
- Deactivate with deactivate

Download Python 3.6.9 - from linuxize.com

If your working environment does not include the Python=3.6.9 version, apply the following instructions for installation:

First, update the packages list and install the packages necessary to build Python source:

$ sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev wget libbz2-dev

Download the source code from the Python download page using the following command:
```
$ wget https://www.python.org/ftp/python/3.6.9/Python-3.6.9.tgz
```
Once the download is complete, extract the gzipped tarball:
```
$ tar -xf Python-3.6.9.tgz
```
Next, navigate to the Python source directory and run the configure script which will perform a number of checks to make sure all of the dependencies on your system are present:
```
$ cd Python-3.6.9
$ ./configure --enable-optimizations
```
The --enable-optimizations option will optimize the Python binary by running multiple tests. This makes the build process slower.
Start the Python build process using make:
```
$ make -j 8
```
For faster build time, modify the -j flag according to your processor. If you do not know the number of cores in your processor, you can find it by typing nproc. The system used in this guide has 8 cores, so we are using the -j8 flag
When the build is done, install the Python binaries by running the following command:
```
$ sudo make altinstall
```
Do not use the standard make install as it will overwrite the default system python3 binary.
That’s it. Python 3.6.9 has been installed and ready to be used. Verify it by typing:
```
$ python3.6.9 --version
```

Dependencies modules Installation

Install the required packages inside the environment:
```
pip install -r requirements.txt
```
Download Spacy models, for more models see Spacy Models & Languages.
```
python -m spacy download en_core_web_lg
```
Make use of Google's Universal Sentence Encoder directly within Spacy - Universal Sentence Encoder.
```
pip install spacy-universal-sentence-encoder
```
Note: By default we use en_use_lg model, if you want to use another model, modify load_model in ./synonym/nltk_wordnet.py line 76

Usage

Run the pipeline using the web interface
- run app.py script: python app.py
- open any browser and enter the following URL: http://localhost:5000/
Run the pipeline using the command line Open config.ini configuration file and update the values.

Note: Please make sure you fulfilled the required configs in config.ini file - especially DEEPL and MYMEMORY.

a. Put the sentence you want to paraphrase in a file, sentences should be separated by a line break. Save the file in the dataset folder(we suggest to save the file with txt extension).

b. Generate paraphrases by runing the following command: $ python main.py -f dataset.txt -l 1 -p false

Command line pipeline Parameter

Parameter	Description
-f	initial data file path, utterances should be separated by a breakline and file extension .txt
-p	generate paraphrases using Online Translator Model(Google,DeepL,Yandex) or pretrained Neural Machine Translator(MariamMT,OpenNMT). -p true translate with pretrained Neural Machine Translator asign. -p false translate with Online Neural Translator Model.
-l	indicate sequential pivot language translation level. To have 1-pivot language `-l 1`. e.g. English-Russian-English, English-Italian-Emglish. To have 2-pivot language `-l 2`. e.g. English-Russian-French-English, English-Japanese-Spanish-Emglish. By default `-l 0` apply both 1-pivot and 2-pivot language translation.
-c	cut-off integer that indicate how many parpahrases to select, e.g. `-c 3` will only select top highest 3 semantically related parpahrases and drop the rest.

This will save the generated paraphrases in the result folder. The result file display the paraphrases in Python dictionary(Key -- initial sentences and Value -- list of paraphrases) following the given order:

Paraphrases generated by Translation API,

Paraphrases after filtration with Universal Sentence Encoding

Paraphrases after filtration with BERT

Paraphrases after duplication with BERT

References

[1] Weir, Nathaniel and Crotty, Andrew and Galakatos, Alex and others. "DBPal: Weak Supervision for Learning a Natural Language Interface to Databases." arXiv preprint arXiv:1909.06182 (2019).

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
Transformers		Transformers
dataset		dataset
evaluation		evaluation
filtering		filtering
pos		pos
static		static
synonym		synonym
templates		templates
translator		translator
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.ini		config.ini
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Paraphrasing Tool

Features

Virtual Environment Installation

Download Python 3.6.9 - from linuxize.com

Dependencies modules Installation

Usage

Command line pipeline Parameter

References

About

Releases

Packages

Languages

AudayBerro/automatedParaphrase

Folders and files

Latest commit

History

Repository files navigation

Automated Paraphrasing Tool

Features

Virtual Environment Installation

Download Python 3.6.9 - from linuxize.com

Dependencies modules Installation

Usage

Command line pipeline Parameter

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages