Skip to content

SAN-G-8055/qnlp_lorenz_etal_2021_resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code and resources for Lorenz et al. (2021) QNLP paper

This repository holds the code and the datasets for the paper:

  • Lorenz, Pearson, Meichanetzidis, Kartsaklis, Coecke (2021). QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer, arXiv:2102.12846 [cs.CL]

The paper presents results on the first NLP experiments conducted on Noisy Intermediate-Scale Quantum (NISQ) computers for datasets of size >= 100 sentences. Exploiting the formal similarity of the compositional model of meaning by Coecke et al. (2010) with quantum theory, we create representations for sentences that have a natural mapping to quantum circuits. We use these representations to implement and successfully train two NLP models that solve simple sentence classification tasks on quantum hardware. We describe in detail the main principles, the process and challenges of these experiments, in a way accessible to NLP researchers, thus paving the way for practical Quantum Natural Language Processing.

Code requirements

For running the code, you will need Python 3.7 or later. Further, the following packages must also be installed:

  • discopy (v0.3.5)
  • pytket (v0.10.1)
  • qiskit (v0.25.4)

Instructions

The two notebooks mc_task.ipynb and rp_task.ipynb, in the folder code show, in an exemplary way, the implementation of the pipeline for the two binary classification tasks, MC and RP, respectively. See Sec. 5 of the paper for the general pipeline, Sec. 6 for a description of the tasks and Sec. 7 for an explanation of this implementation in the notebooks, specifically Sec. 7.4 for the implementation of the quantum runs in these two notebooks. The analogous implementations of the classcial simulations for the MC and RP task as described in Sec. 7.3 of the paper can be found in the two notebooks mc_task_simulation.ipynb and rp_task_simulation.ipynb, respectively.

The folder datasets contains the raw data (train, dev and test subsets, respectively) as txt files, which are the input read by the notebooks.

In all 4 mentioned notebooks any settings one may want to alter as part of the implementation, such as the choice of 'ansatz', are all done in the first cell as explained therein. For the two notebooks mc_task.ipynb and rp_task.ipynb concerning the quantum runs the first cell in addition allows to choose the hardware on which to run the computation. Note that by default backend = AerBackend() envokes a simulator provided through pytket, which does not require any IBMQ account. Provided access to an IBMQ account, the user may use the corresponding commented-out lines in the first cell to set the backend to a device-specific one instead. This may be through IBMQEmulatorBackend(<backend_name>, <credentials>), which provides a simulator that has a device-specific noise model, or through IBMQBackend(<backend_name>, <credentials>), in order to compute on actual IBM quantum hardware of one's choice.

Finally, running the notebook mc_task_simulation.ipynb or rp_task_simulation.ipynb requires making discopy compatible with jax by setting IMPORT_JAX = True in discopy's config.py.

Remarks

  • It is worth emphasising that the fluctuations in the RP task are considerable and that a user is recommended to do several runs to get an impression for the variances.

Links

For further help see:

About

Code and resources for the Lorenz et al. (2021) QNLP paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%