M1: Quantum representation of the text

The aim of this module, related to OB1, is to analyse how natural language could be modelled under the quantum theory, identifying the challenges and opportunities to employ this theory in text representation.
A sentence in a text is not just a “bag of words”, but rather, a kind of network in which words interact in a particular fashion and each language has its own peculiarities in terms of syntax, semantics and pragmatics. Moreover, language is compositional, in the sense that the meanings of new phrases and sentences can typically be derived from their parts, even when the phrase itself has not been seen before or describes something quite unrealistic. According to Coecke et al. (2021), language is “quantum native”, as for instance, the superposition state of quantum mechanics can be used to represent multiple meanings of a word. This perspective is mainly based on two reasons:

1) Quantum theory and NLP use vector spaces for describing states, which implies that natural language naturally fits on quantum hardware (Peral-García, Cruz-Benito, and García-Peñalvo, 2024).
2) The composition of words in a sentence corresponds to the composition of circuits representing words. This results in a circuit that prepares a state encoding the meaning of a sentence and could be directly transformed into a quantum circuit (Widdows, et al., 2024).

In 2016, Zeng and Coecke (2016) proposed the creation of a new paradigm for NLP in a quantum computing context. Further, formal mathematical representations, such as QWIRE (Paykin, Rand, and Zdancewic, 2017) or Distributional Compositional Categorical (DisCoCat) model of language meaning (Coecke and Kissinger, 2018; Coecke, 2021) were proposed. 

Therefore, in this module, these and other potential ways of formalising language via quantum theory will be deeply explored, analysing their capabilities and limitations for language understanding and generation tasks. Attempts to represent language using these formalisms for Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks have been already tested (Wu et al. 2021; Karamlou, Pfaffhauser and Wootton, 2022; ), and could constitute a starting point for our project, as these tasks are currently formulated to a very limited extent (the generation of very simple short sentences in NLG, as for instance, DisCoCat, is not capable of modelling the meaning of large pieces of text).

Task 1.1. Analysis of quantum theory and how to apply it to text representation.

As aforementioned, previous work has shown that the nature of text fits within quantum theory, so we will compile the state of the art about the existing methods to represent text together with their potentials and limitations.

Milestone: Comprehensive review of the state-of-the-art and identification of the potentials and limitations of using quantum theory to represent text.

Task 1.2. Defining quantum circuits for language understanding and generation tasks.

In this task, we will deeply explore how to build quantum circuits from text using DisCoCat or other mathematical formalisms that could model the semantics of a text (Galofaro, Toffano, and Doan, 2018). With the acquired knowledge, we will define the roadmap to follow to be further applied to NLP tasks and applications.

Milestone: Collection of quantum circuits for language understanding and generation tasks.