Methodology
The scientific method proposed will be combined with an agile project management approach, such as Scrum (Srivastava, Bhardwaj, and Saraswat, 2017), which although originally oriented to software development, is also appropriate for research (Ota, 2010). More specifically, for efficient project delivery, where the objectives and tasks have to be fulfilled in the scheduled time.
This approach begins with an initial planning phase and requirements analysis, where the objectives of each module are identified, the specific activities needed to achieve them are defined, and a master plan with clear milestones, resources, and schedules is established. The work is organised into short cycles or “sprints” that allow for the incremental implementation and evaluation of modules, ensuring tangible results at the end of each iteration and facilitating continuous adjustments based on progress. Throughout the process, a collaborative management framework is maintained to ensure coordination, adherence to deadlines, and the quality of outcomes. Project management tools to monitor the modules and their tasks (e.g., Asana or Trello) or more general integrated frameworks such as Microsoft Teams (subscribed by the University of Alicante and therefore free of charge for its staff) could be good options to be used within the project, without discarding the analysis of other existing ones to decide which ones would best fit the nature and aims of the project.
M1. Quantum representation of the text
The aim of this module, related to OB1, is to analyse how natural language could be modelled under the quantum theory, identifying the challenges and opportunities to employ this theory in text representation.
The tasks associated with this module are:
Task 1.1. Analysis of quantum theory and how to apply it to text representation. As aforementioned, previous work has shown that the nature of text fits within quantum theory, so we will compile the state of the art about the existing methods to represent text together with their potentials and limitations.
Milestone: Comprehensive review of the state-of-the-art and identification of the potentials and limitations of using quantum theory to represent text.
Task 1.2. Defining quantum circuits for language understanding and generation tasks. In this task, we will deeply explore how to build quantum circuits from text using DisCoCat or other mathematical formalisms that could model the semantics of a text (Galofaro, Toffano, and Doan, 2018). With the acquired knowledge, we will define the roadmap to follow to be further applied to NLP tasks and applications.
Milestone: Collection of quantum circuits for language understanding and generation tasks.
M2. Quantum and Quantum-inspired Algorithms for NLP
The goal of this module, related to the specific objectives OB2 and OB3, is twofold. On the one hand, we aim to determine how quantum theory could be integrated into classical algorithms, for instance, for optimisation purposes, resulting in hybrid classical-quantum algorithms. On the other hand, we would like to explore novel quantum or quantum-inspired algorithms or improve existing ones. In both cases, the developed algorithms would be later applied to solve NLP tasks and applications and the used case defined in modules 3 and 4, respectively (M3 and M4).
The tasks foreseen in this module are:
Task 2.1. Analysis of the state of the art on available quantum algorithms and frameworks for NLP. A comprehensive review of the literature on quantum algorithms will be performed. The different approaches will show how quantum theory has been applied and allow us to select those more feasible to be based on in our project and make improvements over them, towards the goal of evaluating and comparing pure quantum, quantum-inspired or hybrid models, together with their level of interpretability and transparency.
Milestone: State of the art on available quantum algorithms and frameworks for NLP.
Task 2.2. Exploration of the use of quantum algorithms for optimisation. In this task, we will study, test and evaluate various means of quantum optimisation approaches (e.g., quantum-assisted approximate optimisation (Ruan et al., 2023), quantum annealing (Hegde et al.,, 2022), or adiabatic quantum computing (Zaech et al., 2022)) that might aid the long-term efficiency of NLP tasks and applications. To this end, common implementations of well-known algorithms will be collected and compared. Similar NLP problems already applied to quantum computing will also be analysed and effective approaches will be adapted to the tasks, applications and scenarios proposed in the M3 and M4 modules. Relevant benchmark problems will be defined at various classes of size and difficulty. All approaches will be compared to classical state-of-the-art and classical problem-specific adaptations.
Milestone: A comprehensive report on quantum computing methodologies and frameworks for optimisation and its potential impact.
M3. Integrating quantum theory in NLP tasks and applications
Once we have determined both the best way to represent the information contained in a text using quantum theory (M1) and the way to integrate quantum algorithms for NLP (M2), this module, related to OB4, will analyse the possible scenarios for the application of these findings. To do that, it is essential to first analyse how to model several linguistic phenomena, with special attention to areas where quantum advancements could provide significant improvements, achieving more accurate and precise results compared to current approaches.
Three main tasks will be addressed, one of them related to text understanding and the other two to text production:
Task 3.1. Exploration of quantum NLP approaches to resolve linguistic phenomena. In order to analyse how quantum theory can help to improve NLP tasks related to text understanding and generation, it is necessary to analyse those aspects related to these tasks where current approaches are not able to provide robust results. Therefore, phenomena such as anaphora resolution, word sense disambiguation or metaphor resolution will be investigated.
Milestone: Proposal with a novel approach that treats complex linguistic phenomena using quantum techniques.
Task 3.2. Exploration of quantum NLP approaches to perform text summaries. This task aims to formalise the summarisation problem using principles from quantum theory, offering a novel perspective on how summarisation can be viewed and optimised as an information selection and generation problem. From the knowledge acquired in modules 1 and 2, and task 3.1, quantum circuits will be used to represent and model the summarisation process, potentially leading to breakthroughs in efficiency and accuracy. Then, quantum-inspired optimisation processes would be used to conduct summarisation modeling for fitness evaluation, taking as a basis the ideas already investigated using classical optimisation algorithms (Zamuda and Lloret, 2020; Zamuda, Dugonik and Lloret, 2024) as well as the advances and novel ideas for using quantum extracted from the literature (Niroula et al., 2022; Ulker and Ozer, 2024).
Milestone: Analysis and proposal of a novel quantum-inspired summarisation approach.
Task 3.3. Exploration of quantum NLP approaches to perform automatic text simplification. Following the same guidelines as for the previous task (task 3.2), this task aims to analyse how to optimise the automatic text simplification process using the principle of quantum theory. Although several studies have been carried out so far on the automatic simplification of texts, not all the aspects to be simplified have been solved to the same extent. While linguistic obstacles such as numbers, superlatives, acronyms, enumerations or simple appositions have sufficiently robust solutions (see as an example SIMPLE.TEXT tool in https://simpletext.demos.gplsi.es/), other types of obstacles such as some cases of difficult words or complex sentences still require some effort on the part of the scientific community so that the results obtained can really be of use to society (Saggion, 2024; Martínez, et al., 2024). Therefore, in this task, novel perspectives based on quantum algorithms will be studied to obtain substantial improvements in the automatic text simplification processes. This will be based on studies carried out so far on the use of such algorithms in other complex NLP tasks such as machine translation (Varmantchaonala, et al. 2024; Abbaszade, et al. 2023) or summary generation (Piwowarski, Amini and Llamas, 2012).
Milestone: Novel approaches for the analysis and generation of summaries and simplified texts based on quantum theory.
M4. Use Case: Retrieval Augmented Generation system for the innovation domain
This module, related to OB5, aims to design and develop a specialised Retrieval-Augmented Generation system (RAG) for retrieving and generating information on innovation. This system will integrate advancements in QNLP tasks developed in M1, M2, and M3. This use case will serve as an experimental platform and testing ground to intrinsically and extrinsically evaluate the impact and effectiveness of the developments in NLP, ensuring a practical and results-oriented approach.
Task 4.1. Definition, development, and evaluation of a RAG system for the innovation domain. The objective of this task is to define, develop and evaluate a RAG system for the innovation domain that uses and integrates the results obtained in M1, M2 and M3. In addition, in the last RAG stage concerning language generation, the impact on LLM in the generation of summaries and simplified text will be evaluated and a comparison of classical and quantum-based approaches will be conducted.
Milestone: Proposal and development of a RAG system for the innovation domain.
M5. Dissemination
The PIs and team members will ensure that enough project time is dedicated to the dissemination of project results. A project-dedicated website will be created that will eventually collect all data sets, corpora, tools and publications generated. Publications in the most relevant and prestigious NLP conferences and high-impact journals will be targeted as the main instrument for communicating relevant results about the project. Yearly thematic internal seminars will be organised within the GPLSI research group to create a strong link between the research conducted and the research topics addressed by the group.
M6. Project management
The PIs will be responsible for the management and ongoing monitoring and assessment of work progress and its mid-term and final stage evaluation, ensuring quality and the meeting of deadline and budget requirements. Project team meetings will be held as needed, but at a minimum of once every two weeks.
