HAIKU (Hybrid Artificial Intelligence on Knowledge and Linguistics)

Introduction

Knowledge management plays a key role in Artificial Intelligence (AI). Indeed, Knowledge Representation and Reasoning (KRR) is still one of the key subfields of AI, and numerous different strategies to manage knowledge have been proposed and successfully evaluated in practical applications. More modern approaches to represent knowledge include ontologies, based on Description Logics (DLs), Open Linked Data, or, more recently, knowledge graphs (KG). Those techniques were mainly developed by the Semantic Web (SW) community (hence they are usually called semantic technologies), but can also be applied to any non-Web-based application. Complex AI systems usually need to incorporate a KRR module in conjunction with other AI techniques such as Machine Learning (ML), Natural Language Processing (NLP), etc. to actually achieve their goals. They are examples of Hybrid (or neuro-symbolic) Artificial Intelligence, where symbolic Artificial Intelligence (represented by semantic technologies) is combined with numerical Artificial Intelligence (represented by machine learning).

The main objective of the project is to propose novel techniques for intelligent applications combining semantic technologies (ontologies, KGs, etc.) and NLP techniques (Transformer-based language models), improving state-of-the-art solutions to address different problems (such as information heterogeneity, contextual-dependent information, unreliable results, uncertainty, need of interpretability, vocabulary adaptability and multilinguality). The achievement of the previous target will involve working on two fundamental lines. In the first line, the main effort will be oriented to the development of mechanisms to improve querying and information access. In the second line, the effort will be oriented to the development of mechanisms to improve NLP.

In particular, we plan to address different research problems:

Improvement of current knowledge-based systems by solving specific problems in the field of semantic technologies, including building/learning new knowledge bases, exploiting them to discover knowledge, computing the semantic loss when imprecise answers are provided, developing novel methods for imprecise knowledge representation or flexible query answering, etc.
Improving the integration between Natural Language Processing (NLP) and knowledge-based systems. This includes combining semantic technologies and different language models (both non-contextualized, e.g., word2vec, and contextualized, e.g., transformers based ones), using semantic technologies to solve NLP problems (such as using KGs to model multilingual data) and using NLP techniques to solve problems in the field of semantic technologies (such as using language models to find relationships between pairs of ontologies).

Within the framework of the State Plan for Scientific and Technical Research and Innovation 2024-2027, this project aims at resolving problems addressed in the context of the strategic area, Digital transition and Artificial Intelligence.

Team members


Eduardo Mena (Main researcher 1)	Fernando Bobillo (Main researcher 2)	Carlos Bobed	Ignacio Huitzil	Jorge Bernad	Jorge Gracia	Ángel Corona

Collaborators

Maribel Acosta
Emma Anglés
Ángel Luis Garrido
Maxim Ionov
Miguel López-Otal
Álvaro Peiró
Lucía Pitarch
Carlota Quintana
Umberto Straccia