Graph databases are becoming widely successful as data models that allow to effectively represent and process complex relationships among various types of data. Data-graphs are particular types of graph databases whose representation allows both data values in the paths and in the nodes to be treated as first class citizens by the query language. As with any other type of data repository, data-graphs may suffer from errors and discrepancies with respect to the real-world data they intend to represent. In this talk, we explore the notion of probabilistic unclean data-graphs in order to capture the idea that the observed (unclean) data-graph is actually the noisy version of a clean one that correctly models the world, but that we know of partially. As the factors that yield to such observation heavily depend on the application domain and may be the result of different types of clerical errors or unintended transformations of the data, we consider an epistemic probabilistic model that describes the distribution over all possible ways in which the clean (uncertain) data-graph could have been polluted. Based on this model, we study data cleaning and probabilistic query answering for this framework and present complexity results when the transformation of the data-graph is caused by either removing (subset), adding (superset), or modifying (update) nodes and edges.
Dr. Nina Pardal is currently a Research Associate at the Department of Computer Science of the University of Sheffield, in the UK. She obtained a PhD in Mathematics from the University of Buenos Aires, Argentina, and a PhD in Computer Science from the University Paris-Nord, France. Her research interests lie in the areas of Graph Theory, Logic and Computability, Complexity, and Knowledge and Reasoning.