A Project coordinated by IIIA.
Principal investigator:
Collaborating organisations:
AQUAS
Many studies in the health field are observational, that is, empirical and non-experimental (no intervention by the researchers in the process of generating data), and the corpus of observational data grows high speed. Traditionally, algorithms used in this type of study have been algorithms statistics that work by searching for correlations in the data. Recently, algorithms based on machine learning, which also work by looking for correlations in data has gained popularity. This has meant an increase in predictive capacity, and a shift in approach from observation to prediction. These approaches, however, do not take into account explicitly a fundamental property of the process of data generation: causal relationships. These relationships can be of great interest to researchers, since, in fact, many studies try to answer questions primarily causal: “Has the implementation of the protocol of interest a change in the variable interest?" “How will a specific individual react to the application of the protocol, or how you would have reacted an individual to whom it has been applied, assuming that hadn't been done? " “Do genes or eating habits this or that disease? " The approaches that obviate causal relationships constitute an epistemological limitation, and try to answer causal questions using the correlation as an approximation of causality is, to this day, a limiting strategy. The objectives of this thesis are twofold: on the one hand, compare and benchmark algorithms of causal analysis based on do-calculus and machine learning, focusing on efficiency and versatility. On the other hand, develop a general-purpose algorithm (for healthcare) that uses a combination of the two types of algorithms mentioned above, under a series of assumptions and terms. This task will be carried out using open source programming languages, and libraries specific such as the do-why library of Python. The output will be tested and validated in several datasets managed by AQuAS, and in the cohort GCAT, and will try to answer causal questions relevant to the health field.