TítuloDistance-based and probabilistic record linkage for re-identification of records with categorical variables
Publication TypeConference Paper
Year of Publication2002
AuthorsDomingo-Ferrer J, Torra V
Conference NameButlletí de l´ACIA
EditorialAssociació Catalana d´Intel.ligència Artificial

Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results. This is parallel to comparisons in the literature for numerical data, which also showed a similar behaviour between both record-linkage approaches. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated.