|Títol||Distance-based and probabilistic record linkage for re-identification of records with categorical variables|
|Publication Type||Conference Paper|
|Year of Publication||2002|
|Authors||Domingo-Ferrer J, Torra V|
|Conference Name||Butlletí de l´ACIA|
|Editor||Associació Catalana d´Intel.ligència Artificial|
Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results. This is parallel to comparisons in the literature for numerical data, which also showed a similar behaviour between both record-linkage approaches. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated.
- Quant a IIIA