Application of Record Linkage Techniques for Road Infrastructure Data Integration: a Study using r and Real Data
Résumé
The integration of heterogeneous databases in the field of road infrastructure represents a significant challenge due to inconsistencies in nomenclature, typographical errors, and the lack of unique identifiers. This paper presents a practical application of Record Linkage techniques to link records from various sources related
to urban roads, using the R programming language. Deterministic, probabilistic
methods (based on Fellegi and Sunter’s theory), and fuzzy matching techniques were
applied to real data from municipalities in Galicia (Spain), employing tools such as
RecordLinkage, fuzzyjoin, and stringdist. The process included text normalization,
blocking by municipality, and comparison based on similarity distances (Levenshtein, Jaro-Winkler), achieving a robust unification of records. The results show a
substantial improvement in data quality, integrity, and usability, achieving precision
levels above 90% and recall greater than 85% in the best scenarios. This approach
has direct applications in infrastructure planning, road maintenance management,
territorial analysis, and smart city development. The conference will include visual
demonstrations and practical cases, displaying how the application of Record Linkage techniques can contribute to building integrated urban databases, essential in
the context of Society 5.0. Keywords: Record Linkage, road infrastructure, fuzzy
matching, data quality, data integration.
Téléchargements
Téléchargements
Publiée
Numéro
Rubrique
Licence
(c) Copyright María José Ginzo Villamayor 2025

Ce travail est disponible sous licence Creative Commons Attribution - Partage dans les Mêmes Conditions 4.0 International.