Name Matching Experiment
(Part 6)

Eurospider has carried out a simple experiment with the popular Levenshtein distance string metric. Around 600 names taken from the media were used to search for hits in a test database of more than 1000 entries. For each of the 600 names, the test database contained the full and correct name, which differed from the name used in the media. The names that were found for each of the 600 names were ranked by ascending Levenshtein distance. Finally, yield and precision were determined in the event of the top n ranks being sifted. What can we learn from this?

chart both

We can see that the more ranks are sifted, the more correct hits (true positives) are found. As expected, the precision declines. This means that the more ranks are sifted, the more false hits (false positives) are found. The sharp drop in the precision curve means that the verification effort increases significantly.

Complete Revision of the Federal Data Protection Act

The complete revision's draft of the Federal Data Protection Act is currently in political consultation. Data Protection is to be increased by giving people more control over their private data as well as reinforcing transparancy regarding the handling of confidential data.

Links: draft, report

Eurospider Information Technology AG
Winterthurerstrasse 92
8006 Zürich


Cookies make it easier for us to provide you with our services. With the usage of our services you permit us to use cookies.
More information Ok Decline