News

Researchers Re-identify 99.98% People In ‘Anonymized’ Dataset

Various companies collect data from our devices almost all the time. While there is always a privacy concern in the picture, they try to assure that our data is in completely safe hands. Also, if it gets shared with third-parties, all the information that could be used to identify people is redacted and de-identified.

Turns out the techniques used to anonymize data aren’t that fool-proof, according to researchers at Imperial College London who have published a paper on reverse engineering incomplete datasets.

The researchers developed a machine learning model that can reverse-engineer an incomplete dataset. Using 15 demographic attributes such as age, gender, marital status, etc. they were able to re-identify almost 99.98% Americans in an anonymized dataset.

For that purpose, the researchers used 210 different datasets covering a “large range of uniqueness.” It includes information on around 11 million Americans.

However, the goal of the study isn’t to establish the fact that the so-called “anonymous” datasets can be deanonymized. It was already done in the past at DEFCON 2018, where hackers were able to legally get hold of the browsing history of 3 million Germans, and de-anonymize them.

Researchers have made an attempt to prove how easy it has become to fool the techniques used to make the datasets. It invites a call to action for governments and companies to implement even robust techniques that can keep people’s identities secure.

They have also set up a website where you can check how easy it is to identify you in an anonymous dataset.

To Top

Pin It on Pinterest

Share This