LibGuides: Data management for students: Anonymisation, pseudonimisation &amp; encryption

1. Anonymisation

In general, there are two processes to reduce the sensitivity of personal research data and to allow data to be shared whilst protection participant's personal information: anonymisation and pseudonymisation. Anonymisation is the elimination or modification of personal data so that the data subject is no longer identifiable. In that case, the GDPR does not longer apply. However, it is difficult to anonymise research data completely; if data could be attributed to an individual using additional information, for example, by linking the data file to the key or communication file, the data can be indirectly traced to a person not anonymously. Anonymisation should be considered in the whole project's context and how it can be utilised alongside, informed consent and access controls. For example, if a participant consents to their data being shared, then the use of anonymisation may not be required.

For more information on anonymisation, see the CESSDA Data Management Expert Guide.

2. Pseudonymisation

Pseudonymisation and anonymisation are two distinct terms which fall under different categories in the GDPR. Whereas anonymisation irreversibly destroys any way of identifying the data subject, in theory, pseudonymisation allows to re-identify the data subject with additional information. As a result, pseudonymised data is still considered personal data. The purpose of pseudonymisation is to conceal a person's identity from third parties. Pseudonymisation separates identifying data from non-identifying data and replaces the identifying data with artificial identifiers.

An example of pseudonymisation is replacing a respondent's data in a medical examination by a unique respondent number. The medical data are then linked to this respondent number instead of a name, address, and residence place. As a result, outsiders cannot see to whom the medical data belong. Only the person who can link the respondent number (e.g. researcher) can link the medical data.

For more information on the difference between anonymisation and pseudonimisation, see https://www.tilburguniversity.edu/about/conduct-and-integrity/privacy-and-security/research-data/personal-data.

Direct and indirect identifiers

Personal data can be disclosed through two categories of identifiers.

Direct identifiers are ones like the participant’s name, address, or telephone numbers that specifically identify them;
Indirect identifiers are ones that when they are placed with other information could also reveal an individual, for example, by cross-referencing occupation, salary, age, and location.

Sending data safely

When you want to send sensitive/confidential data to others, it is not safe to do so via email. A secure way to share sensitive and/or large files (up to 100 GB) is via SURF Filesender.

3. Encryption

Another option to protect confidential and secret data is by using encryption. Encrypting data makes the files unreadable unless you have the encryption key or password. There are several programs available for encrypting data files. Tilburg University recommends using 7Zip. Zip software aims to compress files so that they take up less space, but it can also be used to encrypt files. 7Zip is available free of charge via the TiU Software Center.

Manual for using 7Zip

Data management for students: Anonymisation, pseudonimisation & encryption

Protecting confidential and secret data

1. Anonymisation

2. Pseudonymisation

Direct and indirect identifiers

Sending data safely

3. Encryption