That is the central message outlined in a recent joint paper exploring hash functions issued by the European Data Protection Supervisor (EDPS) and Spain's data protection authority, Agencia española de protección de datos (AEPD).
'Hashing' effectively serves to obscure data from being read by replacing original information with a hashed value instead. It involves generating a value or values from a string of text using a mathematical function. The value has a fixed size independent of the size of the initial string of text. Hashing is a technique commonly used to provide protection to passwords which might otherwise be more easily deciphered if intercepted by hackers.
As the new paper highlighted, hashing has been used as a tool to protect personal data for some time, but there have been doubts about the circumstances in which hashing can truly be said to have pseudonymised or even fully anonymised the underlying data. The EDPS and AEPD have tried to clarify those circumstances in their paper.
The issue matters because data protection law does not apply to truly anonymised data. EU data protection authorities have previously confirmed that, for personal data to be rendered anonymous, it "must be processed in such a way that it can no longer be used to identify a natural person by using 'all the means likely reasonably to be used' by either the controller or a third party".
The General Data Protection Regulation also now acknowledges pseudonymisation as an option for helping businesses meet their obligations on data security.
With that legal context in mind, the EDPS and AEPD paper is of particular relevance to businesses using new technologies such as geolocalisation and blockchain that rely on hashes.
According to the authorities, hashing functions aspire to be irreversible, but the order that is implicit in the hashing process and that makes it a unique identifier presents a higher risk that the original message could be re-identified through an analysis of the hash.
The more personal information that is linked to the hash, the higher the risk of identifying the contents of the hash. The existence of identifiers or pseudo identifiers in the original message also increases the risk of re-identification.
To prevent the re-identification of hashed information the EDPS and AEPD recommend, among other things, that the initial message or the hash value are encrypted with a confidential key, or that a constant value or random value, known as a 'salt', is added to all original messages before the hash is created.
According to the EDPS and AEPD, a range of factors play into whether hashing can adequately protect personal data. These need to be considered by businesses when looking to apply the measure. The factors include:
The EDPS and AEPD said that those willing to implement hashing techniques should first conduct a risk assessment to evaluate the potential re-identification risk. This assessment, they said, should consider the hashing process itself and the rest of the elements involved in the hashing system. The type of information linked or linkable to the hash value should also be specifically considered, they said.
To assess if a hashing technique would render the personal data anonymous the risk assessment must consider the organisational measures that guarantee the deletion of the information that allows for re-identification. There should also be a reasonable guarantee that the system will be strong beyond the expected lifecycle of the personal data.
Beyond the risk analysis, the authorities said there are basic elements to consider to boost the effectiveness of hash functions for protecting information. They include: