Guidance for businesses on hashing data

Out-Law Analysis | 25 Nov 2019 | 4:18 pm | 3 min. read

Businesses need to assess how easily people might be re-identified after their data is 'hashed' to determine what further steps they need to take under data protection law to preserve the security of that data and honour individuals' data protection rights.

That is the central message outlined in a recent joint paper exploring hash functions issued by the European Data Protection Supervisor (EDPS) and Spain's data protection authority, Agencia española de protección de datos (AEPD).

'Hashing' effectively serves to obscure data from being read by replacing original information with a hashed value instead. It involves generating a value or values from a string of text using a mathematical function. The value has a fixed size independent of the size of the initial string of text. Hashing is a technique commonly used to provide protection to passwords which might otherwise be more easily deciphered if intercepted by hackers.

As the new paper highlighted, hashing has been used as a tool to protect personal data for some time, but there have been doubts about the circumstances in which hashing can truly be said to have pseudonymised or even fully anonymised the underlying data. The EDPS and AEPD have tried to clarify those circumstances in their paper.

The issue matters because data protection law does not apply to truly anonymised data. EU data protection authorities have previously confirmed that, for personal data to be rendered anonymous, it "must be processed in such a way that it can no longer be used to identify a natural person by using 'all the means likely reasonably to be used' by either the controller or a third party".

The General Data Protection Regulation also now acknowledges pseudonymisation as an option for helping businesses meet their obligations on data security.

With that legal context in mind, the EDPS and AEPD paper is of particular relevance to businesses using new technologies such as geolocalisation and blockchain that rely on hashes.

According to the authorities, hashing functions aspire to be irreversible, but the order that is implicit in the hashing process and that makes it a unique identifier presents a higher risk that the original message could be re-identified through an analysis of the hash.

The more personal information that is linked to the hash, the higher the risk of identifying the contents of the hash. The existence of identifiers or pseudo identifiers in the original message also increases the risk of re-identification.

To prevent the re-identification of hashed information the EDPS and AEPD recommend, among other things, that the initial message or the hash value are encrypted with a confidential key, or that a constant value or random value, known as a 'salt', is added to all original messages before the hash is created.

According to the EDPS and AEPD, a range of factors play into whether hashing can adequately protect personal data. These need to be considered by businesses when looking to apply the measure. The factors include:

  • the computation of the hash: the hashing technique, the algorithm and system used.
  • the message spaces: the entropy – the degree of order or disorder in a dataset – if random elements are added, the redundancy and repetitive structure of the message.
  • the link between the hash and other information in the processing environment: whether identifiers or pseudo identifiers are directly linked to the hashed information, or if information is indirectly linked to the hashed information.
  • the passwords and other random elements that have been introduced.
  • the ongoing management and audit of passwords, including physical security and human factors.

The EDPS and AEPD said that those willing to implement hashing techniques should first conduct a risk assessment to evaluate the potential re-identification risk. This assessment, they said, should consider the hashing process itself and the rest of the elements involved in the hashing system. The type of information linked or linkable to the hash value should also be specifically considered, they said.

To assess if a hashing technique would render the personal data anonymous the risk assessment must consider the organisational measures that guarantee the deletion of the information that allows for re-identification. There should also be a reasonable guarantee that the system will be strong beyond the expected lifecycle of the personal data.

Beyond the risk analysis, the authorities said there are basic elements to consider to boost the effectiveness of hash functions for protecting information. They include:

  • a high level of information entropy when establishing the hash.
  • the use of single-use salt/random values.
  • when appropriate, the size of a salt may exceed the size of the hash block, provided that the former is not a multiple of the latter.
  • the use of appropriate random information generators for the implementation of cryptographic techniques.
  • safe access to the hash playing process.
  • zero links with identifiers, pseudo identifiers and other information, especially in the same record and across records, tables or parallels chains.
  • the regular performance of audits on the of the hash system management procedures