What are the real data protection issues?
The fact that BC, both public and private, is inherently transparent and immutable may clash with data minimization principles and may make it impossible to respond to rights of individuals to have their data corrected or deleted. BC is further, by definition, unable to forget; as a result, the right to be forgotten will be impossible to enforce. The transparency and immutability issues can, to a large extent, be addressed by implementing innovative privacy‑by‑design measures (see below for examples). Noteworthy is that these innovations are not necessarily triggered by privacy considerations, but mostly out of efficiency considerations.
In its most basic form, BC can be used to store plain text information on the ledger, which information can be accessed by those who have read rights. Storing all information on BC takes up a large amount of space on BC and takes a lot of energy both to run and cool the machines. Block space can be saved by separating (segregating) the signature (‘witness’) information from the transaction data (the ‘payload’), so the network can increase the transactions processed. These measures may also, to a certain extent, mitigate transparency issues.
The immutability of BC further does not sit well with, for example, smart contracts in more complex transactions (as contracts often have to be amended for unforeseen circumstances); with technological malfunction, including in case of interference by hackers; and, more generally, with human error (known to lose their BC private key). Solving these issues will require solving the immutability of BC, which may also solve the issue of being able to respond to requests of individuals for deletion and the right to be forgotten.
Immutability is not always an issue
As a side note, we mention that the immutability of BC is not always an issue. For certain applications (in particular, in case of public registries), immutability is actually a requirement. Illustrative here is the judgment of the European Court of Justice (ECJ) in the Manni case. The plaintiff (Mr. Manni) requested deletion of his personal information from the Italian public company register, where information on his prior bankruptcy was recorded. He argued that this record in the company register was widely reused by data brokers, and as a result, his reputation was prejudiced, having a detrimental effect on his new business. The ECJ balanced the public interest in the legal certainty in trade and transparency of business information in the company register with the fundamental right to data protection and concluded that, in this case, the interference with the right to data protection was not disproportionate, taking into account the limited amount of personal information held in the company register.
In line with the above ruling, registering limited personal data in a BC for public registers like land ownership, trademark ownership, and company registers may, therefore, be justified. The above case entails that a balancing of interests should be made for each BC application. For other use cases, the balancing test may conclude that BC will not be suitable, as the impact on data protection will be disproportionate. An example of the latter would be if BC would be applied to provide air passengers with expedited access through the airport, while also recording all money spent in shops and restaurants at airports, subsequent transport, and accommodations on the BC for purposes of a loyalty program. Using BC for the commercial loyalty program would likely be disproportionate.
Privacy-by-Design Options
Limit ledger storage. The original Bitcoin BC stores the full ledger on every node, making it impossible to make changes to prior blocks and thus providing an indisputable ledger for all prior transactions. However, this also means that the personal data included on the ledger is shared with a large number of nodes (Bitcoin has approximately 9,500 nodes). Storing so many instances of personal data is at odds with the data minimization principle of GDPR, which requires access to personal data to be limited to the fewest possible recipients.
A privacy-by-design solution is to no longer store the entire ledger on all nodes. In most Bitcoin instances, the validity of a new block is verified by a consensus mechanism. This means that the creator of the block provides a unique hash of the information. The nodes make the same mathematical equations and, if the outcome of this hash is the same, the block is verified. This requires the nodes to have access to the information included on the block. However, the nodes would still be able to fulfill their verification function if they deleted the information after verification. This would increase the confidentiality of the personal data included on the block and, at the same time, has economic advantages. If each node has to store a full copy of the ledger, a large amount of storage capacity is required, which, in turn, requires a large investment in data storage and uses a lot of energy. Therefore, storing the ledger in one (or a few) instances, rather than on every node, has both privacy and economic advantages.
Pruning. Most BC applications store all transactions since the start of the chain, dating back to the ‘genesis block’, which means that all transactions on that BC are stored infinitely (and, as set out above, are sometimes stored on all nodes). Storing data infinitely is, by definition, at odds with GDPR’s data minimization requirement but also brings ever-increasing storage requirements. For example, during a stress test, the size of the BC of an Ethereum client increased to 40 gigabytes in the first three months of the test.
A privacy-by-design solution to this storage issue is pruning, which enables the node to verify a new block without processing historical transactions by having the node download as many block headers as it can and determine which header is on the end of the longest chain. Starting from this header on the longest chain, the node goes back 100 blocks to verify that the chain matches up. Because this verification process removes the need for retaining the entire chain history for verification purposes, this allows for the removal of unused blocks, which drastically lowers the required storage and implements data minimization into the BC. To ensure that no data is lost, the unused blocks can be stored in one or more archive nodes, which store all data just in case the rest of the network needs them in the future, but the ‘active’ nodes no longer have to process these archived blocks.
Privacy-friendly consensus. A privacy-by-design solution for the infinite storage issue is the concept of non-interactive zero‑knowledge proof, which makes it possible to verify the correctness of a computation, e.g., a hash, without having to execute the computation or even learning what was executed. For example, the proposed currency Zerocoin works as follows. When a coin is purchased, a serial number is attributed to the coin, which can only be revealed using a random number. Using these two numbers, a user can generate a zero-knowledge proof for the fact that the user knows both the serial number and the random number. This zero‑knowledge proof can then be verified by the network without having access to the coin’s serial number or the random number.
The potential use of zero-knowledge proof is not limited to the transfer of coins using BC but can be used to verify any computation without having access to the underlying information. This enables nodes to reach consensus on a new block without accessing the information on that block, and thus without sharing the personal data included on that block with the nodes.
Editable BC. A more radical approach that solves a number of BC data protection issues is the editable BC, for which Accenture has been awarded a patent. The editable BC uses the ‘chameleon’ hash function, which allows for changing the underlying information without changing the outcome of the hash function. This allows for changes to the underlying information for which the hash is already included on the BC, which makes it possible to correct (human) error or intentional (fraudulent) inaccuracies on the BC. This would allow for the execution of individuals’ rights under GDPR, e.g., to correction and to be forgotten.
Solving the immutability of BC comes at a price. To a large extent, the trust in a BC application relies on the network’s consensus on the content of a block and the immutability of the content thereafter. When removing this immutability, other measures should be implemented to retain (or gain) sufficient trust in the BC application for individuals and organizations to use it as a record of their transactions. The trust in a BC application could be retained if, for example, only a single trusted entity can make these changes, similar to the fact that only governments can make certain changes to governmental public registries. A different solution could be to implement a very strict change management procedure, which could include a consensus mechanism that verifies the legitimacy of a change. In any event, changes will have to be strictly logged to ensure that changes can always be reviewed and explained in the future.
BC ‘self-sovereign’ identity management. The well-known use cases of BC are mostly focused on administering transactions, but BC can also be deployed for privacy enhancing purposes, for example, by facilitating ‘self-sovereign’ identity management.
In the offline world, an individual’s identity is mostly established by verifying an individual’s driver’s license or passport. The strength of this system follows from a trusted central governmental authority that provides these proofs of identity. However, because the online world does not follow the national boundaries of the offline world, it is difficult to appoint such a trusted centralized authority for an online proof of identity. By now, there are many initiatives to provide individuals with a digital identity. An example of how BC can be deployed for online identify management is the initiative of Microsoft and Accenture providing a BC-based solution designed to allow individuals with direct control over who has access to their personal data. Rather than all service providers each collecting and storing the personal data required for providing services to an individual, the personal data are stored off-chain, and the system only calls on these data when the individual grants access, whereby access can be limited both in scope and in time. For example, when an individual needs to prove his or her identity when renting a car, the access to the identifying information can be limited to what is necessary to provide this proof and for a short period of time only.
Decentralized identity management has a number of benefits. From a privacy point of view, it enables individuals to take back control over their digital identity, coined the ‘self-sovereign identity’. Currently, many individuals are, for example, not aware of the use of their digital identity and personal data, e.g., for advertising purposes. By using a decentralized identity system, individuals would be able to decide who to give access to which information, for which period of time. A single decentralized identity system also has economic benefits. Right now, a large number of companies are storing similar information about the same individuals. A decentralized identity management system makes this duplicated storage obsolete and ensures that companies have access to up‑to‑date information on an individual, insofar as the individual wants the company to have such access.
Read Part 1 of this post.
Read more about the interplay of blockchain and data protection from Dr. Moerel.
Note:
This blog is a summary version of a full publication of Lokke Moerel published in European Review of Private Law 6-2019 [825 – 852] and in The Cambridge Handbook of Smart Contracts, Blockchain Technology and Digital Platforms (September 2019).