- Why are there regulations about data?
- Legislation on Data Privacy – the GDPR
- What is personal information?
- The rights of privacy
- Practical Issues
- Strategies for compliance
Any IT professional with a responsibility for the custodianship of data will be aware of the regulations that, increasingly, define our various responsibilities for protecting that data. Many seem to regard those regulations as irritant, in much the same way that dogs dislike fleas.
However, there are good reasons for them. Some grew out of financial scandals, others were inspired by the need to control bad behaviour from bankers. Quite a few came from the need for counter-terrorism. They were needed to ensure that internet-based payments work securely and effectively, and are able to control fraud. In this article, I’ll be concentrating on another reason: the right of individuals to privacy. Some of the practical steps we take to ensure privacy come from more than one of these different requirements. Privacy is an increasing concern in society, and because of several notorious abuses of personal data, increasing regulation is going to be inevitable for many of us in IT.
There isn’t a great deal of agreement around the world about the relative importance of individual privacy, beyond perhaps a vague idea that privacy is a “good thing to have”.
Some countries have been pioneers for specific types of regulation; for example, the United States has led the way on regulations for healthcare data and financial transparency in corporates, and the UK on counter-terrorism protections. However, the more general guidance on data privacy has been driven from the European Union. This is because the members of the European Union are also signatories of the European Convention on Human Rights (ECHR).
Whatever one’s feelings about this convention and its interpretation by the European Court of Human Rights, it has produced the first explicit, international definition of the right to privacy. This has allowed the EU to then use this definition to specify what, in commercial practice, it means to respect “private and family life, his home and his correspondence”, and to produce legislation in sufficient detail.
Laudable as this is, it’s always been a big problem for technologists to understand how, practically, to enforce regulations that were devised and driven by politicians. This right to privacy has been subject to legislation for some time, in most western nations, but the current legislation, such as the EU’s 1995 data protection regulation, is vague on detail, and has shown little understanding of the constraints of data technology. As such, it has proven to be difficult to prosecute any but the most florid of abuses of the law.
The Data Protection Act, the UK version of this regulation, attempts to provide a more practical, sensible, and realistic implementation that harmonises with UK law, but the EU has regarded this as a failure to fully comply. This unsatisfactory situation will change in a years’ time when the new General Data Protection Regulation, already adopted by EU countries in April 2016, becomes law in Europe.
The GDPR aims to ‘harmonize’ data protection in Europe but, effectively, brings the EU members more closely in line with the stringent existing privacy regulations in Germany. The Digital Minister in the UK has confirmed that UK law will be amended to comply with the GDPR. This legislation will affect any organisation wishing to trade with Europe, or to hold personal information about individual Europeans. Inevitably, equivalent legislation will be adopted by other countries outside the EU, either because of trade requirements or because it represents the most thorough definition of best-practice in managing personal data. The USA lack legislation that is helpful to organisations that hold data on US citizens. The Patriot Act is not directly relevant, and it currently seems likely that the States will learn from the adoption of the GDPR and introduce similar legislation for domestic regulation.
The GDPR incorporates the requirements of several international standards for information protection, the most relevant being ISO 27001. However, it also covers the rather different aspects of the rights of personal data subjects: the right to be informed, the right to have their data deleted, and moved.
Essentially, the GDPR provides a broader definition of personal data, makes it far clearer what constitutes consent to the use of personal data, increases the requirement for accountability and frankness from organisations that use data, including the need to report incidents of security breaches, allows users to move data from one organisation to another, and requires a better standard of custodianship of data from organisations that hold personal information on individuals.
As it will take a year to change IT systems, now is probably a good time to start. This article will try to explain how this international change will affect our day-to-day work in IT, and what to do about it.
As a veteran of the Data Protection Act, I can say that there is likely to be a stage of some confusion, caused partly by over-statement of the enormity of the task. The primary reason why the GDPR tightens up its definition of good practice regarding the storage and handling of personal data is that it must become easier to prosecute offenders. The laws on privacy must have teeth.
However, the GDPR does not aim to restrict the ability of responsible organisations to do their work, and introduces very little that is not already in the repertoire or capacity of companies who have effective IT.
However, it will come as a shock to the many organisations who have a strange child-like unwillingness to take data access control and data security seriously. Currently, it is depressingly difficult to compel organisations to adopt those good data practices that have been common for the past forty years. This is going to change.
In the coming year, all organisations that handle personal data will need to audit their data, identify personal data, prove that they have plans in place to deal with issues that breach privacy, and must also ensure that their databases and data processing systems meet the basic rights of privacy in society. If they don’t, then it is time to put it right.
Unfortunately, compliance with the GDPR isn’t entirely sufficient because, in some countries, such as Germany, the requirements for privacy may be considerably more stringent than those specified by the GDPR. Compliance is therefore likely to continue to ratchet up the costs of providing IT, unless database systems are re-engineered to cater for society’s wishes for privacy. I’ll explain, later on in this article, what I believe will be required to minimize the effort that will be required.
The GDPR defines two categories of personal data that require special handling. This is similar to the current 1995 data protection regulation, but made clearer. The GDPR add cookies, location data, online identifiers and genetic data to the list. Even if you have rendered the data ‘Pseudonymous‘, it is still treated as personal data because it can enable the identification of individuals (albeit via an inference attack, or via a key if it is encrypted). If data is rendered fully anonymous, (recital 26) then data protection no longer applies. Note that data relating to criminal offences also has special protection but this isn’t covered by the GDPR.
Standard personal data
Names, addresses, phone numbers, identifiers and location data IP addresses, mobile device IDs, IP addresses and cookie strings, any data specific to the physical, physiological, mental, economic, cultural or social identity of that person, including shopping or web-surfing data. This is typical ‘Big Data’ information.
‘Special’ personal data
This is data relating to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and health or sex life. “Biometric data” such as fingerprints, facial recognition, retinal scans. Genetic data is also included.
The right of individuals to have their “private and family life, home and correspondence” respected can be categorized as a right to be informed, in simple terms, how data is being used, and obliges the organization holding the data to get explicit consent for using it. It gives individuals the right to access the data and check whether it is correct, and have it altered or erased if necessary. If a decision is made automatically via profiling, on the basis of data held on an individual, they must able to express their point of view, obtain an explanation of the decision and challenge it. Individuals should be able to move their data between organizations securely where this is appropriate (e.g. utility providers or healthcare).
Explicit consent to its use
Organizations may not process the personal information of individuals unless they have been freely given a specific, informed and unambiguous indication of consent by the data subject, either by a statement or by a clear ‘affirmative action’. The subjects of the data have the right to request a copy of their data to rectify any inaccuracies.
The GDPR increases the number of disclosures that the organisation or agency holding the data (called in GDPR jargon the ‘controller’) must make before collecting personal data. In addition to the identity of the controller, the purposes for processing, and any recipients of personal data, Organisations must disclose how long the data will be stored, and explain in simple language the right to withdraw consent at any time, the right to request access, rectification or restriction of processing, and the right to lodge a complaint (Article 15). Individuals can object to data being used for direct marketing at any time, and have the right to ask how and why their data is being used by an organization.
Individuals have the right to request that their personal data is deleted or removed without undue delay. This is an exercise of the ‘Right to be forgotten’ (Article 17). This is not an unlimited right, but must be balanced against legal freedom of expression, the public interest in health, scientific and historical research, and both the exercise or defence of legal claims.
If you receive a legitimate request, you should inform other organisations that are using the data of the need for erasure. This right to erasure is restricted to circumstances where ‘retention of such data infringes this Regulation or Union or Member State law to which the controller is subject’. If the request doesn’t seem legitimate, the individual who made the request can seek a temporary ‘restriction of processing’ until the matter is settled (article 18), which means flagging the data itself (recital 67) as ‘restricted’
If the storage of personal data was based on the user’s consent, or on a contract, then individuals will have the right to have their data transferred elsewhere in a “structured, commonly used, machine-readable and interoperable format”. They can ask to have a copy that they can check for accuracy.
GDPR requires organisations holding data on individuals to notify them of certain specified types of of breaches within 72 hours of discovering the breach.
Protection of their data, both by design and default
GDPR requires that data protection safeguards be integrated into products and services from the earliest stage of development. Privacy should always be the default. When organizations allow a third party to handle the data in their care, they are still responsible for its protection and security. By design, data should only be collected where it is necessary to fulfil specific purposes, and it should be discard when it is no longer required. By design, IT systems should protect the rights of data subjects
Transparency means that ‘the data subject be informed of the existence of the processing operation and its purposes’, and that it should be easy for the data subject to respond.
Organizations must communicate with individuals on whom they hold personal data in “in a concise, transparent, intelligible and easily accessible form, using clear and plain language.” (Article 12), and provide ‘modalities’ through which users can route requests, or opt out of certain uses of their data. Organisations must deal with requests in a timely manner in a way that is both accessible and easy to understand. Transparency also means being prompt in informing the data subject of any breach that is likely to have included their data.
Although most of the GDPR is aimed at organisations who have used personal data in a reckless or immoral way, against all existing standards of professional practice, there are areas where the majority of organizations will need to make changes in order to comply and there are certain areas where requirements are going to prove technically or administratively difficult even where the. organization is committed to adopting the correct practices.
Storage of personal data
There must be effective encryption of data both in transit and at rest. Only Secure remote access to data is allowed. The GDPR does not define what techniques it considers adequate, but the general advice seems to be to follow the ICO’s recommendations, or the recommendations made by the European Union Agency for Network and Information Security (ENISA Privacy and Data Protection by Design, 2014). The problem with being specific about how to encrypt data is that it is a fast-moving technology in which encryption techniques are quickly discarded once they are proven to be “crackable”.
Erasure of data
In many database systems, there are practical problems in deleting personal data on request. Legacy systems can be tricky to adapt; new systems must be designed to allow erasure of data records upon request. Although this isn’t made very explicit, this erasure would need to include all records, including captured changes of data and backups. Obviously, data must still be erased if it is cloud-based. It must possible to offer a written confirmation of the applicable records’ destruction, which would need to be an automated process.
Users of your systems must be able to transfer their data elsewhere, or to you when they register with you (Article 20). You are obliged to accommodate these requests, using a secure, standard, structured format. Again, this requires knowing where all the data in your care resides. There is no definition of this standard format. Neither XML nor JSON is secure, although transfer over HTTPS provides a measure of security.
Notification of Breaches
The need to notify all your users of any breaches of certain specified types, means that you will need thorough audit logs that enable you to determine where the hacker has gone on your network and what data has been breached
The GDPR insists on a high standard for data resilience, so that personal data is treated in compliance with all business continuity standards for the industry. It states that there must be a service level agreement that guarantees the provider of the data processing services has the ability to quickly restore data when necessary.
Any employee with a high level of access to personal data needs to have had background checks. Any organisation must have a sensible policy to restrict access to data to only those who need it. Personal data must have defence in depth against cyberattack, including intrusion detection and prevention, port scanning and protocol inspection, and perimeter anti-virus/malware blocking.
Any organization must be able to show that it is meets their obligations of providing “privacy by design” and “privacy by default”. It will need to have documented a data protection impact assessment that identifies potential risks involved in processing this data, and describes the measures taken to ensure compliance with the requirements for the data.
Even if your data is stored in the cloud, or in a third-party colocation facility, the security of data is still your responsibility. The Cloud provider must be able to report the location of all the data centres where the data is stored. When an individual exercises their right to erasure, you will need to be confident in assuring the user that all cloud-based copies have been erased.
The Data Protection Officers
Larger organizations in Europe that are engaged in the ‘systematic monitoring of data subjects’ will need to employ, or outsource, Data protection officers (DPOs), as is already customary in larger companies in Germany. All public authorities are required to appoint a DPO under the GDPR.
A DPO will obviously need to understand database technology, security, resilience, and access control, as well as being skilled in managing all the requirements for risk assessments, documentation and re-engineering of existing systems to comply. They also need, Article 39 states, an understanding of privacy regulations both within and outside the EU, and the privacy laws of all EU member states.
Public authorities will already know that they must recruit DPOs, so they are likely to scoop-up all the available qualified people, leaving the private sector a considerable recruitment task
The more you summarise the requirements of the GDPR, the more draconian and harsh they might seem. However, the full articles and recitals contains clauses that emphasize that the regulations must not be applied is a way that unduly restricts start-ups, and enterprise, unless they are acting recklessly. The requirements of the GDPR are extraordinarily close to existing good practices for the custodianship of data.
Given this, if you are preparing for GDPR, I am confident in suggesting the following steps.
Identify your data
This may sound obvious, but many organisations, even public bodies, find it hard to explain precisely what categories of data they are storing, where and why. External audits become an expensive and protracted nightmare. If you haven’t an agreed, and well-understood framework for categorising data in your organisation, then this is a good place to start.
You can then use this framework to find out where the data is being held and in what system. If this data falls within the GDPR’s definition of personal data, can you easily report to individuals what data is being held on them? Can you delete part or all of this data, on request? Can you encrypt this data, both at rest and in transit? Has it adequate access-control? Are you doing anything really dumb, such as allowing developers to use it for developing or modifying the database that holds it? Are backups held securely and encrypted?
Until you know what data is affected, and where it is held, you will be helpless in trying to answer any of these questions.
Identify access requirements within the organisation
You need to know what activities within the organization, need to access personal data and why it is necessary. The first reason you need this information is that the individuals about whom the data is held have the right to request this information from you. You also need this information to implement a data-access system that adopts the principle that the users of an application should have permissions to view, modify or delete only the data in the live database that is relevant to their job role. This is what is meant by ‘executing with minimum necessary privileges’. Note that the GDPR requires that you do a background check on any employee with a high level of access to personal data.
Identify the risks
The GDPR requires that ‘controllers’ and ‘processors’ of personal data make ‘Data Protection Impact Assessments’ or ‘Privacy Impact Assessments’. This will require you to identify, analyze, and document the risks to privacy. These impact assessments are similar to those described in ISO 27001, and are there to allow organisations to identify and fix problems at an early stage.
Consolidate the location of personal data
The more that you can restrict the number of databases that handle ‘special’ and ordinary personal data, the easier are the requirements for audit, operational conditions, network security, intrusion detection and logging.
Encrypt all ‘special’ personal data
Encryption is not, by itself, sufficient for the protection of data, but it is necessary for compliance and must be considered an effective method. Pseudonymization is widely practiced as a way of using real data for research, development, testing, demonstrating databases and for training staff, but pseudonymized data is not considered by the GDPR to be an effective way of protecting data because of its weakness to an ‘inference attack’. This means that pseudonymized personal data that is intended to be processed for purposes such as research must be handled in the same way as any other personal data. Indications are that full anonymization via data masking that is sufficiently thorough to be proof against inference attack represents the safest strategy where testing requires data that is close to production in volume and distribution. (Recital 26)
Introduce full access control to personal data
This practice, often called ‘minimum necessary privileges’ is essential for the proper management of data, and allows a more accurate audit of data access and unsuccessful attempts at access. It minimises the requirement for background checks for users.
If all processes and database users can only access the subset of personal data that is essential for the task, it immediately eliminates all but a small fraction of possible SQL Injection attacks.
Ensure that there is organization-wide data management
Many of the obligations placed on organisations are easier to implement if there is a centralised management of data. This is essential in order to demonstrate that you know where data goes, what category it belongs to, whether it is secure when in transit, where it is used and whether it is properly disposed of. Without this, an audit of compliance for GDPR would be time-consuming and frustrating for external auditors, who charge by the hour.
Implement audit of personal data
To respond properly to a request from an individual on whom you hold data, you not only need to explain clearly how the data is used, but also how, and when data got into the system, especially if there is a dispute happens over the correctness of the data.
Data also needs to be audited in the case of an intrusion. Audit has to be in place, and this information cannot otherwise be gathered retrospectively. Where changes to data cannot be confidently documented then disputes, prosecutions, audits and other legal processes can become highly distracting and expensive. As with disaster-recovery, it is useful to do occasional random checks to ensure that logging systems are being effectively maintained.
Make sure you have a robust intrusion-detection system
Breaches or intrusions are a fact of life, even in the most vigilant of organisations. Occasionally a hacker just gets very lucky indeed. As a data professional, you will probably have to deal with a breach. You need to react as quickly as possible and be completely transparent in explaining what has happened to anyone whose data has been compromised by the breach. To do this, you need a good intrusion-detection system in place, and your databases must log both failed and successful logins.
Draw up plans for dealing with data breaches, including timely notifications
All organisations who use personal data are obliged to report certain types of data breach to the relevant supervisory authority, and in some cases to the individuals affected. The breach can be as simple as a member of the staff of the organisation reading a medical record without authorisation, out of curiosity rather than clinical need, or where real data was used for testing an application.
Organizations are obliged to keep records that, in the event of a breach, can show that the organisation has thought through the impact of its systems and processes, and made informed choices about protecting personal data.
A plan needs to be in place for dealing with a range of severity of breach, so that if it happens, staff can react appropriately. Any notification of a breach needs to include the number of individuals affected, the type of data and approximate number of data records compromised, the name and contact details of the data protection officer or other responsible person, the possible consequences of the personal data breach and a description of the measures taken, or proposed to be taken, to deal with it.
Make all your IT systems secure
The best source of information on application security in general is OWASP, which is a not-for-profit charitable organisation. This maintains a lot of resources to help with the best-practices for security. The most appropriate international standard for security is ISO 27001, which covers most of the GDPR requirement. For database security, Simple-Talk has some good resources such as Schema-Based Access Control for SQL Server Databases and How to Get SQL Server Security Horribly Wrong. There is a series on data masking and database encryption too.
Database administrators, database developers and Ops people will need to take a lead in preparing organisations that use or process personal data to deal with upcoming legislation on data privacy. The GDPR is in the vanguard of this legislation but it is part of an international move to tighten up the way that IT handles personal data.
Except perhaps for the practical difficulties of implementing the ‘right to be forgotten’, the right to refuse to allow certain data to be held and the right to transfer data between organisations (such as energy suppliers or phone companies), the difficulties aren’t insurmountable, and the more you look at detail, the less draconian the regulations seem.
The most effective strategy will be to deal with it as soon as possible. There will be much temporary inconvenience. Developing databases against live data, for example, would seem to require all manner of bureaucratic tasks, even with pseudonymized data. Encryption, logging and auditing will be required, there will be paperwork and effort to perform all the required assessments. Security in general will be a greater concern to senior IT management.
Nevertheless, in reading through the impending legislation in detail, there seems nothing particularly unreasonable about it: it conforms with what the best players in the industry already achieve, and this sort of legislation defends ordinary people from the worst abuses of ‘Big Data’. The security practices that it mandates are in line with OWASP opinion and ISO27001 standards.
In short, we now need to implement all those practices that we’ve always known were necessary but had seemed lower priority than dealing with whatever the current crisis happens to be.