Assessing the Risk of a Data Situation and Recommending Appropriate Mitigations

By Matt Swarbrick

✅ Paper Type: Free Essay	✅ Subject: Computer Science
✅ Wordcount: 2072 words	✅ Published: 18 May 2020

Reference this

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

ASSESSING THE RISK OF A DATA SITUATION AND RECOMMENDING APPROPRIATE MITIGATIONS

CASE STUDY: TROUBLED FAMILIES PROGRAMME DATASET

EXECUTIVE SUMMARY

This report includes explaining how the UK government plans to make available private data to third party researchers to provide analytics on the data situation. It also includes the problems associated with such situation and how to reduce the risks and blowback that may likely arise.

In the writing of this report, the case study illustrated below was taking into account and more light was shed on it.

Presently in the UK, the troubled families programme has been running for about 6 years, going on its 7^th year now. The programme aims to provide better welfare for disadvantaged families. The families in the programme includes families currently facing physical and mental challenges, zero or limited access to education, inadequate access to healthcare, underemployment and not to forget families in areas which are being exposed to crime. Based on the re-occurrence of these issues, the UK government has been in the business of collecting these information to understand the complexity and also to ascertain to what level certain families in the UK are being affected. These information which is being collected from several government administrations and local authorities was combined to form the troubled families dataset. This dataset was created for the sole purpose of allowing the government track the trajectories of the identified families and also measure as to what degree interventions which are currently running have been effective.

The government plans to share an anonymised version of this dataset with third-party researchers as a way to provide them with resources needed to carry out deeper analyses. The possibility of making the dataset available to a vast number of researchers gives room for the data to be exposed to a wide body of independent individuals or organizations which could pose certain risk to the data, the government and possibly the lives of the people the data represents. The aim of this report is to highlight the risks associated with this idea and to highlight ways as to which they can be avoided. Certain risks that could pose a negative effect to the release of data by the government are summarized as follows.

INTRODUCTION

In the world of today, there is a rising increase in the amount of data being collected each day. This data is further being divulged across several environments, sectors and industries. In most scenarios, the process of collecting data is really not a big deal but the ability and technicality of managing the large amount of data collected becomes an unending struggle. The struggle to effectively manage the exploding volume of data has led to increased prominence in the ethical use of data. For this reason and more, it has caused the establishment of policies to ensure there is a check and balance as to how data is being shared with the sole purpose of protecting the integrity of people who own this information. In addition, people or organizations who are charged with the role of gathering, distributing and using this data keep exploring the ethics of their practices and, in most cases, having to confront those ethics in the face of public criticism. When there is a dwindling decline in data management, ethically it will by no means negatively affect human lives and lead to a loss of trust in certain projects or deliverables, products and relationships between organisations.

Data Ethics refers to systemising, defending, and recommending concepts of right and wrong conduct in relation to data, especially when personal data is involved. It is becoming more relevant now because of the increase in how data is being generated and shared. For this reason, ethical considerations need to be put in place. There is a difference between it and Information Ethics because the former is more concerned with people who collect and divulge structured or unstructured data such as data brokers, governments, and large corporations while the later focuses on issues more or less related to intellectual property and concerns relating to librarians, archivists, and information professionals, (“Big Data Ethics”, n.d). As seen on the internet, Data Ethics can be concerned with the 6 major principles: 1. Ownership – This refers to people who own or collect the data. 2. Transaction Transparency – If information belonging to an individuals of the program is used, they should have transparent access to the product the data is being used for 3. Consent – Any individual whose data is going to be collected for onward usage, must be totally briefed and must give his blessing to it. 4. Privacy – If data exchange occurs, protection and security measures must be in place to protect individuals who can be negatively affected by the exposure. 5. Currency – Individuals should be in the know of any financial transactions resulting from the use of their personal data and the scale of these transactions. 6. Openness – Aggregate data sets should be freely available (“Big Data Ethics”, n.d).

RISKS ASSOCIATED WITH THE DATA SITUATION

Business Considerations Related to Data Sharing

This is one very important risk to consider in a data situation. First and foremost, before data can be shared to outsiders of an organisation or business, the data must be checked and validated such that it has no negative rebound effect on the organization. In this case, exposing the data to researchers could give room to concerns such as liability to government agencies, business costs and thoughts pertaining to loss of confidential business information. After information has been gathered by the different government agencies and local authorities, and thereafter collated and transformed, deep analyses carried out by researchers into this new data could reveal certain loopholes in the dataset, if at all there is any to begin with. One loophole that could possibly be identified, is one where let us say certain agencies have been tasked to provide basic amenities to families in the program and have failed to do so. Such a situation could have an adverse effect on the agencies in question. Furthermore, allowing access to these data could lead to researchers detecting if instances occurred in which data gathering process was poorly conducted, it could lead to jeopardy in the outcome of the report. This as a whole could paint a bad image and reputation of the government organizations at large.

Concerns About Adversarial Science

The term ‘adversarial science’ as defined by Wikipedia is one that supports conflicting one-sided positions held by individuals, groups or entire societies, as inputs into the conflict resolution situation, typically with rewards for prevailing in the outcome. There is a possibility of the data being exposed to a high number of researchers. The intentions of these independent researches could be diverse, some might come with the sole intention to either undermine the work and effort which has been out into collating the dataset or discredit the data and its source which could pose integrity issues to the government agencies in return.

Reidentification

Reidentification is a major risk in any given data situation. It is generally referred to as the process of participants being reidentified through deeper analyses on any given data especially in a case where they have been previously promised anonymity. In this case, once the dataset is made available to researchers by the government, there is no limit as to the amount of research they would explore to get a better understanding of the data and this could involve matching the data with other public or previously gained dataset which could lead to buried or covered information about certain people in the program. In most cases, researchers are not to blame because a lot of information is out in the open and by default are available to the public, so people who know where to look easily get access to information such as these. On another note, reidentification is nearly impossible to scrap as researchers need to fully understand the data at full capacity to implement whatever algorithm necessary to produce expected results. The thought of this makes releasing the dataset scary as this could pose an integrity issue on the part of the government because for one, it exposes the participants of the program to certain risks or attacks and two, it makes it difficult for attract more participant in the near future.

Fears Regarding Misuse of Shared Data

Another risk to be conscious of is the way and manner in which the data would be used after it becomes accessible. There is every possibility of data to be misused or used for another purpose different for what is was made available for. Many a times researches go on to use data from past projects for secondary purposes and this could have an adverse effect. Also usage of data intermittently for private and personal study or what have you, could make researchers fall into the situation of exposing the security of participants within the program. The fear of secondary usage of the dataset could cause an eruption of series of legal actions to be taken against the government.

Third-party Breaches

There is a potential risk of third-party breaches where once data has been made accessible to researchers, it in one way or another falls into the hands of individuals outside the government agencies and researchers. As the data becomes available to more researchers, there is a possibility of it being accessed through these researchers by individuals without clearance or who should not have access to the data in the first place. In addition, due to the release of dataset by which ever means of communication (i.e. mail), the data is at risk of being hacked by potential hackers who are constantly in the business of hacking mails. The idea surrounding a possible leak of the data could be a potential problem.

MITIGATIGATING THESE RISKS

De-identification

This is a process of removing any information which might immediately cause researchers to identify a participant. De-identification involves masking or redacting parts of the dataset which might lead to identification of a person. One way to mitigate the risk of potential recognition of participants in the program is to set up a committee who ensures the integrity of data collected for any mishaps. This committees serves as a second-level support whose primary job function is to follow through on ensuring the process of de-identifying participants was successful. Information should be widely checked and scrutinized to remove all demeaning information and also make the dataset such that the information contained therein is sufficient enough to complete the task at hand, at the same time difficult to be merged with other data sources to produce meaningful information.

Building Trust

Trust is the foundation for all productive relationships and is at the heart of making data-sharing efforts happen. Researchers and organizations must be able to trust each other and participants must be able to trust those same researchers and organizations. In research, institutions have relied heavily on contracts to help manage trust relationships. For example, consent forms provide exhaustive detail about expectations and obligations for participants in certain programs. Data-use agreements, terms of use, and other contracts provide differing ways of ensuring ethical management of research. These arrangements may be necessary, but they are insufficient. Trust also needs to be relational, with contracts serving as a way to punctuate what has already been agreed to rather than the sum total of how a relationship will work. Different elements enter into relationship trust. In some cases, people share core values and interests or are committed to a common cause.

Regulations do provide a minimum standard for behaviour, and researchers need to do more than just what regulations mandate. Thus, data-sharing policies can provide a scaffolding, but the research community needs to set standards of excellence and strive to meet those standards (Sharing Clinical Research Data, 2013).

Enhanced IT Security

All communications and exchange of data between government and researchers should be sent over encrypted channels which prevents a possible leak or hack. Adequate funds should be provided by the government to invest in IT security. If there is an existing IT security in place, it should be further empowered constantly during the course of the project to ensure IT firewalls are up to date.

REFERNCES

Big Data Ethics. (n.d). In Wikipedia. Retrieved April 19, 2019, from https://en.wikipedia.org/wiki/Big_data_ethics
Sharing Clinical Research Data: Workshop Summary. (2013). Retrieved from https://www.nap.edu/read/18267/chapter/4

Matt Swarbrick

Matt holds a BA and MA certificate from Cambridge, and is an subject-matter expert in Business and Management. Matt also writes about subjects like Finance, Economics and Computing/ICT.

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all

Essay Writing Service

From £99

Report Writing Service

From £99

Student reading and using laptop to study

Assignment Writing Service

From £99

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please click the following link to email our support team:

Request essay removal