Disclaimer: This is an example of a student written essay.
Click here for sample essays written by our professional writers.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UKEssays.com.

The Ethics of Data Science

Paper Type: Free Essay Subject: Information Technology
Wordcount: 2366 words Published: 10th May 2021

Reference this

The power of data and technology is growing almost in every field of human origin. As conveyed by McKinsey Global Institute, the “global volume of data doubles” almost every three years due to the increase in digital platforms across the world (The age of analytics: Competing in a data-driven world, 2016). Like so, it is significant for data scientist and other researchers to utilize this data for social use. However, large data means increase in the dangers of using the data inappropriately, so it is necessary to follow an ethical way while performing data analysis. Ethics represent moral principles that govern an activity conducted (BBC - Ethics - Introduction to ethics: Ethics: a general introduction, 2014). In a diverse field like data science, it is essential that the data collected for analysis is ethical and appropriately used in building models. Predictive algorithm is one of the widely used algorithms for decision making in almost every field where data science is used. While algorithms are typically performed in a way beneficial for the world, researchers have found most of the predictive algorithms developed by organisations leading to ethical issues. Some of these include privacy, fairness in using the data in a respectable way, producing a shared benefit, governance of data accuracy and transparency (Arvanitakis, 2018). The following paragraphs provide a deeper insight into the ethics of data science by considering a case study published by Princeton University. This case study depicts the use of data science by Minerva High School to reduce the percentage of students exiting the school by improving the academic resources. Thereby, it discusses the one of the most significant ethical concerns of privacy introduced due this predictive algorithm (Optimizing Schools Case Study: 3, 2018).

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Essay Writing Service

Upon achieving a “depressing milestone” of student dropout rate, Minerva High School’s principal Mr. Vulcani was concerned about the school’s future and decided to address this issue by using machine learning to predict the reason for student dropouts. The school decided to use data science to flag at-risk students and provide them with specific learning resources in order to prevent them from dropping out of the school. Thereby, wanting to increase the percentage of students that graduated from the school with their diplomas or undergraduate degrees. With access to the students’ details and behaviour, Mr. Vulcani approached a local data science company called Hephaestats to generate predictions on student dropout rates and provide essential guide on fixing the issue. On recognizing the cause behind student dropout rate, the school aims developing teaching techniques. This case study provides an example of where algorithmic decision making is used in a broad sector of people’s lives. Besides using data science in government, advertisement and business areas, it is additionally widely practised in the education system to develop impactful decisions for students learning.

Data science is useful in recognising patterns and provide effective decisions using various statistical measures and predictive techniques. Minerva High School provided the company Hephaestats with a vast amount of data collected from students like their behaviour, academic performance, attendance and many more. The varied datasets from the school helped the company undergo machine learning techniques to design a model. Hephaestats performed its data analysis by looking at various factors provided in the student data such as student demographics, academic information, teaching statistics and disciplinary factors. On performing a broad analysis of these factors over the previous years, Hephaestats was able to correlate the dropped-out students’ data with the current students’ information. Following the model created, Hephaestats identified the key factors responsible for students leaving the school and helped in improving the school’s learning environment. In the world of ‘Big data’, these types of algorithmic decision-making are used in almost every area to help people’s living. Like in this case study, the results from predictive analysis can help in a range of industries to enhance their business aspects. However, the crucial issue begins when the concern of data ethics is introduced. It is the responsibility of the company to make sure that the dataset and the methods of data science used is ethical and unbiased. Despite the several measures implemented by data scientists to preserve the ethics, there raises the question of data usage and algorithm performance.

One of the biggest ethical issues in the field of data science is privacy and security (Chulu, 2018). Privacy is the privilege to be able to control the collection and usage of personal information. Collection of immense amounts of data is happening in almost all companies and fields. However, this potential usage of ‘big data’ can leave a user’s privacy often at danger. That is as we gather more information, the more complicated it is to protect the privacy of this information. It is vital that user privacy is always protected by the organisations that have access to them. Yet, there are cases where the data privacy is violated for a primary cause of the company. These reflect on the public’s increased concern on privacy in usage of predictive algorithms. Referring to the case study used earlier, although the outcomes were positive in achieving a “praiseworthy increase” in graduation rates, these results raised concerns on the ethical usage of data. Given the urgency of the situation, the school had failed to notify the parents about the use their children’s information. Therefore, the data used to train the algorithms introduced suspicion among parents who argued that the “school was breaching privacy laws” by revealing student information to commercial entity. Critics further argued about the bad example set to the community by allowing a “blatant violation of privacy.”

Another substantial ethical concern to consider is the issue of discrimination and bias highlighted in the algorithms (Chulu, 2018). In this case study, while the student information provided to Hephaestats was by the school itself, students were not happy being “treated as research subjects” and being compared to other students on their academic level. The results produced from analysis cannot be sufficiently accurate on predicting a student’s performance. It can lead to biased results where sometimes a student with high educational capacity is predicted sub standardly by the algorithm. Moreover, the algorithm developed by Hephaestats broached issues among teachers who argued that their experience in teaching was “overridden” by this system. By this, we see that these algorithms can frequently lead to sensitive decisions which can even deprive the individual’s rights. Although the models created by data scientists are do not intentionally attack an individual’s personality, sometimes the lack of transparency can lead to concerns among the public. As said by one of the teachers from Minerva High School, “blackboxing” the machine learning process is an important reason for public suspecting the righteousness. Here, black box indicates a system where the process of what happens inside is out of sight. This “blind faith” in a predictive algorithm is usually never expected by the public and therefore becoming a reason for arguments in ethical uses.

One of the fundamental reasons for biased predictive algorithms is social bias growing in the current world (Chulu, 2018). Machine learning is the most widely used method to make predictions on big datasets. In order to arrive at an efficient model, this algorithm must be trained and tested with massive amounts of data. When there is a biased training data due to the unfairness in the current society, the model produced by the algorithm inherits the same bias. The lack of diversity in the training data used for the algorithm is an underlying cause behind biased algorithm. Upon increasing the diversity in technology and providing a greater transparency in the algorithm, it is possible to remove some bias from the algorithm. Similarly, Hephaestats’ representatives provided tools and some machine learning developments to the teachers and students to reduce ethical concerns.

Find Out How UKEssays.com Can Help You!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

One way to address these ethical concerns regarding privacy is by concealing the personal identities from companies that do data analysis (Stewart, 2020). In this case study, most parents raised their privacy concerns by accusing the school for their children’s personal information being used by external companies. Whilst the results produced from deleting the direct identifies in the dataset may not be as accurate as needed, it can help developers to still be able to generate predictive algorithms without violating privacy. An alternative approach to privacy protection is to use a pseudonymized dataset where artificial identifiers can be applied in algorithms. Finally, the lack of transparency in algorithms which is an important reason for ethical concerns which should be minimised by providing access to algorithm functionalities. While, it is necessary to offer the public with an understanding of the algorithm, not all algorithms can provide a transparent record of what happened. In certain situations, complex machine learning algorithms are difficult to explain even by the developers themselves. In such a case, it is impossible for the public to gain an insight of how the algorithm works and the process of possible biased results. To resolve this issue, we must aim at improving algorithmic transparency and provide greater validation on how the developers formulated the predictive model.

Whilst data science is now being used almost everywhere in the development of company, it is important that ethics are maintained at all levels possible. As the world is growing big in data, the responsibility to maintain privacy and fairness in employing this data is also mounting. There are multiple ways to prevent ethical and privacy concerns. Some possible ways to address this issue are by using pseudonymized data to secure personal information and thereby to reduce the risk of privacy violation. Furthermore, introducing diversity in tech sectors and demanding for transparency of data helps in the reduction of algorithmic bias.

Bibliography with referencing:

  1. The age of analytics: Competing in a data-driven world. McKinsey&Company. (2016). Retrieved 28 September 2020, from https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world.
  2. BBC - Ethics - Introduction to ethics: Ethics: a general introduction. Bbc.co.uk. (2014). Retrieved 1 October 2020, from http://www.bbc.co.uk/ethics/introduction/intro_1.shtml#:~:text=Ethics%20is%20concerned%20with%20what,%2C%20habit%2C%20character%20or%20disposition.
  3. Arvanitakis, J. (2018). What are tech companies doing about ethical use of data? Not much. The Conversation. Retrieved 1 October 2020, from https://theconversation.com/what-are-tech-companies-doing-about-ethical-use-of-data-not-much-104845.
  4. Princeton University. (2018). Optimizing Schools Case Study: 3 [Ebook] (p. All). Retrieved 1 October 2020, from https://aiethics.princeton.edu/wp-content/uploads/sites/587/2018/10/Princeton-AI-Ethics-Case-Study-3.pdf.
  5. Chulu, H. (2018). Let us end algorithmic discrimination. Medium. Retrieved 28 September 2020, from https://medium.com/techfestival-2018/let-us-end-algorithmic-discrimination-98421b1334a3.
  6. Stewart, M. (2020). Data Privacy in the Age of Big Data. Medium. Retrieved 2 October 2020, from https://towardsdatascience.com/data-privacy-in-the-age-of-big-data-c28405e15508.
  7. Nicklin, A. (2018). Applying Ethics to Algorithms. Medium. Retrieved 27 September 2020, from https://towardsdatascience.com/applying-ethics-to-algorithms-3703b0f9dcf4.
  8. Jones, H. (2018). AI, Transparency and its Tug of War with Privacy. Medium. Retrieved 30 September 2020, from https://towardsdatascience.com/ai-transparency-and-its-tug-of-war-with-privacy-5b94c1d262ad.
  9. Heilweil, R. (2020). Why algorithms can be racist and sexist. Vox. Retrieved 9 October 2020, from https://www.vox.com/recode/2020/2/18/21121286/algorithms-bias-discrimination-facial-recognition-transparency.
  10. Schlenker, L. (2019). The Ethics of Data Science*. Medium. Retrieved 4 October 2020, from https://towardsdatascience.com/the-ethics-of-data-science-e3b1828affa2.


Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: