Teaching AI Ethics: Privacy

This is the fifth post in a series exploring the nine areas of AI ethics outlined in this original post. Each post goes into detail on the ethical concern and provides practical ways to discuss these issues in a variety of subject areas. For the previous post on copyright, click here.

There are growing concerns about the impact of Artificial Intelligence technologies on our privacy. AI systems are often “black boxes“, making it hard to understand how they arrive at their decisions and raising questions about transparency.

The use of personal data in AI training data and the potential for data breaches and cyber attacks also pose significant privacy risks to individuals and organisations. As I discussed in the first post in this series, AI systems can perpetuate biases and have unintended consequences that violate individual privacy rights. In this blog post, I’ll explore these ethical concerns around privacy and AI and present a few questions to explore this area across a range of subjects.

Here’s the original PDF infographic which covers all nine areas:

Leonfurze_com_AIEthics

Where does all that data come from?

Developers of large language models, such as ChatGPT, often scrape their training data indiscriminately from the web without paying any attention to individual rights. These models are trained on vast swathes of internet data, and often include personal information that has been collected without consent or used in violation of privacy laws. This has raised concerns about the ethical implications of developing AI models that rely on data collected without regard for individual privacy rights.

The lack of transparency and accountability around the collection and use of personal data in AI development has been a longstanding issue. The vast amount of data required to train these models means that personal information is often collected without explicit consent or knowledge of the individuals affected. Critics argue that developers of large language models prioritise the creation of powerful algorithms over individual privacy rights, and that the industry is not sufficiently regulated.

These concerns have landed OpenAI in trouble with European regulators, particularly under the General Data Protection Regulation (GDPR) laws. The Italian regulator recently issued a temporary emergency decision demanding that OpenAI stop using the personal information of millions of Italians included in its training data, citing a lack of legal justification for using people’s personal information in ChatGPT. The GDPR rules protect the data of over 400 million people across Europe, and apply to personal data that is freely available online. The decision by the Italian regulator highlights the growing concerns around the development of large AI models and the use of personal information in training data.

In the US, the federal privacy commission is also investigating OpenAI following a claim made against the company that it has been unlawfully using personal and private data.

Protecting personal privacy

As I covered in the first post in this series, AI systems have the potential to perpetuate and amplify biases in data, leading to discrimination against certain groups or individuals. This is a serious concern when it comes to privacy, as these biases can lead to the exclusion or mistreatment of individuals based on their personal characteristics. It can lead to members of the public being surveilled based on skin colour, place of residence, or other factors which are part of the data used when training the models. These concerns extend into many areas of the AI industry including facial and affect recognition, which I’ll talk about in a later post.

The storage of personal data in AI training data is also a significant privacy concern. In the creation of these models, personal data has been collected without explicit consent or knowledge of the individuals affected, and there may be inadequate protections in place to ensure that this data is used ethically and responsibly. Data breaches and cyber attacks also a huge risk for AI systems. Several weeks ago, OpenAI experienced a breach due to a bug in one of their code libraries which revealed the first and last names and email addresses of ChatGPT Plus subscribers, along with financial details.

Case Study: AI Defamation

As covered in my article on truth and academic integrity, Artificial intelligence has the potential to generate false information, leading to serious privacy concerns. In a recent case in Hepburn Shire, Australia, OpenAI once again faces the possibility of legal action for defamation. ChatGPT incorrectly described regional mayor Brian Hood as a guilty party in a foreign bribery scandal. The mayor was actually a whistleblower who had reported the bribe payments.

ChatGPT’s errors arose from its indiscriminate data-scraping, as well as the inability of these models to distinguish between true and false claims. As a result, it generated convincing but incorrect information. Although OpenAI, the company that created ChatGPT, has taken some steps to protect people’s privacy, such as removing personal information from training data, such actions may not be sufficient to prevent the spread of false information.

This case highlights the legal challenges associated with suing AI companies for defamation, particularly given the issue of jurisdiction. Although the legal implications of AI technologies like ChatGPT are still uncharted, the case demonstrates the need for more cooperative efforts between AI developers, social media companies, and government agencies to mitigate the risk of generating misleading information.

When personal user data – even publicly available data, like the original news story about Brian Hood’s involvement in the bribery case – is combined with a language model’s capacity for generating falsehoods, we have a recipe for damaging output.

English Teachers: EOIs for Cohort 1 of the Practical Writing Strategies course are now open