diff --git a/site/join_in/join_in.md b/site/join_in/join_in.md index ce70fb1b..478ce331 100644 --- a/site/join_in/join_in.md +++ b/site/join_in/join_in.md @@ -37,21 +37,62 @@ Discussion questions: 2) Do you think ChatGPT is a soft or a hard bullshitter? I.e. do you think it has the intention to mislead its audience or not? 3) What do you think of the implications of the anthropomorphizing of AI tools? E.g. hallucination, learning, training, perception etc. +[Read the writeup here!](https://dataethicsclub.com/write_ups/2024-09-25_writeup.html) + ### 9th October -To be decided. +Material: [Time to reality check the promises of machine learning-powered precision medicine](https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30200-4/fulltext) + +Discussion questions: +1) How should we more meaningfully assess applications of ML to medicine (and other fields)? +2) Why do you think most of the reviewed ML methods are producing classifications (i.e. diagnosed or not diagnosed) instead of predicting a continuum of risk? +3) Do you think precision medicine itself is an epistemological dead end? What about stratified medicine? (identifying and predicting subgroups with a better and worse response) + +[Read the writeup here!](https://dataethicsclub.com/write_ups/2024-10-09_writeup.html) ### 23rd October -To be decided. +Material: [Transparent communication of evidence does not undermine public trust in evidence](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802351/) -### 6th November - DEC Special! -To be decided. +Suggested discussion questions: +1. Were you surprised by the findings of the study? Do they mirror experiences in your own domain? +2. What do you think is behind the difference between the results of the nuclear power study and the vaccine study? +3. Should research articles have a place in persuading the public, or should their intention always be to focus on robust, trustworthy information? + +[Read the writeup here!](https://dataethicsclub.com/write_ups/2024-10-28_writeup.html) + +### 6th November - DEC Social Special +Very optional reading: [Data Ethics Club: Creating a collaborative space to discuss data ethics](https://www.cell.com/patterns/fulltext/S2666-3899(22)00134-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389922001349%3Fshowall%3Dtrue) + +Structure: +Go into breakout rooms and everybody does intros and shares (note taking optional if people don't want their details on the blog): +1. Who are you professionally? +2. What brings you to data ethics club? +3. Who are you outside of data? + +Then as a group your discussion is about: +1. What's something you all have in common? +2. What's something that makes you all different? +3. As a group, if you hypothetically had unlimited time and unlimited money, what would you do to try and put your Data Ethics knowledge into use for societal good? + +Then come back at the end and share! + +[Read the writeup here!](https://dataethicsclub.com/write_ups/2024-11-06_writeup.html) ### 20th November -To be decided. +Material: [A giant biotechnology company might be about to go bust. What will happen to the millions of people’s DNA it holds?](https://theconversation.com/a-giant-biotechnology-company-might-be-about-to-go-bust-what-will-happen-to-the-millions-of-peoples-dna-it-holds-241557) + +Suggested discussion questions: +1. What are the potential risks that could come about with 23andMe using people’s data in the way they’ve outlined in their service agreements? +2. If you were the Chief Executive of 23andMe, what would you be prioritising to make sure highly personal genetic data was being protected if the company is sold (or even if it isn’t!) +3. Data security aside, would you want to take a DNA test knowing that you might find out things about your family or about health conditions you could develop? + +[HackMD link](https://hackmd.io/T4-wGcVaSyyqlu4Hozg4GQ?both) ### 4th December To be decided. +### 18th December +To be decided. + ## Past Meetings diff --git a/site/write_ups/2024-08-31_writeup.md b/site/write_ups/2024-08-31_writeup.md index 1348c631..f37e9acb 100644 --- a/site/write_ups/2024-08-31_writeup.md +++ b/site/write_ups/2024-08-31_writeup.md @@ -2,13 +2,14 @@ blogpost: true date: August 1st, 2024 author: Jessica Woodgate -category: Write Up +category: Bookclub tags: data feminism, challenge power --- # Data Feminism: Chapter 1 – The Power Chapter ```{admonition} What's this? -This is summary of the discussions from Data Feminism Book Club, where we spoke and wrote about [Data Feminism](https://data-feminism.mitpress.mit.edu) by Catherine D’Ignazio and Lauren F. Klein over the summer of 2024. +This is summary of the discussions from Data Feminism Book Club, where we spoke +and wrote about [Data Feminism](https://data-feminism.mitpress.mit.edu) by Catherine D’Ignazio and Lauren F. Klein over the summer of 2024. We hope you enjoy this writeup. We aim to run another Book Club in the summer of 2025! In the mean time, the co-organisers would be really enthusiastic to support anyone interested in running another book club! Please reach out if you want to get involved. The summary was written by Jessica Woodgate, who tried to synthesise everyone's contributions to this document and the discussion. "We" = "someone at Data Ethics Club". @@ -213,4 +214,4 @@ Thinking about how to present data is supported by [Nicole Dalzell’s three que ```{admonition} We hope you enjoyed this writeup of our discussion of chapter 1 of Data Feminism from our Data Feminism book club over the summer of 2024. We hope to run another in the summer of 2025! In the mean time, the co-organisers would be really enthusiastic to support anyone interested in running another book club! Please reach out if you want to get involved. -``` +``` \ No newline at end of file diff --git a/site/write_ups/2024-09-25_writeup.md b/site/write_ups/2024-09-25_writeup.md new file mode 100644 index 00000000..647f1f2e --- /dev/null +++ b/site/write_ups/2024-09-25_writeup.md @@ -0,0 +1,113 @@ +--- +blogpost: true +date: September 25, 2024 +author: Jessica Woodgate +category: Write Up +tags: ChatGPT, LLMs, bullshit, hallucination +--- + +# Data Ethics Club: [ChatGPT is Bullsh*t](https://link.springer.com/article/10.1007/s10676-024-09775-5) + +```{admonition} What's this? +This is summary of Wednesday 25th September's Data Ethics Club discussion, where we spoke and wrote about the article [ChatGPT is Bullsh*t](https://link.springer.com/article/10.1007/s10676-024-09775-5). +The summary was written by Jessica Woodgate, who tried to synthesise everyone's contributions to this document and the discussion. "We" = "someone at Data Ethics Club". +Huw Day helped with the final edit. +``` + +## Article Summary + +Tackling the problem of large language models (LLMs) outputting false statements is an increasingly important problem as LLMs are employed across more areas of society. Falsities generated by LLMs are commonly referred to as ‘hallucinations’. This paper argues that hallucination “is an inapt metaphor which will misinform the public, policymakers, and other interested parties”. To better address the topic, the paper suggests that the label ‘bullshit’ is more appropriate than hallucinate, as LLMs are designed to give the impression that they are accurately representing the world. The paper distinguishes between ‘hard’ bullshit, requiring an active attempt to deceive the audience, and ‘soft’ bullshit, requiring a lack of concern for the truth. LLMs outputs are framed as soft bullshit at a minimum, and hard bullshit if we view LLMs as having intentions, for example, in virtue of how they are designed. + +## Discussion Summary + +### Do you think that the labels of ChatGPT as a bullshit machine is fair? + +Giving LLMs like ChatGPT labels is a path to improve understanding about the mechanics of how the tools work. Understanding mechanics and defining the (limits of) LLM capabilities is important to reduce harms and ensure they are used correctly. Users need to understand how models are designed in order to evaluate the output. Intentional metaphors can be helpful in conveying what a system is designed to do. However, if used inappropriately, metaphors can be misleading. For instance, the ‘learning’ part of ‘machine learning’ is actually something that looks more like recombining. Examples of the kinds of problems that can arise from misconceptions about the capabilities of AI can be seen in domains like digital health. In digital health, we are finding that people look to ChatGPT to diagnose problems [which it is not yet capable of doing reliably](https://theconversation.com/how-good-is-chatgpt-at-diagnosing-disease-a-doctor-puts-it-through-its-paces-203281). + +To encapsulate what is really going on when an LLM is prompted, it is important to understand the ‘temperature’ parameter, of which the paper provides a good explanation. The temperature parameter [“defines the randomness of LLM response. The higher the temperature, the more diverse and creative the output”](https://medium.com/@albert_88839/large-language-model-settings-temperature-top-p-and-max-tokens-1a0b54dcb25e). In other words, temperature enables the model to be tuned to choose more randomly amongst likely words, rather than choosing the most likely word. The effect of this, the paper conveys, is more “creative and human-like text” as well as a higher likelihood of falsehoods. + +In framing the effects of elements like temperature, the label ‘bullshit’ does seem appropriate. The paper argues that the temperature parameter shows that the goal of LLMs is not to convey helpful information, but “to provide a normal-seeming response to a prompt”. LLMs are designed to “give the impression” that the answer is accurate, rather than giving an accurate answer. Bullshit, understood as an indifference to the truth, encapsulates the way that LLMs are designed to prioritise objectives. Whilst bullshit might seem a bit clickbait-y, it is effective at drawing you in. Some of us experienced confirmation bias just looking at the title, feeling that it puts into words how we feel about the topic. + +For those of us that are more sceptical of ChatGPT, rather than seeing it as right or wrong, we thought that the paper presents a useful paradigm within which to frame a conversation. Paradigm discussions in data science are really important, such as [machine learning vs. AI paradigms](https://www.datacamp.com/blog/the-difference-between-ai-and-machine-learning). In discussing paradigms, analytical philosophy surrounding language and precise definitions facilitates scoping areas and highlighting misconceptions. It is fun to utilise a well-defined but playful word like bullshit in this analysis. + +The crux of the paper contrasts bullshit with the term currently used when LLMs output false information – ‘hallucinate’. Hallucinate seems to be a term which has entered common language by slowly creeping in, rather than through the careful and thought-out selection of words that we see in philosophy. We wondered why it has been used as phrasing in the first place. One idea the term hallucinate may be appropriate comes from the fact that in humans, hallucinations can have an element of truth in the individual’s life. This could mirror how in LLMs, the ‘random’ output has some root in its training data. + +Despite some elements of ‘hallucinate’ that fit with what LLMs do, it doesn’t quite capture the meaning. Hallucinate has misleading elements, such as the implication that it is something that ‘just happens’. Human hallucinations are without intention; bullshit is a term that is less neutral and encompasses the effect of intention. Intention includes both the output of the system itself, and the work that goes into building the systems. Saying that a system is hallucinating insinuates that its output is not the designers’ fault. Hallucinate thus propagates a false narrative that distances the system designers from the system’s output, thereby side-stepping accountability. + +The false narrative around AI glosses over the human input in these systems, masking the role of intention by presenting a façade that no humans are involved. In reality, there is [a lot of human work that goes into AI](https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots) (as discussed in a [previous Data Ethics Club](https://dataethicsclub.com/write_ups/2024-05-08_writeup.html)). Users are duped into thinking that the technology works much better than it really does. + +Labelling LLM outputs as bullshit is supported by the notion that common use cases for LLMs are areas where people already bullshit. We see LLMs used in everyday practices where convincingness is prioritised above accuracy, like in emails, marketing, and research proposals. There was the [lawyer that used fake cases generated by ChatGPT in his brief](https://www.legaldive.com/news/chatgpt-fake-legal-cases-generative-ai-hallucinations/651557/). We also saw fraud and liability as massive use cases for LLMs and wondered if there is any responsibility for those who break the law by using LLMs in these ways. + +Bullshit helps to tie the output and effects of LLMs to their human designers, highlighting the human responsibility in the creation of LLMs. Giving the interface of LLMs agency distracts from the designers involved in the system, and their bias. Using the label bullshit helps to reforge this connection to intention and responsibility, thereby improving awareness about underlying mechanics. + +Whilst we saw more strengths to the label ‘bullshit’ than ‘hallucinate’, there are some aspects of ‘bullshit’ that we had difficulty with. It is not an easy word to use generically and has various connotations which need qualification. ‘Bullshit’ as a term is intended for people and seems a bit anthropomorphic, even with the disclaimers in the paper. ‘Bullshit machine’ could be better, or perhaps ‘bullshit facilitator’. We also considered the term [‘confabulation’](https://en.wikipedia.org/wiki/Confabulation), which is “a memory error consisting of the production of fabricated, distorted, or misinterpreted memories about oneself or the world”. + +Although ChatGPT does have a tendency to spit out falsehoods, it might be unfair to call it a bullshit machine if 90% of the output is true. If it is just a predictive tool, which usually predicts correctly, it’s output might not necessarily be classed as bullshit. We expect much more from machines than humans; as humans, we can go our entire lives believing statements to be true and telling other people those statements are true, to find out one day we are wrong. If someone bullshits us, and we believe it to be true and share it with others, we wouldn’t consider ourselves to be bullshitters. + +### Do you think ChatGPT is a soft or a hard bullshitter? I.e. do you think it has the intention to mislead its audience or not? + +We liked the clear distinction between soft and hard bullshitting, where a soft bullshitter need not have an intention to mislead, but a hard bullshitter does. We would welcome more of a distinction between a hard bullshitter and a liar. Hard bullshitting might be prevented from collapsing into lying if there are underlying ulterior motives. + +Whether ChatGPT is soft or hard could depend on its version, with a difference in labels between earlier and later versions. Pre-reinforcement learning (RL), there was less effort dedicated to making sure that ChatGPT was not misleading people. The use of human assisted RL (e.g. [RL from human feedback](https://huggingface.co/blog/rlhf)) seems to play an important part in distinguishing between soft and hard bullshit. + +On one hand, using human assisted RL might make ChatGPT a soft bullshitter, because an effort (no matter how small) has been made to not mislead. Framing ChatGPT as a soft bullshitter could also be supported by the disclaimer on the bottom, alerting people that it might offer false information. If ChatGPT is devoid of intention, or the output isn’t presented as truth or knowledge, it could be labelled soft bullshit. However, hard vs. soft bullshit makes it seem like hard is worse; it should be made clear that soft is still bad and can be very disruptive. For example, if it is soft, we might care less about verifying it, whereas we would try to fix it from being wrong if it was hard. + +On the other hand, the use of human assisted RL may contribute to producing something closer to hard bullshit, as there is a transition from repeating probable words towards being ‘convincing’. RL is used to both make the output more truthful and also to look more ‘truthy’. If we only care about intention in distinguishing between hard and soft bullshit, doing RL changes the appropriate label as it changes the role of intention. Knowing ‘true’ vs. ‘false’ is not enough information to learn to give better answers, and designers will have to make some choices in what defines a better answer. The choices that need to be made when involving RL could be interpreted as the intention of the designers. When intention is involved, ChatGPT thus enters into the space of hard bullshit where the model is trying to convince. + +Intention may be of the system’s creators, or of the system itself. The idea of system itself having intention is supported by the importance of process. In the same way that students should learn from the process of writing, it is the practice that is important, not how polished the end result is. This means that no matter how frequently ChatGPT is accurate, it may still be a bullshit machine if it doesn’t go through the right process. For example, not telling the user how sure it is of its answer may increase the system's level of accountability. The significance of process is why the [controversial ad for Gemini got pulled]( https://www.theverge.com/2024/8/2/24212078/google-gemini-olympics-ad-backlash), in which a father used the tool to write an athlete a fan letter on behalf of his daughter. The advert arguably encourages “taking the easy way out instead of practicing self-expression”. + +We found that the paper seemed to jump around a bit regarding who was being accused of bullshitting; whether it was the users, the creators, or something else. There also seems to be a scale of intentionality in bullshitting, from students avoiding doing their work, to politicians misleading the public. Downstream, it is difficult to delineate responsibility for the use of the system between the users and the designers. For intentionality in LLMs, we wondered [how far developers have ethical agency, and where the buck stops](https://link.springer.com/article/10.1007/s43681-022-00256-3). The people who develop LLMs do not intend to mislead audiences, they intend to develop useful tools. + +However, commercialising and selling tools for specific unfit purposes could be classed as intentionally misleading. Designing bots to exhibit human-like qualities has some intention to deceive. The intent of the dataset an LLM was trained on is also important, for example, whether it is academic articles, or articles from tabloid newspapers. The information it was trained on would steer the intent; this would be the intent of the designers and sources, not the LLM. A lack of availability of underlying sources contributes to the bullshit factor. + +The existence of intention to mislead likely results from the [industry fixation on innovation. The obsession with innovation amongst other factors, has been found to be a significant barrier inhibiting ethical wisdom in the AI developer community](https://link.springer.com/article/10.1007/s43681-024-00458-x ). There is always a human at the top of the chain who is pulling the AI along; if we are looking to hold someone responsible, you just have to follow that chain. + +Even when the intention to mislead is absent, knowing how to train LLMs to portray truth is complicated by the fact that the concept of truth is a philosophical minefield. If it was easy to define truth, we wouldn’t have a justice system or journalists. There are situations where we presume the truth is knowable, and situations where we presume truth is unknowable. When an LLM hallucinates, compared to when it tells the truth, it feels like something procedurally different is happening, but really it is exactly the same process. + +The authors are correct to point out that the relationship to the truth is irrelevant to LLMs. LLMs aren’t tracking ‘truth’ as they have no commitment or connection to the whole picture. Whether LLM outputs are true or false, the intent is always the same. With the variation in temperature parameter, we know that they frequently don’t give us the best guess, trading off accuracy with authenticity. This trade-off produces speech which looks human but is indifferent to the truth. Even with zero temperature, LLMs won’t make stuff up but just go with the consensus of text they are trained on. Going with the consensus seems to align more with parroting than tracking truth. + +Considering the difficulties with assigning intentionality, accountability, and defining truth, we wondered if the distinction between hard and soft bullshit is persistently useful. It is difficult to quantify each term, and you can’t extend either definition to all applications of LLMs. Assigning labels to the system themselves is compounded by irresponsible use of LLMs. Once you’ve decided that LLMs are hard bullshitters, we wondered if they could become soft bullshitters, e.g. by learning, or labelling their own uncertainties with data. Accuracy can be a training goal. + +### What do you think of the implications of the anthropomorphising of AI tools? E.g. hallucination, learning, training, perception etc. + +To truly answer any of the questions we have discussed above, we must address anthropomorphism. As humans, we have a natural tendency to anthropomorphise; we can’t get away from our own human experience, conceptualising the world by reference to ourselves. We wondered if over-anthropomorphism is inherent across humanity, or if it is something seen especially in Anglican traditions. We see anthropomorphism in many systems other than LLMs, such as in robotics. In the Robotics Process Automation office, workers will name their software bot (e.g. Bobby Bot) and talk about the bot as if it is a colleague, for example saying “Bobby’s having a bad day today” when describing lots of exceptions. + +Anthropomorphism risks prescribing more intentionality than we mean to, similar to the effects of [pareidolia](https://en.wikipedia.org/wiki/Pareidolia), which is the tendency for perception to “impose a meaningful interpretation on nebulous stimulus”. We had some doubts about as to whether it is possible to ascribe intention to mislead to LLMs themselves, as this may be a case of anthropomorphism. + +The problem with anthropomorphism is that it does a disservice to how the system works, which affects broad fields (e.g. science) by trivialising and bypassing the underlying mechanics. As well as disguising technical aspects, anthropomorphism has social repercussions. People respond well to human projections on non-human items, and can build human-like relationships with them, finding friendship and companionship (this is explored in a [previous Data Ethics Club](https://dataethicsclub.com/write_ups/2024-02-14_writeup.html), the podcast [Black Box](https://www.theguardian.com/technology/series/blackbox), and TV show [Humans](https://www.imdb.com/title/tt4122068/)). + +To some extent, the dynamics of LLMs as bullshit machines are not so different to human relationships – we all have friends who are bullshitters and can navigate LLMs in similar ways to how we navigate these friendships. However, the similarity to human relationships poses a risk to people that are vulnerable and can induce trust where it might not be deserved. When we consider the technology in isolation, it is weird to think about whether or not we can trust it. However, when the tools are anthropomorphised, trusting them becomes a natural consequence. When that trust is proven misplaced, such as ChatGPT outputting falsehoods, trust in written language is lost, and people start to consume text differently. + +### What change would you like to see on the basis of this piece? Who has the power to make that change? + +If the technology will increase efficiency and save lives, then there are strong arguments in favour of using it. However, at the moment there are a range of responses to good and bad practice of AI in research and education. These responses need to be streamlined to orient AI development with society. We know how to deal with cats, but not lions; we should not release lions into the city without knowing who or how to control it. + +We thought that the paper is kind of missing the ‘so what?’ element at the end, without showing the dangers of the technology. Lying is already a part of our society – why is ChatGPT different? + +One reason ChatGPT is different is because of the scope of its repercussions; we have seen that the repercussions are large and affect many domains. For example, search engines appear to be devolving as the internet is gradually filled with generated content, [discussed in a previous Data Ethics Club]( https://dataethicsclub.com/write_ups/2024-02-28_writeup.html). Another destructive repercussion of LLMs is [their environmental impact](https://medium.com/darrowai/code-green-addressing-the-environmental-impact-of-language-models-0161eb790c21). The Washington Post estimates that to construct [one 100-word email, about a bottle of water, or enough electricity to power 14 LED light bulbs for an hour is used; for 1 out of 10 working Americans to construct one email once a week, the amount of water consumed by all Rhode Island households for 1.5 days or the electricity consumed by all D.C. households for 20 days is used](https://www.washingtonpost.com/technology/2024/09/18/energy-ai-use-electricity-water-data-centers/). There could be creative solutions to the environmental effects of AI, such as the [data centre in Devon which is uses the heat it generates to warm a public swimming pool](https://www.bbc.com/news/technology-64939558). + +To move forwards with technologies like ChatGPT, we thus need to be clear about how we are using them and what we are using them for. Many of us feel that LLMs are useful tools – there are cases of [LLMs doing maths to a grad student level of proficiency]( https://mathstodon.xyz/@tao/110601051375142142) - and just want to clarify when they are useful. Whatever your opinion about them, LLMs are going to be used, so we need to decide when we can rely on them and what the good use cases are. + +Correct language plays an important role in clarifying the usefulness of LLMs. Applying the label of bullshit machine to LLMs can inform how you use them, as you enter into interactions with the expectation that a lot of it might be bullshit. LLMs tell us things, and we should verify that the outputs are true, under the assumption that they are not. Only when we have verified the facts should we share them. There are tools you can buy to check literature and understand which parts of a textual artefact are true, e.g. [Wolfram Alpha](https://www.wolframalpha.com) can be used as a truth checker. + +Contemplating what might come in the future, and how to reduce harm, is informed by looking to scenarios where similar concerns have arisen before. Where we are now with ChatGPT could be paralleled with [worries that people had about Wikipedia when it started](https://en.wikipedia.org/wiki/Criticism_of_Wikipedia). The fears about Wikipedia were largely overblown, and perhaps this will also be true regarding the fears surrounding ChatGPT. In the future, ChatGPT could be used as a conversation starter, but not as the ‘main thing’. However, the key factor that differentiates the two is that Wikipedia has citations, making the sources evident. To circumvent this gap with ChatGPT, aside from including the actual citations, LLMs could include ‘confidence ratings’ on the statements they’re generating, based on the probabilities of the words being strung together. + +## Attendees + +- Huw Day, Data Scientist, Jean Golding Institute, University of Bristol, https://www.linkedin.com/in/huw-day/ +- Amy Joint, Programme Manager, ISRCTN clinical study registry +- Vanessa Hanschke, PhD Interactive AI, University of Bristol +- Zoë Turner, Senior Data Scientist, The Strategy Unit (NHS) +- Paul Matthews, Senior Lecturer, UWE Bristol, https://scholar.social/@paulusm +- Virginia Scarlett, Data and Information Specialist, HHMI Janelia :grimacing: +- Joe Slater, Philosophy, University of Glasgow Philosophy Department. +- Chris Jones, Data Scientist +- Joe Carver, Data Scientist, Brandwatch +- Dani Shanley, Philosophy, Maastricht University +- Mike Hicks, Philosophy, University of Glasgow +- [Kamilla Wells](https://www.linkedin.com/in/kamilla-wells/), Citizen Developer, Australian Public Service, Brisbane +- Euan Bennet, Lecturer, University of Glasgow +- [Robin Dasler](https://www.linkedin.com/in/robindasler), data product manager, California +- Helen Sheehan, PhD Student, University of Bristol +- Matimba Swana, PhD Student, University of Bristol +- [Dan Levy](https://www.linkedin.com/in/danrsl/), Data Analyst, BNSSG ICB (NHS, Bristol) diff --git a/site/write_ups/2024-10-09_writeup.md b/site/write_ups/2024-10-09_writeup.md new file mode 100644 index 00000000..f96a8eb7 --- /dev/null +++ b/site/write_ups/2024-10-09_writeup.md @@ -0,0 +1,87 @@ +--- +blogpost: true +date: October 09, 2024 +author: Jessica Woodgate +category: Write Up +tags: ML, medicine, AI applications +--- + +# Data Ethics Club: [Time to reality check the promises of machine learning-powered precision medicine](https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30200-4/fulltext) + +```{admonition} What's this? +This is summary of Wednesday 9th October's Data Ethics Club discussion, where we spoke and wrote about the article [Time to reality check the promises of machine learning-powered precision medicine](https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30200-4/fulltext) by Jack Wilkinson, Kellyn F Arnold, Eleanor J Murray, Maarten van Smeden, Kareem Carr, Rachel Sippy, Marc de Kamps, Andrew Beam, Stefan Konigorski, Professor Christoph Lippert, Professor Mark S Gilthorpe, and Peter W G Tennant. +The summary was written by Jessica Woodgate, who tried to synthesise everyone's contributions to this document and the discussion. "We" = "someone at Data Ethics Club". +Huw Day helped with the final edit. +``` + +## Article Summary + +Precision medicine attempts to identify healthcare pathways through the needs of individuals rather than the “average person”. Machine learning (ML) has been suggested as a technique that can be used to tailor therapies to each person as an individual, as well as automating diagnosis and prognosis. The paper questions the capabilities of ML with respect to precision medicine, asking whether it is realistic to assume that ML could achieve accurate and personalised therapy. Firstly, there is a lack of robust scientific evidence to support the superiority of ML over health professional assessment in diagnosis accuracy. Secondly, it is unlikely that ML will identify the best treatment for individuals, as causal inference is difficult to achieve without making assumptions based on scientific insight. Most health states are so complex that the chance of something happening is extremely difficult to predict at an individual level. The complexity of health conditions is not resolvable by collecting more data or building more elaborate models, as there are fundamental limitations in our understanding of physical and biological processes. + +The paper suggests that it may be more pragmatic to aim towards stratified medicine, identifying and predicting subgroups, rather than personalised medicine. However, due to the challenge of differentiating true signal from noise and the different between deducing association vs causation, this route is more complex than simply applying ML to data. ML does have potential to advance scientific knowledge, but it is important not to give overinflated promises. If rhetoric surpasses actual capabilities, public trust in ML could be irreparably damaged when ML does not meet those expectations. + +## Discussion Summary + +### How should we more meaningfully assess applications of ML to medicine (and other fields)? + +To meaningfully assess applications of ML, it is important to fight against the hype surrounding AI. A part of countering hype involves being intentional with how we talk about ML and how we foster critical thinking. As a society, we aren’t effectively teaching how to analyse AI. Healthy scepticism should be baked into education to prevent people from just accepting the information that they are presented with. When discussing AI tools, as well as talking about the potential of what they can do, we should carefully consider their mechanics and where the “learning” comes from. We wondered about the kinds of skills necessary to critique AI tools, and whether current medical professionals are equipped with those skills. + +How we assess ML changes if we consider it as a tool versus as a replacement for a doctor. The paper seems to be warning against implementing ML to replace medical professionals. We also felt it important to consider ML technologies as tools to enhance practices, rather than replace what is already there. There is a concern that using ML for precision medicine could be an attempt to take clinicians out of the loop, as the people who build and purchase ML are not necessarily those who will be in clinical practice. Precision medicine tools could end up similar to digital pregnancy tests, which are now much more common than non-digital tests that require a medical professional. It is common sense to compare the outcomes of a model against the advice of medical professionals, looking for meaningful outcomes and using expert opinion as context. An actual clinician can give a contextual perspective of what the most likely outcomes are. The ability of ML to unpick complex data without worrying about the clinical context might be a strength rather than a weakness if we view ML as an assistive tool rather than replacement for a doctor. + +Using ML as tools to assist clinicians, rather than replace clinicians, can free up clinician’s time to do other things. It seems more appropriate to apply ML to admin purposes than to medical decisions. Having ML cover mundane admin tasks could add a lot of value, as admin can be much more of a burden to medical practitioners than the medicine itself. The NHS trialled a tool that would summarise GP consultations and provide suggestions; [one GP said that “this was the first day of my job that I ever left on time”](https://timharford.com/2024/07/8801/). Stratifying cases could be a valid application, helping to minimise the risk of catastrophic mistakes. A stratified approach could help to raise the most at-risk patients to the top of the list so that they are seen sooner. Partitioning images to find the ‘interesting’ areas could be a worthwhile application. There are also many complex conditions which are hard to categorically diagnose, which clinicians address by asking more and more questions and slowly ruling things out. ML could help clinicians rule out possibilities sooner. + +ML should be assistive rather than as replacement because ML isn’t actually very well designed for a lot of problems. There is a narrative around ML as a magical fix-all solution, but its capabilities are often overhyped – “when you have a hammer, every problem looks like a nail”. In an industry that prioritises innovation, there is a lot of [intentional ignorance and bullshitting](https://dataethicsclub.com/write_ups/2024-09-25_writeup.html). Some of the discourse around ML hype comes also from politicians, looking to resolve workforce issues for example by replacing radiologists. Radiology is a good example of an area of medicine which has a lot of ML attention, but there are less urgent settings outside of medicine (e.g. astrophysics) which are less likely to get the same amount of funding. + +Hyping up ML tools is exacerbated by technology companies who promise to solve problems that they don’t really understand. [Studies have found that digital health companies have a lack of understanding of clinical robustness](https://www.jmir.org/2022/6/e37677/). A lot of the time, it seems that the intention behind development is to find out what can be done, rather than what should be done. The technology industry has a tendency to develop tools for the sake of it, before figuring out if the use case is valid. Launching projects before properly assessing their validity is amplified by a lack of user-centric design. To evaluate whether we should be using ML in healthcare we should be asking what the biggest barriers to achieving positive patient outcomes are, including whether algorithms are on this list, or whether outcomes are affected more by the workload of clinical staff and cost/access to healthcare. + +Unsuitability of ML for certain tasks is highlighted by the opaqueness of its decision-making and how easily this procedure is biased. As a human, it is challenging to understand if the model has come to a decision for what we would consider to be valid reasons, or because (for example) the model is good at identifying dark spots on the left hand side of an image. ML is highly affected by the quality of training data it receives; [if white skin is overrepresented then models will not work as well for black skin](https://www.unmasking.ai/). Whilst startups may go through all the appropriate regulation and testing procedures, they could still be using algorithms that are ten years out of date because of cost and convenience, rather than newer methods which might be more suitable. + +In addition to issues with transparency and bias, difficulties with data collection make the usual ML methodology inappropriate in the context of medicine. ML generally works best by getting all the data you can and letting the algorithm decide what’s relevant. Many medical ML tools use big data and deep learning to discover when conditions are interconnected through common pathways (e.g. [omics and big data](https://pmc.ncbi.nlm.nih.gov/articles/PMC6325641/)). ML with massive datasets has its uses, but relying on manually collecting data is unlikely to be useful. In many cases, gathering lots of data simply isn’t feasible, such as with patients for whom intrusion and unnecessary procedures are harmful. Collecting data about health conditions may also contribute to [class imbalance problems](https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/), where there are disproportionately more data available for some classes than others. We are only putting things into the model which we have already thought to measure; there may be many more factors that we aren’t aware of which are influential over the state of a health condition. We did not think it possible to predict individual outcomes from population data, as inferring individual outcomes is intensely complex. For some conditions, the state of a disease is on a continuum of risk which is not fully understood, making it difficult to use those conditions as training sets. When working on an individualised basis, it is difficult to identify what is actually working versus what is a placebo effect. The bodies and health of individuals are constantly changing; even if it is possible to decipher one point in time for one individual, it will quickly change. + +Whilst there are challenges with collecting data in sensitive applications such as medicine, extra data would undoubtably improve ML models - at least for stratified groups. Beneficial data would include previous medical history and longitudinal studies collecting samples from patients, regardless of disease condition. One could also examine whether there are genetic markers for neurodivergence and look at ethnic populations where certain conditions are more prevalent. + +However, rather than just trying bigger and better models, it is important to weigh up the financial benefit of increasingly complex versus more simple models. We would like to see more evaluations of the potential gains of adopting more complex models such as cost/benefit analyses. Having an unexplainably complex model is unreasonable if the people who actually use it won’t be able to understand it. If users have a lack of or incorrect understanding, there is a higher chance that they will use it badly. Making deterministic inferences which humans are able to interpret is more valuable than a ‘magic box’ producing an output that the user has to blindly trust. + +### Why do you think most of the reviewed ML methods are producing classifications (i.e. diagnosed or not diagnosed) instead of predicting a continuum of risk? + +Producing classifications instead of prediction continuums may provide benefits by reducing the number of possible outcomes. Simplifying outcomes through classification can reduce variability of model performance, as there are less possible outcomes to predict. Using classifications instead of continuums also has benefits for human-readability. Even if an outcome does naturally exist on a continuum, humans would often prefer to conceptualise it as a classification in order to make the outcome easier to understand and explain. + +One of the reasons it is tricky to understand ML models is because of the difficulties with communicating uncertainty. It is important to accurately portray risk and uncertainty, but there is a gap between the statistical literacy of the general public and current ways of explaining uncertainty. In terms of public education, there are lots of misconceptions about how to use scatter plots to show relationships and [confidence intervals](https://www.statology.org/understanding-confidence-intervals-what-they-are-and-how-to-use-them/). Regarding the models themselves, often classifiers do not tell you the certainty with which they produce classifications. ML will typically produce a ‘final’ dataset or answer given the patterns that the model has detected. Clinicians, on the other hand, naturally integrate uncertainty into their processes, as they are trained to become more certain in their understandings by asking the right kinds of questions. + +The way uncertainty is presented can significantly affect the implications of the output. For example, stating that “this person is 60% ready for discharge” is quite different to “this person is ready for discharge”. In making ML models more interpretable, it is important to carefully consider the level of detail necessary to communicate the right meaning. + +Tolerating uncertainty is especially difficult for patients; an incentive for implementing AI in medicine is the hope that AI can increase the certainty of doctors. However, the promise of AI which can provide ‘objective’ answers with certainty is misleading as it will be influenced by the biases of those who developed the tool. [Automating Inequality by Virginia Eubanks](https://virginia-eubanks.com/automating-inequality/) explores how values of the people who develop ML tools are baked into those tools. Historically there has not been a lot of diversity in developers of ML tools, limiting the perspective of values integrated into tools with adverse consequences for those affected by the tools’ use. + +### Do you think precision medicine itself is an epistemological dead end? What about stratified medicine? (identifying and predicting subgroups with a better and worse response) + +To help unpick the definitions of precision and personalised medicine, we thought that the definition of personalised medicine could have been drawn out more clearly in the paper. Regarding the distinction between personalised and stratified, we wondered if stratification was a ‘type’ of personalisation, and how valuable the distinction is to understand which type of personalisation is most appropriate for particular applications. We wondered if it would be possible to have precision medicine tailored to a specific genome to target a tumour or ‘reprogram’ the immune system. + +Precision medicine seems to be surrounded by a lot of misconceptions, especially regarding how to make inferences from data. It is not possible to make inferences based on data that we don’t already have. We need to be honest about what we don’t know, which in terms of causes for individual’s medical conditions, is a lot. Diversity in data is important to make classifications; if you have 100 patients and only 1of them has a disease, it is unlikely that the model will pick the disease up. Difficulties with missing or sparse data make it hard to think about how it is possible to use ML to personalise medicine in a way that will truly add value. + +Attempting to make inferences from non-existent data is another example of an ‘AI’ problem which is actually a data problem. Quite often, companies will think that they need AI when they actually have a data problem that can’t be solved with AI. + +In addition to issues with missing data, precision/personalised medicine has a causal inference problem, as discussed in the paper. Those of us who have worked with prediction models have encountered the causal inference problem, frequently asking “it’s interesting, but what’s the utility?”. Outcomes, especially in healthcare, are very complex and it is extremely challenging to know which one of many potential mechanisms was the most effective. + +With stratified medicine, it is possible to perform statistical analysis on groups of people, predicting likelihood based on the average of a sample. With precision medicine, people seem to be treated as a one-off certainty. Treating people as a one-off certainty doesn’t make sense statistically; it thus doesn’t seem reasonable to expect an ML model to make accurate one-off predictions. + +To illustrate why it seems unreasonable to predict individualised outcomes, we discussed some of our research into predicting horse fatalities. In the USA and Canada, the incidence of fatality among racehorses is about 1.3 per 1000 race starts. A model predicting which horses will die will never beat the ideal “horses are immortal”, which is correct over 99.8% of the time. As what we are looking for (a horse death) is a sparse data point, which ML finds difficult to accurately predict, it is not a productive use of time to build a model which tries to predict individual fatalities. Instead, we can focus on using explainable models to understand the risk factors for horse fatality. We can then use the information we know about risk factors to identify the highest risk horses based on their individual histories. In this way, we can explain to the industry what interventions they could put in place to reduce the risk of horse fatality. The result of our ongoing work is that the incidence of horse fatalities in the USA and Canada have decreased by 38% since 2009. We asked: if we’d spent our resources on a fancy, computationally expensive, unexplainable ML model would it have the same impact? In our informed reckon, absolutely no chance. + +### What change would you like to see on the basis of this piece? Who has the power to make that change? + +Appropriate benchmarks will need to be decided on to measure the success of precision medicine applications, for instance, if we are comparing outputs with how often they match the answers clinicians give, or how successful the ML outputs are in treating patients compared to treatments from clinicians. It is important to look at outcomes and processing speeds; for example, if ML can catch conditions early it may improve survival chances or more appropriate treatment options. We were uncertain as to how it would be possible to test the effectiveness of ML, and whether it would be ethical to run a clinical trial comparing medical professionals against precision medicine tools. + +In addition to assessing the accuracy of tools for precision medicine, it is important to think about the effects of predictions regarding privacy and security. Unintended effects can be especially impactful in sensitive applications like healthcare. For example, disclosing private details has significant repercussions as in the example of [Target predicting that a teenager was pregnant before she had told her family](https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/), discussed in both [Hello World](https://hannahfry.co.uk/book/hello-world/) by Hannah Fry and [Data Feminism](https://data-feminism.mitpress.mit.edu/) by Catherine D’Ignazio and Lauren F. Klein. We also had concerns regarding how precision medicine tools would store personal data; companies such as [23AndMe are potentially at risk of selling customers’ data](https://theweek.com/tech/23andme-dna-sale) as the business is undergoing significant leadership changes. + +In the world of equitable health, we wondered what the cost implications of fields like precision medicine are. If precision medicine were to become an effective pathway, the amount of resources it requires (e.g. processing of data, generation of individual treatments) suggests that precision medicine could be out of reach to the masses, furthering the gap between the ultra-rich and everyone else. We see similar problems in domains like [preimplantation genetic testing](https://www.acog.org/clinical/clinical-guidance/committee-opinion/articles/2020/03/preimplantation-genetic-testing) which evaluates embryos for single gene disorders before transferring them to the uterus. + +Instead of wasting resources on fancy models which are inappropriate to be deployed in reality, we should spend those resources supporting doctors better which will in turn support patient outcomes. Reducing workloads of clinical staff would also give them more time to properly engage with research, cultivating a more appropriate array of intersectional perspectives. + +Integrating ML tools into human professions may have adverse consequences for intellectual capabilities. In the [Cautionary Tales podcast](https://timharford.com/2024/07/8801/), Tim Harford discusses how relying on AI to support us with tasks that require skills can deskill us, as we miss the opportunity to build experience in those skills leaving us less capable when the AI fails. For example, using co-pilot and ChatGPT in coding classes stops people from getting help from other humans and learning the processes themselves. It is crucial that we use AI for tasks which won’t deskill our working population. To think about how AI can help us now, we also need to think about where the gap in human knowledge will be in ten years’ time. + +## Attendees + +- Huw Day, Data Scientist, Jean Golding Institute, [LinkedIn](https://www.linkedin.com/in/huw-day/) +- Amy Joint, Programme Manager, ISRCTN Clinical Study Registry, [LinkedIn](https://www.linkedin.com/in/amyjoint/), [Twitter] (https://twitter.com/AmyJointSci) +- Dan Levy, Data Analyst, BNSSG ICB (NHS, Bristol), [LinkedIn](https://www.linkedin.com/in/danrsl/) +- Euan Bennet, Lecturer, University of Glasgow +- Virginia Scarlett, Data and Information Specialist, HHMI Janelia Research Campus diff --git a/site/write_ups/2024-10-28_writeup.md b/site/write_ups/2024-10-28_writeup.md new file mode 100644 index 00000000..a0e6a323 --- /dev/null +++ b/site/write_ups/2024-10-28_writeup.md @@ -0,0 +1,94 @@ +--- +blogpost: true +date: October 29, 2024 +author: Jessica Woodgate +category: Write Up +tags: Transparency, trust, communication +--- + +# Data Ethics Club: [Transparent communication of evidence does not undermine public trust in evidence](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802351/) + +```{admonition} What's this? +This is summary of Wednesday 23th October's Data Ethics Club discussion, where we spoke and wrote about the article [Transparent communication of evidence does not undermine public trust in evidence](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802351/) by John R. Kerr, Claudia R. Schneider, Alexandra L. J. Freeman, Theresa Marteau, and Sander van der Linden. +The summary was written by Jessica Woodgate, who tried to synthesise everyone's contributions to this document and the discussion. "We" = "someone at Data Ethics Club". +Huw Day helped with the final edit. +``` + +## Article Summary + +The paper examines whether public trust in scientific information is undermined by communicating risks, benefits, and uncertainties according to a guideline set of rules. In conducted experiments, participants read either a persuasive method, or a balanced and informed method adhering to evidence communication recommendations. The studies found that balanced messages are perceived as just as trustworthy or more trustworthy than persuasive messages, noting that prior beliefs moderated how the balanced messages were perceived. In one of the studies, participants who had read the persuasive message voiced significantly stronger support for the issue, despite rating the message as less trustworthy. + +## Discussion Summary + +### Were you surprised by the findings of the study? Do they mirror experiences in your own domain? + +We found the paper’s discussion around the findings thought provoking and interesting to read, although were not surprised by the results. Findings were a bit self-evident and the covid results were especially intuitive; we felt the balanced approach to be more trustworthy in both contexts. We questioned whether results were that strong from a psychological standpoint. + +Whilst we agree that there are benefits to having balanced communication, we wondered if the paper diminishes the case for persuasive communication. Sometimes, it is important to tailor communication to the audience. It can be difficult to maintain neutrality, as engaging people in statistics and numbers without any emotion or story is challenging. From our experience with upskilling people in data and communicating directly with industry partners, we found that science in itself should but may not be convincing enough. + +Consistently providing balanced information is not always a viable goal as people will interpret messages in different ways. People have a tendency to adopt insights in favour of various conclusions and widen confidence intervals in their minds. Data needs to be interpreted to be understood, and this interpretation is coloured by the experiences of the person analysing the data. When working with children’s data, for example, there are recommendations at the ends of reports which the reader will interpret; points like ‘neglect’ may be experienced in different ways. The way data will be interpreted is also influenced by the way the data is presented. The best propaganda is truth, but it is easy to present *enough* and not *all* information. Being selective with which parts of the truth are portrayed can foster false narratives, highlighting the impact information control. + +Sometimes, it is logistically challenging to convey balanced information. For example, in vaccination centres not all volunteers are able to pass on information; there is more focus on collecting consent for vaccination than giving statistics. We wondered where to locate expertise if people have more questions. People are not always capable of being informed. For example, when giving epidurals to people in labour the urgency of the situation overrides attempts to fully discuss risks. + +Identifying where exactly the balance lies is not easy and sometimes attempting to provide balance can cause more harm than good. For issues which are much more weighted, including both sides might do more harm than good. For example, presenting a climate researcher and a fossil fuels business owner as 50/50 in climate change issues can detract from accurately presenting facts as one side of the argument is much more incentivised than the other. + +However, making some attempt at providing a balanced approach is important as sometimes only attempting to persuade and omitting balanced information can backfire. Even if the information provided in the persuasive case is generally true, disregarding generally irrelevant but more complete information can foster misinformation and conspiracy theories. We have found that people in industry can have a different idea of what constitutes sufficient evidence, and we have had to explain to them the difference between scientific and anecdotal evidence. We would like to upskill everyone in data skills so that they get a better understanding of how to interpret data, but some of us thought this might not work. In medicine, for example, it might be difficult to put interpretive skills into the hands of patients without guiding their hands. + +Discussing how to communicate data interpretation and teach interpretive skills led us to ask whether the roles of medical professionals are to persuade or inform. The informed consent model is very important in the context of medicine. Consent and informing are closely monitored in clinical trials; providing information to those who need or want it is a necessity. Psychologists among us found it surprising that midwives are informing over persuading. Systems tend to focus on educating a public who doesn’t “know enough”; we have found the Spanish healthcare system to be very paternalistic with restricted access to information for patients. Paternalistic attitudes could be helpful or harmful to the patients; there is a spectrum from informing to persuading to coercing. Examples of coercing are policies that stop welfare payments for people that aren’t vaccinated. + +### What do you think is behind the difference between the results of the nuclear power study and the vaccine study? + +The paper found that there were different trends of opinions in the vaccination and nuclear scenarios; differences between the studies may occur because the mechanism for participation in each situation is quite different. Whilst there is active and individual participation in the vaccination scenario, there is passive and public participation in the nuclear scenario. It is much easier for people to take personal action in the case of vaccines than nuclear power. The proximity to each scenario is different; whilst it is fairly common for people to interact with vaccines or be jabbed involuntarily as a child, directly confronting the concept of nuclear power (e.g. by visiting Hiroshima) is much rarer. Whilst the existential risk is on a personal level for vaccination, it is more community oriented for nuclear. Saying this, vaccines do also influence herd immunity and the choice to vaccinate can be made for community benefit (co-incidental with individual benefit). + +The type and level of life experience that people have also affects the way they engage with issues. This study was conducted in the UK, but we would be curious as to the attitudes of different countries towards evaluating the news and informing themselves. We tend to do studies in our home countries and conclude that they are universally applicable. However, there are cultural differences; [France, Switzerland, and Germany are not necessarily taught that the news is neutral](https://reutersinstitute.politics.ox.ac.uk/our-research/bias-bullshit-and-lies-audience-perspectives-low-trust-media) and are largely aware that there is implicit bias. In Russia, on the other hand, [the news is tightly controlled and questioning official messaging is discouraged](https://www.cnn.com/2023/02/27/europe/russia-propaganda-information-ukraine-anniversary-cmd-intl/index.html). + +Timing of the study may have an influence over results, as experiments were conducted post-covid in 2021. The outcome of the study might have been different if it was done pre-covid, as during covid a lot of people were living under very specific restrictions. Now, and before covid, nobody was prevented from entering bars and restaurants because they didn’t have a flu jab. Anti-vaccine sentiment was likely amplified for many during covid, but pro-vaccine sentiment could also have been increased among those who wanted more freedom. In the wake of fake news and misinformation exploding into the public domain, people might be less likely to trust arguments they encounter. + +Results for each study may be suffering from selection bias as the study was conducted online which restricts the data set. In the vaccine study, there was a sampling problem as there weren’t a lot of people with strong anti-vaccination prior beliefs and the population was fairly homogeneous in this regard. Statistics are thus missing due to low numbers of people with particular life experiences and beliefs. Having a homogeneous population could explain why it is difficult to see a moderating effect on the trustworthiness of information. + +People have a variety of prior opinions which influence how they interact with different issues and how likely they are to trust certain arguments. For example, it seems likely that people are more likely to have a pre-formed opinion on vaccines than nuclear science. Having pre-formed opinions can affect how people perceive an argument. If a ‘balanced’ view includes both sides of an argument, people may see their side of the argument presented and think “oh good, you agreed with me”, not considering that the other view is also presented. + +Different issues also incite different emotional responses and social pressure, as nothing is ever fully neutral. Sometimes, people manipulate emotions to scare people into taking action by using specific examples like “I had a friend who took this vaccine and died”. Social pressure to take a particular view changes for different issues; it’s more acceptable to be sceptical of nuclear power than vaccines, possibly because there is more evidence of the drawbacks of nuclear power. Getting vaccinated generally is viewed as quite a pro-social thing to do; it would be interesting to conduct the study with another more personal issue with less community focus (e.g. diet or fitness). We also wondered to what extent the results were specific to the covid vaccine versus other vaccines. Covid particularly is surrounded by somewhat novel interventions such as lockdowns which could influence how people respond. + +People will react differently according to the person that is trying to convince them. There are conflicting results regarding the effects that bots have on convincing people about conspiracy theories. Some studies have found that [bots can get conspiracy theorists to change their mind](https://www.nature.com/articles/d41586-024-02966-6 ); other research finds that [facts are ineffective in changing conspiracy theorists’ minds](https://www.turing.ac.uk/blog/facts-dont-change-minds-and-theres-data-prove-it). We wondered at what point does a conspiracy theory cross the line to become enlightened. + +How people interpret messages can be affected by the type of education they have. Data literacy is key for people to understand what the data is telling us, and stories are often required to help us understand data. In many schools and even universities, critical thinking isn’t taught very well, which may influence how people make decisions regarding trust. The study doesn’t match groups for reading age or education level, so it is unclear how education may have affected the results. + +Considering that people interpret information in different ways, we wondered if “nudging” is a good idea. Nudging revolves around subtly trying to get people to behave in certain ways through low information provision. As this paper demonstrates, persuasion techiques may not be a one size fits all and can backfire when people need more information than they are provided with. When receiving a nudge, tools should be able to provide extra information when the receiver requests more information, but this does not always happen. + +### Should research articles have a place in persuading the public, or should their intention always be to focus on robust, trustworthy information? + +Attempting to persuade is not always sufficient as persuasive techniques might miss important information, such as recognising the doubts that people have or acknowledging the negatives. Most of the time people know when you’re trying to manipulate them. Information should always be available, and being open about both sides can strengthen arguments by highlighting things which might not have otherwise been talked about. In saying this, we should be practical with respect to the fact that we live in a society and have to be strategic about where information is kept and who provides it. + +On the other hand, sometimes being persuasive is needed to make change happen and for things to improve we don’t always want to be neutral. If you don’t make the case for your viewpoint, sometimes it can be ignored. Sometimes the facts are available, but they aren’t very impactful unless there is a story that prompts enough public shock for change to happen. Except for political opinions which have been built over decades, a lot of opinions can be easily swayed. Although, even political opinions can be changed; historically a lot of sexual misconduct was seen as the woman’s fault but today this is changing. + +Getting the right balance between explaining things in enough detail to communicate robust information and not overwhelming people is tricky. People often say they want nuance, but a lot of the time nuance isn’t effective and explaining things clearly is more likely to garner support. When politicians explain complex and nuanced things they are criticised from both sides but receive a more welcoming response when they use more lucid language. On a personal level, some people are more statistically driven than others. Researchers can be very involved in their specialities and all the background which underpins it, but it is important to remember that the general public don’t need to know every detail. + +When trying to communicate something scientific to the public, it is important to think about how it will be heard by people with less scientific understanding. From our experience in the veterinary field, we have found that people are quite good at removing jargon but can forget that even some non-jargon or non-scientific words can be difficult for a wider audience to interpret. Communicating maths is also challenging, and a lot of people are thrown off when academic terms are used. It is important to be able to explain things better and in less abstract terms. To be appropriate for a wider audience, it is beneficial to aim for a young reading age of 5-9; [in 2012, 1 in 6 adults in England had a literacy level equivalent to the ages of 5-7](https://literacytrust.org.uk/parents-and-families/adult-literacy/what-do-adult-literacy-levels-mean/). Using a young reading age as a starting point and then gradually building up the complexity helps to create a more accessible narrative. Some of us have a series of graphs we use to communicate good or bad science. In an example of [a graph we use to explain “to a 6 year old”]( https://www.canva.com/design/DAGUZEy3yCw/4ENPFX7yqz9u5CVoha4jvg/edit), we essentially just make it obvious that the further to the right a point is, the worse it is. People seem to intuitively understand this even though there are other complicated things happening, including confidence intervals. + +Tailoring research articles so that they can be understood by the public may not be necessary as many people won’t read the articles directly. Most people will get their information through people in the middle that interpret research and influence public views. We’ve seen academics on Twitter using strong language and interpreting the same data completely differently. More and more journals are also doing plain language summaries. [Good summaries don’t influence the number of people who read the full article, but they do influence the number of people who talk about the article in blogs, on social media, and in the news](https://resource-cms.springernature.com/springer-cms/rest/v1/content/25366086/data/v2). Whether the article itself should be easy to read may depend on the topic; 9 year olds don’t need to understand high energy physics, but making sure public health research is interpretable could be beneficial. + +It is difficult to say how important it is that researchers should intend to make articles trustworthy, as the extent to which we can conflate trust with choosing an appropriate action is challenging to discern. We were uncertain about the extent to which trust influences people’s decisions, and wondered about the significance of people trusting scientific research and not taking a recommended action. There seems to be low trust in institutions at the moment, which could explain why people aren’t taking recommended actions like getting vaccinated. + +### What change would you like to see on the basis of this piece? Who has the power to make that change? + +We see relevance of the paper to our own work; the goal of the paper was to help people who need to communicate information to do so in a trustworthy fashion, and communication is an important part of many of our roles. The studies in the paper were the culmination of a lot of work to identify guidelines for trustworthy informative (not persuasive) communication. We liked how there was a detailed reflection on how the authors would have done the studies differently if they did it again. The guidelines are pertinent for some of our current work with a team who want to push an intervention based on a result that may not be sound due to missing data. The guidelines are also relevant to regulatory compliance, where people often have to make accountable decisions risking audit. Being precise with making accountable decisions may be more relevant in the public sector than the private sector, as the public sector may involve more compliance informed and recorded decision making. + +The paper recommends pre-empting misinterpretations, but we wondered how achievable this is in reality. Often attempts to pre-empt misinterpretations may not pierce people’s wilful ignorance as many are much more comfortable resting in the illusion that everything is fine. However, if people are emphatic that there’s no issue whatsoever when there are issues, we can run into problems. It is key to find the balance between pre-empting misunderstandings, whilst being open about risk associated with all scientific interventions. As a community, we need to get better at communicating the contingency of our scientific findings. + +## Attendees + +- Huw Day, Data Scientist, Jean Golding Institute, https://www.linkedin.com/in/huw-day/ +- Amy Joint, Programme Manager, ISRCTN Clinical Study Registry, BioMed Central https://www.linkedin.com/in/amyjoint/ +- Vanessa Hanschke, PhD Student, University of Bristol +-Noshin Mohamed, Service Manager for QA in children's and young people's service +- Euan Bennet, Lecturer, University of Glasgow +- Alex Freeman, ex-Winton Centre, Cambridge University (co-author on paper) +- Paul Smith, Statistician, NHS Blood and Transplant (Bristol) +- Sarah Jones, vet working in clinical pathology lab at Glasgow uni +- [Kamilla Wells](https://www.linkedin.com/in/kamilla-wells/), Citizen Developer, Australian Public Service, Brisbane +- [Zoë Turner](https://github.com/Lextuga007) Senior Data Scientist, NHS +- Jessica Bowden, Neuroscience Research Associate, University of Bristol +- [Robin Dasler](https://www.linkedin.com/in/robindasler/), product manager for academic data curation software, California +- Dan Levy, Data Analyst, BNSSG ICB (NHS, Bristol), https://www.linkedin.com/in/danrsl/ +- Veronica Blanco Gutierrez, Midwife and PhD candidate in Digital Health, University of Bristol https://www.linkedin.com/in/veronica-blanco-gutierrez/ diff --git a/site/write_ups/2024-11-06_writeup.md b/site/write_ups/2024-11-06_writeup.md new file mode 100644 index 00000000..beb390a7 --- /dev/null +++ b/site/write_ups/2024-11-06_writeup.md @@ -0,0 +1,59 @@ +--- +blogpost: true +date: November 14, 2024 +author: Jessica Woodgate +category: Write Up +tags: social +--- + +# Data Ethics Club Social Special: [Data Ethics Club: Creating a collaborative space to discuss data ethics](https://www.cell.com/patterns/fulltext/S2666-3899(22)00134-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389922001349%3Fshowall%3Dtrue) + +```{admonition} What's this? +This is summary of Wednesday 6th November’s Data Ethics Club discussion, where we spoke and wrote about Data Ethics Club! For reading, you could check out our paper [Data Ethics Club: Creating a collaborative space to discuss data ethics](https://www.cell.com/patterns/fulltext/S2666-3899(22)00134-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389922001349%3Fshowall%3Dtrue) by Nina H. Di Cara, Natalie Zelenka, Huw Day, Euan D.S. Bennet, Vanessa Hanschke, Valerio Maggio, Ola Michalec, Charles Radclyffe, Roman Shkunov, Emma Tonkin, Zoë Turner, and Kamilla Wells. +The summary was written by Jessica Woodgate, who tried to synthesise everyone's contributions to this document and the discussion. "We" = "someone at Data Ethics Club". +Huw Day, Amy Joint, Vanessa Hanschke, Nina Di Cara and Natalie Thurlby helped with the final edit. +``` + +## Discussion Summary + +For this instalment of Data Ethics Club, we had a bit more relaxed, getting-to-know-each-other type session. The very optional reading is the DEC paper [Data Ethics Club: Creating a collaborative space to discuss data ethics](https://www.cell.com/patterns/fulltext/S2666-3899(22)00134-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2666389922001349%3Fshowall%3Dtrue) where we present the ideas behind and organisation of Data Ethics Club. + +### What’s something you all have in common? + +Despite our different backgrounds, we have similar approaches to how we use and think about data. A lot of us are problem solvers and communicators, and we are all passionate about data ethics. Some of us also share experience in teaching or research. The combination of problem solving skills and interest in ethics fosters the desire to question the world around us. After encountering a few questionable scenarios with data, we began to examine the ethical question. We use Data Ethics Club (DEC) as an avenue to discuss why it is that we use data in the ways that we do. + +Participating in discussions at DEC has helped us to learn things about data ethics that we can take into other aspects of our lives. In our own work there are lots of questions that arise regarding data handling, privacy, anonymity, etc. which relate to data ethics. We have encountered colleagues who are interested in data ethics and using data correctly but find it hard to provide straightforward solutions or answers. Previous to coming along to DEC, we were curious but apprehensive to form opinions. We come to DEC because we are interested in hearing other peoples’ views about data, learning more about AI and ethics, and gaining more understanding on how we approach the ethical side of AI. Identifying with different experiences at DEC exposes us to a range of perspectives and new insights. + +Encountering different perspectives at DEC is helpful when we have to convince or inform others regarding data ethics and safety - something that many of us have experienced. Sometimes we have had to warn people to not just send data insecurely via email. We’ve found it difficult to significantly engage with getting back to people regarding secondary data usage. We use our skills to help companies be more inclusive by thinking about data collection and ethics, trying to ensure that fairness is applied in building risk profiles or making predictions. To implement data ethics, [Data Hazards](https://datahazards.com/) is a useful project to help structure data ethics assessments and [Diversily](https://www.diversily.com/) work towards empowering businesses to be more inclusive. Diversily have [a playbook](https://diversily.thinkific.com/courses/the-inclusive-innovation-playbook) to help product designers and innovators contribute to a more equitable world. + +Research is something that we strongly feel should be accessible by anyone, anywhere. However, when we've worked with second hand data we’ve found that processes to access data are complicated and ethical assessment is unevenly applied. Paradigms should be adjusted to put end users and data owners at the heart of data science. We are passionate about implementing participatory research; people need to be part of research to understand how their data will be used and the consequences of providing their data. On an institutional level, involving research participants and safeguarding secondary uses of their data is challenging when there is no ongoing contact with people from whom data is sourced. We would like participants to be involved in future decisions involving their data. + +Good data management and secondary data usage is undeniably important, yet we have found there to be a lot of competing standards or uneven approaches when it comes to data handling. One relevant application for data management is medicine. Some of us have experience working in medical research or clinical trials and were shocked by the lack of respect for peoples’ data, such as there being no protection over peoples’ signatures or email collections. We also have experience working with genetic data, from which lots of data handling questions arise including who has access to the data and who safeguards the data. + +Through our interests in data ethics and systemic issues we have been incentivised to diversify our skills base such as improving our maths. Some of us are polymaths and can cross between different fields with ease. Many of us have had changed career directions, which we have found to be highly rewarding. We encourage people to scratch the itch and try a variety of things. + +### What’s something that makes you all different? + +Many different paths have taken us to DEC, coming from an array of backgrounds including maths, zoology, philosophy, astrophysics, English, teaching, product management, and the civil service. Diverse backgrounds are a feature, not a bug of DEC. It’s not possible for someone to be an expert in absolutely everything, so interdisciplinary group effort is needed to tackle difficult problems. + +Diversity fuels good discussions. Often people come to DEC thinking they have no relevant experience and intending to stay on mute, then later realising that they do have things to add. We would like anybody to feel welcome at DEC - "if websites came with a counter on how many times the organisers have broken the website, DEC would probably be a lot less intimidating". Our varying use cases and applications mean that we come to data ethics with different angles. Different angles enrichen discussions to gently nudge people into forming opinions where they might not have had opinions before, or question previously held opinions. + +### As a group, if you hypothetically had unlimited time and unlimited money, what would you do to try and put your Data Ethics knowledge into use for societal good? + +We see [data ethics as leading to data justice](https://dataethicsclub.com/write_ups/2024-08-31_writeup.html#imagine-what-would-it-look-like-if-we-were-data-justice-club-rather-than-data-ethics-club-see-table-2-1), from which we can take actionable insights to initiate change. Ethics is not just something that is done in a dusty old room; it can be done anywhere. We would like to widen involvement in ethics by pursuing outreach and research on how to help people manage or safeguard their data. + +Ethical approval of better, clearer, and diverse data should be the base level from which AI development begins. Streamlining ethical approval processes would improve AI development from a data ethics standpoint. Working on how we talk about AI is also important. [We and AI](https://weandai.org/) work towards enabling critical thinking around AI, and [better images of AI](https://betterimagesofai.org/) is campaigning to counter harmful stereotypes surrounding AI. The video [I, HATE, I, ROBOT](https://youtu.be/zYnQGWjsGXQ?si=IANO2Vh4Fs7Mpewg) led us to talk about creating a film called “I, data” with someone unknown in acting, an evolution from “I, robot”. + +However, trying to do not for profit data science is difficult; time and money are crucial to pursue projects to fruition. + +## Attendees +- Huw Day, Data Scientist, Jean Golding Institute, University of Brhttps://www.linkedin.com/in/huw-day/ +- Amy Joint, Programme Manager, ISRCTN (UK's Clinical Study Registry) +- Zoë Turner, Senior Data Scientist, NHS Midlands and Lancashire CSU (Strategy Unit) and NHS-R Community +- Euan Bennet, Lecturer, University of Glasgow +- [Ismael Kherroubi Garcia](https://www.linkedin.com/in/ismaelkherroubi/), AI Ethics Consultant, Kairoi +- [Kamilla Wells](https://www.linkedin.com/in/kamilla-wells/), Citizen Developer, Australian Public Service, Brisbane +- Chris Jones, Data Scientist +- Zosia Beckles, Research Information Analyst, University of Bristol +- Bing Wang, EPR Senior Configuration Designer, Great Ormond Street Hospital for Children NHS Foundation Trust, London +- Christina Palantza, PhD student at the University of Bristol