Linguistics 575: Societal Impacts of NLP

Autumn Quarter, 2021

Course Info

Lecture: Wednesdays, 3:30-5:50 in SAV 130 and online (Zoom link in Canvas)
Course Canvas (discussion board, assignment submission, grades)

Instructor Info

Emily M. Bender
Office Hours: (Most) Mondays 4-5pm & (most) Thursdays 12-1pm & by appointment, online (Zoom link in course Canvas)
Email: ebender at u

Syllabus

Description

The goal of this course is to better understand the ethical considerations that arise in the deployment of NLP technology, including how to identify people likely to be impacted by the use of the technology (direct and indirect stakeholders), what kinds of risks the technology poses, and how to design systems in ways that better support stakeholder values.

Through discussions of readings in the growing research literature on fairness, accountability, transparency and ethics (FATE) in NLP and allied fields, and value sensitive design, we will seek to answer the following questions:

What can go wrong, when we use NLP systems, in terms of specific harms to people?
How can fix/prevent/mitigate those harms?
What are our responsibilities as NLP researchers and developers in this regard?

Course projects are expected to take the form of a term paper analyzing some particular NLP task or data set in terms of the concepts developed through the quarter and looking forward to how ethical best practices could be developed for that task/data set.

Prerequisites: Graduate standing. The primary audience for this course is expected to be CLMS students, but graduate students in other programs are also welcome.

Accessibility policies

If you have already established accommodations with Disability Resources for Students (DRS), please communicate your approved accommodations to me at your earliest convenience so we can discuss your needs in this course.

If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), you are welcome to contact DRS at 206-543-8924 or uwdrs@uw.edu or disability.uw.edu. DRS offers resources and coordinates reasonable accommodations for students with disabilities and/or temporary health conditions. Reasonable accommodations are established through an interactive process between you, your instructor(s) and DRS. It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law.

Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW's policy, including more information about how to request an accommodation, is available at Faculty Syllabus Guidelines and Resources. Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form available at https://registrar.washington.edu/students/religious-accommodations-request/.

[Note from Emily: The above language is all language suggested by UW and in the immediately preceding paragraph in fact required by UW. I absolutely support the content of both and am struggling with how to contextualize them so they sound less cold. My goal is for this class to be accessible. I'm glad the university has policies that help facilitate that. If there is something you need that doesn't fall under these policies, I hope you will feel comfortable bringing that up with me as well.]

Requirements

KWLA paper (approx 7 pages) (15)
Exercise 1 (10)
Exercise 2 (10)
Participation in discussions (incl. Canvas) (15)
Peer feedback on term project (5)
Term project (45)

Schedule of Topics and Assignments (still subject to change)

NOTE: still need to place the ethics statement assignment.

Date Topic Reading Due
9/29 Introduction, organization
Why are we here? What do we hope to accomplish? No reading assumed for first day

10/1 KWLA papers: K & W due 11pm

10/6 Foundational readings Choose two articles from Foundations below, and be prepared to discuss our reading questions:

Who's "truth" or "right" should we be considering?
In a perfect world, what does this look like? (And what is a "perfect world"?)
What are the harms that the paper identifies? How are they quantified and what perspective is involved in that?
How does the paper engage with systems of power?
What are the implications for or connections to NLP?

NB: For book-length pieces, it's fine to choose a chapter to read.

10/13 Value sensitive design Choose two articles from Value sensitive design below, and be prepared to discuss our reading questions:

What techniques are proposed in this paper?
What's the relationship between value sensitive design and NLP or AI? How can we apply the ideas in this paper to NLP related tasks?
What has changed and what has stayed the same between old and new? What are key principles that emerge as central to the enterprise?
Whose notions of ethics/whose values are centered?

10/20 Bias and discrimination Choose two articles from Bias/Discrimination below, and be prepared to discuss our reading questions:

What is the operational definition of bias in this paper?
What kinds of bias/bias against whom did the author try to mitigate and what solutions did they find?
What are the real-world harms linked to the bias?
How was the bias discovered (how did they think to check for this?) and how was it quantified/measured?
What planning is proposed to consider before building a dataset/system?
What has changed between older and newer papers in this space in terms of how we think about/talk about bias?

10/27 Labor conditions/crowdsourcing + demographic variables Choose two articles from Demographic variables and Crowdsourcing below (with at least one from Demographic variabels), and be prepared to discuss our reading questions:

What are the implications for worker's power and solidarity?
What are the connections to or implications for NLP?
What are the best practices suggested/normative suggestions in the paper?
What are the main ethical issues brought up in these papers?
What is the relationship between this paradigm and data collection paradigms based on surveillance capitalism?
What conditions and complications does crowd-sourcing hide?

10/29 Term paper proposals due

11/3 SciComm and Ethics Education Choose two articles from SciComm and Ethics Education below, and be prepared to discuss our reading questions:

Who is the audience of the SciComm work to be produced?
How does the article address complexity, in the context of making complex issues accessible to the public?
What is recommended about how to do SciComm well?
To what extent and why should we feel obligated to communicate to the public?

11/5 Scicomm exercise due

11/10 Content Moderation/Toxicity Detection Choose two articles from Content Moderation/Toxcity Detection below, and be prepared to discuss our reading questions:

What is the definition of abuse/toxic content?
Who gets to define that/decide what counts in this context?
What is the intended use case?
What are the failure modes (false positive, false negative) and who would be affected by them?
Who should moderation be protecting? (In particular, is value in protecting bots from harassment)
How is automated content moderation motivated and how is it related to human moderation? Why automate / how is the automated system sitauted within a deployed process?
What is the evidence that that system is applicable across a relevant set of domains? What are the dependencies on existing resources that would limit the applicability?

11/12 Term paper outline due

11/17 Social Media + Privacy Choose two articles from Social Media and Privacy below, and be prepared to discuss our shared investigation questions:

What does "privacy" mean and why do the communities we represent value it?
How do we balance privacy against competing values?
How should social media data be best used (or not) in research?
What stance should commercial entites take in providing data to the public or researchers for research purposes?

11/24 Policy/regulation/guides + NLP for social good Choose two articles total from Changing Practice and NLP for Social Good below, and be prepared to discuss our reading questions:

Why should we write ethical considerations sections?
How are ethical considerations sections perceived: by researchers, by regulators, by the lay public?
What is the purpose of adopting a code of ethics?
What other practices can be recommended and why (open source, licensing, IRBs, ...)?
What positive impact has come from NLP for social good projects?
What is "social good" and who does it benefit?
What particular ethical concerns do "NLP for social good" projects raise?
Term paper draft due

12/1 Language Variation and Emergent Bias Choose two articles from Language Variation and Emergent Bias below, and be prepared to discuss our reading questions:

What are the real-world consequences of emergent bias in language technology?
How can emergent bias be measured and what different types do we observe?
Is it possible to prevent implicit standardization?
Dual use consideration: does more inclusive language tech enable more surveillance of marginalized groups?

12/3 KWLA papers due
Comments on partner's paper draft due

12/8 Documentation and Transparency Choose two articles from Documentation and Transparency below, and be prepared to discuss our reading questions:

What types of information should be included in the documentation?
How does documentation help mitigate harms and what types of harms can be mitigated based on documentation?
How to balance transparency and data subject privacy?
How to balance transparency with corporate IP and other forces pushing for secrecy?
How to balance documentation and "benefits" of dataset scale?

12/10 Paper annotations due

12/14 Final papers due 11pm

Date	Topic	Reading	Due
9/29	Introduction, organization Why are we here? What do we hope to accomplish?	No reading assumed for first day
10/1			KWLA papers: K & W due 11pm
10/6	Foundational readings	Choose two articles from Foundations below, and be prepared to discuss our reading questions: Who's "truth" or "right" should we be considering? In a perfect world, what does this look like? (And what is a "perfect world"?) What are the harms that the paper identifies? How are they quantified and what perspective is involved in that? How does the paper engage with systems of power? What are the implications for or connections to NLP? NB: For book-length pieces, it's fine to choose a chapter to read.
10/13	Value sensitive design	Choose two articles from Value sensitive design below, and be prepared to discuss our reading questions: What techniques are proposed in this paper? What's the relationship between value sensitive design and NLP or AI? How can we apply the ideas in this paper to NLP related tasks? What has changed and what has stayed the same between old and new? What are key principles that emerge as central to the enterprise? Whose notions of ethics/whose values are centered?
10/20	Bias and discrimination	Choose two articles from Bias/Discrimination below, and be prepared to discuss our reading questions: What is the operational definition of bias in this paper? What kinds of bias/bias against whom did the author try to mitigate and what solutions did they find? What are the real-world harms linked to the bias? How was the bias discovered (how did they think to check for this?) and how was it quantified/measured? What planning is proposed to consider before building a dataset/system? What has changed between older and newer papers in this space in terms of how we think about/talk about bias?
10/27	Labor conditions/crowdsourcing + demographic variables	Choose two articles from Demographic variables and Crowdsourcing below (with at least one from Demographic variabels), and be prepared to discuss our reading questions: What are the implications for worker's power and solidarity? What are the connections to or implications for NLP? What are the best practices suggested/normative suggestions in the paper? What are the main ethical issues brought up in these papers? What is the relationship between this paradigm and data collection paradigms based on surveillance capitalism? What conditions and complications does crowd-sourcing hide?
10/29			Term paper proposals due
11/3	SciComm and Ethics Education	Choose two articles from SciComm and Ethics Education below, and be prepared to discuss our reading questions: Who is the audience of the SciComm work to be produced? How does the article address complexity, in the context of making complex issues accessible to the public? What is recommended about how to do SciComm well? To what extent and why should we feel obligated to communicate to the public?
11/5			Scicomm exercise due
11/10	Content Moderation/Toxicity Detection	Choose two articles from Content Moderation/Toxcity Detection below, and be prepared to discuss our reading questions: What is the definition of abuse/toxic content? Who gets to define that/decide what counts in this context? What is the intended use case? What are the failure modes (false positive, false negative) and who would be affected by them? Who should moderation be protecting? (In particular, is value in protecting bots from harassment) How is automated content moderation motivated and how is it related to human moderation? Why automate / how is the automated system sitauted within a deployed process? What is the evidence that that system is applicable across a relevant set of domains? What are the dependencies on existing resources that would limit the applicability?
11/12			Term paper outline due
11/17	Social Media + Privacy	Choose two articles from Social Media and Privacy below, and be prepared to discuss our shared investigation questions: What does "privacy" mean and why do the communities we represent value it? How do we balance privacy against competing values? How should social media data be best used (or not) in research? What stance should commercial entites take in providing data to the public or researchers for research purposes?
11/24	Policy/regulation/guides + NLP for social good	Choose two articles total from Changing Practice and NLP for Social Good below, and be prepared to discuss our reading questions: Why should we write ethical considerations sections? How are ethical considerations sections perceived: by researchers, by regulators, by the lay public? What is the purpose of adopting a code of ethics? What other practices can be recommended and why (open source, licensing, IRBs, ...)? What positive impact has come from NLP for social good projects? What is "social good" and who does it benefit? What particular ethical concerns do "NLP for social good" projects raise?	Term paper draft due
12/1	Language Variation and Emergent Bias	Choose two articles from Language Variation and Emergent Bias below, and be prepared to discuss our reading questions: What are the real-world consequences of emergent bias in language technology? How can emergent bias be measured and what different types do we observe? Is it possible to prevent implicit standardization? Dual use consideration: does more inclusive language tech enable more surveillance of marginalized groups?
12/3			KWLA papers due Comments on partner's paper draft due
12/8	Documentation and Transparency	Choose two articles from Documentation and Transparency below, and be prepared to discuss our reading questions: What types of information should be included in the documentation? How does documentation help mitigate harms and what types of harms can be mitigated based on documentation? How to balance transparency and data subject privacy? How to balance transparency with corporate IP and other forces pushing for secrecy? How to balance documentation and "benefits" of dataset scale?
12/10			Paper annotations due
12/14			Final papers due 11pm

Bibliography

NOTE This is still very much a work in progress! I have more papers to add, and some of these need to be recategorized.

Overviews/Calls to Action

Amblard, M. (2016). Pour un TAL responsable. Traitement Automatique des Langues , 57 (2), 21-45.
boyd danah. (Sept 13, 2019). Facing the great reckoning head-on. Medium.
Crawford, K., & Calo, R. (2016). There is a blind spot in AI research. Nature, 538 (7625), 311.
Executive Office of the President National Science and Technology Council Committee on Technology. (2016). Preparing for the future of artificial intelligence. (See especially p. 30)
Fort, K., Adda, G., & Cohen, K. B. (2016). Ethique et traitement automatique des langues et de la parole: entre truismes et tabous. Traitement Automatique des Langues, 57 (2), 7-19.
Grissom II, A. (2019). Thinking about how NLP is used to serve power: Current and future trends. /Presentation at Widening NLP 2019. [Slides] [Video]
Hovy, D., & Spruit, S. L. (2016, August). The social impact of natural language processing. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 591-598). Berlin, Germany: Association for Computational Linguistics.
- Another presentation by Hovy on related material
Lefeuvre-Halftermeyer, A., Govaere, V., Antoine, J.-Y., Allegre, W., Pouplin, S., Departe, J.-P., et al. (2016). Typologie des risques pour une analyse éthique de l'impact des technologies du TAL. Traitement Automatique des Langues, 57 (2), 47-71.
Markham, A. (May 18, 2016). OKCupid data release fiasco: It's time to rethink ethics education. Data & Society: Points.
O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. NY: Crown Publishing Group.
Rogaway, P. (2015). The moral character of cryptographic work.
Shneiderman, B. (2016). Opinion: The dangers of faulty, biased, or malicious algorithms requires independent oversight. Proceedings of the National Academy of Sciences, 113 (48), 13538-13540.
Sourour, B. (Nov 13, 2016). The code I'm still ashamed of. Medium.

Foundations

Alkhatib, A. 2021. To Live in Their Utopia: Why Algorithmic Systems Create Absurd Outcomes
Birhane, A. 2021. Algorithmic injustice: a relational ethics approach
Baria, A. and Cross, K. 2021. The brain is a computer is a brain: neuroscience's internal debate and the social significance of the Computational Metaphor
Calo et al (eds). 2021. Telling Stories: On Culturally Responsive Artificial Intelligence
Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. U. Chi. Legal f., 139.
Crenshaw, K. (1990). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stan. L. Rev., 43, 1241.
Dai, J. 2020. THE PARADOX OF SOCIALLY RESPONSIBLE COMPUTING The limits and potential of teaching tech ethics
Ekstrand, M. et al. 2021. Fairness and Discrimination in Information Access Systems
Hanna, A. and Park, T.M. 2020. Against Scale: Provocations and Resistances to Scale Thinking
Hoffman, A. L. 2020. Terms of Inclusion: Data, Discourse, Violence
Lewis, J.E. et al, 2020. Indigenous Protocol and Artifi cial Intelligence
Martin, K. Forthcoming. Algorithmic Bias and Corporate Responsibility: How companies hide behind the false veil of the technological imperative
Mohamed, S. et al. 2020. Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence
Paullada, A. et al. 2021. Data and its (dis)contents: A survey of dataset development and use in machine learning research Patterns 2.
Pipkin, E. 2021. A City Is a City -- Against the metaphorization of data
Raji, D. 2020. How our data encodes systematic racism MIT Tech Review.
Sambasivan, N. et al. 2021. "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI. CHI 2021.
Selbst, A.D. et al. 2019. Fairness and Abstraction in Sociotechnical Systems. FAT* 19.
Vallor et al. 2018. Overview of Ethics in Tech Practice
Vinsel, L. 2021. You're Doing It Wrong: Notes on Criticism and Technology Hype
Whittaker, M. 2021. From Ethics to Organizing: Getting Serious about AI (Video of a talk)
Zimmermann, A. and Zevenbergen B. 2019. AI Ethics: Seven Traps

Philosophical Underpinnings

Bartky, S. L. (2002). "Sympathy and solidarity" and other essays (Vol. 32). Rowman & Littlefield.
Bryson, J. J. (2015). Artificial intelligence and pro-social behaviour. In C. Misselhorn (Ed.), Collective agency and cooperation in natural and artificial systems: Explanation, implementation and simulation (pp. 281-306). Cham: Springer International Publishing.
Butler, J. (2005). Giving an account of oneself. Oxford University Press. (Available online, through UW libraries)
Cho, S., Crenshaw, K. W., & McCall, L. (2013). Toward a field of intersectionality studies: Theory, applications, and praxis. Signs: Journal of Women in Culture and Society, 38 (4), 785-810.
DeLaTorre, M. A. (2013). Ethics: A liberative approach. Fortress Press. (Available online through UW Libraries; read intro + chapter of choice)
Edgar, S. L. (2003). Morality and machines: Perspectives on computer ethics. Jones & Bartlett Learning. (Available online through UW libraries)
Fieser, J., & Dowden, B. (Eds.). (2016). Internet encyclopedia of philosophy: Entries on Ethics
Liamputtong, P. (2006). Researching the vulnerable: A guide to sensitive research methods. Sage. (Available online, through UW libraries)
Prabhumoye, S., Mayfield, E., & Black, A. W. (2019, August). Principled frameworks for evaluating ethics in NLP systems. In Proceedings of the 2019 workshop on widening nlp (pp. 118-121). Florence, Italy: Association for Computational Linguistics.
Quinn, M. J. (2014). Ethics for the information age. Pearson.
Zalta, E. N. (Ed.). (2019). The Stanford encyclopedia of philosophy (Winter 2016 Edition ed.): Entries on Ethics

Value Sensitive Design and Other Design Approaches

Borning, A., & Muller, M. (2012). Next steps for value sensitive design. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1125-1134).
Friedman, B. (1996). Value-sensitive design. ACM Interactions, 3 (6), 17-23.
Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. MIT Press.
Friedman, B., Hendry, D. G., Borning, A., et al. (2017). A survey of value sensitive design methods. Foundations and Trends in Human-Computer Interaction , 11 (2), 63-125.
Friedman, B., & Kahn Jr., P. H. (2008). Human values, ethics, and design. In J. A. Jacko & A. Sears (Eds.), The human-computer interaction handbook (Revised second ed., pp. 1241-1266). Mahwah, NJ.
Jacobs, N and Huldtgren, A. 2021. Why value sensitive design needs ethical commitments Ethics and Information Technology 23.
Leidner, J. L., & Plachouras, V. (2017). Ethical by design: Ethics best practices for natural language processing. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 30-40). Valencia, Spain: Association for Computational Linguistics.
Nathan, L. P., Klasnja, P. V., & Friedman, B. (2007). Value scenarios: a technique for envisioning systemic effects of new technologies. In CHIགྷ extended abstracts on human factors in computing systems (pp. 2585-2590).
Schnoebelen, T. (2017). Goal-oriented design for ethical machine learning and NLP. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 88-93). Valencia, Spain: Association for Computational Linguistics.
Spiekermann, S. & Winkler, T. 2021. Twenty years of value sensitive design: a review of methodological practices in VSD projects Ethics and Information Technology 23.
Umbrello, S and van de Poel, I. 2021. Mapping value sensitive design onto AI for social good principles AI and Ethics 1.
Yoo, D. 2021. Stakeholder Tokens: a constructive method for value sensitive design stakeholder analysis Ethics and Information Technology 23.
Young, M., Magassa, L., & Friedman, B. (2019). Toward inclusive tech policy design: A method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology, 21(2), 89-103.

Documentation and Transparency

Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
Bender, E. M., Friedman, B. and McMillan-Major, A. (2021). A Guide for Writing Data Statements for Natural Language Processing.
Birhane, A., Prabhu, V.U., and Kahembwe, E. 2021. Multimodal datasets: misogyny, pornography, and malignant stereotypes
Couillault, A. et al. 2014. Evaluating Corpora Documentation with regards to the Ethics and Big Data Charter. LREC 2014.
Dodge et al. 2021. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus EMNLP 2021.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., and Crawford, K. (2018). Datasheets for datasets. Proceedings of the 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., and Crawford K. (2021). Datasheets for datasets. CACM 64(12):86-92.
Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K. (2018). The dataset nutrition label: A framework to drive higher data quality standards.
Jo, E. S. and Gebru, T. 2020. Lessons from archives: strategies for collecting sociocultural data in machine learning. FAT* ཐ.
Mieskes, M. (2017, April). A quantitative study of data in the NLP community. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 23-29). Valencia, Spain: Association for Computational Linguistics.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., et al. (2019). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229). New York, NY, USA: ACM.
Moss, E. et al. 2021. Assembling Accountability: Algorithmic Impact Assessment for the Public Interest
Partnership on AI. (2019). ABOUT-ML: Annotation and benchmarking on understanding and transparency of machine learning lifecycles (ABOUT ML).

Other Best Practices

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. CoRR, abs/1606.06565.
Kulynych, B. et al. 2020. POTs: protective optimization technologies. FAT* 2020.
Markham, A. (2012). Fabrication as ethical practice: Qualitative inquiry in ambiguous Internet contexts. Information, Communication & Society, 15(3), 334-353.
Ratto, M. (2011). Critical making: Conceptual and material studies in technology and social life. The Information Society, 27 (4), 252-260.
Russell, S., Dewey, D., & Tegmark, M. (2015). Research priorities for robust and benefcial artifcial intelligence. AI Magainze.
Shilton, K., & Anderson, S. (2016). Blended, not bossy: Ethics roles, responsibilities and expertise in design. Interacting with Computers.
Shilton, K., & Sayles, S. (2016). "We aren't all going to be on the same page about ethics": Ethical practices and challenges in research on digital and social media. In 2016 49th Hawaii international conference on system sciences (HICSS) (pp. 1909-1918).

Bias/Discrimination

Antoniak and Mimno. 2021. Bad Seeds: Evaluating Lexical Methods for Bias Measurement ACL 2021.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. In Proceedings of FAccT 2021, pp.610-623.
Blodgett, S. et al. 2020. Language (Technology) is Power: A Critical Survey of "Bias" in NLP ACL 2020.
Blodgett, S. et al. 2021. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets ACL 2021.
Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in neural information processing systems 29 (pp. 4349-4357). Curran Associates, Inc.
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora necessarily contain human biases. Science.
Caliskan, A., Bryson, J., & Narayanan, A. (2016). A story of discrimination and unfairness. (Talk presented at HotPETS 2016)
Davani, A.M. et al. 2021. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations, to appear in TACL.
Friedman, B., & Nissenbaum, H. (1996). Bias in computer systems. ACM Transactions on Information Systems (TOIS), 14(3), 330-347.
Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences , 115 (16), E3635-E3644.
Goldfarb-Tarrant, S., et al. 2021. Intrinsic Bias Metrics Do Not Correlate with Application Bias. ACL 2021.
Gonen, H., & Goldberg, Y. (2019). Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 609-614). Minneapolis, Minnesota: Association for Computational Linguistics.
Guynn, J. (Jun 10, 2016). `Three black teenagers' Google search sparks outrage. USA Today.
Herbelot, A., Redecker, E. von, & Müller, J. (2012). Distributional techniques for philosophical enquiry. In Proceedings of the 6th workshop on language technology for cultural heritage, social sciences, and humanities (pp. 45-54). Avignon, France: Association for Computational Linguistics.
Hovy, D. and Prabhumoye, S. 2021. Five sources of bias in natural language processing
Liang, P. et al. 2021. Towards Understanding and Mitigating Social Biases in Language Models Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021.
Manzini, T., Yao Chong, L., Black, A. W., & Tsvetkov, Y. (2019). Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 615-621). Minneapolis, Minnesota: Association for Computational Linguistics.
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.
Rogers, A. 2021. Changing the World by Changing the Data. ACL 2021.
Rudinger, R., May, C., & Van Durme, B. (2017). Social bias in elicited natural language inferences. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 74-79). Valencia, Spain: Association for Computational Linguistics.
Schluter, N. (2018). The word analogy testing caveat. In Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers) (pp. 242-246). New Orleans, Louisiana: Association for Computational Linguistics.
Schmidt, B. (2015). Rejecting the gender binary: A vector-space operation. (Blog post, accessed 12/29/16)
Shen, J.H. et al. 2018. Darling or Babygirl? Investigating Stylistic Bias in Sentiment Analysis FATML 2018.
Sweeney, L. (May 1, 2013). Discrimination in online ad delivery. Communications of the ACM, 56 (5), 44-54.
Talat, Z. et al. 2021. Disembodied Machine Learning: On the Illusion of Objectivity in NLP.
Xu, A. et al. 2021. Detoxifying Language Models Risks Marginalizing Minority Voices. NAACL 2021.

Fairness

Hardt, M. (Sep 26, 2014). How big data is unfair: Understanding sources of unfairness in data driven decision making. Medium.
Rao, D. (n.d.). Fairness in machine learning. (slides)
Zliobaite, I. (2015). On the relation between accuracy and fairness in binary classification. CoRR, abs/1505.05723.

Other resources on bias

Angwin, J., & Larson, J. (Dec 30, 2016). Bias in criminal risk scores is mathematically inevitable, researchers say. ProPublica.
boyd, d. (2015). What world are we building? (Everett C Parker Lecture. Washington, DC, October 20)
Brennan, M. (2015). Can computers be racist? big data, inequality, and discrimination. (online; Ford Foundation)
Clark, J. (Jun 23, 2016). Artificial intelligence has a `sea of dudes' problem. Bloomberg Technology.
Crawford, K. (Apr 1, 2013). The hidden biases in big data. Harvard Business Review.
Daumé III, H. (Nov 8, 2016). Bias in ML, and teaching AI. (Blog post, accessed
Larson, J., Angwin, J., & Parris Jr., T. (Oct 19, 2016). Breaking the black box: How machines learn to be racist. ProPublica. 1/17/17)
Emspak, J. (Dec 29, 2016). How a machine learns prejudice: Artificial intelligence picks up bias from human creators--not from hard, cold logic. Scientific American.
Jacob. (May 8, 2016). Deep learning racial bias: The avenue Q theory of ubiquitous racism. Medium.

More papers in the Proceedings of the First Workshop on Gender Bias in Natural Language Processing

Demographic variables

Ahmen, A. 2018. Trans Competent Interaction Design: A Qualitative Study on Voice, Identity, and Technology. Interacting with Computers
Field, A. et al. 2021. A Survey of Race, Racism, and Anti-Racism in NLP ACL 2021.
Larson, B. (2017). Gender as a variable in natural-language processing: Ethical considerations. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 1-11). Valencia, Spain: Association for Computational Linguistics.
Morgan Klaus Scheuerman, Katta Spiel, Oliver L. Haimson, Foad Hamidi, Stacy M. Branham. 2020. HCI Guidelines for Gender Equity and Inclusivity

Chatbots

Abercrombie, G. et al. 2021. Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing.
Cercas Curry, A., & Rieser, V. (2018). #MeToo Alexa: How conversational systems respond to sexual harassment. In Proceedings of the second ACL workshop on ethics in natural language processing (pp. 7-14). New Orleans, Louisiana, USA: Association for Computational Linguistics.
Dinan, E. et al. 2021. Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Elder, A. (2019). Conversation from beyond the grave? A neo-Confucian ethics of chatbots of the dead. Journal of Applied Philosophy.
Fessler, Leah. (Feb 22, 2017). SIRI, DEFINE PATRIARCHY: We tested bots like Siri and Alexa to see who would stand up to sexual harassment. Quartz.
Fung, P. (Dec 3, 2015). Can robots slay sexism? World Economic Forum.
Mott, N. (Jun 8, 2016). Why you should think twice before spilling your guts to a chatbot. Passcode.
Lee, N. et al. 2019. Exploring Social Bias in Chatbots using Stereotype Knowledge. Proceedings of the 2019 Workshop on Widening NLP.
Paolino, J. (Jan 4, 2017). Google home vs Alexa: Two simple user experience design gestures that delighted a female user. Medium.
Seaman Cook, J. (Apr 8, 2016). From Siri to sexbots: Female AI reinforces a toxic desire for passive, agreeable and easily dominated women. Salon.
Twitter. (Apr 7, 2016). Automation rules and best practices. (Web page, accessed 12/29/16)
Yao, M. (n.d.). Can bots manipulate public opinion? (Web page, accessed 12/29/16)

Privacy

Abadi, M., Chu, A., Goodfellow, I., Brendan McMahan, H., Mironov, I., Talwar, K., et al. (2016). Deep Learning with Differential Privacy. ArXiv e-prints.
Amazon.com. 2017. Memorandum of Law in Support of Amazon's Motion to Quash Search Warrant
Asher-Schapiro, A. and Sherfinski, D. 2021. INSIGHT-AI surveillance takes U.S. prisons by storm Thomson Reuters Foundation News. November 16, 2021.
Brant, T. (Dec 27, 2016). Amazon Alexa data wanted in murder investigation. PC Mag.
Carlini, N. et al. 2020. Extracting Training Data from Large Language Models.
Friedman, B., Kahn Jr, P. H., Hagman, J., Severson, R. L., & Gill, B. (2006). The watcher and the watched: Social judgments about privacy in a public place. Human-Computer Interaction, 21(2), 235-272.
Golbeck, J., & Mauriello, M. L. (2016). User perception of facebook app data access: A comparison of methods and privacy concerns. Future Internet, 8(2), 9.
Grissom II, A. (2019). Thinking about how NLP is used to serve power: Current and future trends. /Presentation at Widening NLP 2019. [Slides] [Video]
Lewis, D., Moorkens, J., & Fatema, K. (2017). Integrating the management of personal data protection and open science with research ethics. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 60-65). Valencia, Spain: Association for Computational Linguistics.
Narayanan, A., & Shmatikov, V. (2010). Myths and fallacies of "personally identifiable information". Communications of the ACM, 53 (6), 24-26.
Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford: Stanford University Press.
Shilton, K. et al. 2021. Excavating awareness and power in data science: A manifesto for trustworthy pervasive data research. Big Data & Society.
Solove, D. J. (2007). 'I've got nothing to hide' and other misunderstandings of privacy. San Diego Law Review, 44 (4), 745-772.
Steel, E., & Angwin, J. (Aug 4, 2010). On the Web's cutting edge, anonymity in name only. The Wall Street Journal.
Tene, O., & Polonetsky, J. (2012). Big data for all: Privacy and user control in the age of analytics. Northwestern Journal of Technology and Intellectual Property, 11(45), 239-273.
Vitak, J., Shilton, K., & Ashktorab, Z. (2016). Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing (pp. 941-953).

Social Media

Association of Internet Researchers. 2019. Internet Research: Ethical Guidelines 3.0
Dym, B. and Fiesler, C. 2020. Ethical and privacy considerations for research using online fandom data
Fiesler, C. and Proferes, N. 2018. "Participant" Perceptions of Twitter Research Ethics
Hallinan, B., Brubaker, J. R., & Fiesler, C. (2019). Unexpected expectations: Public reaction to the facebook emotional contagion study. New Media & Society, 1-19. [Tweet thread]
Metcalf, J., & Crawford, K. (2016). Where are human subjects in big data research? The emerging ethics divide.Big Data & Society 3(1).
Roose, K. 2021. Inside Facebook's Data Wars. The New York Times July 14, 2021.
Shilton, K., & Sayles, S. (2016). ``We aren't all going to be on the same page about ethics'': Ethical practices and challenges in research on digital and social media. In 2016 49th Hawaii international conference on system sciences (HICSS) (pp. 1909-1918).
Townsend, L., & Wallace, C. (2015). Social media research: A guide to ethics. The University of Aberdeen.
Tufekci, Z. 2014. Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls Eighth International AAAI Conference on Weblogs and Social Media.
Williams, M. L., Burnap, P., & Sloan, L. (2017). Towards an ethical framework for publishing twitter data in social research: Taking into account users views, online context and algorithmic estimation. Sociology, 51 (6), 1149-1168.
Woodfield, K. (Ed.). (2017). The ethics of online research. Emerald Publishing Limited.
- See especially Chs 2, 5, 7 and 8

Content moderation/Toxicity detection

Dwoskin et al, 2019 Content moderators at YouTube, Facebook and Twitter see the worst of the web — and suffer silently, Washington Post July 25, 2019.
Jiang, J.A., Scheuerman, M.K., Fiesler, C. and Brubaker, J.R. 2021 Understanding international perceptions of the severity of harmful content online
Jones, L. 2020. Twitter wants you to know that you're still SOL if you get a death threat -- unless you're President Donald Trump
Gehman, S. et al. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. Findings of EMNLP.
Marshall, B. 2021. Algorithmic misogynoir in content moderation practice
Röttger, P. et al. 2021. HateCheck: Functional Tests for Hate Speech Detection Models. ACL 2021.
Saitov, K. and Derczynski, L. 2021. Abusive Language Recognition in Russian. Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing.
Sap, M. et al. 2019. The Risk of Racial Bias in Hate Speech Detection ACL 2019.
Sigurbergsson, G.I. and Derczynski, L. 2020. Offensive Language and Hate Speech Detection for Danish. Proceedings of the 12th Language Resources and Evaluation Conference.
Simonite, Tom. 2021. Facebook Is Everywhere; Its Moderation Is Nowhere Close. WIRED
Talat, Z. et al. 2017. Understanding Abuse: A Typology of Abusive Language Detection Subtasks. ACL 2017./
Zeinert, P., Inie, N. and Derczynski, L. 2021. Annotating Online Misogyny. ACL 2021.

Crowdsourcing and labor conditions

Bederson, B. B., & Quinn, A. J. (2011). Web workers unite! Addressing challenges of online laborers. In CHIཇ extended abstracts on human factors in computing systems (pp. 97-106).
Callison-Burch, C. (2016). Crowd workers. (Slides from Crowdsoucing and Human Computation, accessed online 12/30/16)
Callison-Burch, C. (2016). Ethics of crowdsourcing. (Slides from Crowdsoucing and Human Computation, accessed online 12/30/16)
Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics, 37 (2), 413-420.
Irani, L. and Silberman, M.S. 2013. Turkopticon: interrupting worker invisibility in Amazon Mechanical Turk CHI 2013.
Kummerfeld, J.K. 2021. Quantifying and Avoiding Unfair Qualification Labour in Crowdsourcing. ACL 2021.
Lefeuvre-Halftermeyer, A.; Govaere, V.; Antoine, J.-Y.; Allegre, W.; Pouplin, S.; Departe, J.-P.; Slimani, S. & Spagnulo, A. Typologie des risques pour une analyse éthique de l'impact des technologies du TAL Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2016, 57, 47-71
Shmueli et al. 2021. Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing. NAACL 2021.
Snyder, J. (2010). Exploitation and sweatshop labor: Perspectives and issues. Business Ethics Quarterly, 20 (2), 187-213.

Language Variation and Emergent Bias

Bender, E.M. 2019. The #BenderRule: On Naming the Languages We Study and Why It Matters [audio paper] [works cited]
Broussard, M. (10 May 2018). Agenda: Why the scots are such a struggle for Alexa and Siri. The Herald.
Garimella, A., Banea, C., Hovy, D., & Mihalcea, R. (2019). Women's syntactic resilience and men's grammatical luck: Gender-bias in part-of-speech tagging and dependency parsing. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 3493-3498). Florence, Italy: Association for Computational Linguistics.
Hovy, D., & Søgaard, A. (2015). Tagging performance correlates with author age. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 2: Short papers) (pp. 483-488). Beijing, China: Association for Computational Linguistics.
Huang, X., & Paul, M. J. (2019). Neural user factor adaptation for text classification: Learning to generalize across author demographics. In Proceedings of the eighth joint conference on lexical and computational semantics (*SEM 2019) (pp. 136-146). Minneapolis, Minnesota: Association for Computational Linguistics.
Jørgensen, A., Hovy, D., & Søgaard, A. (2015). Challenges of studying and processing dialects in social media. In Proceedings of the workshop on noisy user-generated text (pp. 9-18). Beijing, China: Association for Computational Linguistics.
Joshi, P. et al. 2020. The State and Fate of Linguistic Diversity and Inclusion in the NLP World ACL 2020.
Jurgens, D., Tsvetkov, Y., & Jurafsky, D. (2017). Incorporating dialectal variability for socially equitable language identification. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 51-57). Vancouver, Canada: Association for Computational Linguistics.
Koenecke, A. et al. 2020. Racial disparities in automated speech recognition
Markl and Lai. 2021. Context-sensitive evaluation of automatic speech recognition: considering user experience & language variation
Tatman, R. (2017). Gender and dialect bias in YouTube's automatic captions. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 53-59). Valencia, Spain: Association for Computational Linguistics.
Wassink, A. 2021. Uneven success: Racial Bias in Automatic Speech Recognition. (Video of talk; [slides])

Biomedical NLP, Mental Health and Social Media

Andrade, N. N. Gomes de, Pawson, D., Muriello, D., Donahue, L., & Guadagno, J. (2018, Dec 01). Ethics and artificial intelligence: Suicide prevention on Facebook. Philosophy & Technology, 31(4), 669-684.
Barnett, I., & Torous, J. (2019, 04). Ethics, Transparency, and Public Health at the Intersection of Innovation and Facebook's Suicide Prevention Efforts. Annals of Internal Medicine, 170 (8), 565-566.
Benton, A., Coppersmith, G., & Dredze, M. (2017). Ethical research protocols for social media health research. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 94-102). Valencia, Spain: Association for Computational Linguistics.
Elder, A. (2019). Conversation from beyond the grave? A neo-Confucian ethics of chatbots of the dead. Journal of Applied Philosophy.
Linthicum, K. P., Schafer, K. M., & Ribeiro, J. D. (2019). Machine learning in suicide science: Applications and ethics. Behavioral sciences & the law , 37 (3), 214-222.
McKernan, L. C., Clayton, E. W., & Walsh, C. G. (2018). Protecting life while preserving liberty: Ethical recommendations for suicide prevention with artificial intelligence. Frontiers in Psychiatry, 9.
Suster, S., Tulkens, S., & Daelemans, W. (2017, April). A short review of ethical challenges in clinical natural language processing. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 80-87). Valencia, Spain: Association for Computational Linguistics.
Tucker, R. P., Tackett, M. J., Glickman, D., & Reger, M. A. (2019). Ethical and practical considerations in the use of a predictive model to trigger suicide prevention interventions in healthcare settings. Suicide and Life-Threatening Behavior , 49 (2), 382-392.

NLP Apps Addressing Ethical Issues/NLP for Social Good

Demszky, D., Garg, N., Voigt, R., Zou, J., Shapiro, J., Gentzkow, M., et al. (2019). Analyzing polarization in social media: Method and application to tweets on 21 mass shootings. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 2970-3005). Minneapolis, Minnesota: Association for Computational Linguistics.
Fokkens, A. (2016). Reading between the lines. (Slides presented at Language Analysis Portal Launch event, University of Oslo, Sept 2016)
Jurgens, D., Hemphill, L., & Chandrasekharan, E. (2019). A just and comprehensive strategy for using NLP to address online abuse. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 3658-3666). Florence, Italy: Association for Computational Linguistics.
Lee, N., Bang, Y., Shin, J., & Fung, P. (2019). Understanding the shades of sexism in popular TV series. In Proceedings of the 2019 workshop on widening NLP (pp. 122-125). Florence, Italy: Association for Computational Linguistics.
Gershgorn, D. (Feb 27, 2017). NOT THERE YET: Alphabet's hate-fighting AI doesn't understand hate yet. Quartz.
Google.com. (2017). The women missing from the silver screen and the technology used to find them. Blog post, accessed March 1, 2017.
Greenberg, A. (2016). Inside Google's Internet Justice League and Its AI-Powered War on Trolls. Wired.
Kellion, L. (Mar 1, 2017) Facebook artificial intelligence spots suicidal users. BBC News.
Madnani, N., Loukina, A., Davier, A. von, Burstein, J., & Cahill, A. (2017). Building better open-source tools to support fairness in automated scoring. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 41-52). Valencia, Spain: Association for Computational Linguistics.
Munger, K. (2016). Tweetment effects on the tweeted: Experimentally reducing racist harassment. Political Behavior, 1-21.
Munger, K. (Nov 17, 2016). This researcher programmed bots to fight racism on twitter. It worked. Washington Post.
Murgia, M. (Feb 23, 2017). Google launches robo-tool to flag hate speech online. Financial Times.
The times is partnering with jigsaw to expand comment capabilities. (Sep 20, 2016). The New York Times.
Qian, J., ElSherief, M., Belding, E., & Wang, W. Y. (2019). Learning to decipher hate symbols. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 3006-3015). Minneapolis, Minnesota: Association for Computational Linguistics.
Waseem, Z. (2016). Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the first workshop on nlp and computational social science (pp. 138-142). Austin, Texas: Association for Computational Linguistics.
Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the naacl student research workshop (pp. 88-93). San Diego, California: Association for Computational Linguistics.
Wiegand, M., Ruppenhofer, J., & Kleinbauer, T. (2019). Detection of Abusive Language: the Problem of Biased Datasets. In Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 602-608). Minneapolis, Minnesota: Association for Computational Linguistics.

Fake News Challenge
Jigsaw Challenges
Perspective (from Jigsaw)
- But see: Hosseini, H, S. Kannan, B. Zhang and R. Poovendran. 2017. Deceiving Google's Perspective API Built for Detecting Toxic Comments. ArXiv.
Textio See also:
- CEO Kieran Snyder's posts on medium.com
- Recording of Kieran Snyder's NLP Meetup talk from Aug 15, 2016

Other Issues in NLP: Carbon Emissions, Generation, ...

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
Seabrook, J. (2019). The next word: Where will predictive text take us? The New Yorker, 14 Oct 2019.
Parra Escartín, C., Reijers, W., Lynn, T., Moorkens, J., Way, A., & Liu, C.-H. (2017). Ethical considerations in NLP shared tasks. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 66-73). Valencia, Spain: Association for Computational Linguistics.
Smiley, C., Schilder, F., Plachouras, V., & Leidner, J. L. (2017). Say the right thing right: Ethics issues in natural language generation systems. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 103-108). Valencia, Spain: Association for Computational Linguistics.
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 3645-3650). Florence, Italy: Association for Computational Linguistics.
Šuster, S., Tulkens, S., & Daelemans, W. (2017). A short review of ethical challenges in clinical natural language processing. In Proceedings of the first ACL workshop on ethics in natural language processing (pp. 80-87). Valencia, Spain: Association for Computational Linguistics.
Torbati, Y. (Sept, 2019). Google says Google Translate can't replace human translators. Immigration officials have used it to vet refugees. ProPublica.
Vincent, J. (Feb, 2019). AI researchers debate the ethics of sharing potentially harmful programs. The Verge.

SciComm and Ethics Education

Burns, T. W., O'Connor, D. J., & Stocklmayer, S. M. (2003). Science communication: A contemporary definition. Public Understanding of Science, 12(2), 183-202.
Di Bari, M., & Gouthier, D. (2002). Tropes, science and communication. Journal of Communication, 2(1).
Fischhoff, B. (2013). The sciences of science communication. Proceedings of the National Academy of Sciences, 110(Supplement 3), 14033-14039.
Gero, K.I. et al. (2021). What Makes Tweetorials Tick: How Experts Communicate Complex Topics on Twitter, Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 422.
Mooney, C. (2010). Do scientists understand the public? American Academy of Arts & Sciences.
Ngumbi, E. (2018, January 26). If you want to explain your science to the public, here's some advice. Scientific American.
Phillips, C. M. L., & Beddoes, K. (2013). Really changing the conversation: The deficit model and public under-standing of engineering. In Proceedings of the 120th ASEE annual conference & exposition.
Shepherd, M. (2016, November 22). 9 tips for communicating science to people who are not scientists. Forbes, 1-4.
Simis, M. J., Madden, H., Cacciatore, M. A., & Yeo, S. K. (2016). The lure of rationality: Why does the deficit model persist in science communication? Public Understanding of Science, 25 (4), 400-414.
Young, E. 2021. What Even Counts as Science Writing Anymore? The Atlantic.

Changing Practice: Policy, regulation, and guidelines

ACM Ethics Task Force. (2016). Code 2018 | ACM ethics. (Web page, accessed 1/5/17)
The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. (2016). Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (AI/AS) (Version 1 -- For Public Discussion).
ACL Ethics FAQ. 2021. EMNLP version
US FDA. 2021. Good Machine Learning Practice for Medical Device Development: Guiding Principles

Ashurst, C. et al. 2020. A Guide to Writing the NeurIPS Impact Statement
Daumé III, H. (Dec 12, 2016). Should the NLP and ML Communities have a Code of Ethics? (Blog post, accessed 12/30/16)
Etlinger, S., & Groopman, J. (2015). The trust imperative: A framework for ethical data use.
Moran, C. 2021. Machine Learning, Ethics, and Open Source Licensing (Part II/II)
Nanayakkara, P. et al. 2021. Unpacking the Expressed Consequences of AI Research in Broader Impact Statements
Patton, D. U. (16 April 2018). Ethical guidelines for social media research with vulnerable groups. Medium.
Rakova et al. 2021. Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for Shifting Organizational Practices
Schwartz et al. 2021. A Proposal for Identifying and Managing Bias in Artificial Intelligence
Stix, C. 2021. Actionable Principles for Artificial Intelligence Policy: Three Pathways

Ethics Statements

Reading notes

Schmaltz 2018 is a proposal around this practice. The other references here are examples of paper including an ethics statment. For this week, we'll be writing our own ethics statements, either for our own papers or for others we have selected.

Papers:

Al-khazra ji, S., Berke, L., Kafle, S., Yeung, P., & Huenerfauth, M. (2018). Modeling the speed and timing of american sign language to generate realistic animations. In Proceedings of the 20th international ACM SIGACCESS conference on computers and accessibility (pp. 259-270). New York, NY, USA: ACM.
Chen, H., Cai, D., Dai, W., Dai, Z., & Ding, Y. (2019). Charge-based prison term prediction with deep gating network. (To appear at EMNLP 2019)
Schmaltz, A. (2018). On the utility of lay summaries and AI safety disclosures: Toward robust, open research oversight. In Proceedings of the second ACL workshop on ethics in natural language processing (pp. 1-6). New Orleans, Louisiana, USA: Association for Computational Linguistics.