Date: Sunday, July 10
|11:00–11:30||Introduction to Queer in AI and Initiatives|
|11:30–12:30||Panel on Non-Binary Representation in Language Technologies|
|14:00–14:30||Lightning Talks by Sponsors|
|14:30–15:30||Panel on Gender as a Variable in NLP|
|16:00–16:45||Socials with Sponsors|
Detecting Harmful Online Conversational Content Detection towards LGBTQIA+ Individuals
Jamell Dacon, Harry Shomer, Shaylynn Crum-Dacon, Jiliang Tang
Harmful online content from real-word conversations has become a major issue, hence, we introduce a real-world dataset for the task of harmful conversational content detection to study and understand stereotypical societal biases against LGBTQIA+ individuals.
Outed by an Algorithm: A Study on Facebook’s Friends Recommendation System for Queer, Trans and Gender-Non-Conforming Users
This paper raises the issue of Facebook’s friends recommendation system indirectly outing queer users to their communities. The author proposes a few business rules to add to the algorithm in order to protect queer users’ privacy and safety.
Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models
Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May
We introduce a new benchmark dataset to quantify the biases against queer and trans people that are encoded in large language models (LLMs) such as BERT. Results show mitigation of bias is possible via finetuning the models on data written by and/or about queer people.
Use of a Stylometric Map-Based Corpus for Tracking Individual Variation in Relation to Gender and Sex
Theodore Daniel Manning, Harleigh Niyu, Alejandro Jorge Napolitano Jawerbaum, Patrick Juola
Map Lemon is a corpus of linguistic variation currently in its infancy. Incidental findings using this corpus indicate that the gender identity of a given writer may be able to be disambiguated.
Overview of STEM Science as Process, Data, Material, and Method Named Entities
The STEM-NER-60k corpus posits a structured model of scholarly article abstracts in terms of process, method, material, and data entities across 10 different STEM domains. This study presents, for the first-time, an analysis of a large-scale multidisciplinary corpus as an inadvertent feasibility test of characterizing Science with domain-independent concepts for over 1M entities.
CNSC: Czech news article dataset for classification of originating source and its credibility
The proposed CNSC dataset contains over 22,000 unique articles spanning nine major Czech news sources across the spectrum of trustworthiness and political profiling. For this workshop, we will explore the trends in the data that concern reporting about LGBTQ+-related topics. This provides context into the potentially harmful effects this has on the classification methods trained upon highly biased data collected in areas like Eastern Europe.
Tackling Gender Microaggressions in Hindi
In this paper, I create a small gender microagressions dataset in Hindi and crowdsource its labeling, and then construct a novel pipeline to detect microagressions in Hindi.