For NAACL 2022, the D&I committee has put together a sample of the exciting research that will be presented at the conference. Read along for all the details and follow the work of these authors:


Connor Baumler

Recognition of They/Them as Singular Personal Pronouns in Coreference Resolution

NAACL Main Conference

We propose a method to test coreference resolution systems’ ability to differentiate singular and plural they/them pronouns. We find that existing systems are biased toward resolving “they” pronouns as plural, even when the correct resolution is clear to humans.

Author Bio: Connor is a first year Ph.D. student at the University of Maryland.

The author is available for Research Internship (Industry) opportunities!

Research Interests: fairness


Wanrong Zhu Twitter URL

Diagnosing Vision-and-Language Navigation: What Really Matters?

Imagination-Augmented Natural Language Understanding

NAACL Main Conference

Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments. In this work, we conduct a series of diagnostic experiments to unveil VLN agents’ focus during navigation.

Author Bio: Wanrong Zhu is a third-year PhD candidate in Computer Science at the University of California, Santa Barbara. Her research focuses on introducing visual information to assist natural language understanding/generation.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Multimodal


Davide Locatelli Twitter URL

Measuring Alignment Bias in Neural Seq2Seq Semantic Parsers

*SEM: The Eleventh Joint Conference on Lexical and Computational Semantics

In this work we investigate whether neural semantic parsers with a seq2seq architecture have a bias for monotonic or non-monotonic alignments. To answer this question we augment the popular Geo semantic parsing dataset with alignment annotations and create Geo-Aligned. We then study the performance of standard seq2seq models on the examples that can be aligned monotonically versus examples that require more complex alignments. Our empirical study shows that performance is significantly better over monotonic alignments.

Author Bio: Davide is a first-year Ph.D. student at the Universitat Politècnica de Catalunya. There, he works towards the development of interactive machine learning algorithms for compositional models of natural language.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Interactive learning, compositionality, emergent communication, semantic parsing


Juan Vásquez Twitter URL

HeteroCorpus: A Corpus for Heteronormative Language Detection

Workshop on Gender Bias in Natural Language Processing

To our knowledge, no other corpora that aims at detecting heteronormativity in English has been created yet. For this reason we present HeteoCorpus, a novel corpus for the detection on heteronormative language using computational methods.

Author Bio: Juan Vásquez is a 2nd-year MsC student in Computer Science with specialization in Artificial Intelligence at IIMAS, UNAM. His research focuses on very interested in D&I in AI, data engineering, design of explainable algorithms, and applied AI for the understanding of social phenomena.

The author is available for PhD opportunities!

Research Interests: interpretability


Jack

Outed by an Algorithm: A Study on Facebook’s Friends Recommendation System for Queer, Trans and Gender-Non-Conforming Users.

Queer in AI Workshop

This paper raises the issue of Facebook’s friends recommendation system indirectly outing queer users to their communities. The author proposes a few business rules to add to the algorithm in order to protect queer users’ privacy and safety.

Author Bio: Jack is a Tech Ethicist using his degrees in Informatics and Philosophy to explore the ethical and societal impacts of new technologies, especially on the queer community.

The author is available for Research Internship (Industry) opportunities!

Research Interests: online privacy, social media, data ethics, recommendation systems, gender bias, AI ethics


Maria Leonor Pacheco Twitter URL

A Holistic Framework for Analyzing the COVID-19 Vaccine Debate

NAACL Main Conference

The Covid-19 pandemic has led to infodemic of low quality information leading to poor health decisions. Combating the outcomes of this infodemic requires reasoning about the decisions individuals make. In this work we propose a holistic analysis framework connecting stance and reason analysis, and fine-grained entity level moral sentiment analysis. We study how to model the dependencies between the different level of analysis and incorporate human insights into the learning process.

Author Bio: Maria is a Postdoctoral Researcher at Microsoft Research, NYC. In Fall 2023, she will join the Department of Computer Science at the University of Colorado Boulder as an Assistant Professor. Maria completed her PhD in Computer Science at Purdue University under the supervision of Prof. Dan Goldwasser. Her current research focuses broadly on neural-symbolic methods to model natural language discourse.

Research Interests: structured prediction, neuro-symbolic NLP, discourse, computational social science


Gati L. Martin

SwahBERT: Language Model of Swahili

NAACL Main Conference

Swahili is a widely spoken language in Africa, but due to a lack of data, it has received less attention in NLP research and is categorized as a low-resource language. With the growth of digital platforms, we train the monolingual language model and introduce emotion dataset for Swahili.

Author Bio: Gati is currently a Ph.D. student at Soonchunhyang University. Her research interests include deep learning, text mining, and medical data analysis.

The author is available for Research Internship (Industry) opportunities!

Research Interests: natural language processing. deep learning. medical data analysis


David Ifeoluwa Adelani Twitter URL

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

NAACL Main Conference

This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains?

Author Bio: David Adelani is a final-year PhD student at Saarland Univeristy and a member of Masakhane. His research focuses on NLP for African languages, multilingual representation learning and privacy in NLP

The author is available for Faculty opportunities!

Research Interests: Low-resource NLP, Multilinguality


Neeraja Kirtane Twitter URL

Mitigating Gender Stereotypes in Hindi and Marathi.

Gender bias in Natural Language Processing

Methods to identify and mitigate gender bias has been done mostly in English language. We propose methods to address and mitigate gender stereotypes in Hindi and Marathi which are gendered and low resource languages.

Author Bio: Neeraja is a final year undergraduate student studying at Manipal Institute of Technology,India. Her research focuses on working on low resource languages in NLP and also making NLP systems free of bias.

The author is available for Research Internship (Academia) opportunities!

Research Interests: Fairness in NLP , NLP for low resource languages


Irene Li

LiGCN: Label-interpretable Graph Convolutional Networks for Multi-label Text Classification

DLG4NLP

Multi-label text classification (MLTC) is an attractive and challenging task in natural language processing (NLP). Compared with single-label text classification, MLTC has a wider range of application in practice. In this paper, we propose a heterogeneous graph convolutional network model to solve the MLTC problem by modeling tokens and labels as nodes in a heterogeneous graph.

Author Bio: Irene is a final-year Ph.D. student at the Yale University. Her research interests include NLP applications and graph neural networks.

The author is available for Postdoctoral opportunities!

Research Interests: graph neural networks


Minh-Tien Nguyen

Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback

NAACL Main Conference

The paper introduces a new framework to train summarization models with preference feedback interactively. By properly leveraging offline data and a novel reward model, we improve the performance regarding ROUGE scores and sample-efficiency.

Author Bio: Minh-Tien Nguyen is an AI researcher at Cinnamon AI. His research focuses on extending NLP technologies to business intelligence.

The author is available for Faculty opportunities!

Research Interests: Information extraction; text summarization; conversational AI


Sabrina J. Mielke Twitter URL

Reducing conversational agents’ overconfidence through linguistic calibration

NAACL Main Conference

We evaluate state-of-the-art chatbots for closed-book QA accuracy, finding that these models are poorly linguistically calibrated—meaning that the amount of confidence or hedging expressed linguistically does not correspond to change of correctness. We do however find that likelihood of correctness can accurately be predicted from model representations. By incorporating such metacognitive features into the training of a controllable generation model, we obtain a dialogue agent with greatly improved linguistic calibration.

Author Bio: Sabrina J. Mielke is a final-year PhD student at the Johns Hopkins University, currently researching open-vocabulary language modeling for vocabulary selection and unit discovery. While her pre-PhD work focused on formal language theory applied to parsing and translation, during her PhD she published on morphology, fair language model comparison, stochastic romanization (at Google AI), metacognition and calibration for chatbots (at Facebook AI Research), and tokenization (at HuggingFace).

Research Interests: generative language modeling, multilinguality, subwords, vocabulary, discrete units


Muhammad Reza Qorib

Frustratingly Easy System Combination for Grammatical Error Correction

NAACL Main Conference

We propose a simple system combination for grammatical error correction that outperforms previous state-of-the-art systems and prior system combination methods.

Author Bio: Reza is a PhD student in the Department of Computer Science, National University of Singapore. He received his bachelor’s degree in computer science from Universitas Indonesia in 2018. His research area is Natural Language Processing (NLP), with the current focus on Grammatical Error Correction.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Grammatical Error Correction


Yujie Lu Twitter URL

Imagination-Augmented Natural Language Understanding

NAACL Main Conference

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations. Such abilities enable us to construct new abstract concepts or concrete objects, and are essential in involving practical knowledge to solve problems in low-resource scenarios. Therefore, we introduce an Imagination-Augmented Cross-modal Encoder (iACE) to solve natural language understanding tasks from a novel learning perspective – imagination-augmented cross-modal understanding.

Author Bio: Yujie Lu is a first year CS PhD at UC Santa Barbara, advised by William Wang and Miguel Eckstein, at Natural Language Processing Group and Vision and Image Understanding Lab. My current research focuses on Vision and Language (grounding, multi-modal, embodied agent) that connect human and robots. I’m passionate about building self-learning human-like agent that could understand and interact with our multi-modal dynamic world.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Vision and Language (grounding, multi-modal, embodied agent)


Joseph Marvin Imperial Twitter URL

A Baseline Readability Model for Cebuano

Workshop on Innovative Use of NLP for Building Educational Applications (BEA)

Research in readability assessment for low-resource languages like Filipino has shown that traditional features such as syllable patterns is still one of the best predictors of complexity. We focus our sights on Cebuano, another major Philippine language, to develop the first-ever ML-based readability model and corpus for this language.

Author Bio: Joseph Imperial is a Filipino NLP researcher broadly interested in research on text complexity, readability assessment, and controllable language generation. He is also an incoming Ph.D. student at the University of Bath for the CDT ART-AI Program where he will work on building fair, interpretable, and trustworthy automatic assessment models.

Research Interests: readability assessment, text complexity, controlled language generation


Gaoyue Zhou Twitter URL

Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia

NAACL Main Conference

While neural networks demonstrate a remarkable ability to model linguistic content, capturing contextual information related to a speaker’s conversational role is an open area of research. In this work, we analyze the effect of speaker role on language use through the game of Mafia, in which participants are assigned either an honest or a deceptive role.

Author Bio: Gaoyue Zhou is a first-year MS in Robotics student at CMU Robotics Institute. Previously, she did her undergraduate in Computer Science and Applied Mathematics at UC Berkeley. During her time at Berkeley, she had the pleasure to work with Prof. Sergey Levine and Prof. John DeNero in the Berkeley Artificial Intelligence Research (BAIR) Lab.

The author is available for PhD opportunities!

Research Interests: interpretability; robotics and language; generalizability


Kurt Micallef Twitter URL

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese

Deep Learning for Low-Resource NLP (DeepLo)

We release new Maltese BERT-based language models which use significantly less data than the corpora used by other high-resource languages. Furthermore, we find that using carefully curated pre-training is important and at times results in significant improvements. Our analysis shows that the amount of data needed to achieve considerable improvements, does not need to be overly ambitious.

Author Bio: Kurt is an early-stage researcher at the University of Malta. He’s currently working in improving Natural Language Processing tools and technologies for the Maltese language.

Research Interests: Maltese, Low-Resource Languages, Multilinguality


Laurie Burchell Twitter URL

Exploring diversity in back translation for low-resource machine translation

DeepLo

A common way to improve back translation is to try to increase the ‘diversity’ of the generated corpus. We split ‘diversity’ into two aspects, lexical and syntactic, and introduce metrics to understand how diversity in the training data affects final neural machine translation performance.

Author Bio: Laurie is a third-year PhD student in the CDT for NLP at the University of Edinburgh. Her research focuses on data augmentation and filtering for low-resource machine translation.

Research Interests: machine translation\nlow-resource machine translation\ndata augmentation\nlanguage identification\ndata filtering


Hanxun Zhong

Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation

NAACL Main Conference

Since the dialogue history is usually long and noisy, most existing methods truncate the dialogue history to model the user’s personality. We propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more abundant and accurate persona information.

Author Bio: Hanxun Zhong is a master student at the Renmin University. His research interests focus on NLP, especially in dialogue systems.

The author is available for Research Internship (Industry) opportunities!

Research Interests: personality


Yuchen Eleanor Jiang Twitter URL

BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation

NAACL Main Conference

Is BLEU sufficient for document-level machine translation? The answer is no, as texts have properties beyond their individual sentences. BlonDe is an alternative automatic metric for document-level MT evaluation. As shown by experiments and human evaluation, BlonDe is a lot more selective than BLEU for document-level MT and shows a larger quality difference between human and machine translations.

Author Bio: Eleanor is a second-year Ph.D. at ETH Zürich. Her research focuses on generating and understanding long texts. To this end, she has recently been dabbling into document-level machine translation, coreference resolution, and coherence modeling. She also likes fun applications of natural language generation (news generation, lyrics generation, etc.).

The author is available for Research Internship (Industry) opportunities!

Research Interests: long text generation and understanding; discourse and coherence; structured prediction; machine translation


Wongyu Kim

Emp-RFT: Empathetic Response Generation via Recognizing Feature Transitions between Utterances

NAACL Main Conference

Dialogue System that generate empathetic responses. When dialogues become long, the system will comprehend the context well.

Author Bio: M.S. student of Yonsei University

The author is available for PhD opportunities!

Research Interests: NLP, Dialogue System


Bonaventure F. P. Dossou Twitter URL

A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation

NAACL Main Conference

In our paper, we create MAFAND (a new African news corpus covering 16 languages including 8 new languages). We demonstrate that we can add new languages, across new domains by fine-tuning large pre-trained models on small quantities of high-quality translation data.

The author is available for Research Internship (Industry) opportunities!

Research Interests: machine translation, multilinguality, speech recognition, named entity recognition, drug discovery


Deeksha Varshney

Commonsense and Named Entity Aware Knowledge Grounded Dialogue Generation

NAACL Main Conference

Despite demonstrating efficacy in empirical evaluation, past work for the task of Dialogue Generation has a few significant drawbacks. There is no explicit representation of entities, semantic relations, or conversation structures, in particular. We propose CNTF, a novel knowledge grounded dialogue generation model that utilizes dialogue context, unstructured textual information, and structural knowledge to facilitate explicit reasoning.

Author Bio: Deeksha is a final-year Ph.D. student at the Indian Institute of Technology Patna, India. Their research focuses on extending NLP techniques for building robust Dialogue Systems.

The author is available for Postdoctoral opportunities!

Research Interests: Knowledge Grounded Dialogue Generation


Natalia Ponomareva

Private NLP workshop, paper 5: Training Text-to-Text Transformers with Privacy Guarantees

PrivateNLP

Recent high-capacity Large Language Models (LLMs) are pre-trained on huge corpus of data that is usually treated public. This data can still contain personally identifiable information, such as names, phone numbers, and copyrighted material. In this work we investigate Differential Privacy for private pre-training of LLMs and demonstrate that downstream tasks are not affected by this. Further we show that DP helps with memorization and can be implemented at moderate cost in training speed.

Research Interests: domain adaptation, domain generalization, large language models, differential privacy


Kasturi Bhattacharjee Twitter URL

What do Users Care About? Detecting Actionable Insights from User Feedback

NAACL Main Conference

Users often leave feedback on a myriad of aspects of a product which, if leveraged successfully, can help yield useful insights that can lead to further improvements down the line. Detecting actionable insights can be challenging owing to large amounts of data as well as the absence of labels in real-world scenarios. In this work, we present an aggregation and graph-based ranking strategy for unsupervised detection of these insights from real-world, noisy, user-generated feedback. Our proposed approach significantly outperforms strong baselines on two real-world user feedback datasets and one academic dataset.

Author Bio: Kasturi is an Applied Scientist at AWS AI Labs and has a Ph.D. in Computer Science from UC Santa Barbara. Her research interests lie in sentiment analysis and emotion detection, generative models, semi-supervised and unsupervised learning from structured and unstructured data, with a focus on low resource scenarios and noisy user-generated text. Her research is geared towards building ML & NLP solutions for real-world customer problems, especially those that capture user opinions, intent and attributes from online text.

Research Interests: aspect-based sentiment analysis, generative models, semi-supervised learning


Zhiyu Chen Twitter URL

KETOD: Knowledge-Enriched Task-Oriented Dialogue

Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains. Towards building a human-like assistant that can converse naturally and seamlessly with users, it is important to build a dialogue system that conducts both types of conversations effectively. In this work, we investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.

Author Bio: Zhiyu Chen is a final-year Ph.D. student at the University of California, Santa Barbara. Her research interest focuses on building natural language interface for AI assistants, such as question answering, dialogue, and natural language generation.

Research Interests: Question Answering, Dialogue


Jamell Dacon Twitter URL

Towards a Multi-Layered Dialectal Analysis: A Case Study of African American English

HCI+NLP

The paper accepted to the HCI+NLP workshop is to highlight the need for dialectical languages in NLP data to create more robust language models.

Author Bio: Jamell Dacon (Mell) is a fourth year Ph.D. student in the Department of Computer Science and Engineering at Michigan State University (MSU). Their current research focuses on examining subjectivity in textual data and linguistic analyses of social media behavior.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Natural language Processing, fairness in M, computational linguistics


Zoey Liu Twitter URL

Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

NAACL Main Conference

Common designs of model evaluation typically focus on monolingual settings, where different models are compared according to their performance on a single data set that is assumed to be representative of all possible data for the task at hand. While this may be reasonable for a large data set, this assumption is difficult to maintain in low-resource scenarios, where artifacts of the data collection can yield data sets that are outliers, potentially making conclusions about model performance coincidental. We illustrate these concerns using morphological segmentation as a test case.

Author Bio: Zoey Liu is a computing innovation fellow supported by the NSF; her host institution is Boston College. Her research focuses on language typology and crosslinguistic generalizability.

Research Interests: multilinguality


Vandita Grover

Understanding the Sarcastic Nature of Emojis with SarcOji, Vandita Grover and Hema Banati

5th International Workshop on Emoji Understanding and Applications in Social Media

In this work, we present SarcOji which has been compiled from five publicly available sarcasm datasets. SarcOji contains labeled English texts which all have emojis. We also analyze SarcOji to determine if there is an incongruence in the polarity of text and emojis used therein. Further, emojis’ usage, occurrences, and positions in the context of sarcasm are also studied in this compiled dataset.

Author Bio: Vandita Grover is a Ph.D. scholar at Department of Computer Science, University of Delhi. Her research interests include Sentiment Analysis, Web-Mining, and Machine Learning. She has over a decade of experience as an Assistant Professor at the University of Delhi.

Research Interests: sentiment analysis explainable AI


Yangyang ahao

A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning

NAACL Main Conference

we present a novel versatile adaptive curriculum learning (VACL) framework, which presents a substantial step toward applying automatic curriculum learning on dialogue policy tasks. It supports evaluating the difficulty of dialogue tasks only using the learning experiences of dialogue policy and skip-level selection according to their learning needs to maximize the learning efficiency. Moreover, an attractive feature of VACL is the construction of a generic, elastic global curriculum while training a good dialogue policy that could guide different dialogue policy learning without extra effort on re-training.

Author Bio: Yangyang Zhao is a final-year Ph.D. student at the South China University of Technology, but is now in joint training at Utrecht University. Her research focuses on dialogue systems, reinforcement learning, and curriculum learning.

The author is available for Faculty opportunities!

Nouha Dziri Twitter URL

On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?

NAACL Main Conference

Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallucination. We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of >60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations.

Author Bio: Nouha Dziri is a Ph.D candidate at the University of Alberta working within the Alberta Machine Intelligence Institute under the supervision of Osmar Zaiane. Her research interests lie in building trustworthy conversational models from three perspectives: modelling, data, and evaluation. She has interned at Google Research, Microsoft Research, and Mila. Her work has been published in top-tier venues including TACL, NAACL and EMNLP. She actively serves as a reviewer for NLP conferences, journals, and workshops and was recognized among the best reviewers at ACL 2021. She is also a proponent of diversity and gives several talks to inspire females to pursue careers in STEM.

The author is available for Postdoctoral opportunities!

Research Interests: conversational AI


Julia Mendelsohn Twitter URL

Modeling Framing in Immigration Discourse on Social Media (Poster at WiNLP satellite workshop, paper published at NAACL 2021)

WiNLP satellite workshop

The framing of political issues can influence policy and public opinion. By creating a new dataset of immigration-related tweets labeled for multiple framing typologies from political communication theory, we develop supervised models to detect frames. We demonstrate how users’ ideology and region impact framing choices, and how a message’s framing influences audience responses.

Author Bio: Julia is a third-year PhD student at the University of Michigan School of Information. Her research is at the intersection of NLP and computational social science, and she is especially interested in developing NLP technologies to better understand political discussions online.

Research Interests: computational social science, fairness, bias


Man Luo

Neural Retriever and Go Beyond: A Thesis Proposal

Student Research Workshop

The paper is about neural information retriever, discussing the limitations and proposed solution for recent state-of-the-art retriever.

Author Bio: Man Luo is a forth year Ph.D. student at the Arizona State University. Her research focus on information retrieval, question answering and multimodal representation learning and search.

The author is available for Postdoctoral opportunities!

Research Interests: Information retrieval, question answering, multimodal representation learning and search.


Danilo Ribeiro

Entailment tree explanations via Iterative Retrieval-Generation Reasoner

NAACL Main Conference

Explainability of large language models output remains elusive. We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR), which generates structured explanations for a given hypothesis given a set of premises.

Author Bio: Danilo Ribeiro is a 5-th year PhD student at the Northwestern University, advised by Dr. Kenneth Forbus. The goal of my research is to build intelligent agents that are able to incorporate knowledge and reasoning when processing natural language, either by enhancing current NLP systems or by creating innovative ways of learning and applying knowledge to solve language tasks.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Question-answering, Dialog, Explainability


Burcu Can Buglalilar Twitter URL

TurkishDelightNLP: A Neural Turkish NLP Toolkit

NAACL Main Conference

We introduce a neural Turkish NLP toolkit called TurkishDelightNLP that performs computational linguistic analyses from morphological level to semantic level that involves tasks such as stemming, morphological segmentation, morphological tagging, part-of-speech tagging, dependency parsing, and semantic parsing, as well as high-level NLP tasks such as named entity recognition. The toolkit is publicly available at: http://rgcl.wlv.ac.uk/TurkishNLP/

Author Bio: Burcu Can is a Reader in Computational Linguistics as a member of RGCL, Wolverhampton

The author is available for Research Internship (Industry) opportunities!

Research Interests: morphology, syntax, semantics, representation learning


Hadas Orgad Twitter URL

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

NAACL Main Conference

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models’ internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. This study bridges the gap between extrinsic and intrinsic evaluations. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner.

Author Bio: I am a first year PhD student in the Technion, advised by Yonatan Belinkov. I am interested in understanding, defining and improving robustness in NLP models. This includes social biases and other biases resulting from spurious correlations in the data.

The author is available for Research Internship (Industry) opportunities!

Research Interests: robustness, interpretability, bias, fairness


Alejandro Rodriguez Perez

Distributed Text Representations Using Transformers for Noisy Written Language

LatinX in NLP Research Workshop

Traditionally in Natural Language Processing systems, methods rely on words as the core components of a text. These methods have shown limitations when dealing with noisy text. Unlike those, we propose a character-based approach to be robust against our target texts’ high syntactical noise.

Author Bio: Alejandro is a first-year Ph.D. student at Baylor University. His research focuses on Natural Language Processing, particularly on natural language representation.

The author is available for Research Internship (Industry) opportunities!

Research Interests: language representation


Armanda Lewis Twitter URL

Multimodal large language models for inclusive collaboration learning tasks

NAACL Main Conference

We summarize a project that utilizes large language models to facilitate inclusive collaboration, and discuss considerations for incorporating large language models into downstream tasks.

Author Bio: Armanda is a Ph.D. student at New York University. Her research interests include leveraging multimodal NLP within downstream educational tasks, building interpretable machine learning/AI tools to support learning and the arts, and examining inclusion and equity in the context of data science education. Her dissertation focuses on multimodal NLP to facilitate inclusive collaboration.

Research Interests: multimodality, QA, generative speech, interpretability, NLP in downstream educational tasks


Mashrura Tasnim Twitter URL

DEPAC: a Corpus for Depression and Anxiety Detection from Speech

Workshop on Computational Linguistics and Clinical Psychology (CLPsych)

Introducing DEPAC, a corpus of 2000+ high quality audio recordings to detect depression and anxiety from speech. The dataset also includes 200+ hand curated features. Machine learning models trained on DEPAC show promise in measuring depression severity.

Author Bio: Mashrura Tasnim received the B.Sc. and the M.Sc. degree in computer science and engineering from Bangladesh University of Engineering and Technology, Dhaka, Bangladesh, in 2014 and 2017, respectively. She is currently pursuing the Ph.D. in computing science at University of Alberta, Edmonton, Alberta, CA. Her research interest includes the application of machine learning and artificial intelligence in development of systems involving wearable sensors to monitor and support individuals with psychological disorders.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Machine learning, Mental health, Speech analysis


Abderrahmane Issam Twitter URL

Goud.ma: a News Article Dataset for Summarization in Moroccan Darija

North Africans in NLP

We introduce Goud.ma: a dataset of over 158k news articles for abstractive summarization in Moroccan Darija. Which we release publicly in an effort to encourage the diversity of NLP evaluation tasks in Darija.

Author Bio: Abderrahmane Issam is a Data Scientist working on NLP for Moroccan Arabic. His research focus is on NLP for Low Resource Languages.

Research Interests: NLP for Low Resource Languages, Multilingual NLP


Samee Ibraheem

Putting the “Con” in Context: Identifying Deceptive Actors in the Game of Mafia

NAACL Main Conference

Come improve your Mafia skills via a bot generating utterances of honest & deceptive players!

Author Bio: Samee Ibraheem is a PhD student in Computer Science at UC Berkeley advised by Professor John DeNero. His research interests lie in integrating speaker attribute information into NLP systems, with a focus on applications that are relevant to online security. Samee has been awarded the NSF Graduate Research Fellowship, the UC Berkeley Chancellor’s Fellowship, and previously received a Bachelor’s magna cum laude in Neurobiology with a minor in Computer Science from Harvard University.

The author is available for Research Internship (Industry) opportunities!

Research Interests: Leveraging Speaker Context for NLP


Matyáš Boháček Twitter URL

Developing disinformation detection models in low-resource contexts: Czech news article dataset for source-level credibility

QueerInAI workshop

Most disinformation (fake news) detection datasets are in English, and collecting one is both timely and expensive. We present a novel methodology for constructing such datasets in a low-resourced context and demonstrate it by creating a news article dataset for Czech. Our workshop poster then analyzes biases toward LGBTQ-related reporting in models trained on this dataset and outlines additional guidelines in addition to the original methodology to prevent these better.

Author Bio: Matyáš is a high school student at the Gymnasium of Johannes Kepler in Prague, working in ML research with the local university and corporate lab. His work focuses primarily on sign language representations, disinformation analysis, and media forensics.

The author is available for Research Internship (Industry) opportunities!

Research Interests: disinformation low-resource


Sunny Rai Twitter URL

Identifying Human Needs through Social Media: A study on Indian cities during COVID-19

In the 10th International Workshop on Natural Language Processing for Social Media.

During crisis, people often express their needs and struggles through social media posts. Taking inspiration from Frustration-Aggression theory, we propose a model to classify tweets expressing frustration which serves as an indicator of unmet needs. Our study reveals the major causes behind feeling frustrated during the lockdown and the second wave of the COVID-19 pandemic in India. Our proposed approach can be useful in timely identification and prioritization of emerging human needs in the event of a crisis.

Author Bio: This paper will be presented by Rohan Joseph. Rohan Joseph is a final year B.Tech student at the Mahindra University, India. His research focuses on metaphor generation and human needs detection on social media. He aims to design NLP solutions for social well being and would like to further his learning through interdisciplinary collaborations.

The author is available for Postdoctoral opportunities!

Research Interests: Computational social science, creative text processing, NLP for well being.


Sudeshna Das

Resilience of Named Entity Recognition under Adversarial Attack

DADC workshop

Author Bio: Sudeshna Das is a PhD candidate at the Indian Institute of Technology Kharagpur, India. Her primary area of research is Human Language Technology and her doctoral thesis deals with gender bias detection.

The author is available for Postdoctoral opportunities!

Research Interests: AI for Social Good


Ao Jia

Beyond Emotion: Α Multi-Modal Dataset for Human Desire Understanding

NAACL Main Conference

Desire is a strong wish to do or have something, which involves not only a linguistic expression, but also underlying cognitive phenomena driving human feelings. As a strikingly understudied task, it is difficult for machines to model and understand desire due to the unavailability of benchmarking datasets with desire and emotion labels. To bridge this gap, we present MSED, the first multi-modal and multi-task sentiment, emotion and desire dataset, which contains 9,190 text-image pairs.

Author Bio: Ao Jia is a undergraduate student in Beijing Institute of Technology. Their work intends to explore desire detection in NLP.

The author is available for PhD opportunities!

Research Interests: sentiment analysis