Invited Speakers

Fei Xia, University of Washington

BIO:

Fei Xia is a Full Professor at the Linguistics Department at the University of Washington (UW) and an adjunct faculty at the Department of Biomedical Informatics and Medical Education at the UW Medical School. Her research covers a wide range of NLP tasks such as grammar engineering, resource development, machine translation, information extraction, clinical NLP, and NLP for low-resourced languages. Her work is supported by grants from NSF, NIH, IARPA, Microsoft, IBM, and UW, and is in collaboration with researchers from both academia and industry. Fei Xia received her Bachelor's degree from Peking University, and M.S. and Ph.D. from the University of Pennsylvania. After graduation, she worked at IBM T. J. Watson Research Center before joining UW in 2005.

ABSTRACT:

If Data Could Talk

During my first year of graduate study at the University of Pennsylvania, I took Dr. Martha Palmer's course on natural language processing (NLP). I was so intrigued by the topics that, by the end of that semester, I decided to switch my research focus from database to NLP and became Martha's student. Throughout the years, Martha has always been my role model, epitomizing the qualities of a great researcher, leader, and mentor. One valuable lesson that I have learned from Martha is the importance of closely examining the data. Although it may sound simple, this step is often overlooked in many of recent studies. With the aid of numerous NLP packages, researchers can easily load, process, and evaluate NLP systems on benchmark datasets without delving into the specifics of the data itself. In this talk, I will underscore the importance of scrutinizing the data by describing two recent studies conducted by my team. One study pertains to the evaluation of Large Language Models (LLMs), while the other examines gender bias in educational materials and WordNet. In this context, the term "data" extends beyond just training and evaluation data. It encompasses a broader range of information, including lexical resources, instructions provided to Large Language Models (LLMs), system outputs, and any other relevant sources of information used in the research or analysis.

Owen Rambow, Stony Brook University

BIO:

ABSTRACT:

Propositional Content and Commitment to Truth

Two of the things I have learned from Martha Palmer are relevant to this talk. First, data annotation is not a necessary evil that should be done as quickly as possible so that we can get to the real work (i.e., tweaking machine learning). Instead, it represents a conceptualization of what we think is an important aspect of language or language use. The development of a manual and annotation exercises help us refine the concepts we are defining. Second, how we represent propositional content (who-did-what-to-whom) matters. In this talk, I will summarize work that goes beyond the lexical semantics of propositional content to pragmatics. We do not only communicate propositional content, we also signal to our discourse partners to what extent we are committed to the truth of that content. For example, we can include hedges, or we can talk about desires rather than factual content, or we can report what others said, leaving open our own commitment. This notion of commitment has been studied under different names (veridicality, factuality, factivity, belief, commitment) in various disciplines (linguistics, philosophy, psychology, NLP, AI). I will summarize past theoretical approaches, some annotation efforts, and some recent machine learning approaches. I will conclude by linking this notion back to the notion of propositional content: our commitment to the truth of a proposition can be different for different parts of a proposition. This will remain as future work.