Kyudo retreat @ NY 2023 Spring.

[Google Scholar] [Github] [LinkedIn] [Twitter]

[Harvard Website]

Contact


Hi, I am a Ph.D. student at Harvard-MGB AIM, jointly with Maastricht University, under the guidance of Hugo Aerts, Ph.D. and Danielle S. Bitterman, M.D. I am the recipient of the 2024 Google PhD Fellowship in Natural Language Processing, mentored by Asma Ghandeharioun, Ph.D. I am also affiliated with the Boston Children's Hospital Computational Health Informatics Program (CHIP), where we have the privilege of collaborating closely with Guergana Savova, Ph.D. and Tim Miller, Ph.D.

I am deeply interested in the knowledge representation and features of large language models, particularly their ability to translate across modalities. My goal is to develop more interpretable AI systems for critical domains such as healthcare. Additionally, I am passionate about enhancing patient communication and establishing robust safety evaluation methods for high-stakes tasks. It is crucial to assess the impact of AI on all healthcare stakeholders—including patients, providers, and others.

During COVID-19, I completed with M.S. in Computational Linguistics from Brandeis University, where I was fortunate to be advised by Professor Nianwen Xue Ph.D. where I fully explored my interests and met many wonderful people and friends.

Before Brandeis, I spent 4 years as an undergraduate in Math, Japanese and Linguistics at St. Olaf College, really enjoyed my liberal arts education, click here if you want to learn more about my undergrad I'm a button.

During my free time, I enjoy doing dragonboat and kyudo 🏹. Feel free to contact me if you to chat!


News


Selected Publications

(* indicate equal contribution)

🐰 RABBITS: Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
*Jack Gallifant, *Shan Chen, Pedro Moreira, ... Leo Anthony Celi, Thomas Hartvigsen, and Danielle S. Bitterman
EMNLP 2024
[🤗] [Tweet] [arXiv] [Industrial adaptation] [Code]

Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
*Shan Chen, *Jack Gallifant, Mingye Gao, Pedro Moreira, ... Leo Anthony Celi, William G. La Cava, and Danielle S. Bitterman
Neurips 2024
[Website] [Code] [Data]

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models
Wonjin Yoon, Shan Chen, ... Danielle S. Bitterman, Majid Afshar, and Timothy Miller
Under Review
[medrXiv] [Code] [Coda Bench]

OncQA: The impact of using an AI chatbot to respond to patient questions
Shan Chen, Marco Guevara ... Hugo Aerts, Timothy Miller, Guergana Savova, Raymond Mak, Majid Afshar, and Danielle S. Bitterman
Lancet Digital Health
[arXiv] [Code] [🤗] [Article] [NYTimes]

Measuring Pointwise V-Usable Information In-Context-ly
Sheng Lu, Shan Chen, Yingya Li, Danielle S. Bitterman, Guergana Savova, and Iryna Gurevych
EMNLP 2023
[EMNLP] [Code] [Tweet] [Tutorial]

Large Language Models to Identify Social Determinants of Health in Electronic Health Records
*Marco Guevara, *Shan Chen, Spencer Thomas ... Hugo Aerts, Guergana Savova, Raymond Mak, and Danielle S. Bitterman
Nature Digital Medicine
[Dataset] [arXiv] [Code] [🤗]

Use of Artificial Intelligence Chatbots for Cancer Treatment Information
Shan Chen, Benjamin Kann, Michael Foote, Hugo Aerts, Guergana Savova, Raymond Mak and Danielle S. Bitterman
JAMA ONC
[arXiv] [Code] [Data] [Article] [News]
Editorial by Atul J. Butte, MD, PhD! [JAMA]
News covereage by Bloomberg, NBC and many others about this work! [News]

Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
*Shan Chen, *Yingya Li, Sheng Lu, Hoang Van, Hugo Aerts, Guergana Savova, and Danielle S. Bitterman
JAMIA
[JAMIA] [arXiv] [Code]

Natural language processing to automatically extract the presence and severity of esophagitis in notes of patients undergoing radiotherapy
Shan Chen, Marco Guevara, Nicolas Ramirez ... Hugo Aerts, Tim Miller, Guergana Savova, Raymond Mak, and Danielle S. Bitterman
JCO CCI
[ASTRO] [Paper] [arXiv] [Code]
Oral Presentation @ ASTRO 2023 < 9%

Medications detection in tweets using transformer networks and multi-task learning
Dongfang Xu, Shan Chen, and Tim Miller
Proceedings of the BioCreative VII Challenge 2021
[Paper] [Code]
First place!!!


Mentoring



Selected Honors



Invited Talks



Service