Welcome! I’m Harry (pronouns: he, him). I’m a PhD candidate in the Department of Biomedical Informatics at Columbia University where I am advised by Noémie Elhadad. For most of my PhD, I was also a Visiting Postgraduate Research Fellow at Harvard Medical School. Recently, I served a 2-year term on the Journal of the American Medical Informatics Association Student Editorial Board and was a co-organizer of the 2023 Machine Learning for Health (ML4H) Symposium. I am also a DAAD AInet Fellow for Safety and Security in AI through the German Academic Exchange Service, Deutscher Akademischer Austauschdienst (DAAD).
My long-term goal is to advance health equity science. Building on two decades of domestic and international experience in clinical and public health informatics, my research focuses especially on human-centered artificial intelligence (AI) and development of systematic, scalable, data-driven approaches to promote health equity. My work usually examines and applies methods such as machine learning, natural language processing, and spatiotemporal analysis in addition to traditional biostatistics and epidemiology. I am particularly interested in using and interrogating multimodal data sources and the vast toolbox that computational learning offers to better understand, improve, and facilitate study of health in populations and communities that are marginalized. Generally, my research can be grouped into four primary domains:
My PhD was funded by training fellowships from the National Library of Medicine and National Institute of Allergy and Infectious Diseases. I am also the recipient of a Computational and Data Science Fellowship from the Association for Computing Machinery (ACM) Special Interest Group in High Performance Computing (SIGHPC). I have a Master of Philosophy in Biomedical Informatics from Columbia University, Master of Applied Science in Spatial Analysis from the Johns Hopkins Bloomberg School of Public Health, and a Bachelor of Arts in Sociology and History from Yale University.
harry [dot] reyes [at] columbia [dot] edu
Dept of Biomedical Informatics
Columbia University
622 West 168th Street, PH20
New York, NY 10032
Citations: 1199 h-index: 14
Mining the Health Disparities and Minority Health Bibliome: A Computational Scoping Review and Gap Analysis of 200,000+ Articles.
Science Advances.
2024.
10(4):eadf9033. PMID: 38266089.
Development of Machine Learning-Based Mpox Surveillance Models in a Learning Health System.
medRxiv [Preprint].
2024 September.
doi: 10.1101/2024.09.25.24314318.
Enabling a Learning Public Health System: Enhanced Surveillance of HIV and Other Sexually Transmitted Infections.
medRxiv [Preprint].
2024 April.
doi: 10.1101/2024.09.25.24314318.
Professional-Patient Boundaries: a National Survey of Primary Care Physicians’ Attitudes and Practices.
J Gen Intern Med.
2020.
35(2):457–464. PMID: 31755012.
A Scoping Review of Ethics Considerations in Clinical Natural Language Processing.
JAMIA Open.
2022.
5(2):ooac039. PMID: 35663112.
What Works in Medication Reconciliation: An On-Treatment and Site Analysis of the MARQUIS2 Study.
BMJ Qual Saf.
2023.
32(8):457-469. PMID: 36948542.
Ranked #1 among the top research articles of 2023 by BMJ Quality and Safety
Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text. AMIA Annu Symp Proc. 2024 January. 289-298.
Bear Don’t Walk IV OJ, Pichon A, Reyes Nieva H, Sun T, Altosaar J, Natarajan K, Perotte A, Tarczy-Hornoch P, Demner-Fushman D, Elhadad N .Exploring Gender Disparities in Time to Diagnosis. Machine Learning for Health (ML4H) Workshop at the Conference on Neural Information Processing Systems (NeurIPS). 2020 December. 1-6.
Sun TY, Bear Don’t Walk OJ IV, Chen JL, Reyes Nieva H, Elhadad N.
Time of day and the decision to prescribe antibiotics.
JAMA Intern Med.
2014.
174(12):2029-31. PMID: 25286067.
Scale-up of networked HIV treatment in Nigeria: creation of an integrated electronic medical records system.
Int J Med Inform.
2015.
84(1):58-68. PMID: 25301692.
Characteristics of Disease-Specific and Generic Diagnostic Pitfalls: A Qualitative Study.
JAMA Netw Open.
2022.
5(1):e2144531. PMID: 35061037.
Leveraging Computational Methods to Advance Health Equity Science Through Evidence Synthesis, Strategic Monitoring, and Precision Public Health
The studies presented in this dissertation seek to advance health equity science by drawing from informatics-based methods and subfields of artificial intelligence such as machine learning, natural language processing, and symbolic reasoning. This thesis employs robust methods for big data collection, integration, and analysis to leverage existing and emerging data sources including a large corpus of biomedical literature, electronic health records from the largest public health information exchange in the United States, open government datasets, proprietary national insurance claims datasets, and public health reporting data.
Health Disparities and Minority Health (HDMH) Monitor / Principal developer
The Health Disparities and Minority Health (HDMH) Monitor is an online article repository and interactive dashboard that leverages natural language processing and machine learning methods to support scientific discovery via automated archive, search, information synthesis, and data visualization of articles concerning HDMH in MEDLINE. It is based on a large-scale computational scoping review aimed at characterizing major topics found among nearly a quarter million scientific articles in the HDMH literature, examining change in topic mention over time, identifying notable gaps in coverage, and deriving actionable insights for further inquiry.
C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text / Co-developer
The C-REACT dataset is a large publicly-available corpus of sentences from clinical notes manually annotated for information related to race and ethnicity (RE). This corpus contains two sets of gold-standard annotations for RE data. The first set contains granular RE-information such as patient country of origin and spoken language. The second set of annotations contains RE labels manually assigned by clinicians. This corpus is intended to improve understanding about granular information related to RE contained within the clinical note and how this information might be used to infer RE.
HERA: Health Equity Research Assessment / Co-developer
The Health Equity Research Assessment (HERA) is a large-scale characterization conducted across Observational Health Data Science Informatics (OHDSI) sites with heterogeneous populations and insurance coverage types, allowing for identification of persistent and generalizable trends in diagnosis differences. The HERA dashboard and visualizations can be used to download study data, further investigate health differences, and generate novel hypotheses.
Covidwatcher is an app and online portal that surveyed users about their exposure to COVID-19, symptoms, access to medical care, and impact on daily life. The data collected was used to track the spread of coronavirus, giving citizens real-time information about hot spots, and enabling health care officials to deploy resources where needed most.
Harvard PEPFAR Nigeria Adult quality improvement tool / Co-developer
Co-developed and evaluated a utility to extract information from EHR data warehouse and generate measures based on 15 adult quality of care indicators at 33 Harvard PEPFAR sites in Nigeria. Module reviews continuity of care, drug therapy initiation, loss to follow-up, laboratory monitoring, disease screening based on clinical symptoms assessment, treatment failure, and treatment response.
Reyes Nieva H, Zucker J, Elhadad N. Elucidating Health Inequities and Research Gaps in HIV and Other Sexually Transmitted Infections Using Data Mining and a Large Language Model: A Computational Scoping Review
September 2024
STI Prevention; Atlanta, GA
Reyes Nieva H. Challenges, Opportunities, and Considerations: Promoting Inclusive Research in the Era of Big Data
November 2023
American Medical Informatics Association Annual Symposium; New Orleans, LA
Invited presenter and panelist for session on “Advancing Diversity, Equity, and Inclusion in Biomedical Informatics Research: Strategies and Best Practices for Using Inclusive Language Across the Research Lifecycle”
Reyes Nieva H, Tucker EG, Castor D, Yin MT, Gordon P, Elhadad N, Zucker J. Health Information Exchange Enables Enhanced STI Surveillance Using Electronic Health Record Data
July 2023
HIV and STI 2023 World Congress; Chicago, IL
Reyes Nieva H, Zucker J, Tucker EG, McLean J, DeLaurentis C, Gunaratne S, Elhadad N. Development and Validation of Machine and Deep Learning Classifiers for Monkeypox
May 2023
Symposium on Artificial Intelligence for Learning Health Systems (SAIL); Río Grande, Puerto Rico
Reyes Nieva H, Elhadad N. Mining the Health Disparities and Minority Health Bibliome: A Computational Scoping Review
November 2022
American Medical Informatics Association Annual Symposium; Washington, DC
Spotlighted and invited for special presentation by AMIA DEI Committee for “demonstrating best practices in promoting diversity, equity, and inclusion through scholarly communications in biomedical informatics”
Sun T, Hardin J, Reyes Nieva H, Natarajan K, Cheng RF, Ryan P, Elhadad N. Large-scale Characterization of Gender Differences in Age at Diagnosis and Time to Diagnosis in Longitudinal Observational Health Data
October 2022
National Institutes of Health (NIH) Office of Research on Women’s Health (ORWH) Workshop on Gender and Health: Impacts of Structural Sexism, Gender Norms, Relational Power Dynamics, and Gender Inequities; Virtual Event due to COVID-19
Reyes Nieva H, Sun TY, Gorman SR, Mao G, Elhadad N. Differential Presentation and Delays in Treatment for Acute Myocardial Infarction Associated with Sex and Race/Ethnicity
November 2021
American Medical Informatics Association Annual Symposium; San Diego, CA
Pang C, Chen R, Reyes Nieva H, Kalluri KS, Sun TY, Jiang X, Rodriguez VA, Natarajan K. Characterization and Comparison of Embedding Algorithms for Phenotyping across a Network of Observational Databases
November 2020
American Medical Informatics Association Annual Symposium; Virtual event due to COVID-19
Boskey E, Tabaac A, Wigell R, Wolf K, Lage I, Landrum S, Reyes Nieva H, Bearnot B, Streed C. Using patterns of missing EHR data to identify care disparities in gender diverse patients
October 2020
APHA Annual Meeting and Expo; Virtual event due to COVID-19
Reyes Nieva H, Blackley S, Streed C, Fiskio J, Zhou L. High physician and clinic-level variation in documentation of sexual orientation and gender identity in the electronic health record
April 2018
New England Science Symposium; Boston, MA
Received the Ruth and William Silen, MD Oral Presentation Award
Mlaver E, Dalal AK, Reyes Nieva H, Chang F, Hanna J, Ravindran S, McNally K, Stade D, Morrison C, Bates D, Dykes P. An Analysis of Patient Portal Use in the Acute Care Setting
November 2015
American Medical Informatics Association Annual Meeting; San Francisco, CA
Reyes Nieva H, Palm K, Zucconi T. Advocacy and Implementation: Gathering Sexual Orientation and Gender Identity Demographics in the Clinical Setting
September 2015
GLMA 33rd Annual Conference; Portland, OR
Reyes Nieva H, Doctor JN, Friedberg MW, Birks C, Fiskio JM, Volk LA, Linder JA. Comparing Clinicians’ Perception of Their Own and Their Peers’ Antibiotic Prescribing to Actual Antibiotic Prescribing for Acute Respiratory Infections in Primary Care
April 2014
Society of General Internal Medicine Annual Meeting; San Diego, CA
Received the Outstanding Quality and Patient Safety Oral Presentation Award