Publications

Click to show abstract.

Conference (abstract) Conference (proceedings) Journal Workshop Patent Book

		2026 (13 Publications)
		Vijay M Tiyyala, Cerina Dubois, Clarissa Madar, Ryan Vandrey, Johannes Thrul, Mark Dredze, John W Ayers. Characterizing public comments via Regulations.gov in response to proposed cannabis rescheduling in the United States. Addiction, 2026. [PDF] [Bibtex] [Close] @article{https://doi.org/10.1111/add.70410, abstract = {Abstract Aims The United States Drug Enforcement Administration's (DEA) proposed rescheduling of cannabis from Schedule I to Schedule III under the Controlled Substances Act marks a significant shift in federal policy. Understanding public sentiment toward this policy is critical for guiding the current cannabis rescheduling effort as well as future reforms. The objective of this study is to characterize public comments submitted to Regulations.gov regarding the DEA's cannabis rescheduling proposal and identify underlying justifications for support or opposition. Design A mixed-methods analysis was conducted. Setting Online public comments submitted to Regulations.gov regarding the DEA's cannabis rescheduling proposal. Participants 42 913 public comments submitted between 21 May and 22 July 2024. Measurements Comments were analyzed for sentiment towards the proposed rescheduling (support, oppose or insufficient rescheduling) and thematic justifications using manual and automated natural language processing techniques. A two-stage annotation approach was employed: manual coding of 200 randomly sampled comments by multiple independent evaluators, followed by automated classification of all 42 913 comments using open source Large Language Model (LLM) validated against the manual annotations. Findings Using LLM-based classification validated against human annotations [88\% agreement, F1 (harmonic mean of precision and recall) = 0.86], we found that among 42 913 comments, 28.85\% [95\% confidence interval (CI) = 28.44\%--29.24\%] supported rescheduling, 6.74\% (95\% CI = 6.50\%--6.99\%) opposed and 63.50\% (95\% CI = 63.06\%--63.99\%) deemed the proposal insufficient, favoring further rescheduling or complete de-scheduling of cannabis. Among the 200 manually annotated comments, therapeutic benefits (56.7\%, 95\% CI = 46.7\%--66.7\%) and economic impacts (27.8\%, 95\% CI = 18.9\%--37.8\%) were the most common justifications among supporters. Public health risks (100.0\%, 95\% CI = 100.0\%--100.0\%), addictiveness concerns (71.4\%, 95\% CI = 42.9\%--100.0\%) and concerns about underage use (57.1\%, 95\% CI = 14.3\%--85.7\%) were predominant in opposing comments. Insufficient rescheduling comments cited therapeutic benefits (37.8\%, 95\% CI = 28.5\%--48.0\%), economic impacts (28.6\%, 95\% CI = 19.4\%--37.8\%) and criminal justice reform (26.5\%, 95\% CI = 18.4\%--35.7\%) as primary justifications. Conclusions Public sentiment on Regulations.gov supports the United States Drug Enforcement Administration's proposal for cannabis rescheduling, though the majority views the proposed Schedule III classification as inadequate and supports further rescheduling or complete de-scheduling of cannabis.}, author = {Tiyyala, Vijay M. and Dubois, Cerina and Madar, Clarissa and Vandrey, Ryan and Thrul, Johannes and Dredze, Mark and Ayers, John W.}, date-added = {2026-05-08 19:28:49 -0400}, date-modified = {2026-05-08 19:29:31 -0400}, file = {https://doi.org/10.1111/add.70410}, journal = {Addiction}, month = {April}, title = {Characterizing public comments via Regulations.gov in response to proposed cannabis rescheduling in the United States}, year = {2026}, bdsk-url-1 = {https://onlinelibrary.wiley.com/doi/abs/10.1111/add.70410}, bdsk-url-2 = {https://doi.org/10.1111/add.70410} } Abstract Aims The United States Drug Enforcement Administration's (DEA) proposed rescheduling of cannabis from Schedule I to Schedule III under the Controlled Substances Act marks a significant shift in federal policy. Understanding public sentiment toward this policy is critical for guiding the current cannabis rescheduling effort as well as future reforms. The objective of this study is to characterize public comments submitted to Regulations.gov regarding the DEA's cannabis rescheduling proposal and identify underlying justifications for support or opposition. Design A mixed-methods analysis was conducted. Setting Online public comments submitted to Regulations.gov regarding the DEA's cannabis rescheduling proposal. Participants 42 913 public comments submitted between 21 May and 22 July 2024. Measurements Comments were analyzed for sentiment towards the proposed rescheduling (support, oppose or insufficient rescheduling) and thematic justifications using manual and automated natural language processing techniques. A two-stage annotation approach was employed: manual coding of 200 randomly sampled comments by multiple independent evaluators, followed by automated classification of all 42 913 comments using open source Large Language Model (LLM) validated against the manual annotations. Findings Using LLM-based classification validated against human annotations [88\% agreement, F1 (harmonic mean of precision and recall) = 0.86], we found that among 42 913 comments, 28.85\% [95\% confidence interval (CI) = 28.44\%--29.24\%] supported rescheduling, 6.74\% (95\% CI = 6.50\%--6.99\%) opposed and 63.50\% (95\% CI = 63.06\%--63.99\%) deemed the proposal insufficient, favoring further rescheduling or complete de-scheduling of cannabis. Among the 200 manually annotated comments, therapeutic benefits (56.7\%, 95\% CI = 46.7\%--66.7\%) and economic impacts (27.8\%, 95\% CI = 18.9\%--37.8\%) were the most common justifications among supporters. Public health risks (100.0\%, 95\% CI = 100.0\%--100.0\%), addictiveness concerns (71.4\%, 95\% CI = 42.9\%--100.0\%) and concerns about underage use (57.1\%, 95\% CI = 14.3\%--85.7\%) were predominant in opposing comments. Insufficient rescheduling comments cited therapeutic benefits (37.8\%, 95\% CI = 28.5\%--48.0\%), economic impacts (28.6\%, 95\% CI = 19.4\%--37.8\%) and criminal justice reform (26.5\%, 95\% CI = 18.4\%--35.7\%) as primary justifications. Conclusions Public sentiment on Regulations.gov supports the United States Drug Enforcement Administration's proposal for cannabis rescheduling, though the majority views the proposed Schedule III classification as inadequate and supports further rescheduling or complete de-scheduling of cannabis.

		Ashley M Witmer, Carlos Aguirre, Susanna Lewis, Zoena Howland, Lida King, Susan Han, Mark Dredze, Holly C Wilcox, James Aluri. The relationship between stressors and seeking a referral to mental health care among college students using an online screening platform. Journal of American College Health, 2026;0(0):1--10. [PDF] [Bibtex] [Close] @article{Witmer24042026, abstract = { Objective: To examine whether experiencing recent stressors predicted seeking a referral to mental health care among college students using the Interactive Screening Program (ISP). Participants: A total of sixty-three thousand four hundred seven college students across 58 postsecondary institutions who participated in the ISP from 2009 to 2024. Methods: Natural Language Processing (NLP) methods categorized stressors students reported and determined whether students sought a referral to care. Results: Reporting any recent stressor increased the likelihood of seeking a referral (aOR: 1.56, 95\% CI: 1.42--1.71). Relationship issues, abuse, mental health, workplace issues, and family dynamics were positively associated with referral-seeking, while financial issues and academic responsibilities were negatively associated. Each additional stressor was associated with a 16\% increase in referral-seeking odds (aOR: 1.16, 95\% CI: 1.14--1.18). Conclusions: Findings provide insight into the stressors college students face and highlight the ISP's proactive ability to identify students experiencing various stressors and connect them to care. }, author = {Ashley M. Witmer and Carlos Aguirre and Susanna Lewis and Zoena Howland and Lida King and Susan Han and Mark Dredze and Holly C. Wilcox and James Aluri}, date-added = {2026-05-08 19:27:28 -0400}, date-modified = {2026-05-08 19:27:28 -0400}, doi = {10.1080/07448481.2026.2649523}, file = {https://doi.org/10.1080/07448481.2026.2649523}, journal = {Journal of American College Health}, note = {PMID: 42030196}, number = {0}, pages = {1--10}, publisher = {Taylor & Francis}, title = {The relationship between stressors and seeking a referral to mental health care among college students using an online screening platform}, volume = {0}, year = {2026}, bdsk-url-1 = {https://doi.org/10.1080/07448481.2026.2649523} } Objective: To examine whether experiencing recent stressors predicted seeking a referral to mental health care among college students using the Interactive Screening Program (ISP). Participants: A total of sixty-three thousand four hundred seven college students across 58 postsecondary institutions who participated in the ISP from 2009 to 2024. Methods: Natural Language Processing (NLP) methods categorized stressors students reported and determined whether students sought a referral to care. Results: Reporting any recent stressor increased the likelihood of seeking a referral (aOR: 1.56, 95\% CI: 1.42--1.71). Relationship issues, abuse, mental health, workplace issues, and family dynamics were positively associated with referral-seeking, while financial issues and academic responsibilities were negatively associated. Each additional stressor was associated with a 16\% increase in referral-seeking odds (aOR: 1.16, 95\% CI: 1.14--1.18). Conclusions: Findings provide insight into the stressors college students face and highlight the ISP's proactive ability to identify students experiencing various stressors and connect them to care.

		Zhen Wang, Fan Bai, Zhongyan Luo, Jinyan Su, Kaiser Sun, Xinle Yu, Jieyuan Liu, Kun Zhou, Claire Cardie, Mark Dredze, Eric P Xing, Zhiting Hu. FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights. International Conference on Machine Learning (ICML), 2026. [Bibtex] [Close] @inproceedings{Wang:2026aa, abstract = {Autonomous agents powered by large language models (LLMs) promise to accelerate scientific discovery, but rigorously evaluating their capacity for verifiable discovery remains a central challenge. Existing benchmarks face a trade-off: they either rely on LLM-as-judge evaluations of automatically generated papers, or optimize isolated performance metrics that provide only coarse proxies for scientific insight. To address this, we introduce FIRE-Bench (Full-cycle Insight Rediscovery Evaluation), a benchmark that evaluates agents through the rediscovery of established findings from recent, high-impact machine learning research. Agents are given only a high-level research question from a published study and must autonomously design experiments, implement code, execute their plans, and derive conclusions supported by empirical evidence. We evaluate a range of state-of-the-art agents with frontier model backbones, such as gpt-5, on FIRE-Bench. Our results show that full-cycle scientific research remains challenging for current agent systems: even the strongest agents achieve limited rediscovery success, exhibit high variance across runs, and display recurring failure modes in experimental design, execution, and evidence-based reasoning. Overall, FIRE-Bench provides a rigorous and diagnostic framework for measuring progress toward reliable agent-driven scientific discovery.}, author = {Zhen Wang and Fan Bai and Zhongyan Luo and Jinyan Su and Kaiser Sun and Xinle Yu and Jieyuan Liu and Kun Zhou and Claire Cardie and Mark Dredze and Eric P. Xing and Zhiting Hu}, booktitle = {International Conference on Machine Learning (ICML)}, date-added = {2026-04-30 20:30:25 -0400}, date-modified = {2026-04-30 20:31:44 -0400}, title = {FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights}, year = {2026} } Autonomous agents powered by large language models (LLMs) promise to accelerate scientific discovery, but rigorously evaluating their capacity for verifiable discovery remains a central challenge. Existing benchmarks face a trade-off: they either rely on LLM-as-judge evaluations of automatically generated papers, or optimize isolated performance metrics that provide only coarse proxies for scientific insight. To address this, we introduce FIRE-Bench (Full-cycle Insight Rediscovery Evaluation), a benchmark that evaluates agents through the rediscovery of established findings from recent, high-impact machine learning research. Agents are given only a high-level research question from a published study and must autonomously design experiments, implement code, execute their plans, and derive conclusions supported by empirical evidence. We evaluate a range of state-of-the-art agents with frontier model backbones, such as gpt-5, on FIRE-Bench. Our results show that full-cycle scientific research remains challenging for current agent systems: even the strongest agents achieve limited rediscovery success, exhibit high variance across runs, and display recurring failure modes in experimental design, execution, and evidence-based reasoning. Overall, FIRE-Bench provides a rigorous and diagnostic framework for measuring progress toward reliable agent-driven scientific discovery.

		Matthew R Allen, Vijay M Tiyyala, Karthik Ramesh, Nimit Desai, Job Shiach, Mark Dredze, John W Ayers. A Novel Workflow for Artificial Intelligence-Enhanced Patient Messaging Services. Applied Clinical Informatics, 2026;17(02):269--274. [PDF] [Bibtex] [Close] @article{Allen2026, author = {Allen, Matthew R. and Tiyyala, Vijay M. and Ramesh, Karthik and Desai, Nimit and Shiach, Job and Dredze, Mark and Ayers, John W.}, date-added = {2026-04-27 22:23:11 -0400}, date-modified = {2026-04-27 22:23:48 -0400}, doi = {10.1055/a-2852-9026}, file = {https://thieme-connect.de/products/ejournals/abstract/10.1055/a-2852-9026}, journal = {Applied Clinical Informatics}, note = {Published online: 27 April 2026}, number = {02}, pages = {269--274}, title = {A Novel Workflow for Artificial Intelligence-Enhanced Patient Messaging Services}, volume = {17}, year = {2026}, bdsk-url-1 = {https://doi.org/10.1055/a-2852-9026} }

		Akanksha Suresh, Priyanka Fernandes, Ayah Zirikly, Elaine C Thompson, Anne R Links, Keith Harrigian, Brant Chee, Mark Dredze, Mary Catherine Beach, Somnath Saha. Exploring the Patient's Lifeworld: A Qualitative Study of Personalizing Language in Electronic Health Records. Journal of General Internal Medicine, 2026. [PDF] [Bibtex] [Close] @article{Suresh:2026aa, abstract = {Physicians'understanding of patients as persons can bolster relationships and help patients feel seen. Including personal details in electronic health records (EHR) may enhance care, but the types of details physicians document about patients'lifeworlds remain largely uncharacterized.}, author = {Suresh, Akanksha and Fernandes, Priyanka and Zirikly, Ayah and Thompson, Elaine C. and Links, Anne R. and Harrigian, Keith and Chee, Brant and Dredze, Mark and Beach, Mary Catherine and Saha, Somnath}, date = {2026/03/31}, date-added = {2026-04-19 22:26:28 -0400}, date-modified = {2026-04-19 22:26:39 -0400}, doi = {10.1007/s11606-026-10396-5}, file = {https://doi.org/10.1007/s11606-026-10396-5}, id = {Suresh2026}, isbn = {1525-1497}, journal = {Journal of General Internal Medicine}, title = {Exploring the Patient's Lifeworld: A Qualitative Study of Personalizing Language in Electronic Health Records}, year = {2026}, bdsk-url-1 = {https://doi.org/10.1007/s11606-026-10396-5} } Physicians'understanding of patients as persons can bolster relationships and help patients feel seen. Including personal details in electronic health records (EHR) may enhance care, but the types of details physicians document about patients'lifeworlds remain largely uncharacterized.

		James Aluri, Ashley M Witmer, Susanna Lewis, Carlos Aguirre, Zoena Howland, Lida King, Susan Han, Mark Dredze, Holly C Wilcox. Sociodemographic and mental health characteristics associated with participating in a digital mental health screening platform on college campuses. Journal of American College Health, 2026;0(0):1--7. [PDF] [Bibtex] [Close] @article{Aluri10032026, abstract = { The authors sought to identify sociodemographic and mental health characteristics associated with participating in an online mental health screening platform. US college students who participated in the Interactive Screening Program (ISP, n = 2, 817) or the Healthy Minds Study (HMS, n = 17, 880) at ten campuses that administered both between 2016 and 2022. The HMS was used as a proxy of the study body at that institution. This cross-sectional analysis linked data from the HMS and the ISP. Logistic regression was used to model the odds of ISP participation. In multivariate regression models, female gender (aOR 1.87, 95\% CI: 1.01-3.47; REF: male gender) and PHQ-9 ≥ 10 (aOR 5.74, 95\% CI: 2.05-16.06, REF: PHQ-9 < 10) were associated with ISP participation. The ISP is more likely to be used by students with moderate depression symptoms. The ISP's ability to engage male students could be strengthened. }, author = {James Aluri and Ashley M. Witmer and Susanna Lewis and Carlos Aguirre and Zoena Howland and Lida King and Susan Han and Mark Dredze and Holly C. Wilcox}, date-added = {2026-04-19 22:20:28 -0400}, date-modified = {2026-04-19 22:20:43 -0400}, doi = {10.1080/07448481.2026.2629343}, file = {https://doi.org/10.1080/07448481.2026.2629343}, journal = {Journal of American College Health}, note = {PMID: 41807134}, number = {0}, pages = {1--7}, publisher = {Taylor & Francis}, title = {Sociodemographic and mental health characteristics associated with participating in a digital mental health screening platform on college campuses}, volume = {0}, year = {2026}, bdsk-url-1 = {https://doi.org/10.1080/07448481.2026.2629343} } The authors sought to identify sociodemographic and mental health characteristics associated with participating in an online mental health screening platform. US college students who participated in the Interactive Screening Program (ISP, n = 2, 817) or the Healthy Minds Study (HMS, n = 17, 880) at ten campuses that administered both between 2016 and 2022. The HMS was used as a proxy of the study body at that institution. This cross-sectional analysis linked data from the HMS and the ISP. Logistic regression was used to model the odds of ISP participation. In multivariate regression models, female gender (aOR 1.87, 95\% CI: 1.01-3.47; REF: male gender) and PHQ-9 ≥ 10 (aOR 5.74, 95\% CI: 2.05-16.06, REF: PHQ-9 < 10) were associated with ISP participation. The ISP is more likely to be used by students with moderate depression symptoms. The ISP's ability to engage male students could be strengthened.

		Kaiser Sun, Fan Bai, Mark Dredze. Task Matters: Knowledge Requirements Shape LLM Responses to Context--Memory Conflict. Association for Computational Linguistics (ACL) (Findings), 2026. [Bibtex] [Close] @inproceedings{Sun:2026aa, abstract = {Large language models (LLMs) rely on both contextual knowledge and parametric memory, yet these sources can conflict. Prior analysis largely focused on contextual question answering, suggesting that models tend to favor parametric knowledge under conflict, but this setting assumes that tasks should always rely on the provided passage. It therefore remains unclear how LLMs behave when \emph{tasks demand different kinds and degrees of knowledge utilization}. We address this gap with a model-agnostic diagnostic framework that holds underlying knowledge constant while injecting controlled conflicts across tasks with varying knowledge requirements. Evaluating representative open-source LLMs, we find that: (1) performance degradation under conflict correlates with a task's knowledge reliance rather than conflict plausibility alone; (2) strategies such as explanatory rationales or reiteration increase context reliance, helping context-only tasks but harming those that require parametric knowledge; and (3) these behaviors bias model-based evaluation, raising concerns about the reliability of LLMs as judges. Together, our findings show that context--memory conflict is fundamentally task-dependent and motivate task-aware approaches to balancing context and memory in LLM deployment and evaluation.}, author = {Kaiser Sun and Fan Bai and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (Findings)}, date-added = {2026-04-06 23:30:53 -0400}, date-modified = {2026-04-06 23:31:17 -0400}, title = {Task Matters: Knowledge Requirements Shape LLM Responses to Context--Memory Conflict}, year = {2026} } Large language models (LLMs) rely on both contextual knowledge and parametric memory, yet these sources can conflict. Prior analysis largely focused on contextual question answering, suggesting that models tend to favor parametric knowledge under conflict, but this setting assumes that tasks should always rely on the provided passage. It therefore remains unclear how LLMs behave when \emphtasks demand different kinds and degrees of knowledge utilization. We address this gap with a model-agnostic diagnostic framework that holds underlying knowledge constant while injecting controlled conflicts across tasks with varying knowledge requirements. Evaluating representative open-source LLMs, we find that: (1) performance degradation under conflict correlates with a task's knowledge reliance rather than conflict plausibility alone; (2) strategies such as explanatory rationales or reiteration increase context reliance, helping context-only tasks but harming those that require parametric knowledge; and (3) these behaviors bias model-based evaluation, raising concerns about the reliability of LLMs as judges. Together, our findings show that context--memory conflict is fundamentally task-dependent and motivate task-aware approaches to balancing context and memory in LLM deployment and evaluation.

		Fatima Jahara, Mark Dredze, Sharon Levy. Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles. Association for Computational Linguistics (ACL) (Findings), 2026. [Bibtex] [Close] @inproceedings{Jahara:2026aa, abstract = {While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmarks. To fill this gap, we introduce a new evaluation framework, PRIME (Puzzle Reasoning for Implicit Biases in Model Evaluation), that uses logic grid puzzles to systematically probe the influence of social stereotypes on logical reasoning and decision making in LLMs. Our use of logic puzzles enables automatic generation and verification, as well as variability in complexity and biased settings. PRIME includes stereotypical, anti-stereotypical, and neutral puzzle variants generated from a shared puzzle structure, allowing for controlled and fine-grained comparisons. We evaluate multiple model families across puzzle sizes and test the effectiveness of prompt-based mitigation strategies. Focusing our experiments on gender stereotypes, our findings highlight that models consistently reason more accurately when solutions align with stereotypical associations. This demonstrates the significance of PRIME for diagnosing and quantifying social biases perpetuated in the deductive reasoning of LLMs, where fairness is critical.}, author = {Fatima Jahara and Mark Dredze and Sharon Levy}, booktitle = {Association for Computational Linguistics (ACL) (Findings)}, date-added = {2026-04-06 23:29:58 -0400}, date-modified = {2026-04-06 23:30:43 -0400}, title = {Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles}, year = {2026} } While recent safety guardrails effectively suppress overtly biased outputs, subtler forms of social bias emerge during complex logical reasoning tasks that evade current evaluation benchmarks. To fill this gap, we introduce a new evaluation framework, PRIME (Puzzle Reasoning for Implicit Biases in Model Evaluation), that uses logic grid puzzles to systematically probe the influence of social stereotypes on logical reasoning and decision making in LLMs. Our use of logic puzzles enables automatic generation and verification, as well as variability in complexity and biased settings. PRIME includes stereotypical, anti-stereotypical, and neutral puzzle variants generated from a shared puzzle structure, allowing for controlled and fine-grained comparisons. We evaluate multiple model families across puzzle sizes and test the effectiveness of prompt-based mitigation strategies. Focusing our experiments on gender stereotypes, our findings highlight that models consistently reason more accurately when solutions align with stereotypical associations. This demonstrates the significance of PRIME for diagnosing and quantifying social biases perpetuated in the deductive reasoning of LLMs, where fairness is critical.

		Jen-tse Huang, Chang Chen, Shiyang Lai, Wenxuan Wang, Michelle R Kaufman, Mark Dredze. Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation. Association for Computational Linguistics (ACL) (Findings), 2026. [Bibtex] [Close] @inproceedings{Huang:2026ab, abstract = {Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns---experimental errors, logical fallacies, and fabricated claims---each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.}, author = {Jen-tse Huang and Chang Chen and Shiyang Lai and Wenxuan Wang and Michelle R Kaufman and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (Findings)}, date-added = {2026-04-06 23:29:01 -0400}, date-modified = {2026-04-06 23:29:46 -0400}, title = {Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation}, year = {2026} } Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (MLLMs) have demonstrated impressive reasoning capabilities, their robustness against misinformation entangled with cognitive biases remains under-explored. In this paper, we introduce a comprehensive evaluation framework using a high-quality, manually annotated dataset of 200 short videos spanning four health domains. This dataset provides fine-grained annotations for three deceptive patterns---experimental errors, logical fallacies, and fabricated claims---each verified by evidence such as national standards and academic literature. We evaluate eight frontier MLLMs across five modality settings. Experimental results demonstrate that Gemini-2.5-Pro achieves the highest performance in the multimodal setting with a belief score of 71.5/100, while o3 performs the worst at 35.2. Furthermore, we investigate social cues that induce false beliefs in videos and find that models are susceptible to biases like authoritative channel IDs.

		Heyuan Huang, Alexandra DeLucia, Vijay Murari Tiyyala, Mark Dredze. MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification. Association for Computational Linguistics (ACL) (Findings), 2026. [Bibtex] [Close] @inproceedings{Huang:2026aa, abstract = {While Large Language Models (LLMs) can generate fluent and convincing responses, they are not necessarily correct. This is especially apparent in the popular decompose-then-verify factuality evaluation pipeline, where LLMs evaluate generations by decomposing the generations into individual, valid claims. Factuality evaluation is especially important for medical answers, since incorrect medical information could seriously harm the patient. However, existing factuality systems are a poor match for the medical domain, as they are typically only evaluated on objective, entity-centric, formulaic texts such as biographies and historical topics. This differs from condition-dependent, conversational, hypothetical, sentence-structure diverse, and subjective medical answers, which makes decomposition into valid facts challenging. We propose MedScore, a new pipeline to decompose medical answers into condition-aware valid facts and verify against in-domain corpora. Our method extracts up to three times more valid facts than existing methods, reducing hallucination and vague references, and retaining condition-dependency in facts. The resulting factuality score significantly varies by decomposition method, verification corpus, and used backbone LLM, highlighting the importance of customizing each step for reliable factuality evaluation by using our open-source generalizable and modularized pipeline for domain adaptation.}, author = {Heyuan Huang and Alexandra DeLucia and Vijay Murari Tiyyala and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (Findings)}, date-added = {2026-04-06 23:24:34 -0400}, date-modified = {2026-04-06 23:25:16 -0400}, title = {MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification}, year = {2026} } While Large Language Models (LLMs) can generate fluent and convincing responses, they are not necessarily correct. This is especially apparent in the popular decompose-then-verify factuality evaluation pipeline, where LLMs evaluate generations by decomposing the generations into individual, valid claims. Factuality evaluation is especially important for medical answers, since incorrect medical information could seriously harm the patient. However, existing factuality systems are a poor match for the medical domain, as they are typically only evaluated on objective, entity-centric, formulaic texts such as biographies and historical topics. This differs from condition-dependent, conversational, hypothetical, sentence-structure diverse, and subjective medical answers, which makes decomposition into valid facts challenging. We propose MedScore, a new pipeline to decompose medical answers into condition-aware valid facts and verify against in-domain corpora. Our method extracts up to three times more valid facts than existing methods, reducing hallucination and vague references, and retaining condition-dependency in facts. The resulting factuality score significantly varies by decomposition method, verification corpus, and used backbone LLM, highlighting the importance of customizing each step for reliable factuality evaluation by using our open-source generalizable and modularized pipeline for domain adaptation.

		Minqian Liu, Ioana Baldini, David Rabinowitz, David S Rosenberg, Sebastian Gehrmann, Mark Dredze. Domain Generalizable AI Guardrails with Augmented Policy Training. Association for Computational Linguistics (ACL), 2026. [Bibtex] [Close] @inproceedings{Liu:2026aa, abstract = {AI guardrail systems support usage policies by determining whether a user query or a generated response is allowed or forbidden under the policy. Fine-tuned guardrails -- such as LlamaGuard and ShieldGemma -- include policy definitions in prompts during training that can be updated during inference to aid generalization. However, our analysis reveals that these models still overfit the training policies, which prevents adaptation to new domains. We propose Augmented Policy Training (APT), a training recipe that enhances guardrail adaptability to unseen policies by using a suite of policy perturbation strategies during training to reduce overfitting and increase generalization. Notably, a small 1B model trained in this manner achieves comparable or better performance than existing 8B guardrails on unseen policies. Our work reveals critical limitations of existing AI guardrails, offers a promising solution, and provides actionable insights for adapting systems to new domains and policies.}, author = {Minqian Liu and Ioana Baldini and David Rabinowitz and David S Rosenberg and Sebastian Gehrmann and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2026-04-06 23:23:15 -0400}, date-modified = {2026-04-06 23:24:05 -0400}, title = {Domain Generalizable AI Guardrails with Augmented Policy Training}, year = {2026} } AI guardrail systems support usage policies by determining whether a user query or a generated response is allowed or forbidden under the policy. Fine-tuned guardrails -- such as LlamaGuard and ShieldGemma -- include policy definitions in prompts during training that can be updated during inference to aid generalization. However, our analysis reveals that these models still overfit the training policies, which prevents adaptation to new domains. We propose Augmented Policy Training (APT), a training recipe that enhances guardrail adaptability to unseen policies by using a suite of policy perturbation strategies during training to reduce overfitting and increase generalization. Notably, a small 1B model trained in this manner achieves comparable or better performance than existing 8B guardrails on unseen policies. Our work reveals critical limitations of existing AI guardrails, offers a promising solution, and provides actionable insights for adapting systems to new domains and policies.

		Ashley M Witmer, Carlos Aguirre, Susanna Lewis, Zoena Howland, Lida King, Susan Han, Mark Dredze, Holly C Wilcox, James Aluri. Higher-risk psychiatric and sociodemographic characteristics predicting referral-seeking among college students using the interactive screening program. Journal of Affective Disorders, 2026. [PDF] [Bibtex] [Close] @article{Witmer:2026aa, abstract = {This study examined whether sociodemographic and higher-risk psychiatric characteristics predicted whether college students sought a referral to mental health care through the American Foundation for Suicide Prevention's Interactive Screening Program (ISP). Using data from 63,407 college students across 58 institutions of higher education from 2009 to 2024, natural language processing (NLP) methods were used to classify student-counselor online message exchanges to determine whether students sought a referral to care, with robust agreement between human coders and the NLP model. Logistic regression models were used to examine the relationships between seeking a referral and higher-risk psychiatric and sociodemographic characteristics of ISP participants. Students were more likely to seek a referral if they had a PHQ-9 score ≥ 10 (aOR: 1.55, 95% CI: 1.44--1.66), were not currently in therapy (aOR: 2.02, 95% CI: 1.80--2.28), reported recent self-harm (aOR: 1.13, 95% CI: 1.00--1.26), recent suicidal ideation (aOR: 1.22, 95% CI: 1.12--1.33), a lifetime suicide attempt (aOR: 1.10, 95% CI: 1.00--1.22), and were aged 25+ (aOR: 1.38, 95% CI: 1.29--1.47). Participants identifying as genderqueer (aOR: 0.70, 95% CI: 0.53--0.91), Hispanic/Latin(x) (aOR: 0.86, 95% CI: 0.78--0.96), and ``other'' race and ethnicity (aOR: 0.79, 95% CI: 0.63--0.97) were less likely to seek referrals. Findings underscore the ISP's effectiveness in connecting students with significant mental health challenges to care and ability to bridge gaps in care by facilitating connections to appropriate resources. However, disparities in referral-seeking among genderqueer, Hispanic/Latin(x), and participants of ``other'' races and ethnicities highlight the need for further work to address factors that might discourage help-seeking.}, author = {Ashley M. Witmer and Carlos Aguirre and Susanna Lewis and Zoena Howland and Lida King and Susan Han and Mark Dredze and Holly C. Wilcox and James Aluri}, date-added = {2026-02-24 16:12:35 -0500}, date-modified = {2026-02-24 16:14:40 -0500}, file = {https://doi.org/10.1016/j.jad.2026.121449}, journal = {Journal of Affective Disorders}, month = {15 June}, pages = {121449}, title = {Higher-risk psychiatric and sociodemographic characteristics predicting referral-seeking among college students using the interactive screening program}, volume = {403}, year = {2026} } This study examined whether sociodemographic and higher-risk psychiatric characteristics predicted whether college students sought a referral to mental health care through the American Foundation for Suicide Prevention's Interactive Screening Program (ISP). Using data from 63,407 college students across 58 institutions of higher education from 2009 to 2024, natural language processing (NLP) methods were used to classify student-counselor online message exchanges to determine whether students sought a referral to care, with robust agreement between human coders and the NLP model. Logistic regression models were used to examine the relationships between seeking a referral and higher-risk psychiatric and sociodemographic characteristics of ISP participants. Students were more likely to seek a referral if they had a PHQ-9 score ≥ 10 (aOR: 1.55, 95% CI: 1.44--1.66), were not currently in therapy (aOR: 2.02, 95% CI: 1.80--2.28), reported recent self-harm (aOR: 1.13, 95% CI: 1.00--1.26), recent suicidal ideation (aOR: 1.22, 95% CI: 1.12--1.33), a lifetime suicide attempt (aOR: 1.10, 95% CI: 1.00--1.22), and were aged 25+ (aOR: 1.38, 95% CI: 1.29--1.47). Participants identifying as genderqueer (aOR: 0.70, 95% CI: 0.53--0.91), Hispanic/Latin(x) (aOR: 0.86, 95% CI: 0.78--0.96), and ``other'' race and ethnicity (aOR: 0.79, 95% CI: 0.63--0.97) were less likely to seek referrals. Findings underscore the ISP's effectiveness in connecting students with significant mental health challenges to care and ability to bridge gaps in care by facilitating connections to appropriate resources. However, disparities in referral-seeking among genderqueer, Hispanic/Latin(x), and participants of ``other'' races and ethnicities highlight the need for further work to address factors that might discourage help-seeking.

		Johannes Thrul, Nicholas Dobbins, Bernal Jimenez Gutierrez, Cerina Dubois, Clarissa Madar, Nazia Qureshi, Amrit Baral, Ahmed Hassoon, Paul Nagy, Mark Dredze, Ryan Vandrey. Electronic health records and cannabis in the Johns Hopkins medical system. Cannabis use and health: Green slope, blue slope, or double black?". Symposium, Winter Brain Conference, Big Sky, MT., 2026. [Bibtex] [Close] @inproceedings{Thrul:2026aa, author = {Johannes Thrul and Nicholas Dobbins and Bernal Jimenez Gutierrez and Cerina Dubois and Clarissa Madar and Nazia Qureshi and Amrit Baral and Ahmed Hassoon and Paul Nagy and Mark Dredze and Ryan Vandrey}, booktitle = {{Cannabis use and health: Green slope, blue slope, or double black?". Symposium, Winter Brain Conference, Big Sky, MT.}, date-added = {2026-02-04 19:16:00 -0600}, date-modified = {2026-02-04 19:18:31 -0600}, keywords = {abstract}, month = {January}, title = {Electronic health records and cannabis in the Johns Hopkins medical system.}, year = {2026} }

		2025 (26 Publications)
		Brittany Nesbitt, Danielle Virgadamo, Carlos Aguirre, Matthew DeCamp, Mark Dredze, Keith Harrigian, Tenzin Lhaksampa, Jennifer M Meuchel, Aja M Meyer, Alex Walker, Ayah Zirikly, Margaret S Chisolm, Peter P Zandi, Leslie Miller. Testing a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals With Mood and Anxiety Disorders: Protocol and Design Considerations for a Pragmatic Randomized Trial. JMIR Res Protoc, 2025. [PDF] [Bibtex] [Close] @article{info:doi/10.2196/63279, abstract = {Background: Mood and anxiety disorders are prevalent mental health diagnoses. Numerous studies have shown that measurement-based care, which is used to monitor patient symptoms, functioning, and treatment progress and help guide clinical decisions and collaboration on treatment goals, can improve outcomes in patients with these disorders. Including digital information regarding patients' electronic communications and social media activity is an innovative approach to augmenting measurement-based care. Recent data indicate interest and willingness from both mental health clinicians and patients to share this type of digital information in treatment sessions. However, the clinical benefit of systematically doing this has been minimally evaluated. Objective: This study aims to develop an electronic dashboard for tracking patients' digital social activity and a protocol for a pragmatic randomized trial to test the feasibility and efficacy of using the dashboard in real-world clinical care of patients with depression or anxiety disorders. Methods: We developed a personalized electronic dashboard that tracks patients' electronic communications and social media activity, visualizes data on these interactions through key graphics and figures, and provides a tool that can be readily integrated into routine clinical care for use by clinicians and patients during treatment sessions. We then designed a randomized trial to evaluate the feasibility and effectiveness of using the electronic dashboard in real-world care compared to treatment as usual. The trial included patients aged ≥12 years with a mood or anxiety disorder who were receiving treatment in outpatient psychiatry clinics in the Johns Hopkins Health System and the Kennedy Krieger Institute. The primary outcome includes changes in patient-rated depression symptoms. Secondary outcomes include changes in patient-rated anxiety symptoms and overall functioning. Exploratory analyses examine the impact of the intervention on measures of therapeutic alliance and the detection of clinically actionable targets. Results: We successfully developed an electronic dashboard for tracking patients' electronic communications and social media activity, and we implemented a protocol for evaluating the feasibility and efficacy of using the dashboard in routine care for mood or anxiety disorders. The protocol was approved by the Johns Hopkins University School of Medicine Institutional Review Board. In this study, we report the technological, ethical, and pragmatic considerations in developing the dashboard and testing it in a real-world setting. Conclusions: The integration of an electronic dashboard to monitor digital social activity in mental health care treatment is novel. This study examines the feasibility and effectiveness of the dashboard and the challenges in implementing this protocol. The lessons learned from developing and implementing the study will inform ongoing discussions about the value of gathering collateral information on patients' digital social activity and how to do so in a way that is acceptable and clinically effective. Trial Registration: ClinicalTrials.gov NCT03925038; https://clinicaltrials.gov/study/NCT03925038 International Registered Report Identifier (IRRID): DERR1-10.2196/63279 }, author = {Nesbitt, Brittany and Virgadamo, Danielle and Aguirre, Carlos and DeCamp, Matthew and Dredze, Mark and Harrigian, Keith and Lhaksampa, Tenzin and Meuchel, Jennifer M and Meyer, Aja M and Walker, Alex and Zirikly, Ayah and Chisolm, Margaret S and Zandi, Peter P and Miller, Leslie}, date-added = {2026-04-19 22:23:28 -0400}, date-modified = {2026-04-19 22:23:36 -0400}, day = {5}, doi = {10.2196/63279}, file = {http://www.ncbi.nlm.nih.gov/pubmed/40053788}, issn = {1929-0748}, journal = {JMIR Res Protoc}, keywords = {digital mental health; mental health; dashboards; psychiatry; measurement-based care; electronic communication; social media; depression; anxiety; personal health information}, month = {Mar}, pages = {e63279}, title = {Testing a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals With Mood and Anxiety Disorders: Protocol and Design Considerations for a Pragmatic Randomized Trial}, volume = {14}, year = {2025}, bdsk-url-1 = {http://www.ncbi.nlm.nih.gov/pubmed/40053788}, bdsk-url-2 = {https://doi.org/10.2196/63279} } Background: Mood and anxiety disorders are prevalent mental health diagnoses. Numerous studies have shown that measurement-based care, which is used to monitor patient symptoms, functioning, and treatment progress and help guide clinical decisions and collaboration on treatment goals, can improve outcomes in patients with these disorders. Including digital information regarding patients' electronic communications and social media activity is an innovative approach to augmenting measurement-based care. Recent data indicate interest and willingness from both mental health clinicians and patients to share this type of digital information in treatment sessions. However, the clinical benefit of systematically doing this has been minimally evaluated. Objective: This study aims to develop an electronic dashboard for tracking patients' digital social activity and a protocol for a pragmatic randomized trial to test the feasibility and efficacy of using the dashboard in real-world clinical care of patients with depression or anxiety disorders. Methods: We developed a personalized electronic dashboard that tracks patients' electronic communications and social media activity, visualizes data on these interactions through key graphics and figures, and provides a tool that can be readily integrated into routine clinical care for use by clinicians and patients during treatment sessions. We then designed a randomized trial to evaluate the feasibility and effectiveness of using the electronic dashboard in real-world care compared to treatment as usual. The trial included patients aged ≥12 years with a mood or anxiety disorder who were receiving treatment in outpatient psychiatry clinics in the Johns Hopkins Health System and the Kennedy Krieger Institute. The primary outcome includes changes in patient-rated depression symptoms. Secondary outcomes include changes in patient-rated anxiety symptoms and overall functioning. Exploratory analyses examine the impact of the intervention on measures of therapeutic alliance and the detection of clinically actionable targets. Results: We successfully developed an electronic dashboard for tracking patients' electronic communications and social media activity, and we implemented a protocol for evaluating the feasibility and efficacy of using the dashboard in routine care for mood or anxiety disorders. The protocol was approved by the Johns Hopkins University School of Medicine Institutional Review Board. In this study, we report the technological, ethical, and pragmatic considerations in developing the dashboard and testing it in a real-world setting. Conclusions: The integration of an electronic dashboard to monitor digital social activity in mental health care treatment is novel. This study examines the feasibility and effectiveness of the dashboard and the challenges in implementing this protocol. The lessons learned from developing and implementing the study will inform ongoing discussions about the value of gathering collateral information on patients' digital social activity and how to do so in a way that is acceptable and clinically effective. Trial Registration: ClinicalTrials.gov NCT03925038; https://clinicaltrials.gov/study/NCT03925038 International Registered Report Identifier (IRRID): DERR1-10.2196/63279

		Leslie Miller, Tenzin C Lhaksampa, Alex Walker, Carlos Aguirre, Matthew DeCamp, Keith Harrigian, Jennifer Meuchel, Aja M Meyer, Brittany Nesbitt, Sazal Sthapit, Jason Straub, Danielle Virgadamo, Ayah Zirikly, Mark Dredze, Margaret S Chisolm, Peter P Zandi. Dashboard Intervention for Tracking Digital Social Media Activity in the Clinical Care of Individuals With Mood and Anxiety Disorders: Randomized Trial. JMIR Ment Health, 2025. [PDF] [Bibtex] [Close] @article{info:doi/10.2196/74212, abstract = {Background: Digital social activity, defined as interactions on social media and electronic communication platforms, has become increasingly important. Social factors impact mental health and can contribute to depression and anxiety. Therefore, incorporating digital social activity into routine mental health care has the potential to improve outcomes. Objective: This study aimed to compare treatment augmented with an electronic dashboard of patient's digital social activity versus treatment-as-usual on patient-rated outcomes symptoms of depression in a randomized trial of patients with mood and anxiety disorders. Methods: We developed a personalized electronic dashboard summarizing a participant's digital social activity. This dashboard, collaboratively discussed during mental health visits, was used to augment clinical care and tested in a randomized trial against treatment-as-usual. Clinicians and patients were recruited from outpatient psychiatry clinics. Patients were eligible if they were 12 years or older and were receiving treatment for a mood or anxiety disorder. Psychiatric symptoms measures for depression (primary outcome measure) and anxiety (secondary outcome measure) were obtained at each clinic visit as part of measurement-based standard of care. Baseline and 3-month follow-up assessments included a measure of mental health status and therapeutic alliance measure. Collateral information and clinical action scale were also collected at each visit. Results: A total of 103 patients consented to participate, 97 of whom were randomized to the dashboard arm (n=49) or the treatment-as-usual arm (n=48). There were no differences in psychiatry symptom rating scores or mental health status between the two arms. However, there was a significant increase in the discussion of digital social activity with the intervention, and it did not appear to change patient therapeutic alliance. Conclusions: The incorporation of a personalized electronic dashboard into clinical care was feasible and led to an increased discussion of digital social activity, but there was no impact on mental health outcomes. Trial Registration: Clinicaltrials.gov NCT03925038; https://clinicaltrials.gov/study/NCT03925038 International Registered Report Identifier (IRRID): RR2-10.2196/63279 }, author = {Miller, Leslie and Lhaksampa, Tenzin C and Walker, Alex and Aguirre, Carlos and DeCamp, Matthew and Harrigian, Keith and Meuchel, Jennifer and Meyer, Aja M and Nesbitt, Brittany and Sthapit, Sazal and Straub, Jason and Virgadamo, Danielle and Zirikly, Ayah and Dredze, Mark and Chisolm, Margaret S and Zandi, Peter P}, date-added = {2026-04-19 22:21:32 -0400}, date-modified = {2026-04-19 22:21:50 -0400}, day = {11}, doi = {10.2196/74212}, file = {https://doi.org/10.2196/74212}, issn = {2368-7959}, journal = {JMIR Ment Health}, keywords = {social media; mood disorders; anxiety; measurement-based care; randomized trial; electronic dashboard}, month = {Nov}, pages = {e74212}, title = {Dashboard Intervention for Tracking Digital Social Media Activity in the Clinical Care of Individuals With Mood and Anxiety Disorders: Randomized Trial}, volume = {12}, year = {2025}, bdsk-url-1 = {https://doi.org/10.2196/74212} } Background: Digital social activity, defined as interactions on social media and electronic communication platforms, has become increasingly important. Social factors impact mental health and can contribute to depression and anxiety. Therefore, incorporating digital social activity into routine mental health care has the potential to improve outcomes. Objective: This study aimed to compare treatment augmented with an electronic dashboard of patient's digital social activity versus treatment-as-usual on patient-rated outcomes symptoms of depression in a randomized trial of patients with mood and anxiety disorders. Methods: We developed a personalized electronic dashboard summarizing a participant's digital social activity. This dashboard, collaboratively discussed during mental health visits, was used to augment clinical care and tested in a randomized trial against treatment-as-usual. Clinicians and patients were recruited from outpatient psychiatry clinics. Patients were eligible if they were 12 years or older and were receiving treatment for a mood or anxiety disorder. Psychiatric symptoms measures for depression (primary outcome measure) and anxiety (secondary outcome measure) were obtained at each clinic visit as part of measurement-based standard of care. Baseline and 3-month follow-up assessments included a measure of mental health status and therapeutic alliance measure. Collateral information and clinical action scale were also collected at each visit. Results: A total of 103 patients consented to participate, 97 of whom were randomized to the dashboard arm (n=49) or the treatment-as-usual arm (n=48). There were no differences in psychiatry symptom rating scores or mental health status between the two arms. However, there was a significant increase in the discussion of digital social activity with the intervention, and it did not appear to change patient therapeutic alliance. Conclusions: The incorporation of a personalized electronic dashboard into clinical care was feasible and led to an increased discussion of digital social activity, but there was no impact on mental health outcomes. Trial Registration: Clinicaltrials.gov NCT03925038; https://clinicaltrials.gov/study/NCT03925038 International Registered Report Identifier (IRRID): RR2-10.2196/63279

		Ahmed Hassoon, Christine Lin, Hyun Yi (Jacqualine) Woo, Ruxandra Irimia, Jill A Marsteller, Anthony Li, Antonio Bander, Hubert Leo, Xiaoyi Peng, David Rastall, Mark Dredze. Guiding artificial intelligence in public health and medicine with epidemiology: A lifecycle framework for mitigating AI misalignment. Annals of Epidemiology, 2025. [PDF] [Bibtex] [Close] @article{Hassoon:2025aa, author = {Ahmed Hassoon and Christine Lin and Hyun Yi (Jacqualine) Woo and Ruxandra Irimia and Jill A. Marsteller and Anthony Li and Antonio Bander and Hubert Leo and Xiaoyi Peng and David Rastall and Mark Dredze}, date-added = {2025-11-20 14:21:44 -0500}, date-modified = {2025-11-23 18:36:26 -0500}, file = {https://www.sciencedirect.com/science/article/abs/pii/S1047279725003369}, journal = {Annals of Epidemiology}, title = {Guiding artificial intelligence in public health and medicine with epidemiology: A lifecycle framework for mitigating AI misalignment}, year = {2025} }

		Mahsa Yarmohammadi, Alexandra DeLucia, Lillian C Chen, Leslie Miller, Heyuan Huang, Sonal Joshi, Jonathan Lasko, Sarah Collica, Ryan Moore, Haoling Qiu, Peter P Zandi, Damianos Karakos, Mark Dredze. MedExpert: An Expert-Annotated Dataset for Medical Chatbot Evaluation. Machine Learning for Health (ML4H), 2025. [Bibtex] [Close] @inproceedings{Yarmohammadi:2025aa, author = {Mahsa Yarmohammadi and Alexandra DeLucia and Lillian C. Chen and Leslie Miller and Heyuan Huang and Sonal Joshi and Jonathan Lasko and Sarah Collica and Ryan Moore and Haoling Qiu and Peter P Zandi and Damianos Karakos and Mark Dredze}, booktitle = {Machine Learning for Health (ML4H)}, date-added = {2025-10-28 01:01:39 -0400}, date-modified = {2025-10-28 01:03:05 -0400}, title = {MedExpert: An Expert-Annotated Dataset for Medical Chatbot Evaluation}, year = {2025} }

		David A Broniatowski, Wei Zhong, Joseph R Simons, Amelia M Jamison, Mark Dredze, Lorien C Abroms. Explaining Twitter's inability to effectively moderate content during the COVID-19 pandemic. Scientific Reports, 2025. [PDF] [Bibtex] [Close] @article{Broniatowski:2025ab, author = {David A. Broniatowski and Wei Zhong and Joseph R. Simons and Amelia M. Jamison and Mark Dredze and Lorien C. Abroms}, date-added = {2025-09-30 09:48:41 -0400}, date-modified = {2025-10-16 07:42:17 -0400}, file = {https://www.nature.com/articles/s41598-025-20033-6}, journal = {Scientific Reports}, number = {36096}, title = {Explaining Twitter's inability to effectively moderate content during the COVID-19 pandemic}, volume = {15}, year = {2025} }

		Jonathan Liu, Damianos Karakos, Mark Dredze, Jonathan Lasko, Haoling Qiu, Mahsa Yarmohammadi. Statistically Significant Results on Biases and Errors of LLMs Do Not Guarantee Generalizable Results (Demo) NeurIPS Workshop on GenAI for Health: Potential, Trust, and Policy Compliance, 2025. [PDF] [Bibtex] [Close] @inproceedings{Liu:2025aa, author = {Jonathan Liu and Damianos Karakos and Mark Dredze and Jonathan Lasko and Haoling Qiu and Mahsa Yarmohammadi}, booktitle = {NeurIPS Workshop on GenAI for Health: Potential, Trust, and Policy Compliance}, date-added = {2025-09-27 22:10:43 -0400}, date-modified = {2025-11-07 11:34:10 -0500}, file = {https://arxiv.org/abs/2511.02246}, keywords = {workshop}, title = {Statistically Significant Results on Biases and Errors of LLMs Do Not Guarantee Generalizable Results (Demo)}, year = {2025} }

		Michelle R Kaufman, Kate Wright, Rosalyn Shin, Elise Tirza Ohene-Kyei, Oluwatimilehin Fatoki, Tahilin Sanchez Karver, Carlos Aguirre, Mark Dredze, Ayah Zirikly. The power of social media activism in the #YesAllWomen Movement. Humanities & Social Sciences Communications, 2025. [PDF] [Bibtex] [Close] @article{Kaufman:2025aa, author = {Michelle R Kaufman and Kate Wright and Rosalyn Shin and Elise Tirza Ohene-Kyei and Oluwatimilehin Fatoki and Tahilin Sanchez Karver and Carlos Aguirre and Mark Dredze and Ayah Zirikly}, date-added = {2025-09-26 18:30:57 -0400}, date-modified = {2025-09-26 18:32:09 -0400}, file = {https://doi.org/10.1057/s41599-025-05647-5}, journal = {Humanities & Social Sciences Communications}, number = {1469}, title = {The power of social media activism in the #YesAllWomen Movement}, volume = {12}, year = {2025} }

		Karan Desai, Vijay M Tiyyala, Pranav Tiyyala, Atharva Yeola, Alejandra Gallegos-Rangel, Alejandro Montiel-Torres, Matthew R Allen, Mark Dredze, Ryan G Vandrey, Johannes Thrul, Eric C Leas, Mike Hogarth, Davey M Smith, John W Ayers. Waldo: Automated Discovery of Adverse Events from Unstructured Self Reports. PLOS Digital Health, 2025. [PDF] [Bibtex] [Close] @article{Desai:2025aa, author = {Karan Desai and Vijay M. Tiyyala and Pranav Tiyyala and Atharva Yeola and Alejandra Gallegos-Rangel and Alejandro Montiel-Torres and Matthew R. Allen and Mark Dredze and Ryan G. Vandrey and Johannes Thrul and Eric C. Leas and Mike Hogarth and Davey M. Smith and John W. Ayers}, date-added = {2025-09-13 22:09:59 -0400}, date-modified = {2026-04-19 22:22:32 -0400}, file = {https://doi.org/10.1371/journal.pdig.0001011}, journal = {PLOS Digital Health}, title = {Waldo: Automated Discovery of Adverse Events from Unstructured Self Reports}, year = {2025} }

		Leslie Miller, Tenzin Lhaksampa, Alex Walker, Carlos Aguirre, Matthew DeCamp, Keith Harrigian, Jennifer Meuchel, Aja M Meyer, Brittany Nesbitt, Sazal Sthapit, Jason Straub, Danielle Virgadamo, Ayah Zirikly, Mark Dredze, Margaret Chisolm, Peter P Zandi. Results from a Randomized Trial of a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals with Mood and Anxiety Disorders. JMIR Mental Health, 2025. [PDF] [Bibtex] [Close] @article{Miller:2025aa, author = {Leslie Miller and Tenzin Lhaksampa and Alex Walker and Carlos Aguirre and Matthew DeCamp and Keith Harrigian and Jennifer Meuchel and Aja M Meyer and Brittany Nesbitt and Sazal Sthapit and Jason Straub and Danielle Virgadamo and Ayah Zirikly and Mark Dredze and Margaret Chisolm and Peter P Zandi}, date-added = {2025-08-21 09:45:20 -0400}, date-modified = {2025-08-22 09:04:05 -0400}, file = {http://dx.doi.org/10.2196/74212}, journal = {JMIR Mental Health}, title = {Results from a Randomized Trial of a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals with Mood and Anxiety Disorders}, year = {2025} }

		Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze. Label-Guided In-Context Learning for Named Entity Recognition. Empirical Methods in Natural Language Processing (EMNLP), 2025. [Bibtex] [Close] @inproceedings{Bai:2025aa, author = {Fan Bai and Hamid Hassanzadeh and Ardavan Saeedi and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2025-08-20 16:00:48 -0400}, date-modified = {2025-08-20 16:01:06 -0400}, title = {Label-Guided In-Context Learning for Named Entity Recognition}, year = {2025} }

		Isabel Cachola, Daniel Khashabi, Mark Dredze. Evaluating the Evaluators: Are readability metrics good measures of readability? Empirical Methods in Natural Language Processing (EMNLP), 2025. [Bibtex] [Close] @inproceedings{Cachola:2025aa, author = {Isabel Cachola and Daniel Khashabi and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2025-08-20 16:00:20 -0400}, date-modified = {2025-08-20 16:00:41 -0400}, title = {Evaluating the Evaluators: Are readability metrics good measures of readability?}, year = {2025} }

		Miriam Wanner, Benjamin Van Durme, Mark Dredze. DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation. Empirical Methods in Natural Language Processing (EMNLP), 2025. [Bibtex] [Close] @inproceedings{Wanner:2025aa, author = {Miriam Wanner and Benjamin Van Durme and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2025-08-20 15:59:28 -0400}, date-modified = {2025-08-20 16:00:14 -0400}, title = {DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation}, year = {2025} }

		Mary Catherine Beach, Keith Harrigian, Brant Chee, Anne Links, Alya Ahmad, Ayah Zirikly, Dingfen Han, Emily Boss, Shari Lawson, Mustapha Saheed, Yahan Li, Mark Dredze, Somnath Saha. Racial bias in clinician assessment of patient credibility: Evidence from electronic health records. PLOS ONE, 2025. [Bibtex] [Close] @article{Beach:2025aa, author = {Mary Catherine Beach and Keith Harrigian and Brant Chee and Anne Links and Alya Ahmad and Ayah Zirikly and Dingfen Han and Emily Boss and Shari Lawson and Mustapha Saheed and Yahan Li and Mark Dredze and Somnath Saha}, date-added = {2025-07-14 08:03:38 -0400}, date-modified = {2025-07-14 08:04:20 -0400}, journal = {PLOS ONE}, title = {Racial bias in clinician assessment of patient credibility: Evidence from electronic health records}, year = {2025} }

		Kuleen Sasse, Carlos Aguirre, Isabel Cachola, Sharon Levy, Mark Dredze. Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats. Association for Computational Linguistics (ACL), 2025. [Bibtex] [Close] @inproceedings{Sasse:2025aa, author = {Kuleen Sasse and Carlos Aguirre and Isabel Cachola and Sharon Levy and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2025-05-15 22:48:51 -0400}, date-modified = {2025-05-15 22:49:33 -0400}, title = {Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats}, year = {2025} }

		Johannes Thrul, Nic Dobbins, Cerina Dubois, Clarissa Madar, Paul Nagy, Mark Dredze, Ryan Vandrey. Using large language models to identify and extract medicinal cannabis use data from electronic medical records in a US academic health system. Research Society on Marijuana, 2025. [Bibtex] [Close] @inproceedings{Thrul:2025aa, author = {Johannes Thrul and Nic Dobbins and Cerina Dubois and Clarissa Madar and Paul Nagy and Mark Dredze and Ryan Vandrey}, booktitle = {Research Society on Marijuana}, date-added = {2025-04-30 23:29:35 -0400}, date-modified = {2025-04-30 23:32:31 -0400}, keywords = {abstract}, title = {Using large language models to identify and extract medicinal cannabis use data from electronic medical records in a US academic health system}, year = {2025} }

		Sebastian Gehrmann, Claire Huang, Xian Teng, Sergei Yurovski, Iyanuoluwa Shode, Chirag S Patel, Arjun Bhorkar, Naveen Thomas, John Doucette, David Rosenberg, Mark Dredze, David Rabinowitz. Understanding and Mitigating Risks of Generative AI in Financial Services. ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2025. [PDF] [Bibtex] [Close] @inproceedings{Gehrmann:2025aa, author = {Sebastian Gehrmann and Claire Huang and Xian Teng and Sergei Yurovski and Iyanuoluwa Shode and Chirag S. Patel and Arjun Bhorkar and Naveen Thomas and John Doucette and David Rosenberg and Mark Dredze and David Rabinowitz}, booktitle = {ACM Conference on Fairness, Accountability, and Transparency (FAccT)}, date-added = {2025-04-11 17:03:15 -0400}, date-modified = {2025-04-29 21:36:15 -0400}, file = {https://arxiv.org/abs/2504.20086}, title = {Understanding and Mitigating Risks of Generative AI in Financial Services}, year = {2025} }

		Kaiser Sun, Mark Dredze. Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models. NAACL Workshop on Representation Learning for NLP (RepL4NLP), 2025. [Bibtex] [Close] @inproceedings{Sun:2025aa, author = {Kaiser Sun and Mark Dredze}, booktitle = {NAACL Workshop on Representation Learning for NLP (RepL4NLP)}, date-added = {2025-03-17 14:23:15 -0400}, date-modified = {2025-03-17 14:24:38 -0400}, title = {Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models}, year = {2025} }

		Craig Leets, Angela Nielsen, Meghan Buchanan, Ayah Zirikly, Mark Dredze, Leslie Miller, Holly C Wilcox, Carlos Gallo. Outreach to Adolescents in Crisis on Social Media, YouthLine's Safe Social Spaces, 2019 to 2024. American Journal of Public Health (ajph), 2025. [PDF] [Bibtex] [Close] @article{Leets:2025aa, author = {Craig Leets and Angela Nielsen and Meghan Buchanan and Ayah Zirikly and Mark Dredze and Leslie Miller and Holly C. Wilcox and Carlos Gallo}, date-added = {2025-03-13 00:16:46 -0400}, date-modified = {2025-03-13 00:17:53 -0400}, file = {https://doi.org/10.2105/AJPH.2024.307970}, journal = {American Journal of Public Health (ajph)}, title = {Outreach to Adolescents in Crisis on Social Media, YouthLine's Safe Social Spaces, 2019 to 2024}, year = {2025} }

		David Broniatowski, Wei Zhong, Joseph Simons, Mark Dredze, Lorien Abroms. A Simulation Approach to Determining the Flexibility of Social Media Platforms. International Engineering Systems Symposium (CESUN), 2025. [Bibtex] [Close] @inproceedings{Broniatowski:2025aa, author = {David Broniatowski and Wei Zhong and Joseph Simons and Mark Dredze and Lorien Abroms}, booktitle = {International Engineering Systems Symposium (CESUN)}, date-added = {2025-03-13 00:04:44 -0400}, date-modified = {2025-03-13 00:05:26 -0400}, keywords = {workshop}, title = {A Simulation Approach to Determining the Flexibility of Social Media Platforms}, year = {2025} }

		Alexander Spangher, Tenghao Huang, Yiqin Huang, Lucas Spangher, Sewon Min, Mark Dredze. A Novel Multi-Document Retrieval Benchmark: Journalist Source-Selection in Newswriting. NAACL Workshop on Knowledge-Augmented Methods for NLP (KnowledgeNLP), 2025. [Bibtex] [Close] @inproceedings{Spangher:2025aa, author = {Alexander Spangher and Tenghao Huang and Yiqin Huang and Lucas Spangher and Sewon Min and Mark Dredze}, booktitle = {NAACL Workshop on Knowledge-Augmented Methods for NLP (KnowledgeNLP)}, date-added = {2025-03-11 06:14:50 -0500}, date-modified = {2025-03-11 06:16:15 -0500}, keywords = {workshop}, title = {A Novel Multi-Document Retrieval Benchmark: Journalist Source-Selection in Newswriting}, year = {2025} }

		Tyrus Vong, Nicholas Rizer, Vedant Jain, Valerie L Thompson, Mark Dredze, Eili Y Klein, Jeremiah S Hinson, Tanjala Purnell, Stephen Kwak, Tinsay Woreta, Alexandra T Strauss. Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models. Hepatol Commun, 2025. [PDF] [Bibtex] [Close] @article{Vong:2025aa, abstract = {BACKGROUND: Hepatic steatosis is a precursor to more severe liver disease, increasing morbidity and mortality risks. In the Emergency Department, routine abdominal imaging often reveals incidental hepatic steatosis that goes undiagnosed due to the acute nature of encounters. Imaging reports in the electronic health record contain valuable information not easily accessible as discrete data elements. We hypothesized that large language models could reliably detect hepatic steatosis from reports without extensive natural language processing training. METHODS: We identified 200 adults who had CT abdominal imaging in the Emergency Department between August 1, 2016, and December 31, 2023. Using text from imaging reports and structured prompts, 3 Azure OpenAI models (ChatGPT 3.5, 4, 4o) identified patients with hepatic steatosis. We evaluated model performance regarding accuracy, inter-rater reliability, sensitivity, and specificity compared to physician reviews. RESULTS: The accuracy for the models was 96.2% for v3.5, 98.3% for v4, and 98.8% for v4o. Inter-rater reliability ranged from 0.99 to 1.00 across 10 iterations. Mean model confidence scores were 2.9 (SD 0.8) for v3.5, 3.9 (SD 0.3) for v4, and 4.0 (SD 0.07) for v4o. Incorrect evaluations were 76 (3.8%) for v3.5, 34 (1.7%) for v4, and 25 (1.3%) for v4o. All models showed sensitivity and specificity above 0.9. CONCLUSIONS: Large language models can assist in identifying incidental conditions from imaging reports that otherwise may be missed opportunities for early disease intervention. Large language models are a democratization of natural language processing by allowing for a user-friendly, expansive analyses of electronic medical records without requiring the development of complex natural language processing models.}, author = {Tyrus Vong and Nicholas Rizer and Vedant Jain and Valerie L Thompson and Mark Dredze and Eili Y Klein and Jeremiah S Hinson and Tanjala Purnell and Stephen Kwak and Tinsay Woreta and Alexandra T Strauss}, date-added = {2025-03-06 17:37:40 -0500}, date-modified = {2025-03-06 17:41:16 -0500}, doi = {10.1097/HC9.0000000000000638}, file = {https://pubmed.ncbi.nlm.nih.gov/39969431/}, journal = {Hepatol Commun}, month = {March}, number = {3}, title = {Automated identification of incidental hepatic steatosis on Emergency Department imaging using large language models.}, volume = {9}, year = {2025}, bdsk-url-1 = {https://doi.org/10.1097/HC9.0000000000000638} } BACKGROUND: Hepatic steatosis is a precursor to more severe liver disease, increasing morbidity and mortality risks. In the Emergency Department, routine abdominal imaging often reveals incidental hepatic steatosis that goes undiagnosed due to the acute nature of encounters. Imaging reports in the electronic health record contain valuable information not easily accessible as discrete data elements. We hypothesized that large language models could reliably detect hepatic steatosis from reports without extensive natural language processing training. METHODS: We identified 200 adults who had CT abdominal imaging in the Emergency Department between August 1, 2016, and December 31, 2023. Using text from imaging reports and structured prompts, 3 Azure OpenAI models (ChatGPT 3.5, 4, 4o) identified patients with hepatic steatosis. We evaluated model performance regarding accuracy, inter-rater reliability, sensitivity, and specificity compared to physician reviews. RESULTS: The accuracy for the models was 96.2% for v3.5, 98.3% for v4, and 98.8% for v4o. Inter-rater reliability ranged from 0.99 to 1.00 across 10 iterations. Mean model confidence scores were 2.9 (SD 0.8) for v3.5, 3.9 (SD 0.3) for v4, and 4.0 (SD 0.07) for v4o. Incorrect evaluations were 76 (3.8%) for v3.5, 34 (1.7%) for v4, and 25 (1.3%) for v4o. All models showed sensitivity and specificity above 0.9. CONCLUSIONS: Large language models can assist in identifying incidental conditions from imaging reports that otherwise may be missed opportunities for early disease intervention. Large language models are a democratization of natural language processing by allowing for a user-friendly, expansive analyses of electronic medical records without requiring the development of complex natural language processing models.

		Bang An, Shiyue Zhang, Mark Dredze. RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models. Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025. [PDF] [Bibtex] [Close] @inproceedings{An:2025aa, abstract = {Efforts to ensure the safety of large language models (LLMs) include safety fine-tuning, evaluation, and red teaming. However, despite the widespread use of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses on standard LLMs, which means we know little about how RAG use cases change a model's safety profile. We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe and change their safety profile. We explore the causes of this change and find that even combinations of safe models with safe documents can cause unsafe generations. In addition, we evaluate some existing red teaming methods for RAG settings and show that they are less effective than when used for non-RAG settings. Our work highlights the need for safety research and red-teaming methods specifically tailored for RAG LLMs.}, annote = {(<b>Ranked in the top 3% of 28m research outputs by <a href="https://www.altmetric.com/details/176504970#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Bang An and Shiyue Zhang and Mark Dredze}, booktitle = {Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2025-01-22 21:55:23 -0500}, date-modified = {2025-05-12 00:58:00 -0400}, file = {https://aclanthology.org/2025.naacl-long.281/}, title = {RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models}, year = {2025} } (Ranked in the top 3% of 28m research outputs by Altmetric) Efforts to ensure the safety of large language models (LLMs) include safety fine-tuning, evaluation, and red teaming. However, despite the widespread use of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses on standard LLMs, which means we know little about how RAG use cases change a model's safety profile. We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe and change their safety profile. We explore the causes of this change and find that even combinations of safe models with safe documents can cause unsafe generations. In addition, we evaluate some existing red teaming methods for RAG settings and show that they are less effective than when used for non-RAG settings. Our work highlights the need for safety research and red-teaming methods specifically tailored for RAG LLMs.

		Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze. Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions. Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025. [PDF] [Bibtex] [Close] @inproceedings{Chen:2025aa, author = {Hanjie Chen and Zhouxiang Fang and Yash Singla and Mark Dredze}, booktitle = {Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2025-01-22 21:54:11 -0500}, date-modified = {2026-03-23 23:08:41 -0400}, file = {https://aclanthology.org/2025.naacl-long.182v2.pdf}, title = {Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions}, year = {2025} }

		David Mueller, Mark Dredze, Nicholas Andrews. Can Optimization Trajectories Explain Multi-Task Transfer? Transactions on Machine Learning Research, 2025. [PDF] [Bibtex] [Close] @article{Mueller:2025aa, author = {David Mueller and Mark Dredze and Nicholas Andrews}, date-added = {2025-01-20 08:30:47 -0500}, date-modified = {2025-01-20 08:32:18 -0500}, file = {https://openreview.net/forum?id=QQE5j2OsLW}, journal = {Transactions on Machine Learning Research}, title = {Can Optimization Trajectories Explain Multi-Task Transfer?}, year = {2025} }

		Emma Pierson, Divya Shanmugam, Rajiv Movva, Jon Kleinberg, Monica Agrawal, Mark Dredze, Kadija Ferryman, Judy Wawira Gichoya, Dan Jurafsky, Pang Wei Koh, Karen Levy, Sendhil Mullainathan, Ziad Obermeyer, Harini Suresh, Keyon Vafa. Using Large Language Models to Promote Health Equity. The New England Journal of Medicine Artificial Intelligence (NEJM AI), 2025. [PDF] [Bibtex] [Close] @article{Pierson:2025aa, abstract = {While the discussion about the effects of large language models (LLMs) on health equity has been largely cautionary, LLMs also present significant opportunities for improving health equity. We highlight three such opportunities: improving the detection of human bias; creating structured datasets relevant to health equity; and improving equity of access to health information.}, author = {Emma Pierson and Divya Shanmugam and Rajiv Movva and Jon Kleinberg and Monica Agrawal and Mark Dredze and Kadija Ferryman and Judy Wawira Gichoya and Dan Jurafsky and Pang Wei Koh and Karen Levy and Sendhil Mullainathan and Ziad Obermeyer and Harini Suresh and Keyon Vafa}, date-added = {2025-01-13 13:25:32 -0500}, date-modified = {2025-01-13 13:28:32 -0500}, file = {https://doi.org/10.1056/AIp2400889}, journal = {The New England Journal of Medicine Artificial Intelligence (NEJM AI)}, title = {Using Large Language Models to Promote Health Equity}, year = {2025} } While the discussion about the effects of large language models (LLMs) on health equity has been largely cautionary, LLMs also present significant opportunities for improving health equity. We highlight three such opportunities: improving the detection of human bias; creating structured datasets relevant to health equity; and improving equity of access to health information.

		Brittany Nesbitt, Danielle Virgadamo, Carlos Aguirre, Matthew DeCamp, Mark Dredze, Keith Harrigian, Tenzin Lhaksampa, Jenn Meuchel, Aja Meyer, Alex Walker, Ayah Zirikly, Margaret Chisolm, Peter Zandi, Leslie Miller. Design Considerations and Protocol for Testing a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals with Mood and Anxiety Disorders. JMIR Research Protocols, 2025. [Bibtex] [Close] @article{Nesbitt:2025aa, author = {Brittany Nesbitt and Danielle Virgadamo and Carlos Aguirre and Matthew DeCamp and Mark Dredze and Keith Harrigian and Tenzin Lhaksampa and Jenn Meuchel and Aja Meyer and Alex Walker and Ayah Zirikly and Margaret Chisolm and Peter Zandi and Leslie Miller}, date-added = {2025-01-02 13:47:45 -0500}, date-modified = {2025-01-02 13:52:00 -0500}, journal = {JMIR Research Protocols}, title = {Design Considerations and Protocol for Testing a Dashboard Intervention for Tracking Digital Social Media Activity in Clinical Care of Individuals with Mood and Anxiety Disorders}, year = {2025} }

		2024 (20 Publications)
		Karan S Desai, Hollie Keene, Mark Dredze, Davey M Smith, John W Ayers. Characterizing Services Advertised on Crisis Pregnancy Center Websites. JAMA Internal Medicine, 2024. [PDF] [Bibtex] [Close] @article{Desai:2024aa, abstract = { Crisis pregnancy centers (CPCs) are nonprofit organizations dedicated to an antiabortion agenda and exist primarily to promote alternatives to abortion. Concerns about CPCs' claims and practices are widespread among clinicians and rulemakers, but those concerns rely on anecdotes or small community studies. We systematically investigated CPC websites to characterize where they operate and what services they advertise.}, annote = {(<b>Ranked in the top 0.3% of 27m research outputs by <a href="https://jamanetwork.altmetric.com/details/171421751#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Karan S. Desai and Hollie Keene and Mark Dredze and Davey M. Smith and John W. Ayers}, date-added = {2024-12-02 21:23:02 -0500}, date-modified = {2024-12-11 21:02:25 -0500}, file = {https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2827342}, journal = {JAMA Internal Medicine}, title = {Characterizing Services Advertised on Crisis Pregnancy Center Websites}, year = {2024} } (Ranked in the top 0.3% of 27m research outputs by Altmetric) Crisis pregnancy centers (CPCs) are nonprofit organizations dedicated to an antiabortion agenda and exist primarily to promote alternatives to abortion. Concerns about CPCs' claims and practices are widespread among clinicians and rulemakers, but those concerns rely on anecdotes or small community studies. We systematically investigated CPC websites to characterize where they operate and what services they advertise.

		Alanna J Bergman, Katherine C McNabb, Michael V Relf, Mark Dredze. Where No One Has Gone Before": Questions to Ensure the Ethical, Rigorous, and Thoughtful Application of Artificial Intelligence in the Analysis of HIV Research. Journal of the Association of Nurses in AIDS Care, 2024. [PDF] [Bibtex] [Close] @article{Bergman:2024aa, abstract = {ChatGPT, an artificial intelligence (AI) system released by OpenAI on November 30th, 2022, has upended scientific and educational paradigms, reshaping the way that we think about teaching, writing, and now research. Since that time, qualitative data analytic software programs such as ATLAS.ti have quickly incorporated AI into their programs to assist with or even replace human coding. Qualitative research is key to understanding the complexity and nuance of HIV-related behaviors, through descriptive and historical textual research, as well as the lived experiences of people with HIV. This commentary weighs the pros and cons of the use of AI coding in HIV-related qualitative research. We pose guiding questions that may help researchers evaluate the application and scope of AI in qualitative research as determined by the research question, underlying epistemology, and goal(s). Qualitative data encompasses a variety of media, methodologies, and styles that exist on a spectrum underpinned by epistemology. The research question and the data sources are informed by the researcher's epistemological viewpoint. Given the heterogeneous applications of qualitative research in nursing, medicine, and public health there are circumstances where qualitative AI coding is appropriate, but this should be congruent with the aims and underlying epistemology of the research.}, author = {Alanna J Bergman and Katherine C McNabb and Michael V Relf and Mark Dredze}, date-added = {2024-11-13 19:54:55 -0500}, date-modified = {2024-12-04 09:22:15 -0500}, file = {https://doi.org/10.1097/JNC.0000000000000483}, journal = {Journal of the Association of Nurses in AIDS Care}, pages = {450-455}, title = {{Where No One Has Gone Before": Questions to Ensure the Ethical, Rigorous, and Thoughtful Application of Artificial Intelligence in the Analysis of HIV Research}, volume = {01;35(5):}, year = {2024} } ChatGPT, an artificial intelligence (AI) system released by OpenAI on November 30th, 2022, has upended scientific and educational paradigms, reshaping the way that we think about teaching, writing, and now research. Since that time, qualitative data analytic software programs such as ATLAS.ti have quickly incorporated AI into their programs to assist with or even replace human coding. Qualitative research is key to understanding the complexity and nuance of HIV-related behaviors, through descriptive and historical textual research, as well as the lived experiences of people with HIV. This commentary weighs the pros and cons of the use of AI coding in HIV-related qualitative research. We pose guiding questions that may help researchers evaluate the application and scope of AI in qualitative research as determined by the research question, underlying epistemology, and goal(s). Qualitative data encompasses a variety of media, methodologies, and styles that exist on a spectrum underpinned by epistemology. The research question and the data sources are informed by the researcher's epistemological viewpoint. Given the heterogeneous applications of qualitative research in nursing, medicine, and public health there are circumstances where qualitative AI coding is appropriate, but this should be congruent with the aims and underlying epistemology of the research.

		Fan Bai, Keith Harrigian, Joel Stremmel, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze. Give me Some Hard Questions: Synthetic Data Generation for Clinical QA. Machine Learning for Health (ML4H) (Findings), 2024. [Bibtex] [Close] @inproceedings{Bai:2024ab, author = {Fan Bai and Keith Harrigian and Joel Stremmel and Hamid Hassanzadeh and Ardavan Saeedi and Mark Dredze}, booktitle = {Machine Learning for Health (ML4H) (Findings)}, date-added = {2024-11-03 21:34:34 -0500}, date-modified = {2024-11-03 21:35:14 -0500}, title = {Give me Some Hard Questions: Synthetic Data Generation for Clinical QA}, year = {2024} }

		Yahan Li, Keith Harrigian, Ayah Zirikly, Mark Dredze. Are Clinical T5 Models Better for Clinical Text? Machine Learning for Health (ML4H), 2024. [Bibtex] [Close] @inproceedings{Li:2024aa, author = {Yahan Li and Keith Harrigian and Ayah Zirikly and Mark Dredze}, booktitle = {Machine Learning for Health (ML4H)}, date-added = {2024-11-03 21:33:15 -0500}, date-modified = {2024-11-03 21:34:27 -0500}, title = {Are Clinical T5 Models Better for Clinical Text?}, year = {2024} }

		Matthew R Allen, Gwenyth Portillo Wightman, Zechariah Zhu, Adam Poliak, Davey M Smith, Mark Dredze, John W Ayers. Pharmacovigilance in the Age of Legalized Cannabis: Using Social Media to Monitor Drug--Drug Interactions Between Immunosuppressants and Cannabis-Derived Products. Drug Safety, 2024. [PDF] [Bibtex] [Close] @article{Allen:2024aa, author = {Matthew R. Allen and Gwenyth Portillo Wightman and Zechariah Zhu and Adam Poliak and Davey M. Smith and Mark Dredze and John W. Ayers}, date-added = {2024-09-27 10:40:59 -0400}, date-modified = {2024-09-27 10:42:39 -0400}, file = {https://doi.org/10.1007/s40264-024-01481-x}, journal = {Drug Safety}, title = {Pharmacovigilance in the Age of Legalized Cannabis: Using Social Media to Monitor Drug--Drug Interactions Between Immunosuppressants and Cannabis-Derived Products}, year = {2024} }

		Jordi Armengol-Estap\'e, Lingyu Li, Sebastian Gehrmann, Achintya Gopal, David S Rosenberg, Gideon S Mann, Mark Dredze. Can We Statically Locate Knowledge in Large Language Models? Financial Domain and Toxicity Reduction Case Studies. EMNLP Workshop on analyzing and interpreting neural networks for NLP (BlackboxNLP), 2024. [PDF] [Bibtex] [Close] @inproceedings{armengol-estape2024can, author = {Jordi Armengol-Estap{\'e} and Lingyu Li and Sebastian Gehrmann and Achintya Gopal and David S Rosenberg and Gideon S. Mann and Mark Dredze}, booktitle = {EMNLP Workshop on analyzing and interpreting neural networks for NLP (BlackboxNLP)}, date-modified = {2024-12-04 09:36:53 -0500}, file = {https://aclanthology.org/2024.blackboxnlp-1.9/}, keywords = {workshop}, title = {Can We Statically Locate Knowledge in Large Language Models? Financial Domain and Toxicity Reduction Case Studies}, year = {2024}, bdsk-url-1 = {https://openreview.net/forum?id=ryqwV6rrAi} }

		Sharon Levy, William Adler, Tahilin Sanchez Karver, Mark Dredze, Michelle R Kaufman. Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts. Empirical Methods in Natural Language Processing (EMNLP) (Findings), 2024. [PDF] [Bibtex] [Close] @inproceedings{William-Adler:2024aa, author = {Sharon Levy and William Adler and Tahilin Sanchez Karver and Mark Dredze and Michelle R Kaufman}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP) (Findings)}, date-added = {2024-09-20 13:27:50 -0400}, date-modified = {2024-12-04 09:37:22 -0500}, file = {https://arxiv.org/abs/2410.11084}, title = {Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts}, year = {2024} }

		Sharon Levy, Tahilin Sanchez Karver, William Adler, Michelle R Kaufman, Mark Dredze. Evaluating Biases in Context-Dependent Sexual and Reproductive Health Questions. Empirical Methods in Natural Language Processing (EMNLP) (Findings), 2024. [PDF] [Bibtex] [Close] @inproceedings{Levy:2024aa, author = {Sharon Levy and Tahilin Sanchez Karver and William Adler and Michelle R Kaufman and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP) (Findings)}, date-added = {2024-09-20 13:27:23 -0400}, date-modified = {2024-12-04 09:37:58 -0500}, file = {https://aclanthology.org/2024.findings-emnlp.332.pdf}, title = {Evaluating Biases in Context-Dependent Sexual and Reproductive Health Questions}, year = {2024} }

		Fan Bai, Junmo Kang, Gabriel Stanovsky, Dayne Freitag, Mark Dredze, Alan Ritter. Schema-Driven Information Extraction from Heterogeneous Tables. Empirical Methods in Natural Language Processing (EMNLP) (Findings), 2024. [PDF] [Bibtex] [Close] @inproceedings{Bai:2024aa, author = {Fan Bai and Junmo Kang and Gabriel Stanovsky and Dayne Freitag and Mark Dredze and Alan Ritter}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP) (Findings)}, date-added = {2024-09-20 13:26:48 -0400}, date-modified = {2024-12-04 09:38:43 -0500}, file = {https://aclanthology.org/2024.findings-emnlp.600.pdf}, title = {Schema-Driven Information Extraction from Heterogeneous Tables}, year = {2024} }

		Alexander Spangher, Nanyun Peng, Sebastian Gehrmann, Mark Dredze. Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs. Empirical Methods in Natural Language Processing (EMNLP), 2024. [PDF] [Bibtex] [Close] @inproceedings{Spangher:2024aa, author = {Alexander Spangher and Nanyun Peng and Sebastian Gehrmann and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2024-09-20 13:26:07 -0400}, date-modified = {2024-11-25 07:38:24 -0500}, file = {https://aclanthology.org/2024.emnlp-main.1216/}, title = {Do LLMs Plan Like Human Writers? Comparing Journalist Coverage of Press Releases with LLMs}, year = {2024} }

		Mark Dredze, Genta Indra Winata, Prabhanjan Kambadur, Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, David S Rosenberg, Sebastian Gehrmann. Academics Can Contribute to Domain-Specialized Language Models. Empirical Methods in Natural Language Processing (EMNLP), 2024. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2024aa, author = {Mark Dredze and Genta Indra Winata and Prabhanjan Kambadur and Shijie Wu and Ozan Irsoy and Steven Lu and Vadim Dabravolski and David S Rosenberg and Sebastian Gehrmann}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2024-09-20 13:22:42 -0400}, date-modified = {2024-12-04 09:39:55 -0500}, file = {https://aclanthology.org/2024.emnlp-main.293/}, title = {Academics Can Contribute to Domain-Specialized Language Models}, year = {2024} }

		Carlos Aguirre, Kuleen Sasse, Isabel Cachola, Mark Dredze. Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models. EMNLP Workshop on NLP for Positive Impact, 2024. [PDF] [Bibtex] [Close] @inproceedings{Aguirre:2024ab, author = {Carlos Aguirre and Kuleen Sasse and Isabel Cachola and Mark Dredze}, booktitle = {EMNLP Workshop on NLP for Positive Impact}, date-added = {2024-09-11 23:42:53 -0400}, date-modified = {2024-12-04 10:05:39 -0500}, file = {https://aclanthology.org/2024.nlp4pi-1.4/}, keywords = {workshop}, title = {Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models}, year = {2024} }

		Carlos Aguirre, Mark Dredze. Transferring Fairness using Multi-Task Learning with Limited Demographic Information. EMNLP Workshop on NLP for Positive Impact, 2024. [PDF] [Bibtex] [Close] @inproceedings{Aguirre:2024aa, author = {Carlos Aguirre and Mark Dredze}, booktitle = {EMNLP Workshop on NLP for Positive Impact}, date-added = {2024-09-11 23:41:43 -0400}, date-modified = {2024-12-04 10:06:11 -0500}, file = {https://aclanthology.org/2024.nlp4pi-1.3/}, keywords = {workshop}, title = {Transferring Fairness using Multi-Task Learning with Limited Demographic Information}, year = {2024} }

		Keith Harrigian, Diep Tran, Tina Tang, Anthony Gonzales, Paul Nagy, Hadi Kharrazi, Mark Dredze, Cindy X Cai. Improving the identification of diabetic retinopathy and related conditions in the electronic health record using natural language processing methods. Ophthalmology Science, 2024. [PDF] [Bibtex] [Close] @article{Harrigian:2024aa, author = {Keith Harrigian and Diep Tran and Tina Tang and Anthony Gonzales and Paul Nagy and Hadi Kharrazi and Mark Dredze and Cindy X. Cai}, date-added = {2024-08-12 12:27:10 -0500}, date-modified = {2024-08-12 12:28:27 -0500}, file = {https://doi.org/10.1016/j.xops.2024.100578}, journal = {Ophthalmology Science}, title = {Improving the identification of diabetic retinopathy and related conditions in the electronic health record using natural language processing methods}, year = {2024} }

		John W Ayers, Adam Poliak, Nikolas T Beros, Michael Paul, Mark Dredze, Michael Hogarth, Davey M Smith. A Digital Cohort Approach for Social Media Monitoring: A Cohort Study of People Who Vape E-Cigarettes. American Journal of Preventive Medicine (AJPM), 2024;67(1):147-154. [PDF] [Bibtex] [Close] @article{Ayers:2024aa, abstract = {Introduction. The evidence hierarchy in public health emphasizes longitudinal studies, whereas social media monitoring relies on aggregate analyses. Authors propose integrating longitudinal analyses into social media monitoring by creating a digital cohort of individual account holders, as demonstrated by a case study analysis of people who vape. Methods. All English language X posts mentioning vape or vaping were collected from January 1, 2017 through December 31, 2020. The digital cohort was composed of people who self-reported vaping and posted at least 10 times about vaping during the study period to determine the (1) prevalence, (2) success rate, and (3) timing of cessation behaviors. Results. There were 25,112 instances where an account shared at least 10 posts about vaping, with 619 (95% CI= 616, 622) mean person-days and 43,810,531 cumulative person-days of observation. Among a random sample of accounts, 39% (95% CI= 35, 43) belonged to persons who vaped. Among this digital cohort, 27% (95% CI= 21, 33) reported making a quit attempt. For all first quit attempts, 26% (95% CI= 19, 33) were successful on the basis of their subsequent vaping posts. Among those with a failed first cessation attempt, 13% (95% CI= 6, 19) subsequently made an additional quit attempt, of whom 36% (95% CI= 11, 61) were successful. On average, a quit attempt occurred 531 days (95% CI= 474, 588) after their first vaping-related post. If their quit attempt failed, any second quit attempt occurred 361 days (95% CI= 250, 474) after their first quit attempt. Conclusions. By aligning with standard epidemiologic surveillance practices, this approach can greatly enhance the usefulness of social media monitoring in informing public health decision making, such as yielding insights into the timing of cessation behaviors among people who vape.}, author = {John W. Ayers and Adam Poliak and Nikolas T. Beros and Michael Paul and Mark Dredze and Michael Hogarth and Davey M. Smith}, date-added = {2024-07-01 16:16:40 -0400}, date-modified = {2024-07-01 16:19:44 -0400}, file = {https://doi.org/10.1016/j.amepre.2024.01.016}, journal = {American Journal of Preventive Medicine (AJPM)}, month = {July}, number = {1}, pages = {147-154}, title = {A Digital Cohort Approach for Social Media Monitoring: A Cohort Study of People Who Vape E-Cigarettes}, volume = {67}, year = {2024} } Introduction. The evidence hierarchy in public health emphasizes longitudinal studies, whereas social media monitoring relies on aggregate analyses. Authors propose integrating longitudinal analyses into social media monitoring by creating a digital cohort of individual account holders, as demonstrated by a case study analysis of people who vape. Methods. All English language X posts mentioning vape or vaping were collected from January 1, 2017 through December 31, 2020. The digital cohort was composed of people who self-reported vaping and posted at least 10 times about vaping during the study period to determine the (1) prevalence, (2) success rate, and (3) timing of cessation behaviors. Results. There were 25,112 instances where an account shared at least 10 posts about vaping, with 619 (95% CI= 616, 622) mean person-days and 43,810,531 cumulative person-days of observation. Among a random sample of accounts, 39% (95% CI= 35, 43) belonged to persons who vaped. Among this digital cohort, 27% (95% CI= 21, 33) reported making a quit attempt. For all first quit attempts, 26% (95% CI= 19, 33) were successful on the basis of their subsequent vaping posts. Among those with a failed first cessation attempt, 13% (95% CI= 6, 19) subsequently made an additional quit attempt, of whom 36% (95% CI= 11, 61) were successful. On average, a quit attempt occurred 531 days (95% CI= 474, 588) after their first vaping-related post. If their quit attempt failed, any second quit attempt occurred 361 days (95% CI= 250, 474) after their first quit attempt. Conclusions. By aligning with standard epidemiologic surveillance practices, this approach can greatly enhance the usefulness of social media monitoring in informing public health decision making, such as yielding insights into the timing of cessation behaviors among people who vape.

		Eric C Leas, John W Ayers, Nimit Desai, Mark Dredze, Michael Hogarth, Davey M Smith. Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection. Journal of Medical Internet Research (JMIR), 2024. [PDF] [Bibtex] [Close] @article{Leas:2024aa, abstract = {This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT's performance with human annotators' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT's training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.}, author = {Eric C Leas and John W Ayers and Nimit Desai and Mark Dredze and Michael Hogarth and Davey M Smith}, date-added = {2024-05-16 09:09:22 -0400}, date-modified = {2024-05-16 09:11:28 -0400}, file = {https://doi.org/10.2196/52499}, journal = {Journal of Medical Internet Research (JMIR)}, number = {e52499}, title = {Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection}, volume = {26}, year = {2024} } This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT's performance with human annotators' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT's training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.

		David Mueller, Mark Dredze, Nicholas Andrews. Multi-Task Transfer Matters During Instruction-Tuning. Association for Computational Linguistics (ACL), 2024. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2024aa, abstract = {Instruction-tuning trains a language model on hundreds of tasks jointly to improve a model's ability to learn in-context; however, the mechanisms that drive in-context learning are poorly understood and, as a result, the role of instruction-tuning on in-context generalization is poorly understood as well. In this work, we study the impact of instruction-tuning on multi-task transfer: how well a model's parameters adapt to an unseen task via fine-tuning. We find that instruction-tuning negatively impacts a model's transfer to unseen tasks, and that model transfer and in-context generalization are highly correlated, suggesting that this catastrophic forgetting may impact in-context learning. We study methods to improve model transfer, finding that multi-task training---how well the training tasks are optimized---can significantly impact ICL generalization; additionally, we find that continual training on unsupervised pre-training data can mitigate forgetting and improve ICL generalization as well. Finally, we demonstrate that, early into training, the impact of instruction-tuning on model transfer to tasks impacts in-context generalization on that task. Overall, we provide significant evidence that multi-task transfer is deeply connected to a model's ability to learn a task in-context.}, author = {David Mueller and Mark Dredze and Nicholas Andrews}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2024-05-16 09:08:16 -0400}, date-modified = {2024-08-13 00:25:53 -0500}, file = {https://aclanthology.org/2024.findings-acl.883.pdf}, title = {Multi-Task Transfer Matters During Instruction-Tuning}, year = {2024} } Instruction-tuning trains a language model on hundreds of tasks jointly to improve a model's ability to learn in-context; however, the mechanisms that drive in-context learning are poorly understood and, as a result, the role of instruction-tuning on in-context generalization is poorly understood as well. In this work, we study the impact of instruction-tuning on multi-task transfer: how well a model's parameters adapt to an unseen task via fine-tuning. We find that instruction-tuning negatively impacts a model's transfer to unseen tasks, and that model transfer and in-context generalization are highly correlated, suggesting that this catastrophic forgetting may impact in-context learning. We study methods to improve model transfer, finding that multi-task training---how well the training tasks are optimized---can significantly impact ICL generalization; additionally, we find that continual training on unsupervised pre-training data can mitigate forgetting and improve ICL generalization as well. Finally, we demonstrate that, early into training, the impact of instruction-tuning on model transfer to tasks impacts in-context generalization on that task. Overall, we provide significant evidence that multi-task transfer is deeply connected to a model's ability to learn a task in-context.

		Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, Benjamin Van Durme. A Closer Look at Claim Decomposition. Joint Conference on Lexical and Computational Semantics (SEM 2023), 2024. [PDF] [Bibtex] [Close] @inproceedings{Wanner:2024aa, abstract = {As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.}, author = {Miriam Wanner and Seth Ebner and Zhengping Jiang and Mark Dredze and Benjamin Van Durme}, booktitle = {Joint Conference on Lexical and Computational Semantics (SEM 2023)}, date-added = {2024-05-02 10:22:34 -0400}, date-modified = {2024-06-25 15:00:42 -0400}, file = {https://arxiv.org/abs/2403.11903}, title = {A Closer Look at Claim Decomposition}, year = {2024} } As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.

		Matthew R Allen, Nimit Desai, Aiden Namazi, Eric Leas, Mark Dredze, Davey M Smith, John W Ayers. Characteristics of X (Formerly Twitter) Community Notes Addressing COVID-19 Vaccine Misinformation. JAMA, 2024. [PDF] [Bibtex] [Close] @article{10.1001/jama.2024.4800, abstract = {{Social media can magnify health misinformation, especially about vaccination. Platform countermeasures have included censoring, shadowbanning (limiting distribution without disclosure), and adding warning labels to problematic content. Yet, evaluating these countermeasures is challenging due to restrictive public disclosures about their inner workings.In late 2022, X (formerly Twitter) introduced Community Notes, a crowdsourced misinformation countermeasure. Anonymous volunteer contributors independently identify posts containing misinformation and propose corrections called ``notes.'' Notes labeled as helpful by contributors who disagreed on past notes (to rely on a diversity of perspectives) are shown alongside the original post. Because Community Notes is open source, we were able to evaluate the topics, accuracy, and credibility of notes addressing COVID-19 vaccination.}}, annote = {(<b>Ranked in the top 0.6% of 26m research outputs by <a href="https://jamanetwork.altmetric.com/details/162819169#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Matthew R. Allen and Nimit Desai and Aiden Namazi and Eric Leas and Mark Dredze and Davey M. Smith and John W. Ayers}, date-added = {2024-05-01 17:07:05 -0400}, date-modified = {2024-05-23 15:08:07 -0400}, doi = {10.1001/jama.2024.4800}, file = {https://doi.org/10.1001/jama.2024.4800}, journal = {JAMA}, month = {04}, title = {Characteristics of X (Formerly Twitter) Community Notes Addressing COVID-19 Vaccine Misinformation}, year = {2024}, bdsk-url-1 = {https://doi.org/10.1001/jama.2024.4800} } (Ranked in the top 0.6% of 26m research outputs by Altmetric) Social media can magnify health misinformation, especially about vaccination. Platform countermeasures have included censoring, shadowbanning (limiting distribution without disclosure), and adding warning labels to problematic content. Yet, evaluating these countermeasures is challenging due to restrictive public disclosures about their inner workings.In late 2022, X (formerly Twitter) introduced Community Notes, a crowdsourced misinformation countermeasure. Anonymous volunteer contributors independently identify posts containing misinformation and propose corrections called ``notes.'' Notes labeled as helpful by contributors who disagreed on past notes (to rely on a diversity of perspectives) are shown alongside the original post. Because Community Notes is open source, we were able to evaluate the topics, accuracy, and credibility of notes addressing COVID-19 vaccination.

		Paiheng Xu, David A Broniatowski, Mark Dredze. Twitter social mobility data reveal demographic variations in social distancing practices during the COVID-19 pandemic. Scientific Reports, 2024. [PDF] [Bibtex] [Close] @article{Xu:2024aa, author = {Paiheng Xu and David A Broniatowski and Mark Dredze}, date-added = {2024-01-12 15:39:59 -0500}, date-modified = {2024-01-12 15:40:57 -0500}, file = {https://doi.org/10.1038/s41598-024-51555-0}, journal = {Scientific Reports}, number = {1165}, title = {Twitter social mobility data reveal demographic variations in social distancing practices during the COVID-19 pandemic}, volume = {14}, year = {2024} }

		2023 (14 Publications)
		Keith Harrigian, Tina Tang, Anthony Franco Gonzales, Cindy Cai, Mark Dredze. An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping. Machine Learning for Health (ML4H) (Findings), 2023. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2023ab, abstract = {Diabetic eye disease is a major cause of blindness worldwide. The ability to monitor relevant clinical trajectories and detect lapses in care is critical to managing the disease and preventing blindness. Alas, much of the information necessary to support these goals is found only in the free text of the electronic medical record. To fill this information gap, we introduce a system for extracting evidence from clinical text of 19 clinical concepts related to diabetic eye disease and inferring relevant attributes for each. In developing this ophthalmology phenotype system, we are also afforded a unique opportunity to evaluate the effectiveness of clinical language models at adapting to new clinical domains. Across multiple training paradigms, we find that BERT language models pretrained on out-of-distribution clinical data offer no significant improvement over BERT language models pretrained on non-clinical data for our domain. Our study calls into question recent work which suggests advances in clinical language modeling are driven by improvements in the transfer of generalizable clinical knowledge.}, author = {Keith Harrigian and Tina Tang and Anthony Franco Gonzales and Cindy Cai and Mark Dredze}, booktitle = {Machine Learning for Health (ML4H) (Findings)}, date-added = {2023-11-01 21:52:52 -0400}, date-modified = {2024-02-29 22:37:43 -0500}, file = {https://arxiv.org/pdf/2311.08687.pdf}, title = {An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping}, year = {2023} } Diabetic eye disease is a major cause of blindness worldwide. The ability to monitor relevant clinical trajectories and detect lapses in care is critical to managing the disease and preventing blindness. Alas, much of the information necessary to support these goals is found only in the free text of the electronic medical record. To fill this information gap, we introduce a system for extracting evidence from clinical text of 19 clinical concepts related to diabetic eye disease and inferring relevant attributes for each. In developing this ophthalmology phenotype system, we are also afforded a unique opportunity to evaluate the effectiveness of clinical language models at adapting to new clinical domains. Across multiple training paradigms, we find that BERT language models pretrained on out-of-distribution clinical data offer no significant improvement over BERT language models pretrained on non-clinical data for our domain. Our study calls into question recent work which suggests advances in clinical language modeling are driven by improvements in the transfer of generalizable clinical knowledge.

		Alexandra DeLucia, Mark Dredze, Anna L Buczak. A Multi-instance Learning Approach to Civil Unrest Event Detection on Twitter. RANLP Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE), 2023. [PDF] [Bibtex] [Close] @inproceedings{DeLucia:2023ab, author = {Alexandra DeLucia and Mark Dredze and Anna L. Buczak}, booktitle = {RANLP Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)}, date-added = {2023-09-06 09:35:17 -0400}, date-modified = {2023-09-06 10:00:02 -0400}, file = {2023_case_workshop.pdf}, title = {A Multi-instance Learning Approach to Civil Unrest Event Detection on Twitter}, year = {2023} }

		John W Ayers, Zechariah Zhu, Keith Harrigian, Gwenyth P Wightman, Mark Dredze, Steffanie A Strathdee, Davey M Smith. Managing HIV During the COVID-19 Pandemic: A Study of Help-Seeking Behaviors on a Social Media Forum. AIDS and Behavior, 2023. [PDF] [Bibtex] [Close] @article{Ayers:2023ac, author = {John W. Ayers and Zechariah Zhu and Keith Harrigian and Gwenyth P. Wightman and Mark Dredze and Steffanie A. Strathdee and Davey M. Smith}, date-added = {2023-08-17 13:58:25 -0400}, date-modified = {2023-08-17 14:00:28 -0400}, file = {https://doi.org/10.1007/s10461-023-04134-9}, journal = {AIDS and Behavior}, title = {Managing HIV During the COVID-19 Pandemic: A Study of Help-Seeking Behaviors on a Social Media Forum}, year = {2023} }

		Katherine Hoops, Paul S Nestadt, Mark Dredze. The case for social media standards on suicide. The Lancet Psychiatry, 2023. [PDF] [Bibtex] [Close] @article{Hoops:2023aa, author = {Katherine Hoops and Paul S Nestadt and Mark Dredze}, date-added = {2023-08-17 13:35:52 -0400}, date-modified = {2023-08-17 13:36:57 -0400}, file = {https://doi.org/10.1016/S2215-0366(23)00222-5}, journal = {The Lancet Psychiatry}, title = {The case for social media standards on suicide}, year = {2023} }

		John W Ayers, Zechariah Zhu, Adam Poliak, Eric C Leas, Mark Dredze, Michael Hogarth, Davey M Smith. Evaluating Artificial Intelligence Responses to Public Health Questions. JAMA Network Open, 2023. [PDF] [Bibtex] [Close] @article{Ayers:2023ab, abstract = {Artificial intelligence (AI) assistants have the potential to transform public health by offering accurate and actionable information to the general public. Unlike web-based knowledge resources (eg, Google Search) that return numerous results and require the searcher to synthesize information, AI assistants are designed to receive complex questions and provide specific answers. However, AI assistants often fail to recognize and respond to basic health questions. ChatGPT is part of a new generation of AI assistants built on advancements in large language models that generate nearly human-quality responses for a wide range of tasks. Although studies3 have focused on using ChatGPT as a supporting resource for healthcare professionals, it is unclear how well ChatGPT handles general health inquiries from the lay public. In this cross-sectional study, we evaluated ChatGPT responses to public health questions.}, annote = {(<b>Ranked in the top 0.15% of 23m research outputs by <a href="https://jamanetwork.altmetric.com/details/149693988#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W. Ayers and Zechariah Zhu and Adam Poliak and Eric C. Leas and Mark Dredze and Michael Hogarth and Davey M. Smith}, date-added = {2023-06-07 20:29:06 -0500}, date-modified = {2023-06-12 00:39:48 -0400}, file = {https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2805756}, journal = {JAMA Network Open}, number = {6}, title = {Evaluating Artificial Intelligence Responses to Public Health Questions}, volume = {6}, year = {2023} } (Ranked in the top 0.15% of 23m research outputs by Altmetric) Artificial intelligence (AI) assistants have the potential to transform public health by offering accurate and actionable information to the general public. Unlike web-based knowledge resources (eg, Google Search) that return numerous results and require the searcher to synthesize information, AI assistants are designed to receive complex questions and provide specific answers. However, AI assistants often fail to recognize and respond to basic health questions. ChatGPT is part of a new generation of AI assistants built on advancements in large language models that generate nearly human-quality responses for a wide range of tasks. Although studies3 have focused on using ChatGPT as a supporting resource for healthcare professionals, it is unclear how well ChatGPT handles general health inquiries from the lay public. In this cross-sectional study, we evaluated ChatGPT responses to public health questions.

		Gwenyth Portillo Wightman, Alexandra DeLucia, Mark Dredze. Strength in Numbers: Estimating Confidence of Large Language Models by Prompt Agreement. ACL Workshop on Trustworthy Natural Language Processing (TrustNLP), 2023. [PDF] [Bibtex] [Close] @inproceedings{Wightman:2023aa, abstract = {Large language models have achieved impressive few-shot performance on a wide variety of tasks. However, in many settings, users require confidence estimates for model predictions. While traditional classifiers produce scores for each label, language models instead produce scores for the generation which may not be well calibrated. We compare generations across diverse prompts and show that these can be used to create confidence scores. By utilizing more prompts we can get more precise confidence estimates and use response diversity as a proxy for confidence. We evaluate this approach across ten multiple-choice question-answering datasets using three models: T0, FLAN-T5, and GPT-3. In addition to analyzing multiple human written prompts, we automatically generate more prompts using a language model in order to produce finer-grained confidence estimates. Our method produces more calibrated confidence estimates compared to the log probability of the answer to a single prompt. These improvements could benefit users who rely on prediction confidence for integration into a larger system or in decision-making processes.}, author = {Gwenyth Portillo Wightman and Alexandra DeLucia and Mark Dredze}, booktitle = {ACL Workshop on Trustworthy Natural Language Processing (TrustNLP)}, date-added = {2023-05-23 09:41:22 -0400}, date-modified = {2024-05-23 15:33:03 -0400}, file = {https://aclanthology.org/2023.trustnlp-1.28/}, keywords = {workshop}, title = {Strength in Numbers: Estimating Confidence of Large Language Models by Prompt Agreement}, year = {2023} } Large language models have achieved impressive few-shot performance on a wide variety of tasks. However, in many settings, users require confidence estimates for model predictions. While traditional classifiers produce scores for each label, language models instead produce scores for the generation which may not be well calibrated. We compare generations across diverse prompts and show that these can be used to create confidence scores. By utilizing more prompts we can get more precise confidence estimates and use response diversity as a proxy for confidence. We evaluate this approach across ten multiple-choice question-answering datasets using three models: T0, FLAN-T5, and GPT-3. In addition to analyzing multiple human written prompts, we automatically generate more prompts using a language model in order to produce finer-grained confidence estimates. Our method produces more calibrated confidence estimates compared to the log probability of the answer to a single prompt. These improvements could benefit users who rely on prediction confidence for integration into a larger system or in decision-making processes.

		Elliot Schumacher, James Mayfield, Mark Dredze. On the Surprising Effectiveness of Name Matching Alone in Autoregressive Entity Linking. ACL Workshop on Matching, 2023. [PDF] [Bibtex] [Close] @inproceedings{Schumacher:2023aa, abstract = {Fifteen years of work on entity linking has established the importance of different information sources in making linking decisions: mention and entity name similarity, contextual relevance, and features of the knowledge base. Modern state-of-the-art systems build on these features, including through neural representations (Wu et al., 2020). In contrast to this trend, the autoregressive language model GENRE (De Cao et al., 2021) generates normalized entity names for mentions and beats many other entity linking systems, despite making no use of knowledge base (KB) information. How is this possible? We analyze the behavior of GENRE on several entity linking datasets and demonstrate that its performance stems from memorization of name patterns. In contrast, it fails in cases that might benefit from using the KB. We experiment with a modification to the model to enable it to utilize KB information, highlighting challenges to incorporating traditional entity linking information sources into autoregressive models.}, author = {Elliot Schumacher and James Mayfield and Mark Dredze}, booktitle = {ACL Workshop on Matching}, date-added = {2023-05-23 00:39:48 -0400}, date-modified = {2024-05-23 15:35:21 -0400}, file = {https://aclanthology.org/2023.matching-1.6/}, keywords = {workshop}, title = {On the Surprising Effectiveness of Name Matching Alone in Autoregressive Entity Linking}, year = {2023} } Fifteen years of work on entity linking has established the importance of different information sources in making linking decisions: mention and entity name similarity, contextual relevance, and features of the knowledge base. Modern state-of-the-art systems build on these features, including through neural representations (Wu et al., 2020). In contrast to this trend, the autoregressive language model GENRE (De Cao et al., 2021) generates normalized entity names for mentions and beats many other entity linking systems, despite making no use of knowledge base (KB) information. How is this possible? We analyze the behavior of GENRE on several entity linking datasets and demonstrate that its performance stems from memorization of name patterns. In contrast, it fails in cases that might benefit from using the KB. We experiment with a modification to the model to enable it to utilize KB information, highlighting challenges to incorporating traditional entity linking information sources into autoregressive models.

		Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Som Saha, Mary Catherine Beach, Mark Dredze. Characterization of Stigmatizing Language in Medical Records. Association for Computational Linguistics (ACL), 2023. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2023aa, abstract = {Widespread disparities in clinical outcomes exist between different demographic groups in the United States. A new line of work in medical sociology has demonstrated physicians often use stigmatizing language in electronic medical records within certain groups, such as black patients, which may exacerbate disparities. In this study, we characterize these instances at scale using a series of domain-informed NLP techniques. We highlight important differences between this task and analogous bias-related tasks studied within the NLP community (e.g., classifying microaggressions). Our study establishes a foundation for NLP researchers to contribute timely insights to a problem domain brought to the forefront by recent legislation regarding clinical documentation transparency. We release data, code, and models.}, annote = {[<a href="https://physionet.org/content/stigmatizing-language/1.0.0/"><span class="pub_link">Code and Data</span></a>]}, author = {Keith Harrigian and Ayah Zirikly and Brant Chee and Alya Ahmad and Anne Links and Som Saha and Mary Catherine Beach and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2023-05-02 22:14:59 -0400}, date-modified = {2023-05-02 22:15:20 -0400}, file = {https://aclanthology.org/2023.acl-short.28.pdf}, title = {Characterization of Stigmatizing Language in Medical Records}, year = {2023} } [Code and Data] Widespread disparities in clinical outcomes exist between different demographic groups in the United States. A new line of work in medical sociology has demonstrated physicians often use stigmatizing language in electronic medical records within certain groups, such as black patients, which may exacerbate disparities. In this study, we characterize these instances at scale using a series of domain-informed NLP techniques. We highlight important differences between this task and analogous bias-related tasks studied within the NLP community (e.g., classifying microaggressions). Our study establishes a foundation for NLP researchers to contribute timely insights to a problem domain brought to the forefront by recent legislation regarding clinical documentation transparency. We release data, code, and models.

		Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg. MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies. Association for Computational Linguistics (ACL), 2023. [PDF] [Bibtex] [Close] @inproceedings{Zhang:2023ab, abstract = {Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may ``over-generalize'', in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies.}, author = {Shiyue Zhang and Shijie Wu and Ozan Irsoy and Steven Lu and Mohit Bansal and Mark Dredze and David Rosenberg}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2023-05-02 22:14:10 -0400}, date-modified = {2024-05-23 15:41:14 -0400}, file = {https://aclanthology.org/2023.acl-long.502/}, title = {MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies}, year = {2023} } Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may ``over-generalize'', in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Q, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MixCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies.

		Elizabeth Spaulding, Gary Kazantsev, Mark Dredze. Joint End-to-end Semantic Proto-role Labeling. Association for Computational Linguistics (ACL), 2023. [PDF] [Bibtex] [Close] @inproceedings{Spaulding:2023aa, abstract = {Semantic proto-role labeling (SPRL) assigns properties to arguments based on a series of binary labels. While multiple studies have evaluated various approaches to SPRL, it has only been studied in-depth as a standalone task using gold predicate/argument pairs. How do SPRL systems perform as part of an information extraction pipeline? We model SPRL jointly with predicate-argument extraction using a deep transformer model. We find that proto-role labeling is surprisingly robust in this setting, with only a small decrease when using predicted arguments. We include a detailed analysis of each component of the joint system, and an error analysis to understand correlations in errors between system stages. Finally, we study the effects of annotation errors on SPRL.}, author = {Elizabeth Spaulding and Gary Kazantsev and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2023-05-02 22:13:40 -0400}, date-modified = {2024-05-23 15:40:22 -0400}, file = {https://aclanthology.org/2023.acl-short.63/}, title = {Joint End-to-end Semantic Proto-role Labeling}, year = {2023} } Semantic proto-role labeling (SPRL) assigns properties to arguments based on a series of binary labels. While multiple studies have evaluated various approaches to SPRL, it has only been studied in-depth as a standalone task using gold predicate/argument pairs. How do SPRL systems perform as part of an information extraction pipeline? We model SPRL jointly with predicate-argument extraction using a deep transformer model. We find that proto-role labeling is surprisingly robust in this setting, with only a small decrease when using predicted arguments. We include a detailed analysis of each component of the joint system, and an error analysis to understand correlations in errors between system stages. Finally, we study the effects of annotation errors on SPRL.

		Jingyu Zhang, Alexandra DeLucia, Chenyu Zhang, Mark Dredze. Geo-Seq2seq: Twitter User Geolocation on Noisy Data through Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2023. [PDF] [Bibtex] [Close] @inproceedings{Zhang:2023aa, abstract = {Location information can support social media analyses by providing geographic context. Some of the most accurate and popular Twitter geolocation systems rely on rule-based methods that examine the user-provided profile location, which fail to handle informal or noisy location names. We propose Geo-Seq2seq, a sequence-to-sequence (seq2seq) model for Twitter user geolocation that rewrites noisy, multilingual user-provided location strings into structured English location names. We train our system on tens of millions of multilingual location string and geotagged-tweet pairs. Compared to leading methods, our model vastly increases coverage (i.e., the number of users we can geolocate) while achieving comparable or superior accuracy. Our error analysis reveals that constrained decoding helps the model produce valid locations according to a location database. Finally, we measure biases across language, country of origin, and time to evaluate fairness, and find that while our model can generalize well to unseen temporal data, performance does vary by language and country.}, author = {Jingyu Zhang and Alexandra DeLucia and Chenyu Zhang and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2023-05-02 22:12:41 -0400}, date-modified = {2024-05-23 15:36:20 -0400}, file = {https://aclanthology.org/2023.findings-acl.294/}, title = {Geo-Seq2seq: Twitter User Geolocation on Noisy Data through Sequence to Sequence Learning}, year = {2023} } Location information can support social media analyses by providing geographic context. Some of the most accurate and popular Twitter geolocation systems rely on rule-based methods that examine the user-provided profile location, which fail to handle informal or noisy location names. We propose Geo-Seq2seq, a sequence-to-sequence (seq2seq) model for Twitter user geolocation that rewrites noisy, multilingual user-provided location strings into structured English location names. We train our system on tens of millions of multilingual location string and geotagged-tweet pairs. Compared to leading methods, our model vastly increases coverage (i.e., the number of users we can geolocate) while achieving comparable or superior accuracy. Our error analysis reveals that constrained decoding helps the model produce valid locations according to a location database. Finally, we measure biases across language, country of origin, and time to evaluate fairness, and find that while our model can generalize well to unseen temporal data, performance does vary by language and country.

		John W Ayers, Adam Poliak, Mark Dredze, Eric C Leas, Zechariah Zhu, Jessica B Kelley, Dennis J Faix, Aaron M Goodman, Christopher A Longhurst, Michael Hogarth, Davey M Smith. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Internal Medicine, 2023. [PDF] [Bibtex] [Close] @article{Ayers:2023aa, abstract = {Importance The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians. Objective To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions. Design, Setting, and Participants In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit's r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose ``which response was better'' and judged both ``the quality of information provided'' (very poor, poor, acceptable, good, or very good) and ``the empathy or bedside manner provided'' (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians. Results Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot. Conclusions In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.}, annote = {(<b>Ranked in #550 out of 25m (top 0.003%) research outputs by <a href="https://jamanetwork.altmetric.com/details/146820735#score"><span class="pub_link">Altmetric</span></a></b>; Most viewed JAMA IM article of 2023)}, author = {John W. Ayers and Adam Poliak and Mark Dredze and Eric C. Leas and Zechariah Zhu and Jessica B. Kelley and Dennis J. Faix and Aaron M. Goodman and Christopher A. Longhurst and Michael Hogarth and Davey M. Smith}, date-added = {2023-04-29 23:48:43 -0400}, date-modified = {2023-12-21 00:20:04 -0500}, file = {https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309}, journal = {JAMA Internal Medicine}, title = {Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum}, year = {2023} } (Ranked in #550 out of 25m (top 0.003%) research outputs by Altmetric; Most viewed JAMA IM article of 2023) Importance The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians. Objective To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions. Design, Setting, and Participants In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit's r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose ``which response was better'' and judged both ``the quality of information provided'' (very poor, poor, acceptable, good, or very good) and ``the empathy or bedside manner provided'' (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians. Results Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot. Conclusions In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.

		Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann. BloombergGPT: A Large Language Model for Finance. arXiv 2303.17564, 2023. [PDF] [Bibtex] [Close] @inproceedings{wu2023bloomberggpt, abstract = {The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT.}, annote = {(<b>Ranked in #4033 out of 23m (top 0.02%) research outputs by <a href="https://www.altmetric.com/details/144663962"><span class="pub_link">Altmetric</span></a></b>)}, author = {Shijie Wu and Ozan Irsoy and Steven Lu and Vadim Dabravolski and Mark Dredze and Sebastian Gehrmann and Prabhanjan Kambadur and David Rosenberg and Gideon Mann}, booktitle = {arXiv 2303.17564}, date-added = {2023-04-01 22:41:44 -0400}, date-modified = {2023-05-23 00:41:48 -0400}, eprint = {2303.17564}, file = {https://arxiv.org/abs/2303.17564}, primaryclass = {cs.LG}, title = {BloombergGPT: A Large Language Model for Finance}, year = {2023} } (Ranked in #4033 out of 23m (top 0.02%) research outputs by Altmetric) The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT.

		Alexandra DeLucia, Adam Poliak, Zechariah Zhu, Stephanie R Pitts, Mario Navarro, Sharareh Shojaie, John W Ayers, Mark Dredze. Automated Discovery of Perceived Health-related Concerns about E-cigarettes from Reddit. Annual Meeting of the Society for Research on Nicotine and Tobacco, 2023. [PDF] [Bibtex] [Close] @inproceedings{DeLucia:2023aa, author = {Alexandra DeLucia and Adam Poliak and Zechariah Zhu and Stephanie R. Pitts and Mario Navarro and Sharareh Shojaie and John W. Ayers and Mark Dredze}, booktitle = {Annual Meeting of the Society for Research on Nicotine and Tobacco}, date-added = {2022-11-10 22:14:48 -0800}, date-modified = {2024-05-23 15:35:56 -0400}, file = {https://www.cs.jhu.edu/~aadelucia/assets/research/Health%20Concerns%20SRNT%202023.pdf}, keywords = {abstract}, title = {Automated Discovery of Perceived Health-related Concerns about E-cigarettes from Reddit}, year = {2023} }

		2022 (18 Publications)
		Carlos Aguirre, Mark Dredze, Philip Resnik. Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics. Machine Learning for Health (ML4H) (Extended Abstract), 2022. [PDF] [Bibtex] [Close] @inproceedings{Aguirre:2022aa, abstract = {Stressors have shown to be related to depression, however, this relationship is complex. In this work, we study the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. Open-ended text responses in survey instruments promise more nuanced insights compared to more traditional inquiries, e.g. multiple choice questions, which may be crucial in settings such as in mental health. However, they often require expensive analyzing, such as coding and annotations. Language Models offer solutions for automatically analyzing these, but with many possible risks, such as biases. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression. Further, we analyze stressors finding different trends across demographic groups. Finally, we find that these differences translate to downstream performance differences across demographic groups.}, author = {Carlos Aguirre and Mark Dredze and Philip Resnik}, booktitle = {Machine Learning for Health (ML4H) (Extended Abstract)}, date-added = {2022-11-07 22:23:48 -0500}, date-modified = {2022-11-29 08:46:48 -0500}, file = {https://arxiv.org/abs/2211.07932}, keywords = {workshop}, title = {Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics}, year = {2022} } Stressors have shown to be related to depression, however, this relationship is complex. In this work, we study the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. Open-ended text responses in survey instruments promise more nuanced insights compared to more traditional inquiries, e.g. multiple choice questions, which may be crucial in settings such as in mental health. However, they often require expensive analyzing, such as coding and annotations. Language Models offer solutions for automatically analyzing these, but with many possible risks, such as biases. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression. Further, we analyze stressors finding different trends across demographic groups. Finally, we find that these differences translate to downstream performance differences across demographic groups.

		Moniba Keymanesh, Adrian Benton, Mark Dredze. What Makes Data-to-Text Generation Hard for Pretrained Language Models? EMNLP Workshop on Generation, Evaluation & Metrics (GEM), 2022. [PDF] [Bibtex] [Close] @inproceedings{Keymanesh:2023aa, abstract = {Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models (PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data. On the other hand, while auto-regressive PLMs can generalize from a few task examples, their efficacy at D2T is largely unexplored. Furthermore, we have an incomplete understanding of the limits of PLMs on D2T. In this work, we conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset. We consider their performance as a function of the amount of task-specific data and how the data is incorporated into the models: zero and few-shot learning, and fine-tuning of model weights. In addition, we probe the limits of PLMs by measuring performance on subsets of the evaluation data: novel predicates and abstractive test examples. To improve the performance on these subsets, we investigate two techniques: providing predicate descriptions in the context and re-ranking generated candidates by information reflected in the source. Finally, we conduct a human evaluation of model errors and show that D2T generation tasks would benefit from datasets with more careful manual curation.}, author = {Moniba Keymanesh and Adrian Benton and Mark Dredze}, booktitle = {EMNLP Workshop on Generation, Evaluation & Metrics (GEM)}, date-added = {2022-10-30 01:31:54 -0500}, date-modified = {2022-12-12 21:58:17 -0500}, file = {https://arxiv.org/abs/2205.11505}, keywords = {workshop}, title = {What Makes Data-to-Text Generation Hard for Pretrained Language Models?}, year = {2022} } Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models (PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data. On the other hand, while auto-regressive PLMs can generalize from a few task examples, their efficacy at D2T is largely unexplored. Furthermore, we have an incomplete understanding of the limits of PLMs on D2T. In this work, we conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset. We consider their performance as a function of the amount of task-specific data and how the data is incorporated into the models: zero and few-shot learning, and fine-tuning of model weights. In addition, we probe the limits of PLMs by measuring performance on subsets of the evaluation data: novel predicates and abstractive test examples. To improve the performance on these subsets, we investigate two techniques: providing predicate descriptions in the context and re-ranking generated candidates by information reflected in the source. Finally, we conduct a human evaluation of model errors and show that D2T generation tasks would benefit from datasets with more careful manual curation.

		Elliot Schumacher, James Mayfield, Mark Dredze. Zero-shot Cross-Language Transfer of Monolingual Entity Linking Models. EMNLP Workshop on Multilingual Representation Learning, 2022. [PDF] [Bibtex] [Close] @inproceedings{Schumacher:2022aa, abstract = {Most entity linking systems, whether mono or multilingual, link mentions to a single English knowledge base. Few have considered linking non-English text to a non-English KB, and therefore, transferring an English entity linking model to both a new document and KB language. We consider the task of zero-shot cross-language transfer of entity linking systems to a new language and KB. We find that a system trained with multilingual representations does reasonably well, and propose improvements to system training that lead to improved recall in most datasets, often matching the in-language performance. We further conduct a detailed evaluation to elucidate the challenges of this setting.}, annote = {(<b>Best Paper Award Honorable Mention</b>)}, author = {Elliot Schumacher and James Mayfield and Mark Dredze}, booktitle = {EMNLP Workshop on Multilingual Representation Learning}, date-added = {2022-10-25 07:23:56 -0400}, date-modified = {2022-12-20 00:39:34 -0500}, file = {2022_replearn_entity_linking.pdf}, keywords = {workshop}, title = {Zero-shot Cross-Language Transfer of Monolingual Entity Linking Models}, year = {2022} } (Best Paper Award Honorable Mention) Most entity linking systems, whether mono or multilingual, link mentions to a single English knowledge base. Few have considered linking non-English text to a non-English KB, and therefore, transferring an English entity linking model to both a new document and KB language. We consider the task of zero-shot cross-language transfer of entity linking systems to a new language and KB. We find that a system trained with multilingual representations does reasonably well, and propose improvements to system training that lead to improved recall in most datasets, often matching the in-language performance. We further conduct a detailed evaluation to elucidate the challenges of this setting.

		David Mueller, Mark Dredze, Nicholas Andrews. The Importance of Temperature in Multi-Task Optimization. NeurIPS Workshop on Optimization for Machine Learning (OPT), 2022. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2022aa, abstract = {The promise of multi-task learning is that optimizing a single model on multiple related tasks will lead to a better solution for all tasks than independently trained models. In practice, optimization difficulties, such as conflicting gradients, can result in negative transfer, where multi-task models which perform worse than single-task models. In this work, we identify the optimization temperature---the ratio of learning rate to batch size---as a key factor in negative transfer. Temperature controls the level of noise in each optimization step, which prior work has shown to have a strong correlation with generalization. We demonstrate that, in some multi-task settings, negative transfer may arise due to poorly set optimization temperature, rather than inherently high task conflict. The implication of this finding is that in some settings, SGD with a carefully controlled temperature achieves comparable, and in some cases superior, performance to that of specialized optimization procedures such as PCGrad, MGDA, and GradNorm. In particular, our results suggest that the significant additional computational burden of these specialized methods may not always be necessary. Finally, we observe a conflict between the optimal temperatures of different tasks in a multi-task objective, with different levels of noise promoting better generalization for different tasks. Our work suggests the need for novel multi-task optimization methods which consider individual task noise-levels, and their impact on generalization.}, author = {David Mueller and Mark Dredze and Nicholas Andrews}, booktitle = {NeurIPS Workshop on Optimization for Machine Learning (OPT)}, date-added = {2022-10-20 14:24:24 -0400}, date-modified = {2022-12-12 21:53:55 -0500}, file = {2022_opt_temperature.pdf}, keywords = {workshop}, title = {The Importance of Temperature in Multi-Task Optimization}, year = {2022} } The promise of multi-task learning is that optimizing a single model on multiple related tasks will lead to a better solution for all tasks than independently trained models. In practice, optimization difficulties, such as conflicting gradients, can result in negative transfer, where multi-task models which perform worse than single-task models. In this work, we identify the optimization temperature---the ratio of learning rate to batch size---as a key factor in negative transfer. Temperature controls the level of noise in each optimization step, which prior work has shown to have a strong correlation with generalization. We demonstrate that, in some multi-task settings, negative transfer may arise due to poorly set optimization temperature, rather than inherently high task conflict. The implication of this finding is that in some settings, SGD with a carefully controlled temperature achieves comparable, and in some cases superior, performance to that of specialized optimization procedures such as PCGrad, MGDA, and GradNorm. In particular, our results suggest that the significant additional computational burden of these specialized methods may not always be necessary. Finally, we observe a conflict between the optimal temperatures of different tasks in a multi-task objective, with different levels of noise promoting better generalization for different tasks. Our work suggests the need for novel multi-task optimization methods which consider individual task noise-levels, and their impact on generalization.

		Zach Wood-Doughty, Ilya Shpitser, Mark Dredze. Generating Synthetic Text Data to Evaluate Causal Inference Methods. American Causal Inference Conference, 2022. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2022aa, abstract = {Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.}, author = {Zach Wood-Doughty and Ilya Shpitser and Mark Dredze}, booktitle = {American Causal Inference Conference}, date-added = {2022-10-14 10:50:47 -0400}, date-modified = {2022-10-21 13:37:49 -0400}, file = {https://arxiv.org/abs/2102.05638}, keywords = {abstract}, title = {Generating Synthetic Text Data to Evaluate Causal Inference Methods}, year = {2022} } Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.

		Alexandra DeLucia, Shijie Wu, Aaron Mueller, Carlos Aguirre, Philip Resnik, Mark Dredze. Bernice: A Multilingual Pre-trained Encoder for Twitter. Empirical Methods in Natural Language Processing (EMNLP), 2022. [PDF] [Bibtex] [Close] @inproceedings{DeLucia:2022vt, abstract = {The language of Twitter differs significantly from that of other domains commonly included in large language model training. While tweets are typically multilingual and contain informal language, including emoji and hashtags, most pre-trained language models for Twitter are either monolingual, adapted from other domains rather than trained exclusively on Twitter, or are trained on a limited amount of in-domain Twitter data. We introduce Bernice, the first multilingual RoBERTa language model trained from scratch on 2.5 billion tweets with a custom tweet-focused tokenizer. We evaluate on a variety of monolingual and multilingual Twitter benchmarks, finding that our model consistently exceeds or matches the performance of a variety of models adapted to social media data as well as strong multilingual baselines, despite being trained on less data overall. We posit that it is more efficient compute- and data-wise to train completely on in-domain data with a specialized domain-specific tokenizer.}, author = {Alexandra DeLucia and Shijie Wu and Aaron Mueller and Carlos Aguirre and Philip Resnik and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2022-10-06 08:02:59 -0400}, date-modified = {2022-12-12 21:58:55 -0500}, file = {2022_emnlp_bernice.pdf}, title = {Bernice: A Multilingual Pre-trained Encoder for Twitter}, year = {2022} } The language of Twitter differs significantly from that of other domains commonly included in large language model training. While tweets are typically multilingual and contain informal language, including emoji and hashtags, most pre-trained language models for Twitter are either monolingual, adapted from other domains rather than trained exclusively on Twitter, or are trained on a limited amount of in-domain Twitter data. We introduce Bernice, the first multilingual RoBERTa language model trained from scratch on 2.5 billion tweets with a custom tweet-focused tokenizer. We evaluate on a variety of monolingual and multilingual Twitter benchmarks, finding that our model consistently exceeds or matches the performance of a variety of models adapted to social media data as well as strong multilingual baselines, despite being trained on less data overall. We posit that it is more efficient compute- and data-wise to train completely on in-domain data with a specialized domain-specific tokenizer.

		David Mueller, Nicholas Andrews, Mark Dredze. Do Text-to-Text Multi-Task Learners Suffer from Task Conflict? Findings of the Empirical Methods in Natural Language Processing (EMNLP), 2022. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2022wv, abstract = {Traditional multi-task learning architectures train a single model across multiple tasks through a shared encoder followed by taskspecific decoders. Learning these models often requires specialized training algorithms that address task-conflict in the shared parameter updates, which otherwise can lead to negative transfer. A new type of multi-task learning within NLP homogenizes multi-task architectures as a shared encoder and language model decoder, which does surprisingly well across a range of diverse tasks (Raffel et al., 2020). Does this new architecture suffer from taskconflicts that require specialized training algorithms? We study how certain factors in the shift towards text-to-text models affects multitask conflict and negative transfer, finding that both directional conflict and transfer are surprisingly constant across architectures.}, author = {David Mueller and Nicholas Andrews and Mark Dredze}, booktitle = {Findings of the Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2022-10-06 08:01:49 -0400}, date-modified = {2022-12-12 22:00:04 -0500}, file = {2022_emnlp_text-to-text.pdf}, title = {Do Text-to-Text Multi-Task Learners Suffer from Task Conflict?}, year = {2022} } Traditional multi-task learning architectures train a single model across multiple tasks through a shared encoder followed by taskspecific decoders. Learning these models often requires specialized training algorithms that address task-conflict in the shared parameter updates, which otherwise can lead to negative transfer. A new type of multi-task learning within NLP homogenizes multi-task architectures as a shared encoder and language model decoder, which does surprisingly well across a range of diverse tasks (Raffel et al., 2020). Does this new architecture suffer from taskconflicts that require specialized training algorithms? We study how certain factors in the shift towards text-to-text models affects multitask conflict and negative transfer, finding that both directional conflict and transfer are surprisingly constant across architectures.

		Jingyu Zhang, Alexandra DeLucia, Mark Dredze. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. EMNLP Workshop on Noisy User-generated Text (W-NUT), 2022. [PDF] [Bibtex] [Close] @inproceedings{Zhang:2022tc, abstract = {Researchers across disciplines use Twitter geolocation tools to filter data for desired locations. These tools have largely been trained and tested on English tweets, often originating in the United States from almost a decade ago. Despite the importance of these tools for data curation, the impact of tweet language, country of origin, and creation date on tool performance remains largely unknown. We explore these issues with Carmen, a popular tool for Twitter geolocation. To support this study we introduce Carmen 2.0, a major update which includes the incorporation of GeoNames, a gazetteer that provides much broader coverage of locations. We evaluate using two new Twitter datasets, one for multilingual, multiyear geolocation evaluation, and another for usage trends over time. We found that language, country origin, and time does impact geolocation tool performance.}, author = {Jingyu Zhang and Alexandra DeLucia and Mark Dredze}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2022-09-13 09:29:16 -0400}, date-modified = {2022-12-12 22:00:50 -0500}, file = {https://aclanthology.org/2022.wnut-1.1/}, keywords = {workshop}, title = {Changes in Tweet Geolocation over Time: A Study with Carmen 2.0}, year = {2022} } Researchers across disciplines use Twitter geolocation tools to filter data for desired locations. These tools have largely been trained and tested on English tweets, often originating in the United States from almost a decade ago. Despite the importance of these tools for data curation, the impact of tweet language, country of origin, and creation date on tool performance remains largely unknown. We explore these issues with Carmen, a popular tool for Twitter geolocation. To support this study we introduce Carmen 2.0, a major update which includes the incorporation of GeoNames, a gazetteer that provides much broader coverage of locations. We evaluate using two new Twitter datasets, one for multilingual, multiyear geolocation evaluation, and another for usage trends over time. We found that language, country origin, and time does impact geolocation tool performance.

		Keith Harrigian, Mark Dredze. Then and Now: Quantifying the Longitudinal Validity of Self-Disclosed Depression Diagnoses. NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych), 2022. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2022ti, abstract = {Self-disclosed mental health diagnoses, which serve as ground truth annotations of mental health status in the absence of clinical measures, underpin the conclusions behind most computational studies of mental health language from the last decade. However, psychiatric conditions are dynamic; a prior depression diagnosis may no longer be indicative of an individual's mental health, either due to treatment or other mitigating factors. We ask: to what extent are self-disclosures of mental health diagnoses actually relevant over time? We analyze recent activity from individuals who disclosed a depression diagnosis on social media over five years ago and, in turn, acquire a new understanding of how presentations of mental health status on social media manifest longitudinally. We also provide expanded evidence for the presence of personality-related biases in datasets curated using self-disclosed diagnoses. Our findings motivate three practical recommendations for improving mental health datasets curated using self-disclosed diagnoses: 1) Annotate diagnosis dates and psychiatric comorbidities; 2) Sample control groups using propensity score matching; 3) Identify and remove spurious correlations introduced by selection bias.}, author = {Keith Harrigian and Mark Dredze}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych)}, date-added = {2022-05-20 03:03:25 -0400}, date-modified = {2022-06-22 20:50:33 -0400}, file = {https://arxiv.org/abs/2206.11155}, keywords = {workshop}, title = {Then and Now: Quantifying the Longitudinal Validity of Self-Disclosed Depression Diagnoses}, year = {2022} } Self-disclosed mental health diagnoses, which serve as ground truth annotations of mental health status in the absence of clinical measures, underpin the conclusions behind most computational studies of mental health language from the last decade. However, psychiatric conditions are dynamic; a prior depression diagnosis may no longer be indicative of an individual's mental health, either due to treatment or other mitigating factors. We ask: to what extent are self-disclosures of mental health diagnoses actually relevant over time? We analyze recent activity from individuals who disclosed a depression diagnosis on social media over five years ago and, in turn, acquire a new understanding of how presentations of mental health status on social media manifest longitudinally. We also provide expanded evidence for the presence of personality-related biases in datasets curated using self-disclosed diagnoses. Our findings motivate three practical recommendations for improving mental health datasets curated using self-disclosed diagnoses: 1) Annotate diagnosis dates and psychiatric comorbidities; 2) Sample control groups using propensity score matching; 3) Identify and remove spurious correlations introduced by selection bias.

		Ayah Zirikly, Mark Dredze. Explaining Models of Mental Health via Clinically Grounded Auxiliary Tasks. NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych), 2022. [PDF] [Bibtex] [Close] @inproceedings{Zirikly:2022tj, abstract = {Models of mental health based on natural language processing can uncover latent signals of mental health from language. Models that indicate whether an individual is depressed, or has other mental health conditions, can aid in diagnosis and treatment. A critical aspect of integration of these models into the clinical setting relies on explaining their behavior to domain experts. In the case of mental health diagnosis, clinicians already rely on an assessment framework to make these decisions; that framework can help a model generate meaningful explanations.In this work we propose to use PHQ-9 categories as an auxiliary task to explaining a social media based model of depression. We develop a multi-task learning framework that predicts both depression and PHQ-9 categories as auxiliary tasks. We compare the quality of explanations generated based on the depression task only, versus those that use the predicted PHQ-9 categories. We find that by relying on clinically meaningful auxiliary tasks, we produce more meaningful explanations.}, author = {Ayah Zirikly and Mark Dredze}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych)}, date-added = {2022-05-20 03:02:33 -0400}, date-modified = {2022-10-06 08:14:26 -0400}, file = {https://aclanthology.org/2022.clpsych-1.3/}, keywords = {workshop}, title = {Explaining Models of Mental Health via Clinically Grounded Auxiliary Tasks}, year = {2022} } Models of mental health based on natural language processing can uncover latent signals of mental health from language. Models that indicate whether an individual is depressed, or has other mental health conditions, can aid in diagnosis and treatment. A critical aspect of integration of these models into the clinical setting relies on explaining their behavior to domain experts. In the case of mental health diagnosis, clinicians already rely on an assessment framework to make these decisions; that framework can help a model generate meaningful explanations.In this work we propose to use PHQ-9 categories as an auxiliary task to explaining a social media based model of depression. We develop a multi-task learning framework that predicts both depression and PHQ-9 categories as auxiliary tasks. We compare the quality of explanations generated based on the depression task only, versus those that use the predicted PHQ-9 categories. We find that by relying on clinically meaningful auxiliary tasks, we produce more meaningful explanations.

		Keith Harrigian, Mark Dredze. The Problem of Semantic Shift in Longitudinal Monitoring of Social Media: A Case Study on Mental Health During the COVID-19 Pandemic. Conference on Web Science (WebSci), 2022. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2022wk, abstract = {Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.}, author = {Keith Harrigian and Mark Dredze}, booktitle = {Conference on Web Science (WebSci)}, date-added = {2022-03-30 11:12:44 -0400}, date-modified = {2022-10-06 08:08:53 -0400}, file = {https://arxiv.org/abs/2206.11160}, title = {The Problem of Semantic Shift in Longitudinal Monitoring of Social Media: A Case Study on Mental Health During the COVID-19 Pandemic}, year = {2022} } Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.

		Zach Wood-Doughty, Isabel Cachola, Mark Dredze. Model Distillation for Faithful Explanations of Medical Code Predictions. ACL Workshop on Biomedical Natural Language Processing (BioNLP), 2022. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2022vw, abstract = {Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Models should ideally be both accurate and simple. 2) Explanations must balance faithfulness to the model's decision-making with their plausibility to a domain expert. We propose to use knowledge distillation, or training a student model that mimics the behavior of a trained teacher model, as a technique to generate faithful and plausible explanations. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that the student model is faithful to the teacher model's behavior and produces quality natural language explanations.}, author = {Zach Wood-Doughty and Isabel Cachola and Mark Dredze}, booktitle = {ACL Workshop on Biomedical Natural Language Processing (BioNLP)}, date-added = {2022-03-28 21:41:15 -0400}, date-modified = {2022-10-06 08:09:25 -0400}, file = {https://aclanthology.org/2022.bionlp-1.41/}, keywords = {workshop}, title = {Model Distillation for Faithful Explanations of Medical Code Predictions}, year = {2022} } Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Models should ideally be both accurate and simple. 2) Explanations must balance faithfulness to the model's decision-making with their plausibility to a domain expert. We propose to use knowledge distillation, or training a student model that mimics the behavior of a trained teacher model, as a technique to generate faithful and plausible explanations. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that the student model is faithful to the teacher model's behavior and produces quality natural language explanations.

		Shijie Wu, Benjamin Van Durme, Mark Dredze. Zero-shot Cross-lingual Transfer is Under-specified Optimization. ACL Workshop on Representation Learning for NLP (RepL4NLP), 2022. [PDF] [Bibtex] [Close] @inproceedings{Wu:2022wq, abstract = {Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.}, author = {Shijie Wu and Benjamin Van Durme and Mark Dredze}, booktitle = {ACL Workshop on Representation Learning for NLP (RepL4NLP)}, date-added = {2022-03-28 10:10:05 -0400}, date-modified = {2022-10-06 08:14:41 -0400}, file = {https://aclanthology.org/2022.repl4nlp-1.25/}, keywords = {workshop}, title = {Zero-shot Cross-lingual Transfer is Under-specified Optimization}, year = {2022} } Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.

		Xiaolei Huang, Franck Dernoncourt, Mark Dredze. Enriching Unsupervised User Embedding via Medical Concepts. Proceedings of the Conference on Health, Inference, and Learning, 2022. [PDF] [Bibtex] [Close] @inproceedings{pmlr-v174-huang22a, abstract = {Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing \textit{unsupervised} approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.}, author = {Xiaolei Huang and Franck Dernoncourt and Mark Dredze}, booktitle = {Proceedings of the Conference on Health, Inference, and Learning}, date-modified = {2022-10-06 08:10:38 -0400}, editor = {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan}, file = {https://proceedings.mlr.press/v174/huang22a/huang22a.pdf}, month = {07--08 Apr}, pages = {63--78}, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, title = {Enriching Unsupervised User Embedding via Medical Concepts}, volume = {174}, year = {2022}, bdsk-url-1 = {https://proceedings.mlr.press/v174/huang22a.html} } Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing \textitunsupervised approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.

		Joshua Dredze, Lisi Dredze, Mark Dredze. Interest in Teletherapy during the COVID-19 Pandemic as Measured by Google Search Trends. Association for Psychological Science Annual Convention (APS), 2022. [Bibtex] [Close] @inproceedings{Dredze:2022tl, author = {Joshua Dredze and Lisi Dredze and Mark Dredze}, booktitle = {Association for Psychological Science Annual Convention (APS)}, date-added = {2022-03-11 01:51:30 -0500}, date-modified = {2022-03-11 01:53:44 -0500}, keywords = {abstract}, title = {Interest in Teletherapy during the COVID-19 Pandemic as Measured by Google Search Trends}, year = {2022} }

		Sheena Panthaplackel, Adrian Benton, Mark Dredze. Updated Headline Generation: Creating Updated Summaries for Evolving News Stories. Association for Computational Linguistics (ACL), 2022. [PDF] [Bibtex] [Close] @inproceedings{Panthaplackel:2022wc, abstract = {We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, and modify the existing headline to reflect this. We introduce a new dataset for this task by automatically identifying contiguous article versions that are likely to require a substantive headline update from the NewsEdits corpus (Spangher and May, 2021). We find that models conditioned on the prior headline and body revisions produce headlines judged by humans to be as factual as gold headlines while making fewer unnecessary edits compared to a standard headline generation model. Our experiments establish benchmarks for this new contextual summarization task.}, author = {Sheena Panthaplackel and Adrian Benton and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2022-02-24 09:54:21 -0500}, date-modified = {2022-08-05 16:59:31 -0400}, file = {https://aclanthology.org/2022.acl-long.446/}, title = {Updated Headline Generation: Creating Updated Summaries for Evolving News Stories}, year = {2022} } We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, and modify the existing headline to reflect this. We introduce a new dataset for this task by automatically identifying contiguous article versions that are likely to require a substantive headline update from the NewsEdits corpus (Spangher and May, 2021). We find that models conditioned on the prior headline and body revisions produce headlines judged by humans to be as factual as gold headlines while making fewer unnecessary edits compared to a standard headline generation model. Our experiments establish benchmarks for this new contextual summarization task.

		Adam Poliak, Paiheng Xu, Eric Leas, Mario Navarro, Stephanie Pitts, Andie Malterud, John W Ayers, Mark Dredze. A Machine Learning Approach For Discovering Tobacco Brands, Products, and Manufacturers in the United States. Annual Meeting of the Society for Research on Nicotine and Tobacco, 2022. [Bibtex] [Close] @inproceedings{Poliak:2022wa, author = {Adam Poliak and Paiheng Xu and Eric Leas and Mario Navarro and Stephanie Pitts and Andie Malterud and John W Ayers and Mark Dredze}, booktitle = {Annual Meeting of the Society for Research on Nicotine and Tobacco}, date-added = {2022-02-11 10:19:54 -0500}, date-modified = {2022-02-11 10:21:15 -0500}, keywords = {abstract}, title = {A Machine Learning Approach For Discovering Tobacco Brands, Products, and Manufacturers in the United States}, year = {2022} }

		David A Broniatowski, Daniel Kerchne, Fouzia Farooq, Xiaolei Huang, Amelia M Jamison, Mark Dredze, Sandra Crouse Quinn. Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics. PLoS ONE, 2022. [PDF] [Bibtex] [Close] @article{Broniatowski:2022we, abstract = {The COVID-19 pandemic brought widespread attention to an ``infodemic'' of potential health misinformation. This claim has not been assessed based on evidence. We evaluated if health misinformation became more common during the pandemic. We gathered about 325 million posts sharing URLs from Twitter and Facebook during the beginning of the pandemic (March 8-May 1, 2020) compared to the same period in 2019. We relied on source credibility as an accepted proxy for misinformation across this database. Human annotators also coded a subsample of 3000 posts with URLs for misinformation. Posts about COVID-19 were 0.37 times as likely to link to ``not credible'' sources and 1.13 times more likely to link to ``more credible'' sources than prior to the pandemic. Posts linking to ``not credible'' sources were 3.67 times more likely to include misinformation compared to posts from ``more credible'' sources. Thus, during the earliest stages of the pandemic, when claims of an infodemic emerged, social media contained proportionally less misinformation than expected based on the prior year. Our results suggest that widespread health misinformation is not unique to COVID-19. Rather, it is a systemic feature of online health communication that can adversely impact public health behaviors and must therefore be addressed.}, author = {David A. Broniatowski and Daniel Kerchne and Fouzia Farooq and Xiaolei Huang and Amelia M. Jamison and Mark Dredze and Sandra Crouse Quinn}, date-added = {2022-01-03 14:23:06 -0500}, date-modified = {2022-01-12 16:08:07 -0500}, file = {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0261768}, journal = {PLoS ONE}, title = {Twitter and Facebook posts about COVID-19 are less likely to spread misinformation compared to other health topics}, year = {2022} } The COVID-19 pandemic brought widespread attention to an ``infodemic'' of potential health misinformation. This claim has not been assessed based on evidence. We evaluated if health misinformation became more common during the pandemic. We gathered about 325 million posts sharing URLs from Twitter and Facebook during the beginning of the pandemic (March 8-May 1, 2020) compared to the same period in 2019. We relied on source credibility as an accepted proxy for misinformation across this database. Human annotators also coded a subsample of 3000 posts with URLs for misinformation. Posts about COVID-19 were 0.37 times as likely to link to ``not credible'' sources and 1.13 times more likely to link to ``more credible'' sources than prior to the pandemic. Posts linking to ``not credible'' sources were 3.67 times more likely to include misinformation compared to posts from ``more credible'' sources. Thus, during the earliest stages of the pandemic, when claims of an infodemic emerged, social media contained proportionally less misinformation than expected based on the prior year. Our results suggest that widespread health misinformation is not unique to COVID-19. Rather, it is a systemic feature of online health communication that can adversely impact public health behaviors and must therefore be addressed.

		2021 (17 Publications)
		Anna L Buczak, Benjamin D Baugher, Christine S Martin, Meg W Keiley-Listermann, James Howard II, Nathan H Parrish, Anton Q Stalick, Daniel S Berman, Mark H Dredze. Crystal Cube: Forecasting Disruptive Events. Applied Artificial Intelligence, 2021;0(0):1-24. [PDF] [Bibtex] [Close] @article{doi:10.1080/08839514.2021.2001179, abstract = {Disruptive events within a country can have global repercussions, creating a need for the anticipation and planning of these events. Crystal Cube (CC) is a novel approach to forecasting disruptive political events at least one month into the future. The system uses a recurrent neural network and a novel measure of event similarity between past and current events. We also introduce the innovative Thermometer of Irregular Leadership Change (ILC). We present an evaluation of CC in predicting ILC for 167 countries and show promising results in forecasting events one to twelve months in advance. We compare CC results with results using a random forest as well as previous work. }, author = {Anna L. Buczak and Benjamin D. Baugher and Christine S. Martin and Meg W. Keiley-Listermann and James Howard II and Nathan H. Parrish and Anton Q. Stalick and Daniel S. Berman and Mark H. Dredze}, date-added = {2021-11-16 16:59:03 -0500}, date-modified = {2022-10-06 08:11:30 -0400}, doi = {10.1080/08839514.2021.2001179}, eprint = {https://doi.org/10.1080/08839514.2021.2001179}, file = {https://doi.org/10.1080/08839514.2021.2001179}, journal = {Applied Artificial Intelligence}, number = {0}, pages = {1-24}, publisher = {Taylor & Francis}, title = {Crystal Cube: Forecasting Disruptive Events}, volume = {0}, year = {2021}, bdsk-url-1 = {https://doi.org/10.1080/08839514.2021.2001179} } Disruptive events within a country can have global repercussions, creating a need for the anticipation and planning of these events. Crystal Cube (CC) is a novel approach to forecasting disruptive political events at least one month into the future. The system uses a recurrent neural network and a novel measure of event similarity between past and current events. We also introduce the innovative Thermometer of Irregular Leadership Change (ILC). We present an evaluation of CC in predicting ILC for 167 countries and show promising results in forecasting events one to twelve months in advance. We compare CC results with results using a random forest as well as previous work.

		Zach Wood-Doughty, Isabel Cachola, Mark Dredze. Proxy Model Explanations for Time Series RNNs. IEEE International Conference on Machine Learning and Applications (ICMLA), 2021. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2021uk, abstract = {While machine learning models can produce accurate predictions of complex real-world phenomena, domain experts may be unwilling to trust such a prediction without an explanation of the model's behavior. This concern has motivated widespread research and produced many methods for interpreting black-box models. Many such methods explain predictions one-by-one, which can be slow and inconsistent across a large dataset, and ill-suited for time series applications. We introduce a proxy model approach that is fast to train, faithful to the original model, and globally consistent in its explanations. We compare our approach to several previous methods and find both that methods disagree with one another and that our approach improves over existing methods in an application to political event forecasting.}, author = {Zach Wood-Doughty and Isabel Cachola and Mark Dredze}, booktitle = {IEEE International Conference on Machine Learning and Applications (ICMLA)}, date-added = {2021-09-19 13:08:58 -0400}, date-modified = {2021-09-19 13:10:16 -0400}, file = {2021_icmla_proxy.pdf}, title = {Proxy Model Explanations for Time Series RNNs}, year = {2021} } While machine learning models can produce accurate predictions of complex real-world phenomena, domain experts may be unwilling to trust such a prediction without an explanation of the model's behavior. This concern has motivated widespread research and produced many methods for interpreting black-box models. Many such methods explain predictions one-by-one, which can be slow and inconsistent across a large dataset, and ill-suited for time series applications. We introduce a proxy model approach that is fast to train, faithful to the original model, and globally consistent in its explanations. We compare our approach to several previous methods and find both that methods disagree with one another and that our approach improves over existing methods in an application to political event forecasting.

		Abhinav Chinta, Jingyu Zhang, Alexandra DeLucia, Mark Dredze, Anna L Buczak. Study of Manifestation of Civil Unrest on Twitter. EMNLP Workshop on Noisy User-generated Text (W-NUT), 2021. [PDF] [Bibtex] [Close] @inproceedings{Chinta:2021tb, abstract = {Twitter is commonly used for civil unrest detection and forecasting tasks, but there is a lack of work in evaluating how civil unrest manifests on Twitter across countries and events. We present two in-depth case studies for two specific large-scale events, one in a country with high (English) Twitter usage (Johannesburg riots in South Africa) and one in a country with low Twitter usage (Burayu massacre protests in Ethiopia). We show that while there is event signal during the events, there is little signal leading up to the events. In addition to the case studies, we train Ngram-based models on a larger set of Twitter civil unrest data across time, events, and countries and use machine learning explainability tools (SHAP) to identify important features. The models were able to find words indicative of civil unrest that generalized across countries. The 42 countries span Africa, Middle East, and Southeast Asia and the events range occur between 2014 and 2019.}, author = {Abhinav Chinta and Jingyu Zhang and Alexandra DeLucia and Mark Dredze and Anna L. Buczak}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2021-09-14 18:55:41 -0400}, date-modified = {2022-10-06 08:12:02 -0400}, file = {https://aclanthology.org/2021.wnut-1.44/}, keywords = {workshop}, title = {Study of Manifestation of Civil Unrest on Twitter}, year = {2021} } Twitter is commonly used for civil unrest detection and forecasting tasks, but there is a lack of work in evaluating how civil unrest manifests on Twitter across countries and events. We present two in-depth case studies for two specific large-scale events, one in a country with high (English) Twitter usage (Johannesburg riots in South Africa) and one in a country with low Twitter usage (Burayu massacre protests in Ethiopia). We show that while there is event signal during the events, there is little signal leading up to the events. In addition to the case studies, we train Ngram-based models on a larger set of Twitter civil unrest data across time, events, and countries and use machine learning explainability tools (SHAP) to identify important features. The models were able to find words indicative of civil unrest that generalized across countries. The 42 countries span Africa, Middle East, and Southeast Asia and the events range occur between 2014 and 2019.

		Mahsa Yarmohammadi, Shijie Wu, Marc Marone, Haoran Xu, Seth Ebner, Guanghui Qin, Yunmo Chen, Jialiang Guo, Craig Harman, Kenton Murray, Aaron Steven White, Mark Dredze, Benjamin Van Durme. Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction. Empirical Methods in Natural Language Processing (EMNLP), 2021. [PDF] [Bibtex] [Close] @inproceedings{Yarmohammadi:2021ts, abstract = {Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of ``train on English, run on any language'', we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.}, author = {Mahsa Yarmohammadi and Shijie Wu and Marc Marone and Haoran Xu and Seth Ebner and Guanghui Qin and Yunmo Chen and Jialiang Guo and Craig Harman and Kenton Murray and Aaron Steven White and Mark Dredze and Benjamin Van Durme}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2021-08-26 11:11:44 -0400}, date-modified = {2022-10-06 08:14:59 -0400}, file = {https://aclanthology.org/2021.emnlp-main.149/}, title = {Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction}, year = {2021} } Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of ``train on English, run on any language'', we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.

		John W Ayers, Brian Chu, Zechariah Zhu, Eric C Leas, Davey M Smith, Mark Dredze, David A Broniatowski. Spread of Misinformation About Face Masks and COVID-19 by Automated Software on Facebook. JAMA Internal Medicine, 2021. [PDF] [Bibtex] [Close] @article{Ayers:2021wp, abstract = {The dangers of misinformation spreading on social media during the COVID-19 pandemic are known. However, software that allows individuals to generate automated content and share it via counterfeit accounts (or ``bots'') to amplify misinformation has been overlooked, including how automated software can be used to disseminate original research while undermining scientific communication. We analyzed conversations on public Facebook groups, a platform known to be susceptible to automated misinformation, concerning the publication of the Danish Study to Assess Face Masks for the Protection Against COVID-19 Infection (DANMASK-19) to explore automated misinformation. We selected DANMASK-19 because it was widely discussed (it was the fifth most shared research article of all time as of March 2021 according to Altmetric5) and demonstrated that masks are an important public health measure to control the pandemic.}, annote = {(<b>Ranked in the top 0.1% of 18m research outputs by <a href="https://jamanetwork.altmetric.com/details/107234596#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W. Ayers and Brian Chu and Zechariah Zhu and Eric C. Leas and Davey M. Smith and Mark Dredze and David A. Broniatowski}, date-added = {2021-06-07 12:28:56 -0400}, date-modified = {2021-06-07 12:29:52 -0400}, file = {https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2780748}, journal = {JAMA Internal Medicine}, title = {Spread of Misinformation About Face Masks and COVID-19 by Automated Software on Facebook}, year = {2021} } (Ranked in the top 0.1% of 18m research outputs by Altmetric) The dangers of misinformation spreading on social media during the COVID-19 pandemic are known. However, software that allows individuals to generate automated content and share it via counterfeit accounts (or ``bots'') to amplify misinformation has been overlooked, including how automated software can be used to disseminate original research while undermining scientific communication. We analyzed conversations on public Facebook groups, a platform known to be susceptible to automated misinformation, concerning the publication of the Danish Study to Assess Face Masks for the Protection Against COVID-19 Infection (DANMASK-19) to explore automated misinformation. We selected DANMASK-19 because it was widely discussed (it was the fifth most shared research article of all time as of March 2021 according to Altmetric5) and demonstrated that masks are an important public health measure to control the pandemic.

		Elliot Schumacher, James Mayfield, Mark Dredze. Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking. Findings of the Association for Computational Linguistics (ACL), 2021. [PDF] [Bibtex] [Close] @inproceedings{Schumacher:2021us, abstract = {Cross-language entity linking grounds mentions in multiple languages to a single-language knowledge base. We propose a neural ranking architecture for this task that uses multilingual BERT representations of the mention and the context in a neural network. We find that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings. Furthermore, we explore zero-shot language transfer and find surprisingly robust performance. We investigate the zero-shot degradation and find that it can be partially mitigated by a proposed auxiliary training objective, but that the remaining error can best be attributed to domain shift rather than language transfer.}, author = {Elliot Schumacher and James Mayfield and Mark Dredze}, booktitle = {Findings of the Association for Computational Linguistics (ACL)}, date-added = {2021-05-06 09:10:52 -0400}, date-modified = {2021-05-06 09:11:36 -0400}, file = {https://arxiv.org/abs/2010.09828}, title = {Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking}, year = {2021} } Cross-language entity linking grounds mentions in multiple languages to a single-language knowledge base. We propose a neural ranking architecture for this task that uses multilingual BERT representations of the mention and the context in a neural network. We find that the multilingual ability of BERT leads to robust performance in monolingual and multilingual settings. Furthermore, we explore zero-shot language transfer and find surprisingly robust performance. We investigate the zero-shot degradation and find that it can be partially mitigated by a proposed auxiliary training objective, but that the remaining error can best be attributed to domain shift rather than language transfer.

		David A Broniatowski, Mark Dredze, John W Ayers. ``First Do No Harm'': Effective Communication About COVID-19 Vaccines. American Journal of Public Health (AJPH), 2021;111(6):1055-1057. [PDF] [Bibtex] [Close] @article{Broniatowski:2021vx, abstract = {With effective COVID-19 vaccines in hand, we must now address the spread of information on social media that might encourage vaccine hesitancy. Although misinformation comes in many forms, including false claims, disinformation (e.g., deliberately false information), and rumors (e.g., unverified information), social media companies now seek to interdict this objectionable content---for the first time in their history---by removing content explicitly containing conspiracy theories and false or debunked claims about vaccines. Concurrently, social media users routinely disparage ``anti-vaxxers'' online, conflating a large group of vaccine-hesitant individuals who may be using social media to seek information about vaccination with a potentially much smaller group of ``vaccine refusers.'' Both strategies could cause more harm than good, necessitating a change in strategy informed by a large body of scientific evidence for making online communications about COVID-19 vaccines more effective.}, author = {David A. Broniatowski and Mark Dredze and John W. Ayers}, date-added = {2021-05-05 17:01:59 -0400}, date-modified = {2021-05-05 17:03:01 -0400}, file = {https://ajph.aphapublications.org/doi/10.2105/AJPH.2021.306288}, journal = {American Journal of Public Health (AJPH)}, month = {June}, number = {6}, pages = {1055-1057}, title = {``First Do No Harm'': Effective Communication About COVID-19 Vaccines}, volume = {111}, year = {2021} } With effective COVID-19 vaccines in hand, we must now address the spread of information on social media that might encourage vaccine hesitancy. Although misinformation comes in many forms, including false claims, disinformation (e.g., deliberately false information), and rumors (e.g., unverified information), social media companies now seek to interdict this objectionable content---for the first time in their history---by removing content explicitly containing conspiracy theories and false or debunked claims about vaccines. Concurrently, social media users routinely disparage ``anti-vaxxers'' online, conflating a large group of vaccine-hesitant individuals who may be using social media to seek information about vaccination with a potentially much smaller group of ``vaccine refusers.'' Both strategies could cause more harm than good, necessitating a change in strategy informed by a large body of scientific evidence for making online communications about COVID-19 vaccines more effective.

		John W Ayers, Eric C Leas, Mark Dredze, Theodore L Caputi, Shu-Hong Zhu, Joanna E Cohen. Did Philip Morris International use the e-cigarette, or vaping, product use associated lung injury (EVALI) outbreak to market IQOS heated tobacco? Tobacco Control, 2021. [PDF] [Bibtex] [Close] @article{Ayers:2021va, abstract = {25 July 2021 will mark the second anniversary of the e-cigarette, or vaping, product use associated lung injury (EVALI) outbreak. The concerns raised and news media attention focused on EVALI created a fertile environment for the tobacco industry to promote their e-cigarette alternatives, but this has not been studied. One such product is Philip Morris International's (PMI) heated tobacco product: `IQOS'. To assess how PMI promoted IQOS in the news during EVALI, we used `Tobacco Watcher' (www.tobaccowatcher.org), a fully automated and publicly available tobacco media analysis engine that warehouses news from more than 500 000+ sources. We plotted trends in news stories mentioning `IQOS' finding the largest number of stories mentioning IQOS occurred on 25 September 2019, with 261 articles, more than double the next highest day previously recorded. While investigating this anomaly we discovered an official PMI press release entitled `Lung illnesses associated with use of vaping products in the US' was published the same day. In the release (see online supplemental material), PMI recounted the EVALI outbreak beginning: `Skepticism and fear around vaping has emerged following the cases of respiratory illness and deaths in the US associated with the use of e-cigarettes.' PMI then contrasted this against their IQOS heated tobacco product, writing `on April 30 2019, the FDA authorized IQOS for sale in the US, finding that marketing of the product would be `appropriate for the protection of public health'' (quotes used in the original release).}, author = {John W Ayers and Eric C Leas and Mark Dredze and Theodore L Caputi and Shu-Hong Zhu and Joanna E Cohen}, date-added = {2021-04-23 16:33:12 -0400}, date-modified = {2022-12-26 23:45:13 -0500}, file = {http://dx.doi.org/10.1136/tobaccocontrol-2021-056661}, journal = {Tobacco Control}, month = {April}, title = {Did Philip Morris International use the e-cigarette, or vaping, product use associated lung injury (EVALI) outbreak to market IQOS heated tobacco?}, year = {2021} } 25 July 2021 will mark the second anniversary of the e-cigarette, or vaping, product use associated lung injury (EVALI) outbreak. The concerns raised and news media attention focused on EVALI created a fertile environment for the tobacco industry to promote their e-cigarette alternatives, but this has not been studied. One such product is Philip Morris International's (PMI) heated tobacco product: `IQOS'. To assess how PMI promoted IQOS in the news during EVALI, we used `Tobacco Watcher' (www.tobaccowatcher.org), a fully automated and publicly available tobacco media analysis engine that warehouses news from more than 500 000+ sources. We plotted trends in news stories mentioning `IQOS' finding the largest number of stories mentioning IQOS occurred on 25 September 2019, with 261 articles, more than double the next highest day previously recorded. While investigating this anomaly we discovered an official PMI press release entitled `Lung illnesses associated with use of vaping products in the US' was published the same day. In the release (see online supplemental material), PMI recounted the EVALI outbreak beginning: `Skepticism and fear around vaping has emerged following the cases of respiratory illness and deaths in the US associated with the use of e-cigarettes.' PMI then contrasted this against their IQOS heated tobacco product, writing `on April 30 2019, the FDA authorized IQOS for sale in the US, finding that marketing of the product would be `appropriate for the protection of public health'' (quotes used in the original release).

		Keith Harrigian, Carlos Aguirre, Mark Dredze. On the State of Social Media Data for Mental Health Research. NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych), 2021. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2021wa, abstract = {Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis.}, author = {Keith Harrigian and Carlos Aguirre and Mark Dredze}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych)}, date-added = {2021-04-19 10:27:43 -0400}, date-modified = {2021-04-19 10:28:08 -0400}, file = {https://aclanthology.org/2021.clpsych-1.2.pdf}, keywords = {workshop}, title = {On the State of Social Media Data for Mental Health Research}, year = {2021} } Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis.

		Carlos Aguirre, Mark Dredze. Qualitative Analysis of Depression Models by Demographics. NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych), 2021. [PDF] [Bibtex] [Close] @inproceedings{Aguirre:2021ti, abstract = {Models for identifying depression using social media text exhibit biases towards different gender and racial/ethnic groups. Factors like representation and balance of groups within the dataset are contributory factors, but difference in content and social media use may further explain these biases. We present an analysis of the content of social media posts from different demographic groups. Our analysis shows that there are content differences between depression and control subgroups across demographic groups, and that temporal topics and demographic-specific topics are correlated with downstream depression model error. We discuss the implications of our work on creating future datasets, as well as designing and training models for mental health.}, author = {Carlos Aguirre and Mark Dredze}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych)}, date-added = {2021-04-19 10:26:58 -0400}, date-modified = {2021-04-19 10:27:35 -0400}, file = {https://aclanthology.org/2021.clpsych-1.19.pdf}, keywords = {workshop}, title = {Qualitative Analysis of Depression Models by Demographics}, year = {2021} } Models for identifying depression using social media text exhibit biases towards different gender and racial/ethnic groups. Factors like representation and balance of groups within the dataset are contributory factors, but difference in content and social media use may further explain these biases. We present an analysis of the content of social media posts from different demographic groups. Our analysis shows that there are content differences between depression and control subgroups across demographic groups, and that temporal topics and demographic-specific topics are correlated with downstream depression model error. We discuss the implications of our work on creating future datasets, as well as designing and training models for mental health.

		Eli Sherman, Keith Harrigian, Carlos Aguirre, Mark Dredze. Towards Understanding the Role of Demographics in Deploying Social Media-Based Mental Health Surveillance Models. NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych), 2021. [PDF] [Bibtex] [Close] @inproceedings{Eli-Sherman:2021wa, abstract = {Spurred by advances in machine learning and natural language processing, developing social media-based mental health surveillance models has received substantial recent attention. For these models to be maximally useful, it is necessary to understand how they perform on various subgroups, especially those defined in terms of protected characteristics. In this paper we study the relationship between user demographics -- focusing on gender -- and depression. Considering a population of Reddit users with known genders and depression statuses, we analyze the degree to which depression predictions are subject to biases along gender lines using domaininformed classifiers. We then study our models' parameters to gain qualitative insight into the differences in posting behavior across genders.}, author = {Eli Sherman and Keith Harrigian and Carlos Aguirre and Mark Dredze}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology (CLPsych)}, date-added = {2021-04-19 10:25:25 -0400}, date-modified = {2021-04-19 10:28:23 -0400}, file = {https://aclanthology.org/2021.clpsych-1.23.pdf}, keywords = {workshop}, title = {Towards Understanding the Role of Demographics in Deploying Social Media-Based Mental Health Surveillance Models}, year = {2021} } Spurred by advances in machine learning and natural language processing, developing social media-based mental health surveillance models has received substantial recent attention. For these models to be maximally useful, it is necessary to understand how they perform on various subgroups, especially those defined in terms of protected characteristics. In this paper we study the relationship between user demographics -- focusing on gender -- and depression. Considering a population of Reddit users with known genders and depression statuses, we analyze the degree to which depression predictions are subject to biases along gender lines using domaininformed classifiers. We then study our models' parameters to gain qualitative insight into the differences in posting behavior across genders.

		Zach Wood-Doughty, Paiheng Xu, Xiao Liu, Mark Dredze. Using Noisy Self-Reports to Predict Twitter User Demographics. NAACL Workshop on Natural Language Processing for Social Media (SocialNLP), 2021. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2021wr, abstract = {Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter), numerous studies have inferred demographics automatically. Despite many studies presenting proof-of-concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite the noise of automated supervision, our self-report datasets enable improvements in classification performance on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.}, annote = {[<a href="https://www.cs.jhu.edu/~mdredze/demographics-training-data/"><span class="pub_link">Data</span></a>]}, author = {Zach Wood-Doughty and Paiheng Xu and Xiao Liu and Mark Dredze}, booktitle = {NAACL Workshop on Natural Language Processing for Social Media (SocialNLP)}, date-added = {2021-04-11 16:29:10 -0400}, date-modified = {2021-04-11 16:31:16 -0400}, file = {https://aclanthology.org/2021.socialnlp-1.11.pdf}, keywords = {workshop}, title = {Using Noisy Self-Reports to Predict Twitter User Demographics}, year = {2021} } [Data] Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter), numerous studies have inferred demographics automatically. Despite many studies presenting proof-of-concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite the noise of automated supervision, our self-report datasets enable improvements in classification performance on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.

		Aaron Mueller, Mark Dredze. Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling. North American Chapter of the Association for Computational Linguistics (NAACL), 2021. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2021tw, abstract = {Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitate zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised task such as topic modeling. Thus, we propose several methods for fine-tuning encoders to improve both monolingual and zero-shot polylingual neural topic modeling. We consider fine-tuning on auxiliary tasks, constructing a new topic classification task, integrating the topic classification objective directly into topic model training, and continued pre-training. We find that fine-tuning encoder representations on topic classification and integrating the topic classification task directly into topic modeling improves topic quality, and that fine-tuning encoder representations on any task is the most important factor for facilitating cross-lingual transfer.}, annote = {[<a href="https://github.com/aaronmueller/contextualized-topic-models"><span class="pub_link">Code</span></a>]}, author = {Aaron Mueller and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2021-03-11 21:02:27 -0500}, date-modified = {2021-03-11 21:02:46 -0500}, file = {https://arxiv.org/abs/2104.05064}, title = {Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling}, year = {2021} } [Code] Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitate zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised task such as topic modeling. Thus, we propose several methods for fine-tuning encoders to improve both monolingual and zero-shot polylingual neural topic modeling. We consider fine-tuning on auxiliary tasks, constructing a new topic classification task, integrating the topic classification objective directly into topic model training, and continued pre-training. We find that fine-tuning encoder representations on topic classification and integrating the topic classification task directly into topic modeling improves topic quality, and that fine-tuning encoder representations on any task is the most important factor for facilitating cross-lingual transfer.

		Xiaolei Huang, Michael J Paul, Franck Dernoncourt, Robin Burke, Mark Dredze. User Factor Adaptation for User Embedding via Multitask Learning. EACL Workshop on Domain Adaptation for NLP (Adapt-NLP), 2021. [PDF] [Bibtex] [Close] @inproceedings{Huang:2021aa, abstract = {Language varies across users and their interested fields in social media data: words authored by a user across his/her interests may have different meanings (e.g., cool) or sentiments (e.g., fast). However, most of the existing methods to train user embeddings ignore the variations across user interests, such as product and movie categories (e.g., drama vs. action). In this study, we treat the user interest as domains and empirically examine how the user language can vary across the user factor in three English social media datasets. We then propose a user embedding model to account for the language variability of user interests via a multitask learning framework. The model learns user language and its variations without human supervision. While existing work mainly evaluated the user embedding by extrinsic tasks, we propose an intrinsic evaluation via clustering and evaluate user embeddings by an extrinsic task, text classification. The experiments on the three English-language social media datasets show that our proposed approach can generally outperform baselines via adapting the user factor.}, author = {Xiaolei Huang and Michael J. Paul and Franck Dernoncourt and Robin Burke and Mark Dredze}, booktitle = {EACL Workshop on Domain Adaptation for NLP (Adapt-NLP)}, date-added = {2021-02-22 13:57:07 -0500}, date-modified = {2021-02-22 13:58:17 -0500}, file = {https://arxiv.org/abs/2102.11103}, keywords = {workshop}, title = {User Factor Adaptation for User Embedding via Multitask Learning}, year = {2021} } Language varies across users and their interested fields in social media data: words authored by a user across his/her interests may have different meanings (e.g., cool) or sentiments (e.g., fast). However, most of the existing methods to train user embeddings ignore the variations across user interests, such as product and movie categories (e.g., drama vs. action). In this study, we treat the user interest as domains and empirically examine how the user language can vary across the user factor in three English social media datasets. We then propose a user embedding model to account for the language variability of user interests via a multitask learning framework. The model learns user language and its variations without human supervision. While existing work mainly evaluated the user embedding by extrinsic tasks, we propose an intrinsic evaluation via clustering and evaluate user embeddings by an extrinsic task, text classification. The experiments on the three English-language social media datasets show that our proposed approach can generally outperform baselines via adapting the user factor.

		John W Ayers, Adam Poliak, Derek C Johnson, Eric C Leas, Mark Dredze, Theodore Caputi, Alicia L Nobles. Suicide-Related Internet Searches During the Early Stages of the COVID-19 Pandemic in the US. JAMA Network Open, 2021;4(1):e2034261. [PDF] [Bibtex] [Close] @article{Ayers:2021aa, abstract = {Experts anticipate that the societal fallout associated with the coronavirus disease 2019 (COVID-19) pandemic will increase suicidal behavior, and strategies to address this anticipated increase have been woven into policy decision-making without contemporaneous data.For instance, President Trump cited increased suicides as an argument against COVID-19 control measures during the first presidential debate on September 29, 2020. Given the time delays inherent in traditional population mental health surveillance, it is important for decision-makers to seek other contemporaneous data to evaluate potential associations. To assess the value that free and public internet search query trends can provide to rapidly identify associations, we monitored suicide-related internet search rates during the early stages of the COVID-19 pandemic in the US.}, author = {John W. Ayers and Adam Poliak and Derek C. Johnson and Eric C. Leas and Mark Dredze and Theodore Caputi and Alicia L. Nobles}, date-added = {2021-01-26 11:23:03 -0500}, date-modified = {2021-01-26 11:24:18 -0500}, file = {https://doi.org/10.1001/jamanetworkopen.2020.34261}, journal = {JAMA Network Open}, number = {1}, pages = {e2034261}, title = {Suicide-Related Internet Searches During the Early Stages of the COVID-19 Pandemic in the US}, volume = {4}, year = {2021} } Experts anticipate that the societal fallout associated with the coronavirus disease 2019 (COVID-19) pandemic will increase suicidal behavior, and strategies to address this anticipated increase have been woven into policy decision-making without contemporaneous data.For instance, President Trump cited increased suicides as an argument against COVID-19 control measures during the first presidential debate on September 29, 2020. Given the time delays inherent in traditional population mental health surveillance, it is important for decision-makers to seek other contemporaneous data to evaluate potential associations. To assess the value that free and public internet search query trends can provide to rapidly identify associations, we monitored suicide-related internet search rates during the early stages of the COVID-19 pandemic in the US.

		Carlos Aguirre, Keith Harrigian, Mark Dredze. Gender and Racial Fairness in Depression Research using Social Media. Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021. [PDF] [Bibtex] [Close] @inproceedings{Aguirre:2021aa, abstract = {Multiple studies have demonstrated that behaviors expressed on online social media platforms can indicate the mental health state of an individual. The widespread availability of such data has spurred interest in mental health research, using several datasets where individuals are labeled with mental health conditions. While previous research has raised concerns about possible biases in models produced from this data, no study has investigated how these biases manifest themselves with regards to demographic groups in data, such as gender and racial/ethnic groups. Here, we analyze the fairness of depression classifiers trained on Twitter data with respect to gender and racial demographic groups. We find that model performance differs for underrepresented groups, and we investigate sources of these biases beyond data representation. Our study results in recommendations on how to avoid these biases in future research.}, author = {Carlos Aguirre and Keith Harrigian and Mark Dredze}, booktitle = {Conference of the European Chapter of the Association for Computational Linguistics (EACL)}, date-added = {2021-01-11 21:45:34 -0500}, date-modified = {2021-01-11 21:45:48 -0500}, file = {https://aclanthology.org/2021.eacl-main.256/}, title = {Gender and Racial Fairness in Depression Research using Social Media}, year = {2021} } Multiple studies have demonstrated that behaviors expressed on online social media platforms can indicate the mental health state of an individual. The widespread availability of such data has spurred interest in mental health research, using several datasets where individuals are labeled with mental health conditions. While previous research has raised concerns about possible biases in models produced from this data, no study has investigated how these biases manifest themselves with regards to demographic groups in data, such as gender and racial/ethnic groups. Here, we analyze the fairness of depression classifiers trained on Twitter data with respect to gender and racial demographic groups. We find that model performance differs for underrepresented groups, and we investigate sources of these biases beyond data representation. Our study results in recommendations on how to avoid these biases in future research.

		Aaron Mueller, Zach Wood-Doughty, Silvio Amir, Mark Dredze, Alicia L Nobles. Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement. Computer-supported Cooperative Work (CSCW), 2021. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2021, abstract = {The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements. Through an analysis of over 600,000 tweets from over 256,000 unique users, we examine online #MeToo conversations across gender and racial/ethnic identities and the topics that each demographic emphasized. We found that tweets authored by white women were overrepresented in the movement compared to other demographics, aligning with criticism of unequal representation. We found that intersected identities contributed differing narratives to frame the movement, co-opted the movement to raise visibility in parallel ongoing movements, employed the same hashtags both critically and supportively, and revived and created new hashtags in response to pivotal moments. Notably, tweets authored by black women often expressed emotional support and were critical about differential treatment in the justice system and by police. In comparison, tweets authored by white women and men often highlighted sexual harassment and violence by public figures and weaved in more general political discussions. We discuss the implications of this work for digital activism research and design, including suggestions to raise visibility by those who were under-represented in this hashtag activism movement. Content warning: this article discusses issues of sexual harassment and violence.}, author = {Aaron Mueller and Zach Wood-Doughty and Silvio Amir and Mark Dredze and Alicia L Nobles}, booktitle = {Computer-supported Cooperative Work (CSCW)}, file = {https://dl.acm.org/doi/10.1145/3449181}, title = {Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement}, year = {2021} } The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements. Through an analysis of over 600,000 tweets from over 256,000 unique users, we examine online #MeToo conversations across gender and racial/ethnic identities and the topics that each demographic emphasized. We found that tweets authored by white women were overrepresented in the movement compared to other demographics, aligning with criticism of unequal representation. We found that intersected identities contributed differing narratives to frame the movement, co-opted the movement to raise visibility in parallel ongoing movements, employed the same hashtags both critically and supportively, and revived and created new hashtags in response to pivotal moments. Notably, tweets authored by black women often expressed emotional support and were critical about differential treatment in the justice system and by police. In comparison, tweets authored by white women and men often highlighted sexual harassment and violence by public figures and weaved in more general political discussions. We discuss the implications of this work for digital activism research and design, including suggestions to raise visibility by those who were under-represented in this hashtag activism movement. Content warning: this article discusses issues of sexual harassment and violence.

		2020 (28 Publications)
		Caitlin Weiger, Katherine C Smith, Joanna E Cohen, Mark Dredze, Meghan Bridgid Moran. How Internet Contracts Impact Research: Content Analysis of Terms of Service on Consumer Product Websites. Journal of Medical Internet Research Public Health Surveillance, 2020;6(4):e23579. [PDF] [Bibtex] [Close] @article{Weiger:2020um, abstract = {Background: Companies use brand websites as a promotional tool to engage consumers on the web, which can increase product use. Given that some products are harmful to the health of consumers, it is important for marketing associated with these products to be subject to public health surveillance. However, terms of service (TOS) governing the use of brand website content may impede such important research. Objective: The aim of this study is to explore the TOS for brand websites with public health significance to assess possible legal and ethical challenges for conducting research on consumer product websites. Methods: Using Statista, we purposefully constructed a sample of 15 leading American tobacco, alcohol, psychiatric pharmaceutical, fast-food, and gun brands that have associated websites. We developed and implemented a structured coding system for the TOS on these websites and coded for the presence versus absence of different types of restriction that might impact the ability to conduct research. Results: All TOS stated that by accessing the website, users agreed to abide by the TOS (15/15, 100%). A total of 11 out of 15 (73%) websites had age restrictions in their TOS. All alcohol brand websites (5/15, 33%) required users to enter their age or date of birth before viewing website content. Both websites for tobacco brands (2/15, 13%) further required that users register and verify their age and identity to access any website content and agree that they use tobacco products. Only one website (1/15, 7%) allowed users to display, download, copy, distribute, and translate the website content as long as it was for personal and not commercial use. A total of 33% (5/15) of TOS unconditionally prohibited or put substantial restrictions on all of these activities and/or failed to specify if they were allowed or prohibited. Moreover, 87% (13/15) of TOS indicated that website access could be restricted at any time. A total of 73% (11/15) of websites specified that violating TOS could result in deleting user content from the website, revoking access by having the user's Internet Protocol address blocked, terminating log-in credentials, or enforcing legal action resulting in civil or criminal penalties. Conclusions: TOS create complications for public health surveillance related to e-marketing on brand websites. Recent court opinions have reduced the risk of federal criminal charges for violating TOS on public websites, but this risk remains unclear for private websites. The public health community needs to establish standards to guide and protect researchers from the possibility of legal repercussions related to such efforts.}, author = {Caitlin Weiger and Katherine C Smith and Joanna E Cohen and Mark Dredze and Meghan Bridgid Moran}, date-added = {2020-12-02 11:08:20 -0500}, date-modified = {2020-12-02 11:11:22 -0500}, file = {https://publichealth.jmir.org/2020/4/e23579/}, journal = {Journal of Medical Internet Research Public Health Surveillance}, number = {4}, pages = {e23579}, title = {How Internet Contracts Impact Research: Content Analysis of Terms of Service on Consumer Product Websites}, volume = {6}, year = {2020} } Background: Companies use brand websites as a promotional tool to engage consumers on the web, which can increase product use. Given that some products are harmful to the health of consumers, it is important for marketing associated with these products to be subject to public health surveillance. However, terms of service (TOS) governing the use of brand website content may impede such important research. Objective: The aim of this study is to explore the TOS for brand websites with public health significance to assess possible legal and ethical challenges for conducting research on consumer product websites. Methods: Using Statista, we purposefully constructed a sample of 15 leading American tobacco, alcohol, psychiatric pharmaceutical, fast-food, and gun brands that have associated websites. We developed and implemented a structured coding system for the TOS on these websites and coded for the presence versus absence of different types of restriction that might impact the ability to conduct research. Results: All TOS stated that by accessing the website, users agreed to abide by the TOS (15/15, 100%). A total of 11 out of 15 (73%) websites had age restrictions in their TOS. All alcohol brand websites (5/15, 33%) required users to enter their age or date of birth before viewing website content. Both websites for tobacco brands (2/15, 13%) further required that users register and verify their age and identity to access any website content and agree that they use tobacco products. Only one website (1/15, 7%) allowed users to display, download, copy, distribute, and translate the website content as long as it was for personal and not commercial use. A total of 33% (5/15) of TOS unconditionally prohibited or put substantial restrictions on all of these activities and/or failed to specify if they were allowed or prohibited. Moreover, 87% (13/15) of TOS indicated that website access could be restricted at any time. A total of 73% (11/15) of websites specified that violating TOS could result in deleting user content from the website, revoking access by having the user's Internet Protocol address blocked, terminating log-in credentials, or enforcing legal action resulting in civil or criminal penalties. Conclusions: TOS create complications for public health surveillance related to e-marketing on brand websites. Recent court opinions have reduced the risk of federal criminal charges for violating TOS on public websites, but this risk remains unclear for private websites. The public health community needs to establish standards to guide and protect researchers from the possibility of legal repercussions related to such efforts.

		Rachel Dorn, Alicia L Nobles, Masoud Rouhizadeh, Mark Dredze. Examining the Feasibility of Off-the-Shelf Algorithms for Masking Directly Identifiable Information in Social Media Data. arXiv:2011.08324, 2020. [PDF] [Bibtex] [Close] @inproceedings{Dorn:2020bs, abstract = {The identification and removal/replacement of protected information from social media data is an understudied problem, despite being desirable from an ethical and legal perspective. This paper identifies types of potentially directly identifiable information (inspired by protected health information in clinical texts) contained in tweets that may be readily removed using off-the-shelf algorithms, introduces an English dataset of tweets annotated for identifiable information, and compiles these off-the-shelf algorithms into a tool (Nightjar) to evaluate the feasibility of using Nightjar to remove directly identifiable information from the tweets. Nightjar as well as the annotated data can be retrieved from this https URL.}, author = {Rachel Dorn and Alicia L. Nobles and Masoud Rouhizadeh and Mark Dredze}, booktitle = {arXiv:2011.08324}, date-added = {2020-11-18 17:20:18 -0500}, date-modified = {2020-11-18 17:35:06 -0500}, file = {https://arxiv.org/abs/2011.08324}, keywords = {unpublished}, title = {Examining the Feasibility of Off-the-Shelf Algorithms for Masking Directly Identifiable Information in Social Media Data}, year = {2020} } The identification and removal/replacement of protected information from social media data is an understudied problem, despite being desirable from an ethical and legal perspective. This paper identifies types of potentially directly identifiable information (inspired by protected health information in clinical texts) contained in tweets that may be readily removed using off-the-shelf algorithms, introduces an English dataset of tweets annotated for identifiable information, and compiles these off-the-shelf algorithms into a tool (Nightjar) to evaluate the feasibility of using Nightjar to remove directly identifiable information from the tweets. Nightjar as well as the annotated data can be retrieved from this https URL.

		Zach Wood-Doughty, Ilya Shpitser, Mark Dredze. Sensitivity Analyses for Incorporating Machine Learning Predictions into Causal Estimates. NeurIPS Workshop on Causal Discovery & Causality-Inspired Machine Learning, 2020. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2020ta, abstract = {Causal inference methods can yield insights into causation from observational datasets. When some necessary variables are unavailable for a causal analysis, machine learning systems may be able to infer those variables based on unstructured data such as images and text. However, if these inferred variables are to be incorporated into causal analyses, the error rate of the underlying classifier should affect the uncertainty in the causal conclusions. Past work has framed classifier accuracy as measurement error to incorporate predictions into consistently-estimated causal effects. However, this estimator is sensitive to small errors in its estimation of the classifier's error rate, leading to erratic outputs that are uninterpretable for any given analysis. In this paper we introduce three sensitivity analyses that capture the uncertainty from using a machine learning classifier in causal estimation, and show that our methods enable more robust analyses.}, author = {Zach Wood-Doughty and Ilya Shpitser and Mark Dredze}, booktitle = {NeurIPS Workshop on Causal Discovery & Causality-Inspired Machine Learning}, date-added = {2020-11-01 01:14:32 -0400}, date-modified = {2020-11-01 01:18:22 -0400}, file = {https://www.cmu.edu/dietrich/causality/CameraReadys-accepted%20papers/14%5CCameraReady%5Cpaper.pdf}, keywords = {workshop}, title = {Sensitivity Analyses for Incorporating Machine Learning Predictions into Causal Estimates}, year = {2020} } Causal inference methods can yield insights into causation from observational datasets. When some necessary variables are unavailable for a causal analysis, machine learning systems may be able to infer those variables based on unstructured data such as images and text. However, if these inferred variables are to be incorporated into causal analyses, the error rate of the underlying classifier should affect the uncertainty in the causal conclusions. Past work has framed classifier accuracy as measurement error to incorporate predictions into consistently-estimated causal effects. However, this estimator is sensitive to small errors in its estimation of the classifier's error rate, leading to erratic outputs that are uninterpretable for any given analysis. In this paper we introduce three sensitivity analyses that capture the uncertainty from using a machine learning classifier in causal estimation, and show that our methods enable more robust analyses.

		Paiheng Xu, David Broniatowski, Mark Dredze. Twitter Detects Who is Social Distancing During COVID-19. NeurIPS Workshop on Machine Learning in Public Health, 2020. [Bibtex] [Close] @inproceedings{Xu:2020fx, author = {Paiheng Xu and David Broniatowski and Mark Dredze}, booktitle = {NeurIPS Workshop on Machine Learning in Public Health}, date-added = {2020-10-26 21:42:31 -0400}, date-modified = {2020-10-26 21:43:17 -0400}, keywords = {workshop}, title = {Twitter Detects Who is Social Distancing During COVID-19}, year = {2020} }

		Eric C Leas, Alicia L Nobles, Theodore L Caputi, Mark Dredze, Shu-Hong Zhu, Joanna E Cohen, John W Ayers. News coverage of the E-cigarette, or Vaping, product use Associated Lung Injury (EVALI) outbreak and internet searches for vaping cessation. Tobacco Control, 2020. [PDF] [Bibtex] [Close] @article{Leas:2010uq, abstract = {Background In the latter half of 2019, an outbreak of pulmonary disease in the USA resulted in 2807 hospitalisations and 68 deaths, as of 18 February 2020. Given the severity of the outbreak, we assessed whether articles during the outbreak era more frequently warned about the dangers of vaping and whether internet searches for vaping cessation increased. Methods Using Tobacco Watcher, a media monitoring platform that automatically identifies and categorises news articles from sources across the globe, we obtained all articles that (a) discussed the outbreak and (b) primarily warned about the dangers of vaping. We obtained internet search trends originating from the USA that mentioned `quit' or `stop' and `e cig(s),' `ecig(s),' `e-cig(s),' `e cigarette(s),' `e-cigarette(s),' `electronic cigarette(s),' `vape(s),' `vaping' or `vaper(s)' from Google Trends (eg, `how do I quit vaping?'). All data were obtained from 1 January 2014 to 18 February 2020 and ARIMA models were used with historical trends to forecast the ratio of observed to expected search volumes during the outbreak era. Results News of the vaping-induced pulmonary disease outbreak was first reported on 25 July 2019 with 195 articles, culminating in 44 512 articles by 18 February 2020. On average, news articles warning about the dangers of vaping were 130% (95% prediction interval (PI): −15 to 417) and searches for vaping cessation were 76% (95% PI: 28 to 182) higher than expected levels for the days during the period when the sources of the outbreak were unknown (25 July to 27 September 2019). News and searches stabilised just after the US Centers for Disease Control and Prevention reported that a primary source of the outbreak was an additive used in marijuana vapes on 27 September 2019. In sum, there were 12 286 articles archived in Tobacco Watcher primarily warning about the dangers of vaping and 1 025 000 cessation searches following the outbreak. Conclusion The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation. Resources and strategies that respond to this elevated interest should become a priority among public health leaders.}, author = {Eric C Leas and Alicia L Nobles and Theodore L Caputi and Mark Dredze and Shu-Hong Zhu and Joanna E Cohen and John W Ayers}, date-added = {2020-10-22 14:53:55 -0400}, date-modified = {2020-10-22 14:57:07 -0400}, file = {http://doi.org/10.1136/tobaccocontrol-2020-055755}, journal = {Tobacco Control}, month = {October}, title = {News coverage of the E-cigarette, or Vaping, product use Associated Lung Injury (EVALI) outbreak and internet searches for vaping cessation}, year = {2020} } Background In the latter half of 2019, an outbreak of pulmonary disease in the USA resulted in 2807 hospitalisations and 68 deaths, as of 18 February 2020. Given the severity of the outbreak, we assessed whether articles during the outbreak era more frequently warned about the dangers of vaping and whether internet searches for vaping cessation increased. Methods Using Tobacco Watcher, a media monitoring platform that automatically identifies and categorises news articles from sources across the globe, we obtained all articles that (a) discussed the outbreak and (b) primarily warned about the dangers of vaping. We obtained internet search trends originating from the USA that mentioned `quit' or `stop' and `e cig(s),' `ecig(s),' `e-cig(s),' `e cigarette(s),' `e-cigarette(s),' `electronic cigarette(s),' `vape(s),' `vaping' or `vaper(s)' from Google Trends (eg, `how do I quit vaping?'). All data were obtained from 1 January 2014 to 18 February 2020 and ARIMA models were used with historical trends to forecast the ratio of observed to expected search volumes during the outbreak era. Results News of the vaping-induced pulmonary disease outbreak was first reported on 25 July 2019 with 195 articles, culminating in 44 512 articles by 18 February 2020. On average, news articles warning about the dangers of vaping were 130% (95% prediction interval (PI): −15 to 417) and searches for vaping cessation were 76% (95% PI: 28 to 182) higher than expected levels for the days during the period when the sources of the outbreak were unknown (25 July to 27 September 2019). News and searches stabilised just after the US Centers for Disease Control and Prevention reported that a primary source of the outbreak was an additive used in marijuana vapes on 27 September 2019. In sum, there were 12 286 articles archived in Tobacco Watcher primarily warning about the dangers of vaping and 1 025 000 cessation searches following the outbreak. Conclusion The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation. Resources and strategies that respond to this elevated interest should become a priority among public health leaders.

		John W Ayers, Benjamin M Althouse, Adam Poliak, Eric C Leas, Alicia L Nobles, Mark Dredze, Davey Smith. Quantifying Public Interest in Police Reforms by Mining Internet Search Data Following George Floyd's Death. Journal of Medical Internet Research (JMIR), 2020;22(10):e22574. [PDF] [Bibtex] [Close] @article{Ayers:2020sj, abstract = {Background: The death of George Floyd while in police custody has resurfaced serious questions about police conduct that result in the deaths of unarmed persons. Objective: Data-driven strategies that identify and prioritize the public's needs may engender a public health response to improve policing. We assessed how internet searches indicative of interest in police reform changed after Mr Floyd's death. Methods: We monitored daily Google searches (per 10 million total searches) that included the terms ``police'' and ``reform(s)'' (eg, ``reform the police,'' ``best police reforms,'' etc) originating from the United States between January 1, 2010, through July 5, 2020. We also monitored searches containing the term ``police'' with ``training,'' ``union(s),'' ``militarization,'' or ``immunity'' as markers of interest in the corresponding reform topics. Results: The 41 days following Mr Floyd's death corresponded with the greatest number of police ``reform(s)'' searches ever recorded, with 1,350,000 total searches nationally. Searches increased significantly in all 50 states and Washington DC. By reform topic, nationally there were 1,220,000 total searches for ``police'' and ``union(s)''; 820,000 for ``training''; 360,000 for ``immunity''; and 72,000 for ``militarization.'' In terms of searches for all policy topics by state, 33 states searched the most for ``training,'' 16 for ``union(s),'' and 2 for ``immunity.'' States typically in the southeast had fewer queries related to any police reform topic than other states. States that had a greater percentage of votes for President Donald Trump during the 2016 election searched more often for police ``union(s)'' while states favoring Secretary Hillary Clinton searched more for police ``training.'' Conclusions: The United States is at a historical juncture, with record interest in topics related to police reform with variability in search terms across states. Policy makers can respond to searches by considering the policies their constituencies are searching for online, notably police training and unions. Public health leaders can respond by engaging in the subject of policing and advocating for evidence-based policy reforms.}, annote = {(<b>Ranked in the top 3% of 16m research outputs by <a href="https://www.altmetric.com/details/92800438?src=bookmarklet#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W Ayers and Benjamin M Althouse and Adam Poliak and Eric C Leas and Alicia L Nobles and Mark Dredze and Davey Smith}, date-added = {2020-10-21 02:13:41 -0400}, date-modified = {2020-10-21 15:02:12 -0400}, file = {http://www.jmir.org/2020/10/e22574/}, journal = {Journal of Medical Internet Research (JMIR)}, month = {October}, number = {10}, pages = {e22574}, title = {Quantifying Public Interest in Police Reforms by Mining Internet Search Data Following George Floyd's Death}, volume = {22}, year = {2020} } (Ranked in the top 3% of 16m research outputs by Altmetric) Background: The death of George Floyd while in police custody has resurfaced serious questions about police conduct that result in the deaths of unarmed persons. Objective: Data-driven strategies that identify and prioritize the public's needs may engender a public health response to improve policing. We assessed how internet searches indicative of interest in police reform changed after Mr Floyd's death. Methods: We monitored daily Google searches (per 10 million total searches) that included the terms ``police'' and ``reform(s)'' (eg, ``reform the police,'' ``best police reforms,'' etc) originating from the United States between January 1, 2010, through July 5, 2020. We also monitored searches containing the term ``police'' with ``training,'' ``union(s),'' ``militarization,'' or ``immunity'' as markers of interest in the corresponding reform topics. Results: The 41 days following Mr Floyd's death corresponded with the greatest number of police ``reform(s)'' searches ever recorded, with 1,350,000 total searches nationally. Searches increased significantly in all 50 states and Washington DC. By reform topic, nationally there were 1,220,000 total searches for ``police'' and ``union(s)''; 820,000 for ``training''; 360,000 for ``immunity''; and 72,000 for ``militarization.'' In terms of searches for all policy topics by state, 33 states searched the most for ``training,'' 16 for ``union(s),'' and 2 for ``immunity.'' States typically in the southeast had fewer queries related to any police reform topic than other states. States that had a greater percentage of votes for President Donald Trump during the 2016 election searched more often for police ``union(s)'' while states favoring Secretary Hillary Clinton searched more for police ``training.'' Conclusions: The United States is at a historical juncture, with record interest in topics related to police reform with variability in search terms across states. Policy makers can respond to searches by considering the policies their constituencies are searching for online, notably police training and unions. Public health leaders can respond by engaging in the subject of policing and advocating for evidence-based policy reforms.

		Eric C Leas, Erik M Hendrickson, Alicia L Nobles, Rory Todd, Davey M Smith, Mark Dredze, John W Ayers. Self-reported Cannabidiol (CBD) Use for Conditions With Proven Therapies. JAMA Network Open, 2020;3(10):e2020977. [PDF] [Bibtex] [Close] @article{Leas:2020ss, abstract = {Question Is the public using cannabidiol (CBD) to treat diagnosable conditions that have evidence-based therapies? Findings In this case series of 376 posts on a CBD forum on Reddit, most users reported taking CBD as a therapeutic for diagnosable conditions, including mental health, cardiological, dermatological, gastroenterological, ophthalmological, oral health, and sexual health conditions, many of which have other evidence-based treatment regimens. Meaning The findings suggest a need for interventions that address the use of CBD for unproven applications, including regulating therapeutic claims about CBD and redirecting patients to proven therapies in lieu of CBD.}, annote = {(<b>Ranked in the top 1% of 16m research outputs by <a href="https://jamanetwork.altmetric.com/details/92448837#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Eric C. Leas and Erik M. Hendrickson and Alicia L. Nobles and Rory Todd and Davey M. Smith and Mark Dredze and John W. Ayers}, date-added = {2020-10-16 16:32:48 -0400}, date-modified = {2020-10-16 16:34:04 -0400}, file = {https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2771735}, journal = {JAMA Network Open}, number = {10}, pages = {e2020977}, title = {Self-reported Cannabidiol (CBD) Use for Conditions With Proven Therapies}, volume = {3}, year = {2020} } (Ranked in the top 1% of 16m research outputs by Altmetric) Question Is the public using cannabidiol (CBD) to treat diagnosable conditions that have evidence-based therapies? Findings In this case series of 376 posts on a CBD forum on Reddit, most users reported taking CBD as a therapeutic for diagnosable conditions, including mental health, cardiological, dermatological, gastroenterological, ophthalmological, oral health, and sexual health conditions, many of which have other evidence-based treatment regimens. Meaning The findings suggest a need for interventions that address the use of CBD for unproven applications, including regulating therapeutic claims about CBD and redirecting patients to proven therapies in lieu of CBD.

		Paiheng Xu, Mark Dredze, David A Broniatowski. The Twitter Social Mobility Index: Measuring Social Distancing Practices from Geolocated Tweets. Journal of Medical Internet Research (JMIR), 2020. [PDF] [Bibtex] [Close] @article{Xu:2020vl, abstract = {Background: Social distancing is an important component of the response to the novel Coronavirus (COVID-19) pandemic. Minimizing social interactions and travel reduces the rate at which the infection spreads, and "flattens the curve" such that the medical system can better treat infected individuals. However, it remains unclear how the public will respond to these policies as the pandemic continues. Objective: We present the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We use public geolocated Twitter data to measure how much a user travels in a given week. Methods: We collect 469,669,925 geotagged tweets from January 1, 2019 to April 27, 2020 in the United States. We analyze the aggregated mobility variance of a total of 3,768,959 Twitter users at the city and state level since the start of the COVID-19 pandemic. Results: We find a large reduction in travel in the United States after the implementation of social distancing policies, with larger reductions in states that were early adopters and smaller changes in states without policies. Our findings are presented on http://socialmobility.covid19dataresources.org and we will continue to update our analysis during the pandemic. Conclusions: Geolocated tweets are an effective way to track social distancing practices from a public resource, suggesting that it can be used as part of ongoing pandemic response planning.}, author = {Paiheng Xu and Mark Dredze and David A Broniatowski}, date-added = {2020-10-11 20:53:18 -0400}, date-modified = {2020-10-22 11:05:36 -0400}, file = {https://doi.org/10.2196/21499}, journal = {Journal of Medical Internet Research (JMIR)}, title = {The Twitter Social Mobility Index: Measuring Social Distancing Practices from Geolocated Tweets}, year = {2020} } Background: Social distancing is an important component of the response to the novel Coronavirus (COVID-19) pandemic. Minimizing social interactions and travel reduces the rate at which the infection spreads, and "flattens the curve" such that the medical system can better treat infected individuals. However, it remains unclear how the public will respond to these policies as the pandemic continues. Objective: We present the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We use public geolocated Twitter data to measure how much a user travels in a given week. Methods: We collect 469,669,925 geotagged tweets from January 1, 2019 to April 27, 2020 in the United States. We analyze the aggregated mobility variance of a total of 3,768,959 Twitter users at the city and state level since the start of the COVID-19 pandemic. Results: We find a large reduction in travel in the United States after the implementation of social distancing policies, with larger reductions in states that were early adopters and smaller changes in states without policies. Our findings are presented on http://socialmobility.covid19dataresources.org and we will continue to update our analysis during the pandemic. Conclusions: Geolocated tweets are an effective way to track social distancing practices from a public resource, suggesting that it can be used as part of ongoing pandemic response planning.

		Amelia Jamison, David A Broniatowski, Michael C Smith, Kajal S Parikh, Adeena Malik, Mark Dredze, Sandra C Quinn. Adapting and Extending a Typology to Identify Vaccine Misinformation on Twitter. American Journal of Public Health (AJPH), 2020;110(S3):S331--S339. [PDF] [Bibtex] [Close] @article{Jamison:2020il, abstract = {Objectives. To adapt and extend an existing typology of vaccine misinformation to classify the major topics of discussion across the total vaccine discourse on Twitter. Methods. Using 1.8 million vaccine-relevant tweets compiled from 2014 to 2017, we adapted an existing typology to Twitter data, first in a manual content analysis and then using latent Dirichlet allocation (LDA) topic modeling to extract 100 topics from the data set. Results. Manual annotation identified 22% of the data set as antivaccine, of which safety concerns and conspiracies were the most common themes. Seventeen percent of content was identified as provaccine, with roughly equal proportions of vaccine promotion, criticizing antivaccine beliefs, and vaccine safety and effectiveness. Of the 100 LDA topics, 48 contained provaccine sentiment and 28 contained antivaccine sentiment, with 9 containing both. Conclusions. Our updated typology successfully combines manual annotation with machine-learning methods to estimate the distribution of vaccine arguments, with greater detail on the most distinctive topics of discussion. With this information, communication efforts can be developed to better promote vaccines and avoid amplifying antivaccine rhetoric on Twitter.}, author = {Amelia Jamison and David A. Broniatowski and Michael C. Smith and Kajal S. Parikh and Adeena Malik and Mark Dredze and Sandra C. Quinn}, date-added = {2020-10-02 00:02:55 -0400}, date-modified = {2020-10-02 00:03:53 -0400}, file = {https://doi.org/10.2105/AJPH.2020.305940}, journal = {American Journal of Public Health (AJPH)}, month = {October}, number = {S3}, pages = {S331--S339}, title = {Adapting and Extending a Typology to Identify Vaccine Misinformation on Twitter}, volume = {110}, year = {2020} } Objectives. To adapt and extend an existing typology of vaccine misinformation to classify the major topics of discussion across the total vaccine discourse on Twitter. Methods. Using 1.8 million vaccine-relevant tweets compiled from 2014 to 2017, we adapted an existing typology to Twitter data, first in a manual content analysis and then using latent Dirichlet allocation (LDA) topic modeling to extract 100 topics from the data set. Results. Manual annotation identified 22% of the data set as antivaccine, of which safety concerns and conspiracies were the most common themes. Seventeen percent of content was identified as provaccine, with roughly equal proportions of vaccine promotion, criticizing antivaccine beliefs, and vaccine safety and effectiveness. Of the 100 LDA topics, 48 contained provaccine sentiment and 28 contained antivaccine sentiment, with 9 containing both. Conclusions. Our updated typology successfully combines manual annotation with machine-learning methods to estimate the distribution of vaccine arguments, with greater detail on the most distinctive topics of discussion. With this information, communication efforts can be developed to better promote vaccines and avoid amplifying antivaccine rhetoric on Twitter.

		David A Broniatowski, Amelia M Jamison, Neil F Johnson, Nicolás Velasquez, Rhys Leahy, Nicholas Johnson Restrepo, Mark Dredze, Sandra C Quinn. Facebook Pages, the ``Disneyland'' Measles Outbreak, and Promotion of Vaccine Refusal as a Civil Right, 2009--2019. American Journal of Public Health (AJPH), 2020;110(S3):S312--S318. [PDF] [Bibtex] [Close] @article{Broniatowski:2020bv, abstract = {Objectives. To understand changes in how Facebook pages frame vaccine opposition. Methods. We categorized 204 Facebook pages expressing vaccine opposition, extracting public posts through November 20, 2019. We analyzed posts from October 2009 through October 2019 to examine if pages' content was coalescing. Results. Activity in pages promoting vaccine choice as a civil liberty increased in January 2015, April 2016, and January 2019 (t[76] = 11.33 [P < .001]; t[46] = 7.88 [P < .001]; and t[41] = 17.27 [P < .001], respectively). The 2019 increase was strongest in pages mentioning US states (t[41] = 19.06; P < .001). Discussion about vaccine safety decreased (rs[119] = −0.61; P < .001) while discussion about civil liberties increased (rs[119] = 0.33; Py < .001]). Page categories increasingly resembled one another (civil liberties: rs[119] = −0.50 [P < .001]; alternative medicine: rs[84] = −0.77 [P < .001]; conspiracy theories: rs[119] = −0.46 [P < .001]; morality: rs[106] = −0.65 [P < .001]; safety and efficacy: rs[119] = −0.46 [P < .001]). Conclusions. The ``Disneyland'' measles outbreak drew vaccine opposition into the political mainstream, followed by promotional campaigns conducted in pages framing vaccine refusal as a civil right. Political mobilization in state-focused pages followed in 2019. Public Health Implications. Policymakers should expect increasing attempts to alter state legislation associated with vaccine exemptions, potentially accompanied by fiercer lobbying from specific celebrities.}, annote = {(<b>Ranked in the top 0.03% of 16m research outputs by <a href="https://apha.altmetric.com/details/91589606#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {David A. Broniatowski and Amelia M. Jamison and Neil F. Johnson and Nicol{\'a}s Velasquez and Rhys Leahy and Nicholas Johnson Restrepo and Mark Dredze and Sandra C. Quinn}, date-added = {2020-10-02 00:01:03 -0400}, date-modified = {2020-10-02 00:02:33 -0400}, file = {https://doi.org/10.2105/AJPH.2020.305869}, journal = {American Journal of Public Health (AJPH)}, month = {October}, number = {S3}, pages = {S312--S318}, title = {Facebook Pages, the ``Disneyland'' Measles Outbreak, and Promotion of Vaccine Refusal as a Civil Right, 2009--2019}, volume = {110}, year = {2020} } (Ranked in the top 0.03% of 16m research outputs by Altmetric) Objectives. To understand changes in how Facebook pages frame vaccine opposition. Methods. We categorized 204 Facebook pages expressing vaccine opposition, extracting public posts through November 20, 2019. We analyzed posts from October 2009 through October 2019 to examine if pages' content was coalescing. Results. Activity in pages promoting vaccine choice as a civil liberty increased in January 2015, April 2016, and January 2019 (t[76] = 11.33 [P < .001]; t[46] = 7.88 [P < .001]; and t[41] = 17.27 [P < .001], respectively). The 2019 increase was strongest in pages mentioning US states (t[41] = 19.06; P < .001). Discussion about vaccine safety decreased (rs[119] = −0.61; P < .001) while discussion about civil liberties increased (rs[119] = 0.33; Py < .001]). Page categories increasingly resembled one another (civil liberties: rs[119] = −0.50 [P < .001]; alternative medicine: rs[84] = −0.77 [P < .001]; conspiracy theories: rs[119] = −0.46 [P < .001]; morality: rs[106] = −0.65 [P < .001]; safety and efficacy: rs[119] = −0.46 [P < .001]). Conclusions. The ``Disneyland'' measles outbreak drew vaccine opposition into the political mainstream, followed by promotional campaigns conducted in pages framing vaccine refusal as a civil right. Political mobilization in state-focused pages followed in 2019. Public Health Implications. Policymakers should expect increasing attempts to alter state legislation associated with vaccine exemptions, potentially accompanied by fiercer lobbying from specific celebrities.

		Justin Sech, Alexandra DeLucia, Anna L Buczak, Mark Dredze. Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest. EMNLP Workshop on Noisy User-generated Text (W-NUT), 2020. [PDF] [Bibtex] [Close] @inproceedings{Sech:2020tw, abstract = {We present CUT, a dataset for studying Civil Unrest on Twitter. Our dataset includes 4,381 tweets related to civil unrest, hand-annotated with information related to the study of civil unrest discussion and events. Our dataset is drawn from 42 countries from 2014 to 2019. We present baseline systems trained on this data for the identification of tweets related to civil unrest. We include a discussion of ethical issues related to research on this topic}, annote = {[<a href="https://github.com/AADeLucia/JHU-CUT"><span class="pub_link">Data</span></a>]}, author = {Justin Sech and Alexandra DeLucia and Anna L Buczak and Mark Dredze}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2020-09-28 23:51:52 -0400}, date-modified = {2020-10-22 11:06:20 -0400}, file = {2020_emnlp_wnut_cut.pdf}, keywords = {workshop}, title = {Civil Unrest on Twitter (CUT): A Dataset of Tweets to Support Research on Civil Unrest}, year = {2020} } [Data] We present CUT, a dataset for studying Civil Unrest on Twitter. Our dataset includes 4,381 tweets related to civil unrest, hand-annotated with information related to the study of civil unrest discussion and events. Our dataset is drawn from 42 countries from 2014 to 2019. We present baseline systems trained on this data for the identification of tweets related to civil unrest. We include a discussion of ethical issues related to research on this topic

		Shijie Wu, Mark Dredze. Do explicit alignments robustly improve multilingual encoders? Empirical Methods in Natural Language Processing (EMNLP), 2020. [PDF] [Bibtex] [Close] @inproceedings{Wu:2020zv, abstract = {Multilingual BERT (Devlin et al., 2019, mBERT), XLM-RoBERTa (Conneau et al., 2019, XLMR) and other unsupervised multilingual encoders can effectively learn crosslingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.}, author = {Shijie Wu and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2020-09-15 16:31:10 -0400}, date-modified = {2020-10-22 11:11:24 -0400}, file = {2020_emnlp_alignments.pdf}, title = {Do explicit alignments robustly improve multilingual encoders?}, year = {2020} } Multilingual BERT (Devlin et al., 2019, mBERT), XLM-RoBERTa (Conneau et al., 2019, XLMR) and other unsupervised multilingual encoders can effectively learn crosslingual representation. Explicit alignment objectives based on bitexts like Europarl or MultiUN have been shown to further improve these representations. However, word-level alignments are often suboptimal and such bitexts are unavailable for many languages. In this paper, we propose a new contrastive alignment objective that can better utilize such signal, and examine whether these previous alignment methods can be adapted to noisier sources of aligned data: a randomly sampled 1 million pair subset of the OPUS collection. Additionally, rather than report results on a single dataset with a single model run, we report the mean and standard derivation of multiple runs with different seeds, on four datasets and tasks. Our more extensive analysis finds that, while our new objective outperforms previous work, overall these methods do not improve performance with a more robust evaluation framework. Furthermore, the gains from using a better underlying model eclipse any benefits from alignment training. These negative results dictate more care in evaluating these methods and suggest limitations in applying explicit alignment objectives.

		Keith Harrigian, Carlos Aguirre, Mark Dredze. Do Models of Mental Health Based on Social Media Data Generalize? Findings of the Empirical Methods in Natural Language Processing (EMNLP), 2020. [PDF] [Bibtex] [Close] @inproceedings{Harrigian:2020uk, abstract = {Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples. However, an emerging body of literature has raised new concerns regarding the validity of these types of methods for use in clinical applications. To further understand the robustness of distantly supervised mental health models, we explore the generalization ability of machine learning classifiers trained to detect depression in individuals across multiple social media platforms. Our experiments not only reveal that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Based on these results, we enumerate recommendations for future mental health dataset construction.}, annote = {[<a href="https://github.com/kharrigian/emnlp-2020-mental-health-generalization"><span class="pub_link">Code</span></a>]}, author = {Keith Harrigian and Carlos Aguirre and Mark Dredze}, booktitle = {Findings of the Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2020-09-15 16:29:34 -0400}, date-modified = {2020-10-22 11:11:37 -0400}, file = {2020_emnlp_mental_health_domain_transfer.pdf}, title = {Do Models of Mental Health Based on Social Media Data Generalize?}, year = {2020} } [Code] Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples. However, an emerging body of literature has raised new concerns regarding the validity of these types of methods for use in clinical applications. To further understand the robustness of distantly supervised mental health models, we explore the generalization ability of machine learning classifiers trained to detect depression in individuals across multiple social media platforms. Our experiments not only reveal that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Based on these results, we enumerate recommendations for future mental health dataset construction.

		Amelia M Jamison, David A Broniatowski, Mark Dredze, Anu Sangraula, Michael C Smith, Sandra C Quinn. Not just conspiracy theories: vaccine opponents and pro-ponents add to the COVID-19 `infodemic' on Twitter. The Harvard Kennedy School (HKS) Misinformation Review, 2020. [PDF] [Bibtex] [Close] @article{Jamison:2020ew, abstract = {In February 2020, the World Health Organization announced an `infodemic' --- a deluge of both accurate and inaccurate health information --- that accompanied the global pandemic of COVID-19 as a major challenge to effective health communication. We assessed content from the most active vaccine accounts on Twitter to understand how existing online communities contributed to the `infodemic' during the early stages of the pandemic. While we expected vaccine opponents to share misleading information about COVID-19, we also found vaccine proponents were not immune to spreading less reliable claims. In both groups, the single largest topic of discussion consisted of narratives comparing COVID-19 to other diseases like seasonal influenza, often downplaying the severity of the novel coronavirus. When considering the scope of the `infodemic,' researchers and health communicators must move beyond focusing on known bad actors and the most egregious types of misinformation to scrutinize the full spectrum of information --- from both reliable and unreliable sources --- that the public is likely to encounter online.}, author = {Amelia M Jamison and David A Broniatowski and Mark Dredze and Anu Sangraula and Michael C Smith and Sandra C Quinn}, date-added = {2020-09-12 22:43:32 -0400}, date-modified = {2020-09-12 22:44:46 -0400}, file = {https://doi.org/10.37016/mr-2020-38}, journal = {The Harvard Kennedy School (HKS) Misinformation Review}, title = {Not just conspiracy theories: vaccine opponents and pro-ponents add to the COVID-19 `infodemic' on Twitter}, year = {2020} } In February 2020, the World Health Organization announced an `infodemic' --- a deluge of both accurate and inaccurate health information --- that accompanied the global pandemic of COVID-19 as a major challenge to effective health communication. We assessed content from the most active vaccine accounts on Twitter to understand how existing online communities contributed to the `infodemic' during the early stages of the pandemic. While we expected vaccine opponents to share misleading information about COVID-19, we also found vaccine proponents were not immune to spreading less reliable claims. In both groups, the single largest topic of discussion consisted of narratives comparing COVID-19 to other diseases like seasonal influenza, often downplaying the severity of the novel coronavirus. When considering the scope of the `infodemic,' researchers and health communicators must move beyond focusing on known bad actors and the most egregious types of misinformation to scrutinize the full spectrum of information --- from both reliable and unreliable sources --- that the public is likely to encounter online.

		John W Ayers, Eric C Leas, Derek C Johnson, Adam Poliak, Benjamin M Althouse, Mark Dredze, Alicia L Nobles. Internet Searches for Acute Anxiety During the Early Stages of the COVID-19 Pandemic. JAMA Internal Medicine, 2020. [PDF] [Bibtex] [Close] @article{Ayers:2020fb, abstract = {There is widespread concern that the coronavirus disease 2019 (COVID-19) pandemic may harm population mental health, chiefly owing to anxiety about the disease and its societal fallout. But traditional population mental health surveillance (eg, telephone surveys, medical records) is time consuming, expensive, and may miss persons who do not participate or seek care. To evaluate the association of COVID-19 with anxiety on a population basis, we examined internet searches indicative of acute anxiety during the early stages of the COVID-19 pandemic.}, annote = {(<b>Ranked in the top 0.1% of 15m research outputs by <a href="https://jamanetwork.altmetric.com/details/88793728#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W. Ayers and Eric C. Leas and Derek C. Johnson and Adam Poliak and Benjamin M. Althouse and Mark Dredze and Alicia L. Nobles}, date-added = {2020-08-25 17:08:59 -0400}, date-modified = {2020-08-25 17:10:12 -0400}, file = {https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2769543}, journal = {JAMA Internal Medicine}, title = {Internet Searches for Acute Anxiety During the Early Stages of the COVID-19 Pandemic}, year = {2020} } (Ranked in the top 0.1% of 15m research outputs by Altmetric) There is widespread concern that the coronavirus disease 2019 (COVID-19) pandemic may harm population mental health, chiefly owing to anxiety about the disease and its societal fallout. But traditional population mental health surveillance (eg, telephone surveys, medical records) is time consuming, expensive, and may miss persons who do not participate or seek care. To evaluate the association of COVID-19 with anxiety on a population basis, we examined internet searches indicative of acute anxiety during the early stages of the COVID-19 pandemic.

		Alicia L Nobles, Eric C Leas, Mark Dredze, Christopher A Longhurst, Davey Smith, John W Ayers. Crowd-Diagnosis: When Patients Turn to Social Media to Obtain Clinical Diagnoses. Annual Symposium of the American Medical Informatics Association (AMIA), 2020. [PDF] [Bibtex] [Close] @inproceedings{Nobles:2020qc, abstract = {In a 2019 study published in the Journal of the American Medical Association, we coined the term ``crowd-diagnosis'' to describe when people turn to public social media to obtain a diagnosis. Using a case study of sexually transmitted infections, we found thousands requesting crowd-diagnoses, commonly posting pictures to aid in diagnosis and sometimes seeking diagnoses to overrule a doctor's diagnosis.1 Our goal is to extend this work to a more general setting by focusing on a popular social media forum dedicated to obtaining feedback on medical conditions and answering (RQ1) who requests crowd-diagnoses, (RQ2) for what health issues are crowd-diagnoses more frequently sought, and (RQ3) what crowd-diagnosis requests are most likely to receive a response?}, author = {Alicia L. Nobles and Eric C. Leas and Mark Dredze and Christopher A. Longhurst and Davey Smith and John W. Ayers}, booktitle = {Annual Symposium of the American Medical Informatics Association (AMIA)}, date-added = {2020-07-16 14:09:22 -0400}, date-modified = {2020-10-22 11:11:51 -0400}, file = {2020_amia_crowd_diagnosis.pdf}, keywords = {abstract}, title = {Crowd-Diagnosis: When Patients Turn to Social Media to Obtain Clinical Diagnoses}, year = {2020} } In a 2019 study published in the Journal of the American Medical Association, we coined the term ``crowd-diagnosis'' to describe when people turn to public social media to obtain a diagnosis. Using a case study of sexually transmitted infections, we found thousands requesting crowd-diagnoses, commonly posting pictures to aid in diagnosis and sometimes seeking diagnoses to overrule a doctor's diagnosis.1 Our goal is to extend this work to a more general setting by focusing on a popular social media forum dedicated to obtaining feedback on medical conditions and answering (RQ1) who requests crowd-diagnoses, (RQ2) for what health issues are crowd-diagnoses more frequently sought, and (RQ3) what crowd-diagnosis requests are most likely to receive a response?

		Alicia L Nobles, Eric C Leas, Seth Noar, Mark Dredze, Carl A Latkin, Steffanie A Strathdee, John W Ayers. Automated image analysis of instagram posts: Implications for risk perception and communication in public health using a case study of #HIV. PLoS One, 2020. [PDF] [Bibtex] [Close] @article{Nobles:2020lo, abstract = {People's perceptions about health risks, including their risk of acquiring HIV, are impacted in part by who they see portrayed as at risk in the media. Viewers in these cases are asking themselves ``do those portrayed as at risk look like me?'' An accurate perception of risk is critical for high-risk populations, who already suffer from a range of health disparities. Yet, to date no study has evaluated the demographic representation of health-related content from social media. The objective of this case study was to apply automated image recognition software to examine the demographic profile of faces in Instagram posts containing the hashtag #HIV (obtained from January 2017 through July 2018) and compare this to the demographic breakdown of those most at risk of a new HIV diagnosis (estimates of incidence of new HIV diagnoses from the 2017 US Centers for Disease Control HIV Surveillance Report). We discovered 26,766 Instagram posts containing #HIV authored in American English with 10,036 (37.5%) containing a detectable human face with a total of 18,227 faces (mean = 1.8, standard deviation [SD] = 1.7). Faces skewed older (47% vs. 11% were 35--39 years old), more female (41% vs. 19%), more white (43% vs. 26%), less black (31% vs 44%), and less Hispanic (13% vs 25%) on Instagram than for new HIV diagnoses. The results were similarly skewed among the subset of #HIV posts mentioning pre-exposure prophylaxis (PrEP). This disparity might lead Instagram users to potentially misjudge their own HIV risk and delay prophylactic behaviors. Social media managers and organic advocates should be encouraged to share images that better reflect at-risk populations so as not to further marginalize these populations and to reduce disparity in risk perception. Replication of our methods for additional diseases, such as cancer, is warranted to discover and address other misrepresentations.}, author = {Alicia L. Nobles and Eric C. Leas and Seth Noar and Mark Dredze and Carl A. Latkin and Steffanie A. Strathdee and John W. Ayers}, date-added = {2020-06-01 15:58:38 -0400}, date-modified = {2020-07-08 21:23:12 -0400}, file = {https://doi.org/10.1371/journal.pone.0231155}, journal = {PLoS One}, month = {May 4}, title = {Automated image analysis of instagram posts: Implications for risk perception and communication in public health using a case study of #HIV}, year = {2020} } People's perceptions about health risks, including their risk of acquiring HIV, are impacted in part by who they see portrayed as at risk in the media. Viewers in these cases are asking themselves ``do those portrayed as at risk look like me?'' An accurate perception of risk is critical for high-risk populations, who already suffer from a range of health disparities. Yet, to date no study has evaluated the demographic representation of health-related content from social media. The objective of this case study was to apply automated image recognition software to examine the demographic profile of faces in Instagram posts containing the hashtag #HIV (obtained from January 2017 through July 2018) and compare this to the demographic breakdown of those most at risk of a new HIV diagnosis (estimates of incidence of new HIV diagnoses from the 2017 US Centers for Disease Control HIV Surveillance Report). We discovered 26,766 Instagram posts containing #HIV authored in American English with 10,036 (37.5%) containing a detectable human face with a total of 18,227 faces (mean = 1.8, standard deviation [SD] = 1.7). Faces skewed older (47% vs. 11% were 35--39 years old), more female (41% vs. 19%), more white (43% vs. 26%), less black (31% vs 44%), and less Hispanic (13% vs 25%) on Instagram than for new HIV diagnoses. The results were similarly skewed among the subset of #HIV posts mentioning pre-exposure prophylaxis (PrEP). This disparity might lead Instagram users to potentially misjudge their own HIV risk and delay prophylactic behaviors. Social media managers and organic advocates should be encouraged to share images that better reflect at-risk populations so as not to further marginalize these populations and to reduce disparity in risk perception. Replication of our methods for additional diseases, such as cancer, is warranted to discover and address other misrepresentations.

		Theodore L Caputi, John W Ayers, Mark Dredze, Nicholas Suplina, Sarah Burd-Sharps. Collateral Crises of Gun Preparation and the COVID-19 Pandemic: Infodemiology Study. JMIR Public Health and Surveillance, 2020;6(2):e19369. [PDF] [Bibtex] [Close] @article{Caputi:2020yt, abstract = {Background: In the past, national emergencies in the United States have resulted in increased gun preparation (ie, purchasing new guns or removing guns from storage); in turn, these gun actions have effected increases in firearm injuries and deaths. Objective: The aim of this paper was to assess the extent to which interest in gun preparation has increased amid the coronavirus disease (COVID-19) pandemic using data from Google searches related to purchasing and cleaning guns. Methods: We fit an Autoregressive Integrated Moving Average (ARIMA) model over Google search data from January 2004 up to the week that US President Donald Trump declared COVID-19 a national emergency. We used this model to forecast Google search volumes, creating a counterfactual of the number of gun preparation searches we would expect if the COVID-19 pandemic had not occurred, and reported observed deviations from this counterfactual. Results: Google searches related to preparing guns have surged to unprecedented levels, approximately 40% higher than previously reported spikes following the Sandy Hook, CT and Parkland, FL shootings and 158% (95% CI 73-270) greater than would be expected if the COVID-19 pandemic had not occurred. In absolute terms, approximately 2.1 million searches related to gun preparation were performed over just 34 days. States severely affected by COVID-19 appear to have some of the greatest increases in the number of searches. Conclusions: Our results corroborate media reports that gun purchases are increasing amid the COVID-19 pandemic and provide more precise geographic and temporal trends. Policy makers should invest in disseminating evidence-based educational tools about gun risks and safety procedures to avert a collateral public health crisis.}, author = {Theodore L Caputi and John W Ayers and Mark Dredze and Nicholas Suplina and Sarah Burd-Sharps}, date-added = {2020-05-28 13:06:40 -0400}, date-modified = {2020-10-22 11:08:47 -0400}, file = {https://doi.org/10.2196/19369}, journal = {JMIR Public Health and Surveillance}, month = {Apr-Jun}, number = {2}, pages = {e19369}, title = {Collateral Crises of Gun Preparation and the COVID-19 Pandemic: Infodemiology Study}, volume = {6}, year = {2020} } Background: In the past, national emergencies in the United States have resulted in increased gun preparation (ie, purchasing new guns or removing guns from storage); in turn, these gun actions have effected increases in firearm injuries and deaths. Objective: The aim of this paper was to assess the extent to which interest in gun preparation has increased amid the coronavirus disease (COVID-19) pandemic using data from Google searches related to purchasing and cleaning guns. Methods: We fit an Autoregressive Integrated Moving Average (ARIMA) model over Google search data from January 2004 up to the week that US President Donald Trump declared COVID-19 a national emergency. We used this model to forecast Google search volumes, creating a counterfactual of the number of gun preparation searches we would expect if the COVID-19 pandemic had not occurred, and reported observed deviations from this counterfactual. Results: Google searches related to preparing guns have surged to unprecedented levels, approximately 40% higher than previously reported spikes following the Sandy Hook, CT and Parkland, FL shootings and 158% (95% CI 73-270) greater than would be expected if the COVID-19 pandemic had not occurred. In absolute terms, approximately 2.1 million searches related to gun preparation were performed over just 34 days. States severely affected by COVID-19 appear to have some of the greatest increases in the number of searches. Conclusions: Our results corroborate media reports that gun purchases are increasing amid the COVID-19 pandemic and provide more precise geographic and temporal trends. Policy makers should invest in disseminating evidence-based educational tools about gun risks and safety procedures to avert a collateral public health crisis.

		Shijie Wu, Mark Dredze. Are All Languages Created Equal in Multilingual BERT? ACL Workshop on Representation Learning for NLP (RepL4NLP), 2020. [PDF] [Bibtex] [Close] @inproceedings{Wu:2020sy, abstract = {Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals. However, these evaluations have focused on cross-lingual transfer with high-resource languages, covering only a third of the languages covered by mBERT. We explore how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance. We consider three tasks: Named Entity Recognition (99 languages), Part-of-speech Tagging and Dependency Parsing (54 languages each). mBERT does better than or comparable to baselines on high resource languages but does much worse for low resource languages. Furthermore, monolingual BERT models for these languages do even worse. Paired with similar languages, the performance gap between monolingual BERT and mBERT can be narrowed. We find that better models for low resource languages require more efficient pretraining techniques or more data.}, annote = {(<b>Best Long Paper Award</b>)}, author = {Shijie Wu and Mark Dredze}, booktitle = {ACL Workshop on Representation Learning for NLP (RepL4NLP)}, date-added = {2020-05-08 12:22:17 -0400}, date-modified = {2022-11-23 10:48:49 -0500}, file = {https://www.aclweb.org/anthology/2020.repl4nlp-1.16/}, keywords = {workshop}, title = {Are All Languages Created Equal in Multilingual BERT?}, year = {2020} } (Best Long Paper Award) Multilingual BERT (mBERT) trained on 104 languages has shown surprisingly good cross-lingual performance on several NLP tasks, even without explicit cross-lingual signals. However, these evaluations have focused on cross-lingual transfer with high-resource languages, covering only a third of the languages covered by mBERT. We explore how mBERT performs on a much wider set of languages, focusing on the quality of representation for low-resource languages, measured by within-language performance. We consider three tasks: Named Entity Recognition (99 languages), Part-of-speech Tagging and Dependency Parsing (54 languages each). mBERT does better than or comparable to baselines on high resource languages but does much worse for low resource languages. Furthermore, monolingual BERT models for these languages do even worse. Paired with similar languages, the performance gap between monolingual BERT and mBERT can be narrowed. We find that better models for low resource languages require more efficient pretraining techniques or more data.

		Michael Liu, Theodore L Caputi, Mark Dredze, Aaron S Kesselheim, John W Ayers. Internet Searches for Unproven COVID-19 Therapies in the United States. JAMA Internal Medicine, 2020. [PDF] [Bibtex] [Close] @article{Liu:2020bx, abstract = {There are no highly effective prescription drug therapies supported by any reliable evidence for the ongoing coronavirus disease 2019 (COVID-19) pandemic of severe acute respiratory syndrome coronavirus 2. However, fears among the public can lead to searches for unproven therapies. Therefore, when several high-profile figures, including entrepreneur Elon Musk and President Donald Trump, endorsed the use of chloroquine, a malarial prophylaxis drug, and hydroxychloroquine (with the antibiotic azithromycin), a lupus and rheumatoid arthritis treatment, to treat COVID-19, it drew massive public attention that could shape individual decision-making. This attention is especially troublesome because chloroquine and hydroxychloroquine (1) are thus far only known to inhibit severe acute respiratory syndrome coronavirus 2 in vitro,1 (2) have potential cardiovascular toxic effects,2 and (3) can be confused with commercially available chloroquine-containing products, such as aquarium cleaner. Poisonings, including 1 fatality, attributed to persons taking chloroquine to prevent or treat COVID-19 without the supervision of a licensed physician have already been reported.3 To better understand the scope of demand for these drugs, we examined internet searches indicative of shopping for chloroquine and hydroxychloroquine.}, annote = {(<b>Ranked in the top 0.2% of 15m research outputs by <a href="https://jamanetwork.altmetric.com/details/80772312#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Michael Liu and Theodore L. Caputi and Mark Dredze and Aaron S. Kesselheim and John W. Ayers}, date-added = {2020-04-29 17:03:18 -0400}, date-modified = {2020-07-08 21:23:47 -0400}, file = {https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2765361}, journal = {JAMA Internal Medicine}, title = {Internet Searches for Unproven COVID-19 Therapies in the United States}, year = {2020} } (Ranked in the top 0.2% of 15m research outputs by Altmetric) There are no highly effective prescription drug therapies supported by any reliable evidence for the ongoing coronavirus disease 2019 (COVID-19) pandemic of severe acute respiratory syndrome coronavirus 2. However, fears among the public can lead to searches for unproven therapies. Therefore, when several high-profile figures, including entrepreneur Elon Musk and President Donald Trump, endorsed the use of chloroquine, a malarial prophylaxis drug, and hydroxychloroquine (with the antibiotic azithromycin), a lupus and rheumatoid arthritis treatment, to treat COVID-19, it drew massive public attention that could shape individual decision-making. This attention is especially troublesome because chloroquine and hydroxychloroquine (1) are thus far only known to inhibit severe acute respiratory syndrome coronavirus 2 in vitro,1 (2) have potential cardiovascular toxic effects,2 and (3) can be confused with commercially available chloroquine-containing products, such as aquarium cleaner. Poisonings, including 1 fatality, attributed to persons taking chloroquine to prevent or treat COVID-19 without the supervision of a licensed physician have already been reported.3 To better understand the scope of demand for these drugs, we examined internet searches indicative of shopping for chloroquine and hydroxychloroquine.

		Dian Hu, Christine Martin, Mark Dredze, David A Broniatowski. Chinese social media suggest decreased vaccine acceptance in China: An observational study on Weibo following the 2018 Changchun Changsheng vaccine incident. Vaccine, 2020;38(13):2764-2770. [PDF] [Bibtex] [Close] @article{Hu:2020kn, abstract = {China is home to the world's largest population, with the potential for disease outbreaks to affect billions. However, knowledge of Chinese vaccine acceptance trends is limited. In this work we use Chinese social media to track responses to the recent Changchun Changsheng Biotechnology vaccine scandal, which led to extensive discussion regarding vaccine safety and regulation in China. We analyzed messages from the popular Chinese microblogging platform Sina Weibo in July 2018 (n = 11, 085), and August 2019 (n = 500). Thus, we consider Chinese vaccine acceptance, before, during, immediately after, and one year after the scandal occurred. Results show that expressions of distrust in government pertaining to vaccines increased significantly during and immediately after the scandal. Self-reports of vaccination occurred both before, and one year after, the scandal; however, these self-reports changed from positive endorsements of vaccination to concerns about vaccine harms. Data suggest that expressed support for vaccine acceptance in China May be decreasing.}, author = {Dian Hu and Christine Martin and Mark Dredze and David A. Broniatowski}, date-added = {2020-04-20 18:12:07 -0400}, date-modified = {2020-04-20 18:13:22 -0400}, file = {https://doi.org/10.1016/j.vaccine.2020.02.027}, journal = {Vaccine}, month = {17 March}, number = {13}, pages = {2764-2770}, title = {Chinese social media suggest decreased vaccine acceptance in China: An observational study on Weibo following the 2018 Changchun Changsheng vaccine incident.}, volume = {38}, year = {2020} } China is home to the world's largest population, with the potential for disease outbreaks to affect billions. However, knowledge of Chinese vaccine acceptance trends is limited. In this work we use Chinese social media to track responses to the recent Changchun Changsheng Biotechnology vaccine scandal, which led to extensive discussion regarding vaccine safety and regulation in China. We analyzed messages from the popular Chinese microblogging platform Sina Weibo in July 2018 (n = 11, 085), and August 2019 (n = 500). Thus, we consider Chinese vaccine acceptance, before, during, immediately after, and one year after the scandal occurred. Results show that expressions of distrust in government pertaining to vaccines increased significantly during and immediately after the scandal. Self-reports of vaccination occurred both before, and one year after, the scandal; however, these self-reports changed from positive endorsements of vaccination to concerns about vaccine harms. Data suggest that expressed support for vaccine acceptance in China May be decreasing.

		David A Broniatowski, Sandra C Quinn, Mark Dredze, Amelia M Jamison. Vaccine Communication as Weaponized Identity Politics. American Journal of Public Health (AJPH), 2020;110(5):617--618. [PDF] [Bibtex] [Close] @article{Broniatowski:2020fx, author = {David A. Broniatowski and Sandra C. Quinn and Mark Dredze and Amelia M. Jamison}, date-added = {2020-04-14 14:20:36 -0400}, date-modified = {2020-04-14 14:25:09 -0400}, file = {https://doi.org/10.2105/AJPH.2020.305616}, journal = {American Journal of Public Health (AJPH)}, number = {5}, pages = {617--618}, title = {Vaccine Communication as Weaponized Identity Politics}, volume = {110}, year = {2020} }

		Elliot Schumacher, Andriy Mulyar, Mark Dredze. Clinical Concept Linking with Contextualized Neural Representations. Association for Computational Linguistics (ACL), 2020. [PDF] [Bibtex] [Close] @inproceedings{Schumacher:2020hq, abstract = {In traditional approaches to entity linking, linking decisions are based on three sources of information -- the similarity of the mention string to an entity's name, the similarity of the context of the document to the entity, and broader information about the knowledge base (KB). In some domains, there is little contextual information present in the KB and thus we rely more heavily on mention string similarity. We consider one example of this, concept linking, which seeks to link mentions of medical concepts to a medical concept ontology. We propose an approach to concept linking that leverages recent work in contextualized neural models, such as ELMo (Peters et al. 2018), which create a token representation that integrates the surrounding context of the mention and concept name. We find a neural ranking approach paired with contextualized embeddings provides gains over a competitive baseline (Leaman et al. 2013). Additionally, we find that a pre-training step using synonyms from the ontology offers a useful initialization for the ranker.}, author = {Elliot Schumacher and Andriy Mulyar and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2020-04-04 22:03:37 -0400}, date-modified = {2020-07-08 21:24:42 -0400}, file = {https://www.aclweb.org/anthology/2020.acl-main.760/}, title = {Clinical Concept Linking with Contextualized Neural Representations}, year = {2020} } In traditional approaches to entity linking, linking decisions are based on three sources of information -- the similarity of the mention string to an entity's name, the similarity of the context of the document to the entity, and broader information about the knowledge base (KB). In some domains, there is little contextual information present in the KB and thus we rely more heavily on mention string similarity. We consider one example of this, concept linking, which seeks to link mentions of medical concepts to a medical concept ontology. We propose an approach to concept linking that leverages recent work in contextualized neural models, such as ELMo (Peters et al. 2018), which create a token representation that integrates the surrounding context of the mention and concept name. We find a neural ranking approach paired with contextualized embeddings provides gains over a competitive baseline (Leaman et al. 2013). Additionally, we find that a pre-training step using synonyms from the ontology offers a useful initialization for the ranker.

		David Mueller, Nicholas Andrews, Mark Dredze. Sources of Transfer in Multilingual Named Entity Recognition. Association for Computational Linguistics (ACL), 2020. [PDF] [Bibtex] [Close] @inproceedings{Mueller:2020lr, abstract = {Named-entities are inherently multilingual, and annotations in any given language may be limited. This motivates us to consider polyglot named-entity recognition (NER), where one model is trained using annotated data drawn from more than one language. However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data. The starting point of this paper is a simple solution to this problem, in which polyglot models are fine-tuned on monolingual data to consistently and significantly outperform their monolingual counterparts. To explain this phenomena, we explore the sources of multilingual transfer in polyglot NER models and examine the weight structure of polyglot models compared to their monolingual counterparts. We find that polyglot models efficiently share many parameters across languages and that fine-tuning may utilize a large number of those parameters.}, author = {David Mueller and Nicholas Andrews and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2020-04-04 22:02:26 -0400}, date-modified = {2020-07-08 21:25:10 -0400}, file = {https://www.aclweb.org/anthology/2020.acl-main.720/}, title = {Sources of Transfer in Multilingual Named Entity Recognition}, year = {2020} } Named-entities are inherently multilingual, and annotations in any given language may be limited. This motivates us to consider polyglot named-entity recognition (NER), where one model is trained using annotated data drawn from more than one language. However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data. The starting point of this paper is a simple solution to this problem, in which polyglot models are fine-tuned on monolingual data to consistently and significantly outperform their monolingual counterparts. To explain this phenomena, we explore the sources of multilingual transfer in polyglot NER models and examine the weight structure of polyglot models compared to their monolingual counterparts. We find that polyglot models efficiently share many parameters across languages and that fine-tuning may utilize a large number of those parameters.

		Manya Wadhwa, Silvio Amir, Mark Dredze. Aligning Public Feedback To Requests For Comments On Regulations.gov. International Conference on Web and Social Media (ICWSM), 2020. [PDF] [Bibtex] [Close] @inproceedings{Wadhwa:2020lo, abstract = {In an effort to democratize the regulatory process, the United States Federal government created regulations.gov, a portal through which federal agencies can share proposed regulations and solicit feedback from the public. A proposed regulation will contain several requests for feedback on specific topics, and the public can then submit comments in response. While this reduces barriers to soliciting feedback, it still leaves regulators with a challenge: how to produce a summary and incorporate feedback from the sometimes tens of thousands of submitted comments. We propose an information retrieval system by which comments are aligned to specific regulatory requests. We evaluate several measures of semantic similarity for matching comments to information requests. We evaluate our proposed system over a dataset containing several regulations proposed for electronic cigarettes, an issue that energized tens of thousands of comments in response.}, author = {Manya Wadhwa and Silvio Amir and Mark Dredze}, booktitle = {International Conference on Web and Social Media (ICWSM)}, date-added = {2020-03-20 14:54:46 -0400}, date-modified = {2020-03-20 14:55:23 -0400}, file = {2020_icwsm_regulations_gov.pdf}, title = {Aligning Public Feedback To Requests For Comments On Regulations.gov}, year = {2020} } In an effort to democratize the regulatory process, the United States Federal government created regulations.gov, a portal through which federal agencies can share proposed regulations and solicit feedback from the public. A proposed regulation will contain several requests for feedback on specific topics, and the public can then submit comments in response. While this reduces barriers to soliciting feedback, it still leaves regulators with a challenge: how to produce a summary and incorporate feedback from the sometimes tens of thousands of submitted comments. We propose an information retrieval system by which comments are aligned to specific regulatory requests. We evaluate several measures of semantic similarity for matching comments to information requests. We evaluate our proposed system over a dataset containing several regulations proposed for electronic cigarettes, an issue that energized tens of thousands of comments in response.

		Alicia L Nobles, Eric C Leas, Mark Dredze, John W Ayers. Examining Peer-to-Peer and Patient-Provider Interactions on a Social Media Community Facilitating Ask the Doctor Services. International Conference on Web and Social Media (ICWSM), 2020. [PDF] [Bibtex] [Close] @inproceedings{Nobles:2020ht, abstract = {Ask the Doctor (AtD) services provide patients the opportunity to seek medical advice using online platforms. While these services represent a new mode of healthcare delivery, study of these online health communities and how they are used is limited. In particular, it is unknown if these platforms replicate existing barriers and biases in traditional healthcare delivery across demographic groups. We present an analy- sis of AskDocs, a subreddit that functions as a public AtD platform on social media. We examine the demographics of users, the health topics discussed, if biases present in offline healthcare settings exist on this platform, and how empathy is expressed in interactions between users and physicians. Our findings suggest a number of implications to enhance and support peer-to-peer and patient-provider interactions on online platforms.}, author = {Alicia L Nobles and Eric C Leas and Mark Dredze and John W Ayers}, booktitle = {International Conference on Web and Social Media (ICWSM)}, date-added = {2020-03-18 00:11:25 -0400}, date-modified = {2020-03-18 00:14:14 -0400}, file = {2020_icwsm_peer_to_peer_doctor_services.pdf}, title = {Examining Peer-to-Peer and Patient-Provider Interactions on a Social Media Community Facilitating Ask the Doctor Services}, year = {2020} } Ask the Doctor (AtD) services provide patients the opportunity to seek medical advice using online platforms. While these services represent a new mode of healthcare delivery, study of these online health communities and how they are used is limited. In particular, it is unknown if these platforms replicate existing barriers and biases in traditional healthcare delivery across demographic groups. We present an analy- sis of AskDocs, a subreddit that functions as a public AtD platform on social media. We examine the demographics of users, the health topics discussed, if biases present in offline healthcare settings exist on this platform, and how empathy is expressed in interactions between users and physicians. Our findings suggest a number of implications to enhance and support peer-to-peer and patient-provider interactions on online platforms.

		Joshua Dredze, Mark Dredze. Theoretical Orientations: Measuring Online Information Seeking from Google Search Queries. Association for Psychological Science Annual Convention (APS) (Conference canceled), 2020. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2020a, author = {Joshua Dredze and Mark Dredze}, booktitle = {Association for Psychological Science Annual Convention (APS) (Conference canceled)}, date-added = {2020-03-04 22:40:30 -0500}, date-modified = {2020-06-04 16:20:57 -0400}, file = {2020_aps_poster_dredze.pdf}, keywords = {abstract}, title = {Theoretical Orientations: Measuring Online Information Seeking from Google Search Queries}, year = {2020} }

		Alicia L Nobles, Eric C Leas, Carl A Latkin, Mark Dredze, Steffanie A Strathdee, John W Ayers. #HIV: Alignment of HIV-Related Visual Content on Instagram with Public Health Priorities in the US. AIDS and Behavior, 2020. [PDF] [Bibtex] [Close] @article{Nobles:2020xh, abstract = {Instagram, with more than 1 billion monthly users, is the go-to social media platform to chronicle one's life via images, but how are people using the platform to present visual content about HIV? We analyzed public Instagram posts containing the hashtag ``#HIV'' (because they are self-tagged as related to HIV) between January 2017 and July 2018. We described the prevalence of co-occurring hashtags and explored thematic concepts in the images using automated image recognition and topic modeling. Twenty-eight percent of all #HIV posts included hashtags focused on awareness, followed by LGBTQ (24.5%) and living with HIV (17.9%). However, specific strategies were rarely cited, including testing (10.8%), treatment (10.3%), PrEP (6.2%) and condoms (4.1%). Image analyses revealed 44.5% of posts included infographics followed by people (21.3%) thereby humanizing HIV and stigmatized populations and promoting community mobilization. Novel content such as the handwriting image-theme (3.8%) where posters shared their HIV test results appeared. We discuss how this visual content aligns with public health priorities to reduce HIV in the US and the novel, organic messages that public health could help amplify.}, author = {Alicia L. Nobles and Eric C. Leas and Carl A. Latkin and Mark Dredze and Steffanie A. Strathdee and John W. Ayers}, date-added = {2020-01-13 13:39:10 -0500}, date-modified = {2020-10-22 11:09:28 -0400}, file = {https://doi.org/10.1007/s10461-019-02765-5}, journal = {AIDS and Behavior}, title = {#HIV: Alignment of HIV-Related Visual Content on Instagram with Public Health Priorities in the US}, year = {2020} } Instagram, with more than 1 billion monthly users, is the go-to social media platform to chronicle one's life via images, but how are people using the platform to present visual content about HIV? We analyzed public Instagram posts containing the hashtag ``#HIV'' (because they are self-tagged as related to HIV) between January 2017 and July 2018. We described the prevalence of co-occurring hashtags and explored thematic concepts in the images using automated image recognition and topic modeling. Twenty-eight percent of all #HIV posts included hashtags focused on awareness, followed by LGBTQ (24.5%) and living with HIV (17.9%). However, specific strategies were rarely cited, including testing (10.8%), treatment (10.3%), PrEP (6.2%) and condoms (4.1%). Image analyses revealed 44.5% of posts included infographics followed by people (21.3%) thereby humanizing HIV and stigmatized populations and promoting community mobilization. Novel content such as the handwriting image-theme (3.8%) where posters shared their HIV test results appeared. We discuss how this visual content aligns with public health priorities to reduce HIV in the US and the novel, organic messages that public health could help amplify.

		2019 (20 Publications)
		Benjamin M Althouse, Daniel M Weinberger, Samuel V Scarpino, Virginia E Pitzer, John W Ayers, Edward Wenger, Isaac Chun-Hai Fung, Mark Dredze, Hao Hu. Google searches accurately forecast RSV hospitalizations. Chest Infections, 2019. [PDF] [Bibtex] [Close] @article{Althouse:2019il, abstract = {Hospitalization of children with respiratory syncytial virus (RSV) is common and costly. Traditional sources of hospitalization data, useful for public health decision-makers and physicians to make decisions, are themselves costly to acquire and are subject to delays from gathering to publication. Here we use Google searches for RSV as a proxy for RSV hospitalizations.}, author = {Benjamin M Althouse and Daniel M Weinberger and Samuel V Scarpino and Virginia E Pitzer and John W Ayers and Edward Wenger and Isaac Chun-Hai Fung and Mark Dredze and Hao Hu}, date-added = {2020-04-29 00:51:30 -0400}, date-modified = {2021-07-13 23:14:12 -0400}, file = {https://doi.org/10.1016/j.chest.2019.02.077}, journal = {Chest Infections}, month = {April}, number = {4}, title = {Google searches accurately forecast RSV hospitalizations}, volume = {155}, year = {2019} } Hospitalization of children with respiratory syncytial virus (RSV) is common and costly. Traditional sources of hospitalization data, useful for public health decision-makers and physicians to make decisions, are themselves costly to acquire and are subject to delays from gathering to publication. Here we use Google searches for RSV as a proxy for RSV hospitalizations.

		Amelia M Jamison, David A Broniatowski, Mark Dredze, Zach Wood-Doughty, Dure-Aden Khan, Sandra Crouse-Quinn. Vaccine-related advertising in the Facebook Ad Archive. Vaccine, 2019. [PDF] [Bibtex] [Close] @article{Jamison:2019pf, abstract = {Background. In 2018, Facebook introduced Ad Archive as a platform to improve transparency in advertisements related to politics and ``issues of national importance.'' Vaccine-related Facebook advertising is publicly available for the first time. After measles outbreaks in the US brought renewed attention to the possible role of Facebook advertising in the spread of vaccine-related misinformation, Facebook announced steps to limit vaccine-related misinformation. This study serves as a baseline of advertising before new policies went into effect. Methods. Using the keyword `vaccine', we searched Ad Archive on December 13, 2018 and again on February 22, 2019. We exported data for 505 advertisements. A team of annotators sorted advertisements by content: pro-vaccine, anti-vaccine, not relevant. We also conducted a thematic analysis of major advertising themes. We ran Mann-Whitney U tests to compare ad performance metrics. Results. 309 advertisements were included in analysis with 163 (53%) pro-vaccine advertisements and 145 (47%) anti-vaccine advertisements. Despite a similar number of advertisements, the median number of ads per buyer was significantly higher for anti-vaccine ads. First time buyers are less likely to complete disclosure information and risk ad removal. Thematically, anti-vaccine advertising messages are relatively uniform and emphasize vaccine harms (55%). In contrast, pro-vaccine advertisements come from a diverse set of buyers (83 unique) with varied goals including promoting vaccination (49%), vaccine related philanthropy (15%), and vaccine related policy (14%). Conclusions. A small set of anti-vaccine advertisement buyers have leveraged Facebook advertisements to reach targeted audiences. By deeming all vaccine-related content an issue of ``national importance,'' Facebook has further the politicized vaccines. The implementation of a blanket disclosure policy also limits which ads can successfully run on Facebook. Improving transparency and limiting misinformation should not be separate goals. Public health communication efforts should consider the potential impact on Facebook users' vaccine attitudes and behaviors.}, annote = {(<b>Ranked in the top 0.1% of 14.1m research outputs by <a href="https://www.altmetric.com/details/70263284#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Amelia M. Jamison and David A. Broniatowski and Mark Dredze and Zach Wood-Doughty and Dure-Aden Khan and Sandra Crouse-Quinn}, date-added = {2019-11-18 00:43:18 -0500}, date-modified = {2020-01-16 17:32:52 -0500}, file = {https://doi.org/10.1016/j.vaccine.2019.10.066}, journal = {Vaccine}, title = {Vaccine-related advertising in the Facebook Ad Archive}, year = {2019} } (Ranked in the top 0.1% of 14.1m research outputs by Altmetric) Background. In 2018, Facebook introduced Ad Archive as a platform to improve transparency in advertisements related to politics and ``issues of national importance.'' Vaccine-related Facebook advertising is publicly available for the first time. After measles outbreaks in the US brought renewed attention to the possible role of Facebook advertising in the spread of vaccine-related misinformation, Facebook announced steps to limit vaccine-related misinformation. This study serves as a baseline of advertising before new policies went into effect. Methods. Using the keyword `vaccine', we searched Ad Archive on December 13, 2018 and again on February 22, 2019. We exported data for 505 advertisements. A team of annotators sorted advertisements by content: pro-vaccine, anti-vaccine, not relevant. We also conducted a thematic analysis of major advertising themes. We ran Mann-Whitney U tests to compare ad performance metrics. Results. 309 advertisements were included in analysis with 163 (53%) pro-vaccine advertisements and 145 (47%) anti-vaccine advertisements. Despite a similar number of advertisements, the median number of ads per buyer was significantly higher for anti-vaccine ads. First time buyers are less likely to complete disclosure information and risk ad removal. Thematically, anti-vaccine advertising messages are relatively uniform and emphasize vaccine harms (55%). In contrast, pro-vaccine advertisements come from a diverse set of buyers (83 unique) with varied goals including promoting vaccination (49%), vaccine related philanthropy (15%), and vaccine related policy (14%). Conclusions. A small set of anti-vaccine advertisement buyers have leveraged Facebook advertisements to reach targeted audiences. By deeming all vaccine-related content an issue of ``national importance,'' Facebook has further the politicized vaccines. The implementation of a blanket disclosure policy also limits which ads can successfully run on Facebook. Improving transparency and limiting misinformation should not be separate goals. Public health communication efforts should consider the potential impact on Facebook users' vaccine attitudes and behaviors.

		Elliot Schumacher, Mark Dredze. Learning unsupervised contextual representations for medical synonym discovery. JAMIA Open, 2019. [PDF] [Bibtex] [Close] @article{Schumacher:2019pi, abstract = {Objectives. An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (``dilated RA'' and ``dilated RV'') or dissimilar (``cerebrovascular accident'' and ``stroke''); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. Materials and Methods. Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. Results. We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. Conclusions. Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance.}, author = {Elliot Schumacher and Mark Dredze}, date-added = {2019-11-05 15:02:21 -0500}, date-modified = {2019-11-26 15:56:17 -0500}, file = {https://academic.oup.com/jamiaopen/advance-article/doi/10.1093/jamiaopen/ooz057/5612165}, journal = {JAMIA Open}, title = {Learning unsupervised contextual representations for medical synonym discovery}, year = {2019} } Objectives. An important component of processing medical texts is the identification of synonymous words or phrases. Synonyms can inform learned representations of patients or improve linking mentioned concepts to medical ontologies. However, medical synonyms can be lexically similar (``dilated RA'' and ``dilated RV'') or dissimilar (``cerebrovascular accident'' and ``stroke''); contextual information can determine if 2 strings are synonymous. Medical professionals utilize extensive variation of medical terminology, often not evidenced in structured medical resources. Therefore, the ability to discover synonyms, especially without reliance on training data, is an important component in processing training notes. The ability to discover synonyms from models trained on large amounts of unannotated data removes the need to rely on annotated pairs of similar words. Models relying solely on non-annotated data can be trained on a wider variety of texts without the cost of annotation, and thus may capture a broader variety of language. Materials and Methods. Recent contextualized deep learning representation models, such as ELMo (Peters et al., 2019) and BERT, (Devlin et al. 2019) have shown strong improvements over previous approaches in a broad variety of tasks. We leverage these contextualized deep learning models to build representations of synonyms, which integrate the context of surrounding sentence and use character-level models to alleviate out-of-vocabulary issues. Using these models, we perform unsupervised discovery of likely synonym matches, which reduces the reliance on expensive training data. Results. We use the ShARe/CLEF eHealth Evaluation Lab 2013 Task 1b data to evaluate our synonym discovery method. Comparing our proposed contextualized deep learning representations to previous non-neural representations, we find that the contextualized representations show consistent improvement over non-contextualized models in all metrics. Conclusions. Our results show that contextualized models produce effective representations for synonym discovery. We expect that the use of these representations in other tasks would produce similar gains in performance.

		Alicia L Nobles, Eric C Leas, Benjamin M Althouse, Mark Dredze, Christopher A Longhurst, Davey M Smith, John W Ayers. Requests for Diagnoses of Sexually Transmitted Diseases on a Social Media Platform. Journal of the American Medical Association (JAMA), 2019;322(17):1712-1713. [PDF] [Bibtex] [Close] @article{Nobles:2019kh, abstract = {Although many studies document the use of social media for sharing and requesting information on specific health conditions,1,2 whether individuals obtain diagnoses on social media platforms has not been investigated.3,4 The occurrence of requests for a diagnosis on social media (crowd-diagnosis) and determination as to whether the requested diagnosis was for a second opinion after seeing a health care professional were evaluated in a case study.}, annote = {(<b>Ranked in the top 0.05% of 13.9m research outputs by <a href="https://jamanetwork.altmetric.com/details/69814011#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Alicia L. Nobles and Eric C. Leas and Benjamin M. Althouse and Mark Dredze and Christopher A. Longhurst and Davey M. Smith and John W. Ayers}, date-added = {2019-11-05 14:23:12 -0500}, date-modified = {2020-10-22 11:09:56 -0400}, file = {https://jamanetwork.com/journals/jama/fullarticle/2753884}, journal = {Journal of the American Medical Association (JAMA)}, number = {17}, pages = {1712-1713}, title = {Requests for Diagnoses of Sexually Transmitted Diseases on a Social Media Platform}, volume = {322}, year = {2019} } (Ranked in the top 0.05% of 13.9m research outputs by Altmetric) Although many studies document the use of social media for sharing and requesting information on specific health conditions,1,2 whether individuals obtain diagnoses on social media platforms has not been investigated.3,4 The occurrence of requests for a diagnosis on social media (crowd-diagnosis) and determination as to whether the requested diagnosis was for a second opinion after seeing a health care professional were evaluated in a case study.

		Eric C Leas, Alicia L Nobles, Theodore L Caputi, Mark Dredze, Davey M Smith, John W Ayers. Trends in Internet Searches for Cannabidiol (CBD) in the United States. JAMA Network Open, 2019;2(10):e1913853. [PDF] [Bibtex] [Close] @article{Leas:2019qm, abstract = {Cannabidiol (CBD) is widely promoted as a panacea. For example, the cannabis brand MedMen claims CBD treats acne, anxiety, opioid addiction, pain, and menstrual problems.1 However, the US Food and Drug Administration has only approved highly purified CBD (Epidiolex) for treating epilepsy. To our knowledge, there is currently no population-focused surveillance of public interest in CBD. Consequently, many question whether CBD should be prioritized by public health leaders and regulators. This article describes public interest in CBD within the United States.}, annote = {(<b>Ranked in the top 0.5% of 13.6m research outputs by <a href="https://www.altmetric.com/details/69155726#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Eric C. Leas and Alicia L. Nobles and Theodore L. Caputi and Mark Dredze and Davey M. Smith and John W. Ayers}, date-added = {2019-10-23 12:20:06 -0400}, date-modified = {2019-10-23 12:21:25 -0400}, file = {https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2753393}, journal = {JAMA Network Open}, number = {10}, pages = {e1913853}, title = {Trends in Internet Searches for Cannabidiol (CBD) in the United States}, volume = {2}, year = {2019} } (Ranked in the top 0.5% of 13.6m research outputs by Altmetric) Cannabidiol (CBD) is widely promoted as a panacea. For example, the cannabis brand MedMen claims CBD treats acne, anxiety, opioid addiction, pain, and menstrual problems.1 However, the US Food and Drug Administration has only approved highly purified CBD (Epidiolex) for treating epilepsy. To our knowledge, there is currently no population-focused surveillance of public interest in CBD. Consequently, many question whether CBD should be prioritized by public health leaders and regulators. This article describes public interest in CBD within the United States.

		Eric C Leas, Mark Dredze, John W Ayers. Ignoring Data Delays Our Reaction to Emerging Public Health Tragedies Like 13 Reasons Why. JAMA Psychiatry, 2019. [PDF] [Bibtex] [Close] @article{Leas:2019yk, abstract = {We applaud Niederkrotenthaler and colleagues1 for adding another layer of evidence that 13 Reasons Why is harming the public by pushing some individuals toward suicide. However, their dismissal of some of the earliest evidence on this subject deserves a revision not because it undermines their central claim but because it makes it even stronger and can make psychiatric epidemiology more actionable in the future.}, author = {Eric C. Leas and Mark Dredze and John W. Ayers}, date-added = {2019-10-10 17:10:18 -0400}, date-modified = {2019-10-10 17:11:33 -0400}, file = {https://jamanetwork.com/journals/jamapsychiatry/article-abstract/2751529}, journal = {JAMA Psychiatry}, title = {Ignoring Data Delays Our Reaction to Emerging Public Health Tragedies Like 13 Reasons Why}, year = {2019} } We applaud Niederkrotenthaler and colleagues1 for adding another layer of evidence that 13 Reasons Why is harming the public by pushing some individuals toward suicide. However, their dismissal of some of the earliest evidence on this subject deserves a revision not because it undermines their central claim but because it makes it even stronger and can make psychiatric epidemiology more actionable in the future.

		Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, Mark Dredze. Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models. NeurIPS Workshop on Machine Learning for Health (ML4H), 2019. [PDF] [Bibtex] [Close] @inproceedings{Mulyar:2019pi, abstract = {Clinical notes contain an extensive record of a patient's health status, such as smoking status or the presence of heart conditions. However, this detail is not replicated within the structured data of electronic health systems. Phenotyping, the extraction of patient conditions from free clinical text, is a critical task which supports a variety of downstream applications such as decision support and secondary use of medical records. Previous work has resulted in systems which are high performing but require hand engineering, often of rules. Recent work in pretrained contextualized language models have enabled advances in representing text for a variety of tasks. We therefore explore several architectures for modeling phenotyping that rely solely on BERT representations of the clinical note, removing the need for manual engineering. We find these architectures are competitive with or outperform existing state of the art methods on two phenotyping tasks.}, author = {Andriy Mulyar and Elliot Schumacher and Masoud Rouhizadeh and Mark Dredze}, booktitle = {NeurIPS Workshop on Machine Learning for Health (ML4H)}, date-added = {2019-10-01 23:17:57 -0400}, date-modified = {2019-11-26 15:48:28 -0500}, file = {https://arxiv.org/abs/1910.13664}, keywords = {workshop}, title = {Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models}, year = {2019} } Clinical notes contain an extensive record of a patient's health status, such as smoking status or the presence of heart conditions. However, this detail is not replicated within the structured data of electronic health systems. Phenotyping, the extraction of patient conditions from free clinical text, is a critical task which supports a variety of downstream applications such as decision support and secondary use of medical records. Previous work has resulted in systems which are high performing but require hand engineering, often of rules. Recent work in pretrained contextualized language models have enabled advances in representing text for a variety of tasks. We therefore explore several architectures for modeling phenotyping that rely solely on BERT representations of the clinical note, removing the need for manual engineering. We find these architectures are competitive with or outperform existing state of the art methods on two phenotyping tasks.

		Zachary Wood-Doughty, David Broniatowski, Mark Dredze. Machine Learning Classifiers for Socio-Demographics of Social Media Users: Limitations and Possibilities. American Public Health Association (APHA), 2019. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2019by, abstract = {Background: Social media analyses of health behaviors, such as vaccination, have shown significant promise for surveillance. However, existing research, both quantitative and qualitiative, suggests that health behaviors varies significantly with socio-demographic factors, particularly race, ethnicity, and gender. Often, these factors are not explicitly disclosed on social media platforms and must instead be inferred, raising the possibility of methodological bias. This session examines existing tools and methodologies used to infer socio-demographics of social media users. Methods: We survey several socio-demographic classifiers that work with Twitter and Reddit data. These classifiers use features including users' language patterns, follower behaviors, and choice of names. These classifiers predict labels including users' gender, race and ethnicity, or filter out social media accounts run by organizations. Results: We explain how the data for these classifiers is collected, how the classification models are trained, and how they could be applied to public health research. We in particular discuss the limitations that these classifiers have, including possible methodological bias introduced by the challenges of large-scale data collection of social media users' demographic information. Discussion: Health behaviors vary with socio-demographic factors, which are challenging to measure on social media platforms. Machine learning classification of socio-demographics is possible, but requires interdisciplinary considerations.}, author = {Zachary Wood-Doughty and David Broniatowski and Mark Dredze}, booktitle = {American Public Health Association (APHA)}, date-added = {2019-09-15 23:09:07 -0400}, date-modified = {2019-09-15 23:10:28 -0400}, file = {2019_apha_poster_machine_learning_classifiers_for_socio_demographics_of_social_media_users.pdf}, keywords = {abstract}, title = {Machine Learning Classifiers for Socio-Demographics of Social Media Users: Limitations and Possibilities}, year = {2019} } Background: Social media analyses of health behaviors, such as vaccination, have shown significant promise for surveillance. However, existing research, both quantitative and qualitiative, suggests that health behaviors varies significantly with socio-demographic factors, particularly race, ethnicity, and gender. Often, these factors are not explicitly disclosed on social media platforms and must instead be inferred, raising the possibility of methodological bias. This session examines existing tools and methodologies used to infer socio-demographics of social media users. Methods: We survey several socio-demographic classifiers that work with Twitter and Reddit data. These classifiers use features including users' language patterns, follower behaviors, and choice of names. These classifiers predict labels including users' gender, race and ethnicity, or filter out social media accounts run by organizations. Results: We explain how the data for these classifiers is collected, how the classification models are trained, and how they could be applied to public health research. We in particular discuss the limitations that these classifiers have, including possible methodological bias introduced by the challenges of large-scale data collection of social media users' demographic information. Discussion: Health behaviors vary with socio-demographic factors, which are challenging to measure on social media platforms. Machine learning classification of socio-demographics is possible, but requires interdisciplinary considerations.

		Yuchen Zhou, Mark Dredze, David A Broniatowski, William D Adler. Elites and foreign actors among the alt-right: The Gab social media platform. First Monday, 2019. [PDF] [Bibtex] [Close] @article{Zhou:2019hh, abstract = {Content regulation and censorship of social media platforms is increasingly discussed by governments and the platforms themselves. To date, there has been little data-driven analysis of the effects of regulated content deemed inappropriate on online user behavior. We therefore compared Twitter --- a popular social media platform that occasionally removes content in violation of its Terms of Service --- to Gab --- a platform that markets itself as completely unregulated. Launched in mid-2016, Gab is, in practice, dominated by individuals who associate with the ``alt-right'' political movement in the United States. Despite its billing as ``The Free Speech Social Network,'' Gab users display more extreme social hierarchy and elitism when compared to Twitter. Although the framing of the site welcomes all people, Gab users' content is more homogeneous, preferentially sharing material from sites traditionally associated with the extremes of American political discourse, especially the far right. Furthermore, many of these sites are associated with state-sponsored propaganda from foreign governments. Finally, we discovered a significant presence of German language posts on Gab, with several topics focusing on German domestic politics, yet sharing significant amounts of content from U.S. and Russian sources. These results indicate possible emergent linkages between domestic politics in European and American far right political movements. Implications for regulation of social media platforms are discussed.}, annote = {(<b>Ranked in the top 3% of 13.6m research outputs by <a href="https://www.altmetric.com/details/65738967#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {Yuchen Zhou and Mark Dredze and David A. Broniatowski and William D. Adler}, date-added = {2019-09-01 10:55:24 -0400}, date-modified = {2019-09-15 23:11:44 -0400}, file = {http://dx.doi.org/10.5210/fm.v24i9.10062}, journal = {First Monday}, month = {2 September}, number = {9}, title = {Elites and foreign actors among the alt-right: The Gab social media platform}, volume = {24}, year = {2019} } (Ranked in the top 3% of 13.6m research outputs by Altmetric) Content regulation and censorship of social media platforms is increasingly discussed by governments and the platforms themselves. To date, there has been little data-driven analysis of the effects of regulated content deemed inappropriate on online user behavior. We therefore compared Twitter --- a popular social media platform that occasionally removes content in violation of its Terms of Service --- to Gab --- a platform that markets itself as completely unregulated. Launched in mid-2016, Gab is, in practice, dominated by individuals who associate with the ``alt-right'' political movement in the United States. Despite its billing as ``The Free Speech Social Network,'' Gab users display more extreme social hierarchy and elitism when compared to Twitter. Although the framing of the site welcomes all people, Gab users' content is more homogeneous, preferentially sharing material from sites traditionally associated with the extremes of American political discourse, especially the far right. Furthermore, many of these sites are associated with state-sponsored propaganda from foreign governments. Finally, we discovered a significant presence of German language posts on Gab, with several topics focusing on German domestic politics, yet sharing significant amounts of content from U.S. and Russian sources. These results indicate possible emergent linkages between domestic politics in European and American far right political movements. Implications for regulation of social media platforms are discussed.

		Michelle R Kaufman, Debangan Dey, Ciprian Crainiceanu, Mark Dredze. #MeToo and Google Inquiries Into Sexual Violence: A Hashtag Campaign Can Sustain Information Seeking. Journal of Interpersonal Violence, 2019. [PDF] [Bibtex] [Close] @article{Kaufman:2019pi, abstract = {The #MeToo Movement has brought new attention to sexual harassment and assault. While the movement originates with activist Tarana Burke, actor Alyssa Milano used the phrase on Twitter in October 2017 in response to multiple sexual harassment allegations against Hollywood producer Harvey Weinstein. Within 24 hours, 53,000 people tweeted comments and/or shared personal experiences of sexual violence. The study objective was to measure how information seeking via Google searches for sexual harassment and assault changed following Milano's tweet and whether this change was sustained in spite of celebrity scandals. Weekly Google search inquiries in the United States were downloaded for the terms metoo, sexual assault, sexual harassment, sexual abuse, and rape for January 1, 2017 to July 15, 2018. Seven related news events about perpetrator accusations were considered. Results showed that searches for metoo increased dramatically after the Weinstein accusation and stayed high during subsequent accusations. A small decrease in searches followed, but the number remained very high relative to baseline (the period before the Weinstein accusation). Searches for sexual assault and sexual harassment increased substantially immediately following the Weinstein accusation, stayed high during subsequent accusations, and saw a decline after the accusation of Matt Lauer (talk show host; last event considered). We estimated a 40% to 70% reduction in searches 6 months after the Lauer accusation, though the increase in searches relative to baseline remained statistically significant. For sexual abuse and rape, the number of searches returned close to baseline by 6 months. It appears that the #MeToo movement sparked greater information seeking that was sustained beyond the associated events. Given its recent ubiquitous use in the media and public life, hashtag activism such as #MeToo can be used to draw further attention to the next steps in addressing sexual assault and harassment, moving public web inquiries from information seeking to action.}, author = {Michelle R. Kaufman and Debangan Dey and Ciprian Crainiceanu and Mark Dredze}, date-added = {2019-08-29 00:05:10 -0400}, date-modified = {2019-09-15 23:12:11 -0400}, file = {https://doi.org/10.1177%2F0886260519868197}, journal = {Journal of Interpersonal Violence}, title = {#MeToo and Google Inquiries Into Sexual Violence: A Hashtag Campaign Can Sustain Information Seeking}, year = {2019} } The #MeToo Movement has brought new attention to sexual harassment and assault. While the movement originates with activist Tarana Burke, actor Alyssa Milano used the phrase on Twitter in October 2017 in response to multiple sexual harassment allegations against Hollywood producer Harvey Weinstein. Within 24 hours, 53,000 people tweeted comments and/or shared personal experiences of sexual violence. The study objective was to measure how information seeking via Google searches for sexual harassment and assault changed following Milano's tweet and whether this change was sustained in spite of celebrity scandals. Weekly Google search inquiries in the United States were downloaded for the terms metoo, sexual assault, sexual harassment, sexual abuse, and rape for January 1, 2017 to July 15, 2018. Seven related news events about perpetrator accusations were considered. Results showed that searches for metoo increased dramatically after the Weinstein accusation and stayed high during subsequent accusations. A small decrease in searches followed, but the number remained very high relative to baseline (the period before the Weinstein accusation). Searches for sexual assault and sexual harassment increased substantially immediately following the Weinstein accusation, stayed high during subsequent accusations, and saw a decline after the accusation of Matt Lauer (talk show host; last event considered). We estimated a 40% to 70% reduction in searches 6 months after the Lauer accusation, though the increase in searches relative to baseline remained statistically significant. For sexual abuse and rape, the number of searches returned close to baseline by 6 months. It appears that the #MeToo movement sparked greater information seeking that was sustained beyond the associated events. Given its recent ubiquitous use in the media and public life, hashtag activism such as #MeToo can be used to draw further attention to the next steps in addressing sexual assault and harassment, moving public web inquiries from information seeking to action.

		Shijie Wu, Mark Dredze. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. Empirical Methods in Natural Language Processing (EMNLP), 2019. [PDF] [Bibtex] [Close] @inproceedings{Wu:2019rw, abstract = {Pretrained contextual representation models (Peters et al.,2018; Devlin et al.,2019) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin,2018) includes a model simultaneously pre-trained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero-shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification,NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language-specific features, and measure factors that influence cross-lingual transfer.}, author = {Shijie Wu and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2019-08-13 18:09:12 -0400}, date-modified = {2019-08-13 18:09:23 -0400}, file = {2019_mbert_emnlp.pdf}, title = {Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT}, year = {2019} } Pretrained contextual representation models (Peters et al.,2018; Devlin et al.,2019) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin,2018) includes a model simultaneously pre-trained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero-shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification,NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language-specific features, and measure factors that influence cross-lingual transfer.

		Tao Chen, Mark Dredze, Jonathan P Weiner, Hadi Kharrazi. Identifying vulnerable older adult populations by contextualizing geriatric syndrome information in clinical notes of electronic health records. Journal of the American Medical Informatics Association (JAMIA), 2019. [PDF] [Bibtex] [Close] @article{Chen:2019dp, abstract = {Objective Geriatric syndromes such as functional disability and lack of social support are often not encoded in electronic health records (EHRs), thus obscuring the identification of vulnerable older adults in need of additional medical and social services. In this study, we automatically identify vulnerable older adult patients with geriatric syndrome based on clinical notes extracted from an EHR system, and demonstrate how contextual information can improve the process. Materials and Methods We propose a novel end-to-end neural architecture to identify sentences that contain geriatric syndromes. Our model learns a representation of the sentence and augments it with contextual information: surrounding sentences, the entire clinical document, and the diagnosis codes associated with the document. We trained our system on annotated notes from 85 patients, tuned the model on another 50 patients, and evaluated its performance on the rest, 50 patients. Results Contextual information improved classification, with the most effective context coming from the surrounding sentences. At sentence level, our best performing model achieved a micro-F1 of 0.605, significantly outperforming context-free baselines. At patient level, our best model achieved a micro-F1 of 0.843. Discussion Our solution can be used to expand the identification of vulnerable older adults with geriatric syndromes. Since functional and social factors are often not captured by diagnosis codes in EHRs, the automatic identification of the geriatric syndrome can reduce disparities by ensuring consistent care across the older adult population. Conclusion EHR free-text can be used to identify vulnerable older adults with a range of geriatric syndromes.}, author = {Tao Chen and Mark Dredze and Jonathan P Weiner and Hadi Kharrazi}, date-added = {2019-07-03 18:19:29 -0400}, date-modified = {2019-09-15 23:12:32 -0400}, file = {https://doi.org/10.1093/jamia/ocz093}, journal = {Journal of the American Medical Informatics Association (JAMIA)}, title = {Identifying vulnerable older adult populations by contextualizing geriatric syndrome information in clinical notes of electronic health records}, year = {2019} } Objective Geriatric syndromes such as functional disability and lack of social support are often not encoded in electronic health records (EHRs), thus obscuring the identification of vulnerable older adults in need of additional medical and social services. In this study, we automatically identify vulnerable older adult patients with geriatric syndrome based on clinical notes extracted from an EHR system, and demonstrate how contextual information can improve the process. Materials and Methods We propose a novel end-to-end neural architecture to identify sentences that contain geriatric syndromes. Our model learns a representation of the sentence and augments it with contextual information: surrounding sentences, the entire clinical document, and the diagnosis codes associated with the document. We trained our system on annotated notes from 85 patients, tuned the model on another 50 patients, and evaluated its performance on the rest, 50 patients. Results Contextual information improved classification, with the most effective context coming from the surrounding sentences. At sentence level, our best performing model achieved a micro-F1 of 0.605, significantly outperforming context-free baselines. At patient level, our best model achieved a micro-F1 of 0.843. Discussion Our solution can be used to expand the identification of vulnerable older adults with geriatric syndromes. Since functional and social factors are often not captured by diagnosis codes in EHRs, the automatic identification of the geriatric syndrome can reduce disparities by ensuring consistent care across the older adult population. Conclusion EHR free-text can be used to identify vulnerable older adults with a range of geriatric syndromes.

		Silvio Amir, Mark Dredze, John W Ayers. Population Level Mental Health Surveillance over Social Media with Digital Cohorts. NAACL Workshop on Computational Linguistics and Clinical Psychology, 2019. [PDF] [Bibtex] [Close] @inproceedings{Amir:2019lq, abstract = {The ability to track mental health conditions via social media opened the doors for largescale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.}, author = {Silvio Amir and Mark Dredze and John W. Ayers}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology}, date-added = {2019-04-03 16:17:44 -0400}, date-modified = {2019-04-03 16:18:15 -0400}, file = {https://www.aclweb.org/anthology/W19-3013}, keywords = {workshop}, title = {Population Level Mental Health Surveillance over Social Media with Digital Cohorts}, year = {2019} } The ability to track mental health conditions via social media opened the doors for largescale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.

		Alicia L Nobles, Mark Dredze, John W Ayers. Repeal and replace": increased demand for intrauterine devices following the 2016 presidential election. Contraception, 2019. [PDF] [Bibtex] [Close] @article{Nobles:2019zt, abstract = {Objective To evaluate public's interest in contraceptive options following heightened focus on a repeal of the Affordable Care Act (ACA) since the 2016 United States presidential election. Study design We monitored the fraction of Google searches emerging from the United States for the three most popular reversible contraceptive methods --- oral contraceptives, intrauterine devices (IUDs) and condoms --- from January 1, 2004, through October 31, 2017 (1 year after the presidential election). Results IUD searches were cumulatively 15% (95% CI: 10 to 20) higher than expected the year following the 2016 election, reflecting 10 to 21 million excess searches. IUD searches were statistically significantly higher in all states, except NV, and were consistent across states won by Trump or Clinton (Welch t test=0.60, p=.548). Conversely, searches for oral contraceptives and condoms remained stable (0%; 95% CI: −2 to 1) or declined (−4%; 95% CI: −5 to −2), respectively, following the election. Conclusions The etiology of increased searches for IUDs is likely multifaceted. However, it may largely be because IUDs will confer continued protection even after an ACA repeal, thereby providing a medical hedge against a possible repeal. Regardless, these data suggest the heightened focus on an ACA repeal is a concern to the record number of Americans seeking out information about IUDs.}, author = {Alicia L. Nobles and Mark Dredze and John W. Ayers}, date-added = {2019-04-01 00:45:47 -0400}, date-modified = {2019-04-01 00:48:15 -0400}, file = {https://doi.org/10.1016/j.contraception.2018.10.012}, journal = {Contraception}, title = {{Repeal and replace": increased demand for intrauterine devices following the 2016 presidential election}, year = {2019} } Objective To evaluate public's interest in contraceptive options following heightened focus on a repeal of the Affordable Care Act (ACA) since the 2016 United States presidential election. Study design We monitored the fraction of Google searches emerging from the United States for the three most popular reversible contraceptive methods --- oral contraceptives, intrauterine devices (IUDs) and condoms --- from January 1, 2004, through October 31, 2017 (1 year after the presidential election). Results IUD searches were cumulatively 15% (95% CI: 10 to 20) higher than expected the year following the 2016 election, reflecting 10 to 21 million excess searches. IUD searches were statistically significantly higher in all states, except NV, and were consistent across states won by Trump or Clinton (Welch t test=0.60, p=.548). Conversely, searches for oral contraceptives and condoms remained stable (0%; 95% CI: −2 to 1) or declined (−4%; 95% CI: −5 to −2), respectively, following the election. Conclusions The etiology of increased searches for IUDs is likely multifaceted. However, it may largely be because IUDs will confer continued protection even after an ACA repeal, thereby providing a medical hedge against a possible repeal. Regardless, these data suggest the heightened focus on an ACA repeal is a concern to the record number of Americans seeking out information about IUDs.

		Tao Chen, Mark Dredze, Jonathan P Weiner, Leilani Hernandez, Joe Kimura, Hadi Kharrazi. Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods. JMIR Medical Informatics, 2019. [PDF] [Bibtex] [Close] @article{Chen:2019fx, abstract = {Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.}, author = {Tao Chen and Mark Dredze and Jonathan P Weiner and Leilani Hernandez and Joe Kimura and Hadi Kharrazi}, date-added = {2019-03-10 21:46:00 -0400}, date-modified = {2019-03-14 09:26:23 -0400}, file = {http://dx.doi.org/10.2196/13039}, journal = {JMIR Medical Informatics}, title = {Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods}, year = {2019} } Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.

		Elliot Schumacher, Mark Dredze. Discriminative Candidate Generation for Medical Concept Linking. Knowledge Base Construction (AKBC), 2019. [PDF] [Bibtex] [Close] @inproceedings{Schumacher:2019nx, abstract = {Linking mentions of medical concepts in a clinical note to a concept in an ontology enables a variety of tasks that rely on understanding the content of a medical record, such as identifying patient populations and decision support. Medical concept linking can be formulated as a two-step task; 1) candidate generator, which selects likely candidates from the ontology for the given mention, and 2) a ranker, which orders the candidates based on a set of features to find the best one.In this paper, we propose a candidate generation system based on the DiscK framework [Chen andVan Durme, 2017]. Our system produces a candidate list with both high coverage and a rankingthat is a useful starting point for the second step of the linking process. we integrate our candidate selection process into a current linking system, DNorm [Leaman et al., 2013]. The resulting system achieves similar accuracy paired with with a gain in efficiency due to a large reduction in the number of potential candidates considered.}, author = {Elliot Schumacher and Mark Dredze}, booktitle = {Knowledge Base Construction (AKBC)}, date-added = {2019-02-16 20:43:58 -0500}, date-modified = {2019-02-16 20:45:10 -0500}, file = {2019_disck_akbc.pdf}, title = {Discriminative Candidate Generation for Medical Concept Linking}, year = {2019} } Linking mentions of medical concepts in a clinical note to a concept in an ontology enables a variety of tasks that rely on understanding the content of a medical record, such as identifying patient populations and decision support. Medical concept linking can be formulated as a two-step task; 1) candidate generator, which selects likely candidates from the ontology for the given mention, and 2) a ranker, which orders the candidates based on a set of features to find the best one.In this paper, we propose a candidate generation system based on the DiscK framework [Chen andVan Durme, 2017]. Our system produces a candidate list with both high coverage and a rankingthat is a useful starting point for the second step of the linking process. we integrate our candidate selection process into a current linking system, DNorm [Leaman et al., 2013]. The resulting system achieves similar accuracy paired with with a gain in efficiency due to a large reduction in the number of potential candidates considered.

		Ran Zhao, Yuntian Deng, Mark Dredze, Arun Verma, David Rosenberg, Amanda Stent. Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning. The Florida Artificial Intelligence Research Society (FLAIRS), 2019. [PDF] [Bibtex] [Close] @inproceedings{Zhao:2019qa, abstract = {Technical and fundamental analysis are traditional tools used to analyze stocks; however, the finance literature has shown that the price movement of each individual stock is highly correlated with that of other stocks, especially those within the same sector. In this paper we propose a general- purpose market representation that incorporates fundamental and technical indicators and relationships between individual stocks. We treat the daily stock market as a `market image' where rows (grouped by market sector) represent individual stocks and columns represent indicators. We apply a convo- lutional neural network over this market image to build market features in a hierarchical way. We use a recurrent neural network, with an attention mechanism over the market fea- ture maps, to model temporal dynamics in the market. Our model outperforms strong baselines in both short-term and long-term stock return prediction tasks. We also show another use for our market image: to construct concise and dense mar- ket embeddings suitable for downstream prediction tasks.}, author = {Ran Zhao and Yuntian Deng and Mark Dredze and Arun Verma and David Rosenberg and Amanda Stent}, booktitle = {The Florida Artificial Intelligence Research Society (FLAIRS)}, date-added = {2019-02-12 23:35:16 -0500}, date-modified = {2019-02-12 23:36:15 -0500}, file = {2019_zhao_flairs.pdf}, title = {Visual Attention Model for Cross-sectional Stock Return Prediction and End-to-End Multimodal Market Representation Learning}, year = {2019} } Technical and fundamental analysis are traditional tools used to analyze stocks; however, the finance literature has shown that the price movement of each individual stock is highly correlated with that of other stocks, especially those within the same sector. In this paper we propose a general- purpose market representation that incorporates fundamental and technical indicators and relationships between individual stocks. We treat the daily stock market as a `market image' where rows (grouped by market sector) represent individual stocks and columns represent indicators. We apply a convo- lutional neural network over this market image to build market features in a hierarchical way. We use a recurrent neural network, with an attention mechanism over the market fea- ture maps, to model temporal dynamics in the market. Our model outperforms strong baselines in both short-term and long-term stock return prediction tasks. We also show another use for our market image: to construct concise and dense mar- ket embeddings suitable for downstream prediction tasks.

		Joshua Dredze, Lisi Dredze, Mark Dredze. Measuring Online Information Seeking for Stimulants from Google Search Queries. American Psychological Association (APA), 2019. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2019ee, author = {Joshua Dredze and Lisi Dredze and Mark Dredze}, booktitle = {American Psychological Association (APA)}, date-added = {2019-01-28 23:28:11 -0500}, date-modified = {2019-01-28 23:28:57 -0500}, file = {2019_dredze_apa.pdf}, keywords = {abstract}, title = {Measuring Online Information Seeking for Stimulants from Google Search Queries}, year = {2019} }

		John W Ayers, Alicia L Nobles, Mark Dredze. Media Trends for the Substance Abuse and Mental Health Services Administration 800-662-HELP Addiction Treatment Referral Services After a Celebrity Overdose. JAMA Internal Medicine, 2019. [PDF] [Bibtex] [Close] @article{Ayers:2019ao, abstract = {Despite a substantial investment in evidence-based addiction resources, only 10% of US individuals who need treatment for drug addiction receive it.1 The Substance Abuse and Mental Health Services Administration (SAMHSA) national helpline (800-662-HELP) is the only free, federally managed and endorsed US addiction treatment referral service, helping callers find the best local services that match their needs. We sought to determine awareness of the avialability of this free resource.}, annote = {(<b>Ranked in the top 1% of 12.5m research outputs by <a href="https://jamanetwork.altmetric.com/details/53918599#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W. Ayers and Alicia L. Nobles and Mark Dredze}, date-added = {2019-01-17 11:31:15 -0500}, date-modified = {2019-01-17 11:33:21 -0500}, file = {https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2720125?guestAccessKey=2d35350c-4391-43f0-9e4c-b7c41bd7a2d2&utm_source=jps&utm_medium=email&utm_campaign=author_alert-jamanetwork&utm_content=author-author_engagement&utm_term=1m}, journal = {JAMA Internal Medicine}, title = {Media Trends for the Substance Abuse and Mental Health Services Administration 800-662-HELP Addiction Treatment Referral Services After a Celebrity Overdose}, year = {2019} } (Ranked in the top 1% of 12.5m research outputs by Altmetric) Despite a substantial investment in evidence-based addiction resources, only 10% of US individuals who need treatment for drug addiction receive it.1 The Substance Abuse and Mental Health Services Administration (SAMHSA) national helpline (800-662-HELP) is the only free, federally managed and endorsed US addiction treatment referral service, helping callers find the best local services that match their needs. We sought to determine awareness of the avialability of this free resource.

		Xiaolei Huang, Michael C Smith, Amelia M Jamison, David A Broniatowski, Mark Dredze, Sandra Crouse Quinn, Justin Cai, Michael J Paul. Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013--2017. BMJ Open, 2019. [PDF] [Bibtex] [Close] @article{Huang:2018ys, abstract = {Introduction The Centers for Disease Control and Prevention (CDC) spend significant time and resources to track influenza vaccination coverage each influenza season using national surveys. Emerging data from social media provide an alternative solution to surveillance at both national and local levels of influenza vaccination coverage in near real time. Objectives This study aimed to characterise and analyse the vaccinated population from temporal, demographical and geographical perspectives using automatic classification of vaccination-related Twitter data. Methods In this cross-sectional study, we continuously collected tweets containing both influenza-related terms and vaccine-related terms covering four consecutive influenza seasons from 2013 to 2017. We created a machine learning classifier to identify relevant tweets, then evaluated the approach by comparing to data from the CDC's FluVaxView. We limited our analysis to tweets geolocated within the USA. Results We assessed 1 124 839 tweets. We found strong correlations of 0.799 between monthly Twitter estimates and CDC, with correlations as high as 0.950 in individual influenza seasons. We also found that our approach obtained geographical correlations of 0.387 at the US state level and 0.467 at the regional level. Finally, we found a higher level of influenza vaccine tweets among female users than male users, also consistent with the results of CDC surveys on vaccine uptake. Conclusion Significant correlations between Twitter data and CDC data show the potential of using social media for vaccination surveillance. Temporal variability is captured better than geographical and demographical variability. We discuss potential paths forward for leveraging this approach.}, author = {Xiaolei Huang and Michael C Smith and Amelia M Jamison and David A Broniatowski and Mark Dredze and Sandra Crouse Quinn and Justin Cai and Michael J Paul}, date-added = {2019-01-16 13:13:14 -0500}, date-modified = {2019-02-19 23:00:35 -0500}, file = {http://dx.doi.org/10.1136/bmjopen-2018-024018}, journal = {BMJ Open}, pages = {e024018}, title = {Can online self-reports assist in real-time identification of influenza vaccination uptake? A cross-sectional study of influenza vaccine-related tweets in the USA, 2013--2017}, volume = {9}, year = {2019} } Introduction The Centers for Disease Control and Prevention (CDC) spend significant time and resources to track influenza vaccination coverage each influenza season using national surveys. Emerging data from social media provide an alternative solution to surveillance at both national and local levels of influenza vaccination coverage in near real time. Objectives This study aimed to characterise and analyse the vaccinated population from temporal, demographical and geographical perspectives using automatic classification of vaccination-related Twitter data. Methods In this cross-sectional study, we continuously collected tweets containing both influenza-related terms and vaccine-related terms covering four consecutive influenza seasons from 2013 to 2017. We created a machine learning classifier to identify relevant tweets, then evaluated the approach by comparing to data from the CDC's FluVaxView. We limited our analysis to tweets geolocated within the USA. Results We assessed 1 124 839 tweets. We found strong correlations of 0.799 between monthly Twitter estimates and CDC, with correlations as high as 0.950 in individual influenza seasons. We also found that our approach obtained geographical correlations of 0.387 at the US state level and 0.467 at the regional level. Finally, we found a higher level of influenza vaccine tweets among female users than male users, also consistent with the results of CDC surveys on vaccine uptake. Conclusion Significant correlations between Twitter data and CDC data show the potential of using social media for vaccination surveillance. Temporal variability is captured better than geographical and demographical variability. We discuss potential paths forward for leveraging this approach.

		2018 (18 Publications)
		John W Ayers, Mark Dredze, Eric C Leas, Theodore L Caputi, Jon-Patrick Allem, Joanna E Cohen. Next generation media monitoring: Global coverage of electronic nicotine delivery systems (electronic cigarettes) on Bing, Google and Twitter, 2013-2018. PloS one, 2018;13(11):e0205822. [PDF] [Bibtex] [Close] @article{ayers2018next, abstract = {News media monitoring is an important scientific tool. By treating news reporters as data collectors and their reports as qualitative accounts of a fast changing public health landscape, researchers can glean many valuable insights. Yet, there have been surprisingly few innovations in public health media monitoring, with nearly all studies relying on labor-intensive content analyses limited to a small number of media reports. We propose to advance this subfield by using scalable machine learning. In potentially the largest contemporary public health media monitoring study to date, we systematically characterize global news reports surrounding electronic cigarettes or electronic nicotine delivery systems (ENDS) using natural language processing techniques. News reports including ENDS terms (e.g., ``electronic cigarettes'') from over 100,000 sources (all sources archived on Google News or Bing News, as well as all news articles shared on Twitter) were monitored for 1 January 2013 through 31 July 2018. The geographic and subject (e.g., prevalence, bans, quitting, warnings, marketing, prices, age, flavor and industry) foci of news articles, their popularity among readers who share news on social media, and the sentiment behind news articles were assessed algorithmically. Globally there were 86,872 ENDS news reports with coverage increasing from 8 (standard deviation [SD] = 8) stories per day in 2013 to 75 (SD = 56) stories per day during 2018. The focus of ENDS news spanned 148 nations, with the plurality focusing on the United States (34% of all news). Potentially overlooked hotspots of ENDS media activity included China, Egypt, Russia, Ukraine, and Paraguay. The most common subject was warnings about ENDS (18%), followed by bans on using ENDS (13%) and ENDS prices (9%). Flavor and age restrictions were the least covered news subjects (~1% each). Among different subject foci, reports on quitting cigarettes using ENDS had the highest probability of scoring in the top three deciles of popularity rankings. Moreover, ENDS news on quitting and prices had a more positive sentiment on average than news with other subject foci. Public health leaders can use these trends to stay abreast of how ENDS are portrayed in the media, and potentially how the public perceives ENDS. Because our analytical strategies are updated in near real time, we aim to make media monitoring part of standard practice to support evidence-based tobacco control in the future.}, author = {John W Ayers and Mark Dredze and Eric C Leas and Theodore L. Caputi and Jon-Patrick Allem and Joanna E Cohen}, date-added = {2018-12-13 22:50:23 -0500}, date-modified = {2018-12-13 22:52:26 -0500}, file = {https://doi.org/10.1371/journal.pone.0205822}, journal = {PloS one}, number = {11}, pages = {e0205822}, publisher = {Public Library of Science}, title = {Next generation media monitoring: Global coverage of electronic nicotine delivery systems (electronic cigarettes) on Bing, Google and Twitter, 2013-2018}, volume = {13}, year = {2018} } News media monitoring is an important scientific tool. By treating news reporters as data collectors and their reports as qualitative accounts of a fast changing public health landscape, researchers can glean many valuable insights. Yet, there have been surprisingly few innovations in public health media monitoring, with nearly all studies relying on labor-intensive content analyses limited to a small number of media reports. We propose to advance this subfield by using scalable machine learning. In potentially the largest contemporary public health media monitoring study to date, we systematically characterize global news reports surrounding electronic cigarettes or electronic nicotine delivery systems (ENDS) using natural language processing techniques. News reports including ENDS terms (e.g., ``electronic cigarettes'') from over 100,000 sources (all sources archived on Google News or Bing News, as well as all news articles shared on Twitter) were monitored for 1 January 2013 through 31 July 2018. The geographic and subject (e.g., prevalence, bans, quitting, warnings, marketing, prices, age, flavor and industry) foci of news articles, their popularity among readers who share news on social media, and the sentiment behind news articles were assessed algorithmically. Globally there were 86,872 ENDS news reports with coverage increasing from 8 (standard deviation [SD] = 8) stories per day in 2013 to 75 (SD = 56) stories per day during 2018. The focus of ENDS news spanned 148 nations, with the plurality focusing on the United States (34% of all news). Potentially overlooked hotspots of ENDS media activity included China, Egypt, Russia, Ukraine, and Paraguay. The most common subject was warnings about ENDS (18%), followed by bans on using ENDS (13%) and ENDS prices (9%). Flavor and age restrictions were the least covered news subjects ( 1% each). Among different subject foci, reports on quitting cigarettes using ENDS had the highest probability of scoring in the top three deciles of popularity rankings. Moreover, ENDS news on quitting and prices had a more positive sentiment on average than news with other subject foci. Public health leaders can use these trends to stay abreast of how ENDS are portrayed in the media, and potentially how the public perceives ENDS. Because our analytical strategies are updated in near real time, we aim to make media monitoring part of standard practice to support evidence-based tobacco control in the future.

		Masoud Rouhizadeh, Elham Hatef, Mark Dredze, Christopher Chute, Hadi Kharrazi. Identifying Social Determinants of Health from Clinical Notes: A Rule-Based Approach. AMIA Natural Language Processing Working Group Pre-Symposium, 2018. [Bibtex] [Close] @inproceedings{Rouhizadeh:2018fq, author = {Masoud Rouhizadeh and Elham Hatef and Mark Dredze and Christopher Chute and Hadi Kharrazi}, booktitle = {AMIA Natural Language Processing Working Group Pre-Symposium}, date-added = {2018-09-02 14:25:18 -0400}, date-modified = {2018-09-02 14:26:14 -0400}, keywords = {workshop}, title = {Identifying Social Determinants of Health from Clinical Notes: A Rule-Based Approach}, year = {2018} }

		David A Broniatowski, Amelia M Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C Quinn, Mark Dredze. Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate. American Journal of Public Health (AJPH), 2018;108(10):1378-1384. [PDF] [Bibtex] [Close] @article{broniatowski:2018a, abstract = {Objectives. To understand how Twitter bots and trolls (``bots'') promote online health content. Methods. We compared bots' to average users' rates of vaccine-relevant messages, which we collected online from July 2014 through September 2017. We estimated the likelihood that users were bots, comparing proportions of polarized and antivaccine tweets across user types. We conducted a content analysis of a Twitter hashtag associated with Russian troll activity. Results. Compared with average users, Russian trolls (χ2(1) = 102.0; P < .001), sophisticated bots (χ2(1) = 28.6; P < .001), and ``content polluters'' (χ2(1) = 7.0; P < .001) tweeted about vaccination at higher rates. Whereas content polluters posted more antivaccine content (χ2(1) = 11.18; P < .001), Russian trolls amplified both sides. Unidentifiable accounts were more polarized (χ2(1) = 12.1; P < .001) and antivaccine (χ2(1) = 35.9; P < .001). Analysis of the Russian troll hashtag showed that its messages were more political and divisive. Conclusions. Whereas bots that spread malware and unsolicited content disseminated antivaccine messages, Russian trolls promoted discord. Accounts masquerading as legitimate users create false equivalency, eroding public consensus on vaccination. Public Health Implications. Directly confronting vaccine skeptics enables bots to legitimize the vaccine debate. More research is needed to determine how best to combat bot-driven content. (Am J Public Health. Published online ahead of print August 23, 2018: e1--e7. doi:10.2105/AJPH.2018.304567)}, annote = {(<b>Ranked #31 of 13.6 million research outputs by <a href="https://apha.altmetric.com/details/46880168#score"><span class="pub_link">Altmetric</span></a> and in the <a href="https://www.altmetric.com/top100/2018/">top-20 for 2018.</a></b>) [<a href="http://www.cs.jhu.edu/~mdredze/datasets/2018_ajph_weaponized_health_10k.zip"><span class="pub_link">Data</span></a>] }, author = {David A. Broniatowski and Amelia M. Jamison and SiHua Qi and Lulwah AlKulaib and Tao Chen and Adrian Benton and Sandra C. Quinn and Mark Dredze}, date-modified = {2019-11-19 14:04:26 -0500}, file = {https://ajph.aphapublications.org/doi/10.2105/AJPH.2018.304567}, journal = {American Journal of Public Health (AJPH)}, number = {10}, pages = {1378-1384}, title = {Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate}, volume = {108}, year = {2018} } (Ranked #31 of 13.6 million research outputs by Altmetric and in the top-20 for 2018.) [Data] Objectives. To understand how Twitter bots and trolls (``bots'') promote online health content. Methods. We compared bots' to average users' rates of vaccine-relevant messages, which we collected online from July 2014 through September 2017. We estimated the likelihood that users were bots, comparing proportions of polarized and antivaccine tweets across user types. We conducted a content analysis of a Twitter hashtag associated with Russian troll activity. Results. Compared with average users, Russian trolls (χ2(1) = 102.0; P < .001), sophisticated bots (χ2(1) = 28.6; P < .001), and ``content polluters'' (χ2(1) = 7.0; P < .001) tweeted about vaccination at higher rates. Whereas content polluters posted more antivaccine content (χ2(1) = 11.18; P < .001), Russian trolls amplified both sides. Unidentifiable accounts were more polarized (χ2(1) = 12.1; P < .001) and antivaccine (χ2(1) = 35.9; P < .001). Analysis of the Russian troll hashtag showed that its messages were more political and divisive. Conclusions. Whereas bots that spread malware and unsolicited content disseminated antivaccine messages, Russian trolls promoted discord. Accounts masquerading as legitimate users create false equivalency, eroding public consensus on vaccination. Public Health Implications. Directly confronting vaccine skeptics enables bots to legitimize the vaccine debate. More research is needed to determine how best to combat bot-driven content. (Am J Public Health. Published online ahead of print August 23, 2018: e1--e7. doi:10.2105/AJPH.2018.304567)

		Adrian Benton, Mark Dredze. Using Author Embeddings to Improve Tweet Stance Classification. EMNLP Workshop on Noisy User-generated Text (W-NUT), 2018. [PDF] [Bibtex] [Close] @inproceedings{Benton:2018dk, abstract = {Many social media classification tasks analyze the content of a message, but do not consider the context of the message. For example, in tweet stance classification -- where a tweet is categorized according to a view-point it espouses -- the expressed viewpoint depends on latent beliefs held by the user. In this paper we investigate whether incorporating knowledge about the author can improve tweet stance classification. Furthermore, since author information and embeddings are often unavailable for labeled training examples, we propose a semi-supervised pre-training method to predict user embeddings. Although the neural stance classifiers we learn are often outperformed by a baseline SVM, author embedding pre-training yields improvements over a non-pre-trained neural network on four out of five domains in the SemEval 2016 6A tweet stance classification task. In a tweet gun control stance classification dataset, improvements from pre-training are only apparent when training data is limited.}, author = {Adrian Benton and Mark Dredze}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2018-08-19 22:06:11 -0400}, date-modified = {2019-01-10 00:27:19 -0500}, file = {2018_benton_wnut.pdf}, keywords = {workshop}, pages = {184--194}, title = {Using Author Embeddings to Improve Tweet Stance Classification}, year = {2018} } Many social media classification tasks analyze the content of a message, but do not consider the context of the message. For example, in tweet stance classification -- where a tweet is categorized according to a view-point it espouses -- the expressed viewpoint depends on latent beliefs held by the user. In this paper we investigate whether incorporating knowledge about the author can improve tweet stance classification. Furthermore, since author information and embeddings are often unavailable for labeled training examples, we propose a semi-supervised pre-training method to predict user embeddings. Although the neural stance classifiers we learn are often outperformed by a baseline SVM, author embedding pre-training yields improvements over a non-pre-trained neural network on four out of five domains in the SemEval 2016 6A tweet stance classification task. In a tweet gun control stance classification dataset, improvements from pre-training are only apparent when training data is limited.

		Zachary Wood-Doughty, Nicholas Andrews, Mark Dredze. Convolutions Are All You Need (For Classifying Character Sequences) EMNLP Workshop on Noisy User-generated Text (W-NUT), 2018. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2018qd, abstract = {While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences. When text is modeled as characters instead of words, the longer sequences make RNNs a poor choice. Convolutional neural networks (CNNs), although somewhat less ubiquitous than RNNs, have an internal structure more appropriate for long-distance character dependencies. To better understand how CNNs and RNNs differ in handling long sequences, we use them for text classification tasks in several character-level social media datasets. The CNN models vastly outperform the RNN models in our experiments, suggesting that CNNs are superior to RNNs at learning to classify character-level data.}, author = {Zachary Wood-Doughty and Nicholas Andrews and Mark Dredze}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2018-08-19 22:03:13 -0400}, date-modified = {2019-01-10 00:27:40 -0500}, file = {2018_wnut_convolutions.pdf}, keywords = {workshop}, pages = {208--213}, title = {Convolutions Are All You Need (For Classifying Character Sequences)}, year = {2018} } While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences. When text is modeled as characters instead of words, the longer sequences make RNNs a poor choice. Convolutional neural networks (CNNs), although somewhat less ubiquitous than RNNs, have an internal structure more appropriate for long-distance character dependencies. To better understand how CNNs and RNNs differ in handling long sequences, we use them for text classification tasks in several character-level social media datasets. The CNN models vastly outperform the RNN models in our experiments, suggesting that CNNs are superior to RNNs at learning to classify character-level data.

		Vedran Sekara, Alex Rutherford, Gideon Mann, Mark Dredze, Natalia Adler, Manuel García-Herranz. Trends in the Adoption of Corporate Child Labor Policies: An Analysis with Bloomberg Terminal ESG Data. Bloomberg Data for Good Exchange, 2018. [PDF] [Bibtex] [Close] @inproceedings{Sekara:2018uo, abstract = {Over 150 million children worldwide are estimated to be engaged in some form of child labor, with nearly one in every four children between the ages of 5 and 14 engaged in potentially harmful work in the world's poorest countries. Child labor compromises children's physical, mental, social and educational development. It also reinforces cycles of poverty, negatively affecting the ecosystem necessary for business to thrive in a sustainable manner. Against a backdrop of multiple international and national laws against child labor, corporations also adopt policies on child labor. However, new methods of globally dispersed production have made this commitment to sustainability issues across supply chains more challenging. In this work we examine, through the lens of Bloomberg's environmental, social and governance (ESG) and financial data, trends in corporate child labor policies and their relationship to classic economic variables as a first step in understanding sustainability issues across global supply networks.}, author = {Vedran Sekara and Alex Rutherford and Gideon Mann and Mark Dredze and Natalia Adler and Manuel Garc{\'\i}a-Herranz}, booktitle = {Bloomberg Data for Good Exchange}, date-added = {2018-08-15 22:43:44 -0400}, date-modified = {2018-08-15 22:44:32 -0400}, file = {2018_d4gx_child_labor.pdf}, title = {Trends in the Adoption of Corporate Child Labor Policies: An Analysis with Bloomberg Terminal ESG Data}, year = {2018} } Over 150 million children worldwide are estimated to be engaged in some form of child labor, with nearly one in every four children between the ages of 5 and 14 engaged in potentially harmful work in the world's poorest countries. Child labor compromises children's physical, mental, social and educational development. It also reinforces cycles of poverty, negatively affecting the ecosystem necessary for business to thrive in a sustainable manner. Against a backdrop of multiple international and national laws against child labor, corporations also adopt policies on child labor. However, new methods of globally dispersed production have made this commitment to sustainability issues across supply chains more challenging. In this work we examine, through the lens of Bloomberg's environmental, social and governance (ESG) and financial data, trends in corporate child labor policies and their relationship to classic economic variables as a first step in understanding sustainability issues across global supply networks.

		Zachary Wood-Doughty, Ilya Shpitser, Mark Dredze. Challenges of Using Text Classifiers for Causal Inference. Empirical Methods in Natural Language Processing (EMNLP), 2018. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2018qe, abstract = {Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference.}, author = {Zachary Wood-Doughty and Ilya Shpitser and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2018-08-11 23:50:05 -0400}, date-modified = {2019-01-10 00:27:58 -0500}, file = {2018_emnlp_causal_nlp.pdf}, pages = {4586--4598}, title = {Challenges of Using Text Classifiers for Causal Inference}, year = {2018} } Causal understanding is essential for many kinds of decision-making, but causal inference from observational data has typically only been applied to structured, low-dimensional datasets. While text classifiers produce low-dimensional outputs, their use in causal inference has not previously been studied. To facilitate causal analyses based on language data, we consider the role that text classifiers can play in causal inference through established modeling mechanisms from the causality literature on missing data and measurement error. We demonstrate how to conduct causal analyses using text classifiers on simulated and Yelp data, and discuss the opportunities and challenges of future work that uses text data in causal inference.

		John W Ayers, Theodore L Caputi, Camille Nebeker, Mark Dredze. Don't quote me: reverse identification of research participants in social media studies. Nature Digital Medicine, 2018. [PDF] [Bibtex] [Close] @article{Ayers:2018eb, abstract = {We investigated if participants in social media surveillance studies could be reverse identified by reviewing all articles published on PubMed in 2015 or 2016 with the words ``Twitter'' and either ``read,'' ``coded,'' or ``content'' in the title or abstract. Seventy-two percent (95% CI: 63--80) of articles quoted at least one participant's tweet and searching for the quoted content led to the participant 84% (95% CI: 74--91) of the time. Twenty-one percent (95% CI: 13--29) of articles disclosed a participant's Twitter username thereby making the participant immediately identifiable. Only one article reported obtaining consent to disclose identifying information and institutional review board (IRB) involvement was mentioned in only 40% (95% CI: 31--50) of articles, of which 17% (95% CI: 10--25) received IRB-approval and 23% (95% CI:16--32) were deemed exempt. Biomedical publications are routinely including identifiable information by quoting tweets or revealing usernames which, in turn, violates ICMJE ethical standards governing scientific ethics, even though said content is scientifically unnecessary. We propose that authors convey aggregate findings without revealing participants' identities, editors refuse to publish reports that reveal a participant's identity, and IRBs attend to these privacy issues when reviewing studies involving social media data. These strategies together will ensure participants are protected going forward.}, annote = {(<b>Ranked in the top 0.6% of 11.6m million research outputs by <a href="https://www.altmetric.com/details/45930481#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W Ayers and Theodore L. Caputi and Camille Nebeker and Mark Dredze}, date-added = {2018-08-07 13:32:29 -0400}, date-modified = {2018-08-07 13:33:35 -0400}, file = {https://www.nature.com/articles/s41746-018-0036-2}, journal = {Nature Digital Medicine}, number = {30}, title = {Don't quote me: reverse identification of research participants in social media studies}, volume = {1}, year = {2018} } (Ranked in the top 0.6% of 11.6m million research outputs by Altmetric) We investigated if participants in social media surveillance studies could be reverse identified by reviewing all articles published on PubMed in 2015 or 2016 with the words ``Twitter'' and either ``read,'' ``coded,'' or ``content'' in the title or abstract. Seventy-two percent (95% CI: 63--80) of articles quoted at least one participant's tweet and searching for the quoted content led to the participant 84% (95% CI: 74--91) of the time. Twenty-one percent (95% CI: 13--29) of articles disclosed a participant's Twitter username thereby making the participant immediately identifiable. Only one article reported obtaining consent to disclose identifying information and institutional review board (IRB) involvement was mentioned in only 40% (95% CI: 31--50) of articles, of which 17% (95% CI: 10--25) received IRB-approval and 23% (95% CI:16--32) were deemed exempt. Biomedical publications are routinely including identifiable information by quoting tweets or revealing usernames which, in turn, violates ICMJE ethical standards governing scientific ethics, even though said content is scientifically unnecessary. We propose that authors convey aggregate findings without revealing participants' identities, editors refuse to publish reports that reveal a participant's identity, and IRBs attend to these privacy issues when reviewing studies involving social media data. These strategies together will ensure participants are protected going forward.

		Yuki Lama, Tao Chen, Mark Dredze, Amelia M Jamison, Sandra C Quinn, David A Broniatowski. Discordance Between Human Papillomavirus Twitter Images and Disparities in Human Papillomavirus Risk and Disease in the United States: Mixed-Methods Analysis. Journal of Medical Internet Research (JMIR), 2018;20(9):e10244. [PDF] [Bibtex] [Close] @article{Lama:2018ss, abstract = {Background: Racial and ethnic minorities are disproportionately affected by human papillomavirus (HPV)-related cancer, many of which could have been prevented with vaccination. Yet, the initiation and completion rates of HPV vaccination remain low among these populations. Given the importance of social media platforms for health communication, we examined US-based HPV images on Twitter. We explored inconsistencies between the demographics represented in HPV images and the populations that experience the greatest burden of HPV-related disease. Objective: The objective of our study was to observe whether HPV images on Twitter reflect the actual burden of disease by select demographics and determine to what extent Twitter accounts utilized images that reflect the burden of disease in their health communication messages. Methods: We identified 456 image tweets about HPV that contained faces posted by US users between November 11, 2014 and August 8, 2016. We identified images containing at least one human face and utilized Face++ software to automatically extract the gender, age, and race of each face. We manually annotated the source accounts of these tweets into 3 types as follows: government (38/298, 12.8%), organizations (161/298, 54.0%), and individual (99/298, 33.2%) and topics (news, health, and other) to examine how images varied by message source. Results: Findings reflected the racial demographics of the US population but not the disease burden (795/1219, 65.22% white faces; 140/1219, 11.48% black faces; 71/1219, 5.82% Asian faces; and 213/1219, 17.47% racially ambiguous faces). Gender disparities were evident in the image faces; 71.70% (874/1219) represented female faces, whereas only 27.89% (340/1219) represented male faces. Among the 11-26 years age group recommended to receive HPV vaccine, HPV images contained more female-only faces (214/616, 34.3%) than males (37/616, 6.0%); the remainder of images included both male and female faces (365/616, 59.3%). Gender and racial disparities were present across different image sources. Faces from government sources were more likely to depict females (n=44) compared with males (n=16). Of male faces, 80% (12/15) of youth and 100% (1/1) of adults were white. News organization sources depicted high proportions of white faces (28/38, 97% of female youth and 12/12, 100% of adult males). Face++ identified fewer faces compared with manual annotation because of limitations with detecting multiple, small, or blurry faces. Nonetheless, Face++ achieved a high degree of accuracy with respect to gender, race, and age compared with manual annotation. Conclusions: This study reveals critical differences between the demographics reflected in HPV images and the actual burden of disease. Racial minorities are less likely to appear in HPV images despite higher rates of HPV incidence. Health communication efforts need to represent populations at risk better if we seek to reduce disparities in HPV infection.}, author = {Yuki Lama and Tao Chen and Mark Dredze and Amelia M Jamison and Sandra C Quinn and David A Broniatowski}, date-added = {2018-06-28 14:22:01 +0000}, date-modified = {2019-01-10 00:25:26 -0500}, file = {https://doi.org/10.2196/10244}, journal = {Journal of Medical Internet Research (JMIR)}, number = {9}, pages = {e10244}, title = {Discordance Between Human Papillomavirus Twitter Images and Disparities in Human Papillomavirus Risk and Disease in the United States: Mixed-Methods Analysis}, volume = {20}, year = {2018} } Background: Racial and ethnic minorities are disproportionately affected by human papillomavirus (HPV)-related cancer, many of which could have been prevented with vaccination. Yet, the initiation and completion rates of HPV vaccination remain low among these populations. Given the importance of social media platforms for health communication, we examined US-based HPV images on Twitter. We explored inconsistencies between the demographics represented in HPV images and the populations that experience the greatest burden of HPV-related disease. Objective: The objective of our study was to observe whether HPV images on Twitter reflect the actual burden of disease by select demographics and determine to what extent Twitter accounts utilized images that reflect the burden of disease in their health communication messages. Methods: We identified 456 image tweets about HPV that contained faces posted by US users between November 11, 2014 and August 8, 2016. We identified images containing at least one human face and utilized Face++ software to automatically extract the gender, age, and race of each face. We manually annotated the source accounts of these tweets into 3 types as follows: government (38/298, 12.8%), organizations (161/298, 54.0%), and individual (99/298, 33.2%) and topics (news, health, and other) to examine how images varied by message source. Results: Findings reflected the racial demographics of the US population but not the disease burden (795/1219, 65.22% white faces; 140/1219, 11.48% black faces; 71/1219, 5.82% Asian faces; and 213/1219, 17.47% racially ambiguous faces). Gender disparities were evident in the image faces; 71.70% (874/1219) represented female faces, whereas only 27.89% (340/1219) represented male faces. Among the 11-26 years age group recommended to receive HPV vaccine, HPV images contained more female-only faces (214/616, 34.3%) than males (37/616, 6.0%); the remainder of images included both male and female faces (365/616, 59.3%). Gender and racial disparities were present across different image sources. Faces from government sources were more likely to depict females (n=44) compared with males (n=16). Of male faces, 80% (12/15) of youth and 100% (1/1) of adults were white. News organization sources depicted high proportions of white faces (28/38, 97% of female youth and 12/12, 100% of adult males). Face++ identified fewer faces compared with manual annotation because of limitations with detecting multiple, small, or blurry faces. Nonetheless, Face++ achieved a high degree of accuracy with respect to gender, race, and age compared with manual annotation. Conclusions: This study reveals critical differences between the demographics reflected in HPV images and the actual burden of disease. Racial minorities are less likely to appear in HPV images despite higher rates of HPV incidence. Health communication efforts need to represent populations at risk better if we seek to reduce disparities in HPV infection.

		Katherine Smith, Caitlin Weiger, Errol Fields, Joanna E Cohen, Meghan Moran, Mark Dredze. Conducting public health surveillance research on consumer product websites. American Public Health Association (APHA), 2018. [PDF] [Bibtex] [Close] @inproceedings{Smith:2018jl, author = {Katherine Smith and Caitlin Weiger and Errol Fields and Joanna E Cohen and Meghan Moran and Mark Dredze}, booktitle = {American Public Health Association (APHA)}, date-added = {2018-06-05 03:27:25 +0000}, date-modified = {2018-06-05 03:28:54 +0000}, file = {2018_apha_conducting_public_health_surveillance_research.pdf}, keywords = {abstract}, title = {Conducting public health surveillance research on consumer product websites}, year = {2018} }

		Yuchen Zhou, Mark Dredze, David A Broniatowski, William Adler. Gab: The Alt-Right Social Media Platform. International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), 2018. [PDF] [Bibtex] [Close] @inproceedings{Zhou:2018uk, abstract = {This study proposes the use of Gab as a vehicle for political science research regarding modern American politics and the Alt-Right population. We collect several million Gab messages posted on Gab web- site from August 2016 to February 2018. We conduct a preliminary analysis of Gab platform related to site use, growth and topics, which shows that Gab is a reasonable resource for Alt-Right study.}, author = {Yuchen Zhou and Mark Dredze and David A Broniatowski and William Adler}, booktitle = {International Conference on Social Computing, Behavioral-Cultural Modeling & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS)}, date-added = {2018-06-03 02:43:58 +0000}, date-modified = {2018-06-03 02:44:50 +0000}, file = {2018_sbpbrims_gab.pdf}, title = {Gab: The Alt-Right Social Media Platform}, year = {2018} } This study proposes the use of Gab as a vehicle for political science research regarding modern American politics and the Alt-Right population. We collect several million Gab messages posted on Gab web- site from August 2016 to February 2018. We conduct a preliminary analysis of Gab platform related to site use, growth and topics, which shows that Gab is a reasonable resource for Alt-Right study.

		Travis Wolfe, Annabelle Carrell, Mark Dredze, Benjamin Van Durme. Summarizing Entities using Distantly Supervised Information Extractors. SIGIR Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR), 2018. [Bibtex] [Close] @inproceedings{Wolfe:2018il, author = {Travis Wolfe and Annabelle Carrell and Mark Dredze and Benjamin Van Durme}, booktitle = {SIGIR Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR)}, date-added = {2018-06-01 02:17:46 +0000}, date-modified = {2019-01-10 00:28:28 -0500}, keywords = {workshop}, pages = {51-58}, title = {Summarizing Entities using Distantly Supervised Information Extractors}, year = {2018} }

		Alexis S Hammond, Michael J Paul, J Gregory Hobelmann, Animesh R Koratana, Mark Dredze, Margaret S Chisolm. Perceived Attitudes About Substance Use in Anonymous Social Media Posts Near College Campuses. Journal of Medical Internet Research Mental Health (JMIR MH), 2018;5(3):e52. [PDF] [Bibtex] [Close] @article{hammond:2018lq, author = {Alexis S. Hammond and Michael J. Paul and J. Gregory Hobelmann and Animesh R. Koratana and Mark Dredze and Margaret S. Chisolm}, date-added = {2018-05-30 13:34:04 +0000}, date-modified = {2019-01-10 00:25:47 -0500}, file = {https://doi.org/10.2196/mental.9903}, journal = {Journal of Medical Internet Research Mental Health (JMIR MH)}, number = {3}, pages = {e52}, title = {Perceived Attitudes About Substance Use in Anonymous Social Media Posts Near College Campuses}, volume = {5}, year = {2018} }

		Zachary Wood-Doughty, Praateek Mahajan, Mark Dredze. Johns Hopkins or johnny-hopkins: Classifying Individuals versus Organizations on Twitter. NAACL Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media, 2018. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2018:peoples2, author = {Zachary Wood-Doughty and Praateek Mahajan and Mark Dredze}, booktitle = {NAACL Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media}, date-modified = {2019-01-10 00:28:51 -0500}, file = {2018_individual_vs_organization.pdf}, keywords = {workshop}, pages = {56-61}, title = {Johns Hopkins or johnny-hopkins: Classifying Individuals versus Organizations on Twitter}, year = {2018} }

		Zachary Wood-Doughty, Nicholas Andrews, Rebecca Marvin, Mark Dredze. Predicting Twitter User Demographics from Names Alone. NAACL Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media, 2018. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2018:peoples1, author = {Zachary Wood-Doughty and Nicholas Andrews and Rebecca Marvin and Mark Dredze}, booktitle = {NAACL Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media}, date-modified = {2019-01-10 00:29:08 -0500}, file = {2018_peoples_name_demographics.pdf}, keywords = {workshop}, pages = {105--111}, title = {Predicting Twitter User Demographics from Names Alone}, year = {2018} }

		Theodore L Caputi, Eric C Leas, Mark Dredze, John W Ayers. Online Sales of Marijuana: An Unrecognized Public Health Dilemma. American Journal of Preventive Medicine (AJPM), 2018;54(5):719-721. [PDF] [Bibtex] [Close] @article{Caputi:2018dk, author = {Theodore L. Caputi and Eric C. Leas and Mark Dredze and John W. Ayers}, date-added = {2018-03-27 00:50:11 +0000}, date-modified = {2019-01-10 00:26:38 -0500}, file = {https://doi.org/10.1016/j.amepre.2018.01.032}, journal = {American Journal of Preventive Medicine (AJPM)}, number = {5}, pages = {719-721}, title = {Online Sales of Marijuana: An Unrecognized Public Health Dilemma}, volume = {54}, year = {2018} }

		Adrian Benton, Mark Dredze. Deep Dirichlet Multinomial Regression. North American Chapter of the Association for Computational Linguistics (NAACL), 2018. [PDF] [Bibtex] [Close] @inproceedings{Benton:2018dn, author = {Adrian Benton and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2018-02-15 02:14:16 +0000}, date-modified = {2019-01-10 00:29:25 -0500}, file = {2018_naacl_deep_dmr.pdf}, pages = {365--374}, title = {Deep Dirichlet Multinomial Regression}, year = {2018} }

		Tao Chen, Mark Dredze. Vaccine Images on Twitter: What is Shared and Why. Journal of Medical Internet Research (JMIR), 2018;20(4):2018. [PDF] [Bibtex] [Close] @article{Chen:2018tg, author = {Tao Chen and Mark Dredze}, date-added = {2018-01-29 14:32:03 +0000}, date-modified = {2019-01-10 00:26:53 -0500}, file = {http://www.jmir.org/2018/4/e130/}, journal = {Journal of Medical Internet Research (JMIR)}, number = {4}, pages = {2018}, title = {Vaccine Images on Twitter: What is Shared and Why}, volume = {20}, year = {2018} }

		2017 (23 Publications)
		Seth M Noar, Eric C Leas, Benjamin M Althouse, Mark Dredze, Dannielle Kelley, John W Ayers. Can a selfie promote public engagement with skin cancer? Preventive Medicine, 2017. [PDF] [Bibtex] [Close] @article{NOAR2017, abstract = {Social media may provide new opportunities to promote skin cancer prevention, but research to understand this potential is needed. In April of 2015, Kentucky native Tawny Willoughby (TW) shared a graphic skin cancer selfie on Facebook that subsequently went viral. We examined the volume of comments and shares of her original Facebook post; news volume of skin cancer from Google News; and search volume for skin cancer Google queries. We compared these latter metrics after TWs announcement against expected volumes based on forecasts of historical trends. TW's skin cancer selfie went viral on May 11, 2015 after the social media post had been shared approximately 50,000 times. All search queries for skin cancer increased 162% (95% CI 102 to 320) and 155% (95% CI 107 to 353) on May 13th and 14th, when news about TW's skin cancer selfie was at its peak, and remained higher through May 17th. Google searches about skin cancer prevention and tanning were also significantly higher than expected volumes. In practical terms, searches reached near-record levels - i.e., May 13th, 14th and 15th were respectively the 6th, 8th, and 40th most searched days for skin cancer since January 1, 2004 when Google began tracking searches. We conclude that an ordinary person's social media post caught the public's imagination and led to significant increases in public engagement with skin cancer prevention. Digital surveillance methods can rapidly detect these events in near real time, allowing public health practitioners to engage and potentially elevate positive effects.}, author = {Seth M. Noar and Eric C Leas and Benjamin M. Althouse and Mark Dredze and Dannielle Kelley and John W. Ayers}, date-added = {2017-12-13 16:11:09 +0000}, date-modified = {2019-01-10 00:17:36 -0500}, doi = {https://doi.org/10.1016/j.ypmed.2017.10.038}, file = {http://www.sciencedirect.com/science/article/pii/S0091743517304206}, issn = {0091-7435}, journal = {Preventive Medicine}, keywords = {Social media, Selfie, Health communication, Skin cancer, Prevention}, pages = {10.1016/j.ypmed.2017.10.038}, title = {Can a selfie promote public engagement with skin cancer?}, year = {2017}, bdsk-url-1 = {http://www.sciencedirect.com/science/article/pii/S0091743517304206}, bdsk-url-2 = {https://doi.org/10.1016/j.ypmed.2017.10.038} } Social media may provide new opportunities to promote skin cancer prevention, but research to understand this potential is needed. In April of 2015, Kentucky native Tawny Willoughby (TW) shared a graphic skin cancer selfie on Facebook that subsequently went viral. We examined the volume of comments and shares of her original Facebook post; news volume of skin cancer from Google News; and search volume for skin cancer Google queries. We compared these latter metrics after TWs announcement against expected volumes based on forecasts of historical trends. TW's skin cancer selfie went viral on May 11, 2015 after the social media post had been shared approximately 50,000 times. All search queries for skin cancer increased 162% (95% CI 102 to 320) and 155% (95% CI 107 to 353) on May 13th and 14th, when news about TW's skin cancer selfie was at its peak, and remained higher through May 17th. Google searches about skin cancer prevention and tanning were also significantly higher than expected volumes. In practical terms, searches reached near-record levels - i.e., May 13th, 14th and 15th were respectively the 6th, 8th, and 40th most searched days for skin cancer since January 1, 2004 when Google began tracking searches. We conclude that an ordinary person's social media post caught the public's imagination and led to significant increases in public engagement with skin cancer prevention. Digital surveillance methods can rapidly detect these events in near real time, allowing public health practitioners to engage and potentially elevate positive effects.

		Theodore L Caputi, Eric C Leas, Mark Dredze, Joanna E Cohen, John W Ayers. They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products. PLoS ONE, 2017. [PDF] [Bibtex] [Close] @article{Caputi:2017xr, author = {Theodore L. Caputi and Eric C Leas and Mark Dredze and Joanna E. Cohen and John W. Ayers}, date-added = {2017-10-15 06:30:45 +0000}, date-modified = {2019-01-10 00:18:38 -0500}, file = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185735}, journal = {PLoS ONE}, pages = {10.1371/journal.pone.0185735}, title = {They're heating up: Internet search query trends reveal significant public interest in heat-not-burn tobacco products}, year = {2017} }

		Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield, Philipp Koehn, Craig Harman, Dawn Lawrie, Chandler May, Max Thomas, Julianne Chaloux, Annabelle Carrell, Tongfei Chen, Alex Comerford, Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Rashmi Sankepally, Pushpendre Rastogi, Travis Wolfe, Ying-Ying Tran, Ted Zhang. CADET: Computer Assisted Discovery Extraction and Translation. International Joint Conference on Natural Language Processing (IJCNLP) (Demonstration Track), 2017. [PDF] [Bibtex] [Close] @inproceedings{Durme:2017mz, author = {Benjamin Van Durme and Tom Lippincott and Kevin Duh and Deana Burchfield and Adam Poliak and Cash Costello and Tim Finin and Scott Miller and James Mayfield and Philipp Koehn and Craig Harman and Dawn Lawrie and Chandler May and Max Thomas and Julianne Chaloux and Annabelle Carrell and Tongfei Chen and Alex Comerford and Mark Dredze and Benjamin Glass and Shudong Hao and Patrick Martin and Rashmi Sankepally and Pushpendre Rastogi and Travis Wolfe and Ying-Ying Tran and Ted Zhang}, booktitle = {International Joint Conference on Natural Language Processing (IJCNLP) (Demonstration Track)}, date-added = {2017-10-10 03:43:22 +0000}, date-modified = {2019-01-10 00:20:32 -0500}, file = {http://aclweb.org/anthology/I17-3002.pdf}, pages = {5-8}, title = {CADET: Computer Assisted Discovery Extraction and Translation}, year = {2017} }

		Michael C Smith, Mark Dredze, Sandra C Quinn, David A Broniatowski. Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy. AMIA Workshop on Social Media Mining for Health Applications, 2017. [PDF] [Bibtex] [Close] @inproceedings{Smith:2017pi, author = {Michael C. Smith and Mark Dredze and Sandra C Quinn and David A. Broniatowski}, booktitle = {AMIA Workshop on Social Media Mining for Health Applications}, date-added = {2017-09-28 03:21:28 +0000}, date-modified = {2017-09-28 03:22:29 +0000}, file = {2017_amia_nlp_workshop_spatial_health.pdf}, keywords = {workshop}, title = {Monitoring Real-time Spatial Public Health Discussions in the Context of Vaccine Hesitancy}, year = {2017} }

		Ning Gao, Mark Dredze, Douglas Oard. Enhancing Scientific Collaboration Through Knowledge Base Population and Linking for Meetings. Hawaii International Conference on System Sciences (HICSS), 2017. [PDF] [Bibtex] [Close] @inproceedings{Gao:2017bf, author = {Ning Gao and Mark Dredze and Douglas Oard}, booktitle = {Hawaii International Conference on System Sciences (HICSS)}, date-added = {2017-09-12 16:35:29 +0000}, date-modified = {2019-01-10 00:21:54 -0500}, file = {2017_hicss.pdf}, pages = {10.24251/HICSS.2018.076}, title = {Enhancing Scientific Collaboration Through Knowledge Base Population and Linking for Meetings}, year = {2017} }

		Michael J Paul, Mark Dredze. Social Monitoring for Public Health. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2017;9(5):1-183. [PDF] [Bibtex] [Close] @article{doi:10.2200/S00791ED1V01Y201707ICR060, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/publications/2017_social_monitoring_preprint.pdf"><span class="pub_link">Preprint (free)</span></a>]}, author = {Michael J. Paul and Mark Dredze}, date-added = {2017-09-05 02:27:41 +0000}, date-modified = {2018-03-06 03:12:48 +0000}, doi = {10.2200/S00791ED1V01Y201707ICR060}, file = {https://doi.org/10.2200/S00791ED1V01Y201707ICR060}, journal = {Synthesis Lectures on Information Concepts, Retrieval, and Services}, keywords = {book}, number = {5}, pages = {1-183}, title = {Social Monitoring for Public Health}, url = {https://doi.org/10.2200/S00791ED1V01Y201707ICR060}, volume = {9}, year = {2017}, bdsk-url-1 = {https://doi.org/10.2200/S00791ED1V01Y201707ICR060}, bdsk-url-2 = {http://dx.doi.org/10.2200/S00791ED1V01Y201707ICR060} } [Preprint (free)]

		Ning Gao, Gregory Sell, Douglas Oard, Mark Dredze. Leveraging Side Information for Speaker Identification with the Enron Conversational Telephone Speech Collection. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017. [PDF] [Bibtex] [Close] @inproceedings{Gao:2017by, author = {Ning Gao and Gregory Sell and Douglas Oard and Mark Dredze}, booktitle = {IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, date-added = {2017-08-31 14:27:51 +0000}, date-modified = {2017-08-31 14:28:45 +0000}, file = {2017_asru_speakerid.pdf}, title = {Leveraging Side Information for Speaker Identification with the Enron Conversational Telephone Speech Collection}, year = {2017} }

		John W Ayers, Benjamin M Althouse, Eric C Leas, Mark Dredze, Jon-Patrick Allem. Internet searches for suicide following the release of 13 Reasons Why. JAMA Internal Medicine, 2017;57(4):238-240. [PDF] [Bibtex] [Close] @article{doi:10.1001/jamainternmed.2017.3333, annote = {(<b>Ranked in the top .02% of 8.2m research outputs by <a href="https://jamanetwork.altmetric.com/details/23320505"><span class="pub_link">Altmetric</span></a></b>, Read the <a href="http://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2646769"><span class="pub_link">JAMA IM Editorial</span></a>, Read Netflix cast's response to <a href="http://www.clevver.com/13-reasons-why-season-2-criticism/"><span class="pub_link">criticism</span></a>)}, author = {John W Ayers and Benjamin M. Althouse and Eric C Leas and Mark Dredze and Jon-Patrick Allem}, date-added = {2017-07-31 15:15:23 +0000}, date-modified = {2017-08-14 20:41:48 +0000}, doi = {10.1001/jamainternmed.2017.3333}, eprint = {/data/journals/intemed/0/jamainternal_ayers_2017_ld_170042.pdf}, file = {http://dx.doi.org/10.1001/jamainternmed.2017.3333}, journal = {JAMA Internal Medicine}, number = {4}, pages = {238-240}, title = {Internet searches for suicide following the release of {13 Reasons Why}}, volume = {57}, year = {2017}, bdsk-url-1 = {+%20http://dx.doi.org/10.1001/jamainternmed.2017.3333}, bdsk-url-2 = {http://dx.doi.org/10.1001/jamainternmed.2017.3333} } (Ranked in the top .02% of 8.2m research outputs by Altmetric, Read the JAMA IM Editorial, Read Netflix cast's response to criticism)

		Mark Dredze, Zachary Wood-Doughty, Sandra C Quinn, David A Broniatowski. Vaccine opponents' use of Twitter during the 2016 US presidential election: Implications for practice and policy. Vaccine, 2017;35(36):4670-4672. [PDF] [Bibtex] [Close] @article{Dredze:2017fv, abstract = {The recent inauguration of President Trump carries with it many public health policy implications. During the election, President Trump, like all political candidates, made policy commitments to various interest groups including vaccine skeptics. These groups celebrated the announcement that Robert Kennedy Jr., a noted proponent of a causal link between vaccines and autism, may chair a commission on vaccines. Furthermore, during the GOP primaries, Mr. Trump endorsed messages associated with vaccine refusal on Twitter, and met with prominent vaccine refusal advocates including Andrew Wakefield, who published the retracted and discredited 1998 Lancet article claiming to link autism to MMR vaccination. In this paper, we show that the new administration has mobilized vaccine refusal advocates, potentially enabling them to influence the national agenda in a manner that could lead to changes in existing vaccination policy.}, author = {Mark Dredze and Zachary Wood-Doughty and Sandra C Quinn and David A. Broniatowski}, date-added = {2017-07-06 01:40:20 +0000}, date-modified = {2019-01-10 00:23:42 -0500}, file = {https://doi.org/10.1016/j.vaccine.2017.06.066}, journal = {Vaccine}, month = {July}, number = {36}, pages = {4670-4672}, title = {Vaccine opponents' use of Twitter during the 2016 US presidential election: Implications for practice and policy}, volume = {35}, year = {2017} } The recent inauguration of President Trump carries with it many public health policy implications. During the election, President Trump, like all political candidates, made policy commitments to various interest groups including vaccine skeptics. These groups celebrated the announcement that Robert Kennedy Jr., a noted proponent of a causal link between vaccines and autism, may chair a commission on vaccines. Furthermore, during the GOP primaries, Mr. Trump endorsed messages associated with vaccine refusal on Twitter, and met with prominent vaccine refusal advocates including Andrew Wakefield, who published the retracted and discredited 1998 Lancet article claiming to link autism to MMR vaccination. In this paper, we show that the new administration has mobilized vaccine refusal advocates, potentially enabling them to influence the national agenda in a manner that could lead to changes in existing vaccination policy.

		Anietie Andy, Mark Dredze, Mugizi Rwebangira, Chris Callison-Burch. Constructing an Alias List for Named Entities during an Event. EMNLP Workshop on Noisy User-generated Text (W-NUT), 2017. [PDF] [Bibtex] [Close] @inproceedings{Andy:2017rr, author = {Anietie Andy and Mark Dredze and Mugizi Rwebangira and Chris Callison-Burch}, booktitle = {EMNLP Workshop on Noisy User-generated Text (W-NUT)}, date-added = {2017-07-04 14:43:55 +0000}, date-modified = {2017-08-16 00:46:24 +0000}, file = {wnut_alias_list_2017.pdf}, keywords = {workshop}, pages = {40-44}, title = {Constructing an Alias List for Named Entities during an Event}, year = {2017} }

		Nanyun Peng, Mark Dredze. Multi-task Domain Adaptation for Sequence Tagging. ACL Workshop on Representation Learning for NLP (RepL4NLP), 2017. [PDF] [Bibtex] [Close] @inproceedings{Peng:2017ye, abstract = {Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for multiple tasks simultaneously, and learns shared representations that better generalize for domain adaptation. We apply the proposed framework to domain adaptation for sequence tagging problems considering two tasks: Chinese word segmentation and named entity recognition. Experiments show that multi-task domain adaptation works better than disjoint domain adaptation for each task, and achieves the state-of-the-art results for both tasks in the social media domain.}, author = {Nanyun Peng and Mark Dredze}, booktitle = {ACL Workshop on Representation Learning for NLP (RepL4NLP)}, date-added = {2017-05-30 22:30:33 +0000}, date-modified = {2017-08-14 20:41:07 +0000}, file = {https://arxiv.org/abs/1608.02689}, keywords = {workshop}, pages = {91-100}, title = {Multi-task Domain Adaptation for Sequence Tagging}, year = {2017} } Many domain adaptation approaches rely on learning cross domain shared representations to transfer the knowledge learned in one domain to other domains. Traditional domain adaptation only considers adapting for one task. In this paper, we explore multi-task representation learning under the domain adaptation scenario. We propose a neural network framework that supports domain adaptation for multiple tasks simultaneously, and learns shared representations that better generalize for domain adaptation. We apply the proposed framework to domain adaptation for sequence tagging problems considering two tasks: Chinese word segmentation and named entity recognition. Experiments show that multi-task domain adaptation works better than disjoint domain adaptation for each task, and achieves the state-of-the-art results for both tasks in the social media domain.

		Zachary Wood-Doughty, Michael C Smith, David A Broniatowski, Mark Dredze. How Does Twitter User Behavior Vary Across Demographic Groups? ACL Workshop on Natural Language Processing and Computational Social Science, 2017. [PDF] [Bibtex] [Close] @inproceedings{Wood-Doughty:2017lr, abstract = {Demographically-tagged social media messages are a common source of data for computational social science. While these messages can indicate differences in beliefs and behaviors between demographic groups, we do not have a clear understanding of how different demographic groups use platforms such as Twitter. This paper presents a preliminary analysis of how groups' differing behaviors may confound analyses of the groups themselves. We analyzed one million Twitter users by first inferring demographic attributes, and then measuring several indicators of Twitter behavior. We find differences in these indicators across demographic groups, suggesting that there may be underlying differences in how different demographic groups use Twitter.}, author = {Zachary Wood-Doughty and Michael C Smith and David A Broniatowski and Mark Dredze}, booktitle = {ACL Workshop on Natural Language Processing and Computational Social Science}, date-added = {2017-05-24 12:38:57 +0000}, date-modified = {2017-08-14 20:40:28 +0000}, file = {2017_nlpcss_demographics.pdf}, keywords = {workshop}, pages = {83-89}, title = {How Does Twitter User Behavior Vary Across Demographic Groups?}, year = {2017} } Demographically-tagged social media messages are a common source of data for computational social science. While these messages can indicate differences in beliefs and behaviors between demographic groups, we do not have a clear understanding of how different demographic groups use platforms such as Twitter. This paper presents a preliminary analysis of how groups' differing behaviors may confound analyses of the groups themselves. We analyzed one million Twitter users by first inferring demographic attributes, and then measuring several indicators of Twitter behavior. We find differences in these indicators across demographic groups, suggesting that there may be underlying differences in how different demographic groups use Twitter.

		Jon-Patrick Allem, Eric C Leas, Theodore L Caputi, Mark Dredze, Benjamin M Althouse, Seth M Noar, John W Ayers. The Charlie Sheen Effect on Rapid In-home Human Immunodeficiency Virus Test Sales. Prevention Science, 2017;18(5):541--544. [PDF] [Bibtex] [Close] @article{Allem:2017qd, abstract = {One in eight of the 1.2 million Americans living with human immunodeficiency virus (HIV) are unaware of their positive status, and untested individuals are responsible for most new infections. As a result, testing is the most cost-effective HIV prevention strategy and must be accelerated when opportunities are presented. Web searches for HIV spiked around actor Charlie Sheen's HIV-positive disclosure. However, it is unknown whether Sheen's disclosure impacted offline behaviors like HIV testing. The goal of this study was to determine if Sheen's HIV disclosure was a record-setting HIV prevention event and determine if Web searches presage increases in testing allowing for rapid detection and reaction in the future. Sales of OraQuick rapid in-home HIV test kits in the USA were monitored weekly from April 12, 2014, to April 16, 2016, alongside Web searches including the terms ``test,'' ``tests,'' or ``testing'' and ``HIV'' as accessed from Google Trends. Changes in OraQuick sales around Sheen's disclosure and prediction models using Web searches were assessed. OraQuick sales rose 95% (95% CI, 75--117; p < 0.001) of the week of Sheen's disclosure and remained elevated for 4 more weeks (p < 0.05). In total, there were 8225 more sales than expected around Sheen's disclosure, surpassing World AIDS Day by a factor of about 7. Moreover, Web searches mirrored OraQuick sales trends (r = 0.79), demonstrating their ability to presage increases in testing. The ``Charlie Sheen effect'' represents an important opportunity for a public health response, and in the future, Web searches can be used to detect and act on more opportunities to foster prevention behaviors.}, author = {Jon-Patrick Allem and Eric C. Leas and Theodore L. Caputi and Mark Dredze and Benjamin M. Althouse and Seth M. Noar and John W. Ayers}, date-added = {2017-05-18 12:43:26 +0000}, date-modified = {2019-01-10 00:19:11 -0500}, file = {https://link.springer.com/article/10.1007/s11121-017-0792-2}, journal = {Prevention Science}, number = {5}, pages = {541--544}, title = {The Charlie Sheen Effect on Rapid In-home Human Immunodeficiency Virus Test Sales}, volume = {18}, year = {2017} } One in eight of the 1.2 million Americans living with human immunodeficiency virus (HIV) are unaware of their positive status, and untested individuals are responsible for most new infections. As a result, testing is the most cost-effective HIV prevention strategy and must be accelerated when opportunities are presented. Web searches for HIV spiked around actor Charlie Sheen's HIV-positive disclosure. However, it is unknown whether Sheen's disclosure impacted offline behaviors like HIV testing. The goal of this study was to determine if Sheen's HIV disclosure was a record-setting HIV prevention event and determine if Web searches presage increases in testing allowing for rapid detection and reaction in the future. Sales of OraQuick rapid in-home HIV test kits in the USA were monitored weekly from April 12, 2014, to April 16, 2016, alongside Web searches including the terms ``test,'' ``tests,'' or ``testing'' and ``HIV'' as accessed from Google Trends. Changes in OraQuick sales around Sheen's disclosure and prediction models using Web searches were assessed. OraQuick sales rose 95% (95% CI, 75--117; p < 0.001) of the week of Sheen's disclosure and remained elevated for 4 more weeks (p < 0.05). In total, there were 8225 more sales than expected around Sheen's disclosure, surpassing World AIDS Day by a factor of about 7. Moreover, Web searches mirrored OraQuick sales trends (r = 0.79), demonstrating their ability to presage increases in testing. The ``Charlie Sheen effect'' represents an important opportunity for a public health response, and in the future, Web searches can be used to detect and act on more opportunities to foster prevention behaviors.

		Ning Gao, Douglas Oard, Mark Dredze. Support for Interactive Identification of Mentioned Entities in Conversational Speech. International Conference on Research and Development in Information Retrieval (SIGIR) (short paper), 2017. [PDF] [Bibtex] [Close] @inproceedings{Gao:2017wo, abstract = {Searching conversational speech poses several new challenges, among which is how the searcher will make sense of what they find. This paper describes our initial experiments with a freely available collection of Enron telephone conversations. Our goal is to help the user make sense of search results by finding information about mentioned people, places and organizations. Because automated entity recognition is not yet sufficiently accurate on conversational telephone speech, we ask the user to transcribe just the name, and to indicate where in the recording it was heard. We then seek to link that mention to other mentions of the same entity in a variety of sources (in our experiments, in email and in Wikipedia). We cast this as an entity linking problem, and achieve promising results by utilizing social network features to help compensate for the limited accuracy of automatic transcription for this challenging content.}, author = {Ning Gao and Douglas Oard and Mark Dredze}, booktitle = {International Conference on Research and Development in Information Retrieval (SIGIR) (short paper)}, date-added = {2017-04-14 21:08:02 +0000}, date-modified = {2017-08-14 20:39:25 +0000}, file = {2017_sigir_entitylinking.pdf}, pages = {953-956}, title = {Support for Interactive Identification of Mentioned Entities in Conversational Speech}, year = {2017} } Searching conversational speech poses several new challenges, among which is how the searcher will make sense of what they find. This paper describes our initial experiments with a freely available collection of Enron telephone conversations. Our goal is to help the user make sense of search results by finding information about mentioned people, places and organizations. Because automated entity recognition is not yet sufficiently accurate on conversational telephone speech, we ask the user to transcribe just the name, and to indicate where in the recording it was heard. We then seek to link that mention to other mentions of the same entity in a variety of sources (in our experiments, in email and in Wikipedia). We cast this as an entity linking problem, and achieve promising results by utilizing social network features to help compensate for the limited accuracy of automatic transcription for this challenging content.

		Nicholas Andrews, Mark Dredze, Benjamin Van Durme, Jason Eisner. Bayesian Modeling of Lexical Resources for Low-Resource Settings. Association for Computational Linguistics (ACL), 2017. [PDF] [Bibtex] [Close] @inproceedings{Andrews:2017pb, abstract = {Lexical resources such as dictionaries and gazetteers are often used as auxiliary data for tasks such as part-of-speech induction and named-entity recognition. However, discriminative training with lexical features requires annotated data to reliably estimate the lexical feature weights and may result in overfitting the lexical features at the expense of features which generalize better. In this paper, we investigate a more robust approach: we stipulate that the lexicon is the result of an assumed generative process. Practically, this means that we may treat the lexical resources as observations under the proposed generative model. The lexical resources provide training data for the generative model without requiring separate data to estimate lexical feature weights. We evaluate the proposed approach in two settings: part-of-speech induction and low-resource named-entity recognition.}, author = {Nicholas Andrews and Mark Dredze and Benjamin Van Durme and Jason Eisner}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2017-03-31 13:51:47 +0000}, date-modified = {2017-08-14 20:38:53 +0000}, file = {http://aclweb.org/anthology/P/P17/P17-1095.pdf}, pages = {1029-1039}, title = {Bayesian Modeling of Lexical Resources for Low-Resource Settings}, year = {2017} } Lexical resources such as dictionaries and gazetteers are often used as auxiliary data for tasks such as part-of-speech induction and named-entity recognition. However, discriminative training with lexical features requires annotated data to reliably estimate the lexical feature weights and may result in overfitting the lexical features at the expense of features which generalize better. In this paper, we investigate a more robust approach: we stipulate that the lexicon is the result of an assumed generative process. Practically, this means that we may treat the lexical resources as observations under the proposed generative model. The lexical resources provide training data for the generative model without requiring separate data to estimate lexical feature weights. We evaluate the proposed approach in two settings: part-of-speech induction and low-resource named-entity recognition.

		Travis Wolfe, Mark Dredze, Benjamin Van Durme. Pocket Knowledge Base Population. Association for Computational Linguistics (ACL) (short paper), 2017. [PDF] [Bibtex] [Close] @inproceedings{Wolfe:2017dq, abstract = {Existing Knowledge Base Population methods extract relations from a closed relational schema with limited coverage, leading to sparse KBs. We propose Pocket Knowledge Base Population (PKBP), the task of dynamically constructing a KB of entities related to a query and finding the best characterization of relationships between entities. We describe novel Open Information Extraction methods which leverage the PKB to find informative trigger words. We evaluate using existing KBP shared-task data as well as new annotations collected for this work. Our methods produce high quality KBs from just text with many more entities and relationships than existing KBP systems.}, annote = {[<a href="https://hub.docker.com/r/hltcoe/pocket-knowledge-base-population/"><span class="pub_link">Code</span></a>]}, author = {Travis Wolfe and Mark Dredze and Benjamin Van Durme}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2017-03-31 12:59:33 +0000}, date-modified = {2017-08-14 20:38:20 +0000}, file = {2017_acl_pocket_kb.pdf}, pages = {305-310}, title = {Pocket Knowledge Base Population}, year = {2017} } [Code] Existing Knowledge Base Population methods extract relations from a closed relational schema with limited coverage, leading to sparse KBs. We propose Pocket Knowledge Base Population (PKBP), the task of dynamically constructing a KB of entities related to a query and finding the best characterization of relationships between entities. We describe novel Open Information Extraction methods which leverage the PKB to find informative trigger words. We evaluate using existing KBP shared-task data as well as new annotations collected for this work. Our methods produce high quality KBs from just text with many more entities and relationships than existing KBP systems.

		Ning Gao, Mark Dredze, Douglas Oard. Person Entity Linking in Email with NIL Detection. Journal of the Association for Information Science and Technology (JAIST), 2017. [PDF] [Bibtex] [Close] @article{Gao:2017pb, abstract = {For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on ``conversational'' sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as ``NIL detection''). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.}, author = {Ning Gao and Mark Dredze and Douglas Oard}, date-added = {2017-03-09 04:07:14 +0000}, date-modified = {2019-01-10 00:19:36 -0500}, file = {http://onlinelibrary.wiley.com/doi/10.1002/asi.23888/full}, journal = {Journal of the Association for Information Science and Technology (JAIST)}, pages = {10.1002/asi.23888}, title = {Person Entity Linking in Email with NIL Detection}, year = {2017} } For each specific mention of an entity found in a text, the goal of entity linking is to determine whether the referenced entity is present in an existing knowledge base, and if so to determine which KB entity is the correct referent. Entity linking has been well explored for dissemination-oriented sources such as news stories, blogs, and microblog posts, but the limited work to date on ``conversational'' sources such as email or text chat has not yet attempted to determine when the referent entity is not in the knowledge base (a task known as ``NIL detection''). This article presents a supervised machine learning system for linking named mentions of people in email messages to a collection-specific knowledge base, and that is also capable of NIL detection. This system learns from manually annotated training examples to leverage a rich set of features. The entity linking accuracy for entities present in the knowledge base is substantially and significantly better than the best previously reported results on the Enron email collection, comparable accuracy is reported for the challenging NIL detection task, and these results are for the first time replicated on a second email collection from a different source with comparable results.

		Ann Irvine, Mark Dredze. Harmonic Grammar, Optimality Theory, and Syntax Learnability: An Empirical Exploration of Czech Word Order. Unpublished Manuscript, 2017. [PDF] [Bibtex] [Close] @unpublished{1702.05793, abstract = {This work presents a systematic theoretical and empirical comparison of the major algorithms that have been proposed for learning Harmonic and Optimality Theory grammars (HG and OT, respectively). By comparing learning algorithms, we are also able to compare the closely related OT and HG frameworks themselves. Experimental results show that the additional expressivity of the HG framework over OT affords performance gains in the task of predicting the surface word order of Czech sentences. We compare the perceptron with the classic Gradual Learning Algorithm (GLA), which learns OT grammars, as well as the popular Maximum Entropy model. In addition to showing that the perceptron is theoretically appealing, our work shows that the performance of the HG model it learns approaches that of the upper bound in prediction accuracy on a held out test set and that it is capable of accurately modeling observed variation.}, author = {Ann Irvine and Mark Dredze}, date-added = {2017-02-21 05:44:00 +0000}, date-modified = {2017-02-21 05:44:00 +0000}, eprint = {arXiv:1702.05793}, file = {https://arxiv.org/abs/1702.05793}, title = {Harmonic Grammar, Optimality Theory, and Syntax Learnability: An Empirical Exploration of Czech Word Order}, year = {2017} } This work presents a systematic theoretical and empirical comparison of the major algorithms that have been proposed for learning Harmonic and Optimality Theory grammars (HG and OT, respectively). By comparing learning algorithms, we are also able to compare the closely related OT and HG frameworks themselves. Experimental results show that the additional expressivity of the HG framework over OT affords performance gains in the task of predicting the surface word order of Czech sentences. We compare the perceptron with the classic Gradual Learning Algorithm (GLA), which learns OT grammars, as well as the popular Maximum Entropy model. In addition to showing that the perceptron is theoretically appealing, our work shows that the performance of the HG model it learns approaches that of the upper bound in prediction accuracy on a held out test set and that it is capable of accurately modeling observed variation.

		Adrian Benton, Glen A Coppersmith, Mark Dredze. Ethical Research Protocols for Social Media Health Research. EACL Workshop on Ethics in Natural Language Processing, 2017. [PDF] [Bibtex] [Close] @inproceedings{Benton:2017lq, abstract = {Social media have transformed data driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research.}, author = {Adrian Benton and Glen A Coppersmith and Mark Dredze}, booktitle = {EACL Workshop on Ethics in Natural Language Processing}, date-added = {2017-02-12 01:21:49 +0000}, date-modified = {2017-08-14 20:35:45 +0000}, file = {ethicsnlp_2017.pdf}, keywords = {workshop}, pages = {94-102}, title = {Ethical Research Protocols for Social Media Health Research}, year = {2017} } Social media have transformed data driven research in political science, the social sciences, health, and medicine. Since health research often touches on sensitive topics that relate to ethics of treatment and patient privacy, similar ethical considerations should be acknowledged when using social media data in health research. While much has been said regarding the ethical considerations of social media research, health research leads to an additional set of concerns. We provide practical suggestions in the form of guidelines for researchers working with social media data in health research. These guidelines can inform an IRB proposal for researchers new to social media health research.

		John W Ayers, Eric C Leas, Jon-Patrick Allem, Adrian Benton, Mark Dredze, Benjamin M Althouse, Tess B Cruz, Jennifer B Unger. Why Do People Use Electronic Nicotine Delivery Systems (Electronic Cigarettes)? A Content Analysis of Twitter, 2012-2015. PLoS One, 2017. [PDF] [Bibtex] [Close] @article{Ayers:2017qf, abstract = {The reasons for using electronic nicotine delivery systems (ENDS) are poorly understood and are primarily documented by expensive cross-sectional surveys that use preconceived close-ended response options rather than allowing respondents to use their own words. We passively identify the reasons for using ENDS longitudinally from a content analysis of public postings on Twitter. All English language public tweets including several ENDS terms (e.g., ``e-cigarette'' or ``vape'') were captured from the Twitter data stream during 2012 and 2015. After excluding spam, advertisements, and retweets, posts indicating a rationale for vaping were retained. The specific reasons for vaping were then inferred based on a supervised content analysis using annotators from Amazon's Mechanical Turk. During 2012 quitting combustibles was the most cited reason for using ENDS with 43% (95%CI 39--48) of all reason-related tweets cited quitting combustibles, e.g., ``I couldn't quit till I tried ecigs,'' eclipsing the second most cited reason by more than double. Other frequently cited reasons in 2012 included ENDS's social image (21%; 95%CI 18--25), use indoors (14%; 95%CI 11--17), flavors (14%; 95%CI 11--17), safety relative to combustibles (9%; 95%CI 7--11), cost (3%; 95%CI 2--5) and favorable odor (2%; 95%CI 1--3). By 2015 the reasons for using ENDS cited on Twitter had shifted. Both quitting combustibles and use indoors significantly declined in mentions to 29% (95%CI 24--33) and 12% (95%CI 9--16), respectively. At the same time, social image increased to 37% (95%CI 32--43) and lack of odor increased to 5% (95%CI 2--5), the former leading all cited reasons in 2015. Our data suggest the reasons people vape are shifting away from cessation and toward social image. The data also show how the ENDS market is responsive to a changing policy landscape. For instance, smoking indoors was less frequently cited in 2015 as indoor smoking restrictions became more common. Because the data and analytic approach are scalable, adoption of our strategies in the field can inform follow-up survey-based surveillance (so the right questions are asked), interventions, and policies for ENDS.}, author = {John W Ayers and Eric C Leas and Jon-Patrick Allem and Adrian Benton and Mark Dredze and Benjamin M Althouse and Tess B Cruz and Jennifer B Unger}, date-added = {2017-02-07 15:09:22 +0000}, date-modified = {2019-01-10 00:19:55 -0500}, file = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170702}, journal = {PLoS One}, pages = {10.1371/journal.pone.0170702}, title = {Why Do People Use Electronic Nicotine Delivery Systems (Electronic Cigarettes)? A Content Analysis of Twitter, 2012-2015}, year = {2017} } The reasons for using electronic nicotine delivery systems (ENDS) are poorly understood and are primarily documented by expensive cross-sectional surveys that use preconceived close-ended response options rather than allowing respondents to use their own words. We passively identify the reasons for using ENDS longitudinally from a content analysis of public postings on Twitter. All English language public tweets including several ENDS terms (e.g., ``e-cigarette'' or ``vape'') were captured from the Twitter data stream during 2012 and 2015. After excluding spam, advertisements, and retweets, posts indicating a rationale for vaping were retained. The specific reasons for vaping were then inferred based on a supervised content analysis using annotators from Amazon's Mechanical Turk. During 2012 quitting combustibles was the most cited reason for using ENDS with 43% (95%CI 39--48) of all reason-related tweets cited quitting combustibles, e.g., ``I couldn't quit till I tried ecigs,'' eclipsing the second most cited reason by more than double. Other frequently cited reasons in 2012 included ENDS's social image (21%; 95%CI 18--25), use indoors (14%; 95%CI 11--17), flavors (14%; 95%CI 11--17), safety relative to combustibles (9%; 95%CI 7--11), cost (3%; 95%CI 2--5) and favorable odor (2%; 95%CI 1--3). By 2015 the reasons for using ENDS cited on Twitter had shifted. Both quitting combustibles and use indoors significantly declined in mentions to 29% (95%CI 24--33) and 12% (95%CI 9--16), respectively. At the same time, social image increased to 37% (95%CI 32--43) and lack of odor increased to 5% (95%CI 2--5), the former leading all cited reasons in 2015. Our data suggest the reasons people vape are shifting away from cessation and toward social image. The data also show how the ENDS market is responsive to a changing policy landscape. For instance, smoking indoors was less frequently cited in 2015 as indoor smoking restrictions became more common. Because the data and analytic approach are scalable, adoption of our strategies in the field can inform follow-up survey-based surveillance (so the right questions are asked), interventions, and policies for ENDS.

		Anthony Nastasi, Tyler Bryant, Joseph K Canner, Mark Dredze, Melissa S Camp, Neeraja Nagarajan. Breast Cancer Screening and Social Media: a Content Analysis of Evidence Use and Guideline Opinions on Twitter. Journal of Cancer Education, 2017. [PDF] [Bibtex] [Close] @article{Nastasi:2017qq, abstract = {There is ongoing debate regarding the best mammography screening practices. Twitter has become a powerful tool for disseminating medical news and fostering healthcare conversations; however, little work has been done examining these conversations in the context of how users are sharing evidence and discussing current guidelines for breast cancer screening. To characterize the Twitter conversation on mammography and assess the quality of evidence used as well as opinions regarding current screening guidelines, individual tweets using mammography-related hashtags were prospectively pulled from Twitter from 5 November 2015 to 11 December 2015. Content analysis was performed on the tweets by abstracting data related to user demographics, content, evidence use, and guideline opinions. Standard descriptive statistics were used to summarize the results. Comparisons were made by demographics, tweet type (testable claim, advice, personal experience, etc.), and user type (non-healthcare, physician, cancer specialist, etc.). The primary outcomes were how users are tweeting about breast cancer screening, the quality of evidence they are using, and their opinions regarding guidelines. The most frequent user type of the 1345 tweets was ``non-healthcare'' with 323 tweets (32.5%). Physicians had 1.87 times higher odds (95% CI, 0.69--5.07) of providing explicit support with a reference and 11.70 times higher odds (95% CI, 3.41--40.13) of posting a tweet likely to be supported by the scientific community compared to non-healthcare users. Only 2.9% of guideline tweets approved of the guidelines while 14.6% claimed to be confused by them. Non-healthcare users comprise a significant proportion of participants in mammography conversations, with tweets often containing claims that are false, not explicitly backed by scientific evidence, and in favor of alternative ``natural'' breast cancer prevention and treatment. Furthermore, users appear to have low approval and confusion regarding screening guidelines. These findings suggest that more efforts are needed to educate and disseminate accurate information to the general public regarding breast cancer prevention modalities, emphasizing the safety of mammography and the harms of replacing conventional prevention and treatment modalities with unsubstantiated alternatives.}, author = {Anthony Nastasi and Tyler Bryant and Joseph K. Canner and Mark Dredze and Melissa S. Camp and Neeraja Nagarajan}, date-added = {2017-01-20 04:17:53 +0000}, date-modified = {2017-08-14 20:31:44 +0000}, file = {http://www.springer.com/-/4/AVm5DmFhX4Rvx3H72GBw}, journal = {Journal of Cancer Education}, pages = {1-8}, title = {Breast Cancer Screening and Social Media: a Content Analysis of Evidence Use and Guideline Opinions on Twitter}, year = {2017} } There is ongoing debate regarding the best mammography screening practices. Twitter has become a powerful tool for disseminating medical news and fostering healthcare conversations; however, little work has been done examining these conversations in the context of how users are sharing evidence and discussing current guidelines for breast cancer screening. To characterize the Twitter conversation on mammography and assess the quality of evidence used as well as opinions regarding current screening guidelines, individual tweets using mammography-related hashtags were prospectively pulled from Twitter from 5 November 2015 to 11 December 2015. Content analysis was performed on the tweets by abstracting data related to user demographics, content, evidence use, and guideline opinions. Standard descriptive statistics were used to summarize the results. Comparisons were made by demographics, tweet type (testable claim, advice, personal experience, etc.), and user type (non-healthcare, physician, cancer specialist, etc.). The primary outcomes were how users are tweeting about breast cancer screening, the quality of evidence they are using, and their opinions regarding guidelines. The most frequent user type of the 1345 tweets was ``non-healthcare'' with 323 tweets (32.5%). Physicians had 1.87 times higher odds (95% CI, 0.69--5.07) of providing explicit support with a reference and 11.70 times higher odds (95% CI, 3.41--40.13) of posting a tweet likely to be supported by the scientific community compared to non-healthcare users. Only 2.9% of guideline tweets approved of the guidelines while 14.6% claimed to be confused by them. Non-healthcare users comprise a significant proportion of participants in mammography conversations, with tweets often containing claims that are false, not explicitly backed by scientific evidence, and in favor of alternative ``natural'' breast cancer prevention and treatment. Furthermore, users appear to have low approval and confusion regarding screening guidelines. These findings suggest that more efforts are needed to educate and disseminate accurate information to the general public regarding breast cancer prevention modalities, emphasizing the safety of mammography and the harms of replacing conventional prevention and treatment modalities with unsubstantiated alternatives.

		Xiaolei Huang, Michael C Smith, Michael J Paul, Dmytro Ryzhkov, Sandra C Quinn, David A Broniatowski, Mark Dredze. Examining Patterns of Influenza Vaccination in Social Media. AAAI Joint Workshop on Health Intelligence (W3PHIAI), 2017. [PDF] [Bibtex] [Close] @inproceedings{Huang:2017yg, abstract = {Traditional data on influenza vaccination has several limitations: high cost, limited coverage of underrepresented groups, and low sensitivity to emerging public health issues. Social media, such as Twitter, provide an alternative way to understand a population's vaccination-related opinions and behaviors. In this study, we build and employ several natural language classifiers to examine and analyze behavioral patterns regarding influenza vaccination in Twitter across three dimensions: temporality (by week and month), geography (by US region), and demography (by gender). Our best results are highly correlated official government data, with a correlation over 0.90, providing validation of our approach. We then suggest a number of directions for future work.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/flu_vaccine.json.gz"><span class="pub_link">Data</span></a>]}, author = {Xiaolei Huang and Michael C. Smith and Michael J Paul and Dmytro Ryzhkov and Sandra C Quinn and David A Broniatowski and Mark Dredze}, booktitle = {AAAI Joint Workshop on Health Intelligence (W3PHIAI)}, date-added = {2016-12-05 22:39:35 +0000}, date-modified = {2019-01-10 00:23:09 -0500}, file = {2017_w3phi_vaccines.pdf}, keywords = {workshop}, pages = {542-546}, title = {Examining Patterns of Influenza Vaccination in Social Media}, year = {2017} } [Data] Traditional data on influenza vaccination has several limitations: high cost, limited coverage of underrepresented groups, and low sensitivity to emerging public health issues. Social media, such as Twitter, provide an alternative way to understand a population's vaccination-related opinions and behaviors. In this study, we build and employ several natural language classifiers to examine and analyze behavioral patterns regarding influenza vaccination in Twitter across three dimensions: temporality (by week and month), geography (by US region), and demography (by gender). Our best results are highly correlated official government data, with a correlation over 0.90, providing validation of our approach. We then suggest a number of directions for future work.

		Neeraja Nagarajan, Husain Alshaikh, Anthony Nastasi, Blair J Smart, Zackary D Berger, Eric B Schneider, Mark Dredze, Joseph K Canner, Nita Ahuja. The Utility of Twitter in Generating High-Quality Conversations about Surgical Care. Academic Surgical Congress, 2017. [PDF] [Bibtex] [Close] @inproceedings{Nagarajan:2017kx, abstract = {Introduction: There is growing interest among various stakeholders in using social media sites to discuss healthcare issues. However, little is known about how social media sites are used to discuss surgical care. There is also a lack of understanding of the types of content generated and the quality of the information shared in social media platforms about surgical care issues. We therefore sought to identify and summarize conversations on surgical care in Twitter, a popular microblogging website. Methods: A comprehensive list of surgery-related hashtags was used to pull individual tweets from 3/27-4/27/2015. Four independent reviewers blindly analyzed 25 tweets to develop themes for extraction from a larger sample. The themes were broadly divided further to obtain data at the levels of the user, the tweet, the content of the tweet and personal information shared (Figure I). Standard descriptive statistical analysis and simple logistic regression analysis was used. Results: In total, 17,783 tweets were pulled and 1000 from 615 unique users were randomly selected for analysis. Most users were from North America (62.4%) and non-healthcare related individuals (31.8%). Healthcare organizations generated 12.4%, and surgeons 9.5%, of tweets. Overall, 67.4% were original tweets and 79.0% contained a hyperlink (11% to healthcare and 8.7% to peer-reviewed sources). The common areas of surgery discussed were global surgery/health systems (18.4%), followed by general surgery (15.6%). Among personal tweets (n=236), 31.1% concerned surgery on family/friends and 24.4% on the user; 61.1% discussed procedures already performed and 58.0% used positive language about their personal experience with surgical care. Surgical news/opinion was present in 45% of tweets and 13.7% contained evidence-based information. Non-healthcare professionals were 53.5% (95% CI: 3.8%-77.5%, p=0.039) and 72.8% (95% CI: 21.1%-91.7%, p=0.017) less likely to generate a tweet that contained evidence-based information and to quote from a peer-reviewed journal, respectively, when compared to other users. Conclusion: Our study demonstrates that while healthcare professionals and organizations tend to share higher quality data on surgical care on social media, non-health care related individuals largely drive the conversation. Fewer than half of all surgery-related tweets included surgical news/opinion; only 14% included evidence-based information and just 9% linked to peer-reviewed sources. As social media outlets become important sources of actionable information, leaders in the surgical community should develop professional guidelines to maximize this versatile platform to disseminate accurate and high-quality content on surgical issues to a wide range of audiences.}, author = {Neeraja Nagarajan and Husain Alshaikh and Anthony Nastasi and Blair J Smart and Zackary D Berger and Eric B. Schneider and Mark Dredze and Joseph K. Canner and Nita Ahuja}, booktitle = {Academic Surgical Congress}, date-added = {2016-11-16 03:40:32 +0000}, date-modified = {2016-11-16 03:43:14 +0000}, file = {http://www.asc-abstracts.org/abs2017/92-02-the-utility-of-twitter-in-generating-high-quality-conversations-about-surgical-care/}, keywords = {abstract}, title = {The Utility of Twitter in Generating High-Quality Conversations about Surgical Care}, year = {2017} } Introduction: There is growing interest among various stakeholders in using social media sites to discuss healthcare issues. However, little is known about how social media sites are used to discuss surgical care. There is also a lack of understanding of the types of content generated and the quality of the information shared in social media platforms about surgical care issues. We therefore sought to identify and summarize conversations on surgical care in Twitter, a popular microblogging website. Methods: A comprehensive list of surgery-related hashtags was used to pull individual tweets from 3/27-4/27/2015. Four independent reviewers blindly analyzed 25 tweets to develop themes for extraction from a larger sample. The themes were broadly divided further to obtain data at the levels of the user, the tweet, the content of the tweet and personal information shared (Figure I). Standard descriptive statistical analysis and simple logistic regression analysis was used. Results: In total, 17,783 tweets were pulled and 1000 from 615 unique users were randomly selected for analysis. Most users were from North America (62.4%) and non-healthcare related individuals (31.8%). Healthcare organizations generated 12.4%, and surgeons 9.5%, of tweets. Overall, 67.4% were original tweets and 79.0% contained a hyperlink (11% to healthcare and 8.7% to peer-reviewed sources). The common areas of surgery discussed were global surgery/health systems (18.4%), followed by general surgery (15.6%). Among personal tweets (n=236), 31.1% concerned surgery on family/friends and 24.4% on the user; 61.1% discussed procedures already performed and 58.0% used positive language about their personal experience with surgical care. Surgical news/opinion was present in 45% of tweets and 13.7% contained evidence-based information. Non-healthcare professionals were 53.5% (95% CI: 3.8%-77.5%, p=0.039) and 72.8% (95% CI: 21.1%-91.7%, p=0.017) less likely to generate a tweet that contained evidence-based information and to quote from a peer-reviewed journal, respectively, when compared to other users. Conclusion: Our study demonstrates that while healthcare professionals and organizations tend to share higher quality data on surgical care on social media, non-health care related individuals largely drive the conversation. Fewer than half of all surgery-related tweets included surgical news/opinion; only 14% included evidence-based information and just 9% linked to peer-reviewed sources. As social media outlets become important sources of actionable information, leaders in the surgical community should develop professional guidelines to maximize this versatile platform to disseminate accurate and high-quality content on surgical issues to a wide range of audiences.

		2016 (34 Publications)
		Travis Wolfe, Mark Dredze, Benjamin Van Durme. Feature Generation for Robust Semantic Role Labeling. Unpublished Manuscript, 2016. [PDF] [Bibtex] [Close] @unpublished{1702.07046, abstract = {Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create. In this work we describe how to automatically generate rich feature sets from simple units called featlets, requiring less engineering. Using information gain to guide the generation process, we train models which rival the state of the art on two standard Semantic Role Labeling datasets with almost no task or linguistic insight.}, author = {Travis Wolfe and Mark Dredze and Benjamin Van Durme}, eprint = {arXiv:1702.07046}, file = {https://arxiv.org/abs/1702.07046}, title = {Feature Generation for Robust Semantic Role Labeling}, year = {2016} } Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create. In this work we describe how to automatically generate rich feature sets from simple units called featlets, requiring less engineering. Using information gain to guide the generation process, we train models which rival the state of the art on two standard Semantic Role Labeling datasets with almost no task or linguistic insight.

		Anietie Andy, Satoshi Sekine, Mugizi Rwebangira, Mark Dredze. Name Variation in Community Question Answering Systems. COLING Workshop on Noisy User-generated Text, 2016. [PDF] [Bibtex] [Close] @inproceedings{Andy:2016rz, abstract = {Community question answering systems are forums where users can ask and answer questions in various categories. Examples are Yahoo! Answers, Quora, and Stack Overflow. A common challenge with such systems is that a significant percentage of asked questions are left unanswered. In this paper, we propose an algorithm to reduce the number of unanswered questions in Yahoo! Answers by reusing the answer to the most similar past resolved question to the unanswered question, from the site. Semantically similar questions could be worded differently, thereby making it difficult to find questions that have shared needs. For example, Who is the best player for the Reds? and Who is currently the biggest star at Manchester United? have a shared need but are worded differently; also, Reds and Manchester United are used to refer to the soccer team Manchester United football club. In this research, we focus on question categories that contain a large number of named entities and entity name variations. We show that in these categories, entity linking can be used to identify relevant past resolved questions with shared needs as a given question by disambiguating named entities and matching these questions based on the disambiguated entities, identified entities, and knowledge base information related to these entities. We evaluated our algorithm on a new dataset constructed from Yahoo! Answers. The dataset contains annotated question pairs, (Qgiven, [Qpast, Answer]). We carried out experiments on several question categories and show that an entity-based approach gives good performance when searching for similar questions in entity rich categories.}, annote = {(<b>Best Paper Award</b>)}, author = {Anietie Andy and Satoshi Sekine and Mugizi Rwebangira and Mark Dredze}, booktitle = {COLING Workshop on Noisy User-generated Text}, date-added = {2016-10-30 05:16:37 +0000}, date-modified = {2017-08-14 21:32:47 +0000}, file = {2016_wnut_name_variation.pdf}, keywords = {workshop}, pages = {51-60}, title = {Name Variation in Community Question Answering Systems}, year = {2016} } (Best Paper Award) Community question answering systems are forums where users can ask and answer questions in various categories. Examples are Yahoo! Answers, Quora, and Stack Overflow. A common challenge with such systems is that a significant percentage of asked questions are left unanswered. In this paper, we propose an algorithm to reduce the number of unanswered questions in Yahoo! Answers by reusing the answer to the most similar past resolved question to the unanswered question, from the site. Semantically similar questions could be worded differently, thereby making it difficult to find questions that have shared needs. For example, Who is the best player for the Reds? and Who is currently the biggest star at Manchester United? have a shared need but are worded differently; also, Reds and Manchester United are used to refer to the soccer team Manchester United football club. In this research, we focus on question categories that contain a large number of named entities and entity name variations. We show that in these categories, entity linking can be used to identify relevant past resolved questions with shared needs as a given question by disambiguating named entities and matching these questions based on the disambiguated entities, identified entities, and knowledge base information related to these entities. We evaluated our algorithm on a new dataset constructed from Yahoo! Answers. The dataset contains annotated question pairs, (Qgiven, [Qpast, Answer]). We carried out experiments on several question categories and show that an entity-based approach gives good performance when searching for similar questions in entity rich categories.

		Travis Wolfe, Mark Dredze, Benjamin Van Durme. A Study of Imitation Learning Methods for Semantic Role Labeling. EMNLP Workshop on Structured Prediction for NLP, 2016. [PDF] [Bibtex] [Close] @inproceedings{Wolfe:2016ul, abstract = {Global features have proven effective in a wide range of structured prediction problems but come with high inference costs. Imitation learning is a common method for training models when exact inference isn't feasible. We study imitation learning for Semantic Role Labeling (SRL) and analyze the effectiveness of the Violation Fixing Perceptron (VFP) (Huang et al., 2012) and Locally Optimal Learning to Search (LOLS) (Chang et al.,2015) frameworks with respect to SRL global features. We describe problems in applying each framework to SRL and evaluate the effectiveness of some solutions. We also show that action ordering, including easy first inference, has a large impact on the quality of greedy global models.}, author = {Travis Wolfe and Mark Dredze and Benjamin Van Durme}, booktitle = {EMNLP Workshop on Structured Prediction for NLP}, date-added = {2016-09-20 17:51:35 +0000}, date-modified = {2017-08-14 21:31:41 +0000}, file = {http://www.aclweb.org/anthology/W/W16/W16-5905.pdf}, keywords = {workshop}, pages = {44-53}, title = {A Study of Imitation Learning Methods for Semantic Role Labeling}, year = {2016} } Global features have proven effective in a wide range of structured prediction problems but come with high inference costs. Imitation learning is a common method for training models when exact inference isn't feasible. We study imitation learning for Semantic Role Labeling (SRL) and analyze the effectiveness of the Violation Fixing Perceptron (VFP) (Huang et al., 2012) and Locally Optimal Learning to Search (LOLS) (Chang et al.,2015) frameworks with respect to SRL global features. We describe problems in applying each framework to SRL and evaluate the effectiveness of some solutions. We also show that action ordering, including easy first inference, has a large impact on the quality of greedy global models.

		Rebecca Knowles, Josh Carroll, Mark Dredze. Demographer: Extremely Simple Name Demographics. EMNLP Workshop on Natural Language Processing and Computational Social Science, 2016. [PDF] [Bibtex] [Close] @inproceedings{Knowles:2016qv, abstract = {The lack of demographic information available when conducting passive analysis of social media content can make it difficult to compare results to traditional survey results. We present DEMOGRAPHER, a tool that predicts gender from names, using name lists and a classifier with simple character-level features. By relying only on a name, our tool can make predictions even without extensive user-authored content. We compare DEMOGRAPHER to other available tools and discuss differences in performance. In particular, we show that DEMOGRAPHER performs well on Twitter data, making it useful for simple and rapid social media demographic inference.}, annote = {[<a href="https://bitbucket.org/mdredze/demographer"><span class="pub_link">Code</span></a>]}, author = {Rebecca Knowles and Josh Carroll and Mark Dredze}, booktitle = {EMNLP Workshop on Natural Language Processing and Computational Social Science}, date-added = {2016-09-20 14:16:10 +0000}, date-modified = {2017-08-14 21:30:54 +0000}, file = {2016_emnlp_workshop_demographer.pdf}, keywords = {workshop}, pages = {108-113}, title = {Demographer: Extremely Simple Name Demographics}, year = {2016} } [Code] The lack of demographic information available when conducting passive analysis of social media content can make it difficult to compare results to traditional survey results. We present DEMOGRAPHER, a tool that predicts gender from names, using name lists and a classifier with simple character-level features. By relying only on a name, our tool can make predictions even without extensive user-authored content. We compare DEMOGRAPHER to other available tools and discuss differences in performance. In particular, we show that DEMOGRAPHER performs well on Twitter data, making it useful for simple and rapid social media demographic inference.

		John W Ayers, Eric C Leas, Mark Dredze, Jon-Patrick Allem, Jurek G Grabowski, Linda Hill. Pok\'emon go---a new distraction for drivers and pedestrians. JAMA Internal Medicine, 2016;176(12):1865-1866. [PDF] [Bibtex] [Close] @article{doi:10.1001/jamainternmed.2016.6274, abstract = {Pok{\'e}mon GO, an augmented reality game, has swept the nation. As players move, their avatar moves within the game, and players are then rewarded for collecting Pok{\'e}mon placed in real-world locations. By rewarding movement, the game incentivizes physical activity. However, if players use their cars to search for Pok{\'e}mon they negate any health benefit and incur serious risk. Motor vehicle crashes are the leading cause of death among 16- to 24-year-olds, whom the game targets. Moreover, according to the American Automobile Association, 59% of all crashes among young drivers involve distractions within 6 seconds of the accident. We report on an assessment of drivers and pedestrians distracted by Pok{\'e}mon GO and crashes potentially caused by Pok{\'e}mon GO by mining social and news media reports.}, annote = {(<b>Ranked in the top .02% of 6.5m research outputs by <a href="https://jamanetwork.altmetric.com/details/12034170"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W Ayers and Eric C Leas and Mark Dredze and Jon-Patrick Allem and Jurek G Grabowski and Linda Hill}, date-modified = {2018-10-25 23:45:14 -0400}, doi = {10.1001/jamainternmed.2016.6274}, file = {http://dx.doi.org/10.1001/jamainternmed.2016.6274}, journal = {JAMA Internal Medicine}, number = {12}, pages = {1865-1866}, title = {Pok{\'e}mon go---a new distraction for drivers and pedestrians}, volume = {176}, year = {2016}, bdsk-url-1 = {http://dx.doi.org/10.1001/jamainternmed.2016.6274} } (Ranked in the top .02% of 6.5m research outputs by Altmetric) Pok\'emon GO, an augmented reality game, has swept the nation. As players move, their avatar moves within the game, and players are then rewarded for collecting Pok\'emon placed in real-world locations. By rewarding movement, the game incentivizes physical activity. However, if players use their cars to search for Pok\'emon they negate any health benefit and incur serious risk. Motor vehicle crashes are the leading cause of death among 16- to 24-year-olds, whom the game targets. Moreover, according to the American Automobile Association, 59% of all crashes among young drivers involve distractions within 6 seconds of the accident. We report on an assessment of drivers and pedestrians distracted by Pok\'emon GO and crashes potentially caused by Pok\'emon GO by mining social and news media reports.

		Mark Dredze, Nicholas Andrews, Jay DeYoung. Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation. EMNLP Workshop on Natural Language Processing for Social Media, 2016. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2016mz, abstract = {Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking.}, annote = {[<a href="https://bitbucket.org/noandrews/phyloinf"><span class="pub_link">Code</span></a>], [<a href="https://bitbucket.org/mdredze/tgx"><span class="pub_link">Data</span></a>]}, author = {Mark Dredze and Nicholas Andrews and Jay DeYoung}, booktitle = {EMNLP Workshop on Natural Language Processing for Social Media}, date-added = {2016-09-07 23:44:49 +0000}, date-modified = {2017-08-14 21:29:59 +0000}, file = {2016_emnlp_socialnlp.pdf}, keywords = {workshop}, pages = {20-25}, title = {Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation}, year = {2016} } [Code], [Data] Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking.

		John W Ayers, Benjamin M Althouse, Eric C Leas, Ted Alcorn, Mark Dredze. Big Media Data Can Inform Gun Violence Prevention. Bloomberg Data for Good Exchange, 2016. [PDF] [Bibtex] [Close] @inproceedings{Ayers:2016uo, abstract = {The scientific method drives improvements in public health, but a strategy of obstructionism has impeded scientists from gathering even a minimal amount of information to address America's gun violence epidemic. We argue that in spite of a lack of federal investment, large amounts of publicly available data offer scientists an opportunity to measure a range of firearm-related behaviors. Given the diversity of available data -- including news coverage, social media, web forums, online advertisements, and Internet searches (to name a few) -- there are ample opportunities for scientists to study everything from trends in particular types of gun violence to gun related behaviors (such as purchases and safety practices) to public understanding of and sentiment towards various gun violence reduction measures. Science has been sidelined in the gun violence debate for too long. Scientists must tap the big media datastream and help resolve this crisis.}, author = {John W Ayers and Benjamin M. Althouse and Eric C Leas and Ted Alcorn and Mark Dredze}, booktitle = {Bloomberg Data for Good Exchange}, date-added = {2016-08-12 20:33:46 +0000}, date-modified = {2016-08-18 20:26:34 +0000}, file = {2016_bloomberg_guns_search.pdf}, keywords = {workshop}, title = {Big Media Data Can Inform Gun Violence Prevention}, year = {2016} } The scientific method drives improvements in public health, but a strategy of obstructionism has impeded scientists from gathering even a minimal amount of information to address America's gun violence epidemic. We argue that in spite of a lack of federal investment, large amounts of publicly available data offer scientists an opportunity to measure a range of firearm-related behaviors. Given the diversity of available data -- including news coverage, social media, web forums, online advertisements, and Internet searches (to name a few) -- there are ample opportunities for scientists to study everything from trends in particular types of gun violence to gun related behaviors (such as purchases and safety practices) to public understanding of and sentiment towards various gun violence reduction measures. Science has been sidelined in the gun violence debate for too long. Scientists must tap the big media datastream and help resolve this crisis.

		Adrian Benton, Braden Hancock, Glen A Coppersmith, John W Ayers, Mark Dredze. After Sandy Hook Elementary: A Year in the Gun Control Debate on Twitter. Bloomberg Data for Good Exchange, 2016. [PDF] [Bibtex] [Close] @inproceedings{Benton:2016rm, abstract = {The mass shooting at Sandy Hook elementary school on December 14, 2012 catalyzed a year of active debate and legislation on gun control in the United States. Social media hosted an active public discussion where people expressed their support and opposition to a variety of issues surrounding gun legislation. In this paper, we show how a content based analysis of Twitter data can provide insights and understanding into this debate. We estimate the relative support and opposition to gun control measures, along with a topic analysis of each camp by analyzing over 70 million gun-related tweets from 2013. We focus on spikes in conversation surrounding major events related to guns throughout the year. Our general approach can be applied to other important public health and political issues to analyze the prevalence and nature of public opinion.}, author = {Adrian Benton and Braden Hancock and Glen A Coppersmith and John W Ayers and Mark Dredze}, booktitle = {Bloomberg Data for Good Exchange}, date-added = {2016-08-12 20:32:56 +0000}, date-modified = {2016-08-12 20:33:40 +0000}, file = {2016_bloomberg_guns_twitter.pdf}, keywords = {workshop}, title = {After Sandy Hook Elementary: A Year in the Gun Control Debate on Twitter}, year = {2016} } The mass shooting at Sandy Hook elementary school on December 14, 2012 catalyzed a year of active debate and legislation on gun control in the United States. Social media hosted an active public discussion where people expressed their support and opposition to a variety of issues surrounding gun legislation. In this paper, we show how a content based analysis of Twitter data can provide insights and understanding into this debate. We estimate the relative support and opposition to gun control measures, along with a topic analysis of each camp by analyzing over 70 million gun-related tweets from 2013. We focus on spikes in conversation surrounding major events related to guns throughout the year. Our general approach can be applied to other important public health and political issues to analyze the prevalence and nature of public opinion.

		Eric C Leas, Benjamin M Althouse, Mark Dredze, Nick Obradovich, James H Fowler, Seth M Noar, Jon-Patrick Allem, John W Ayers. Big data sensors of organic advocacy: The case of Leonardo DiCaprio and Climate Change. PLoS One, 2016;11(8):e0159885. [PDF] [Bibtex] [Close] @article{Leas:2016qd, abstract = {The strategies that experts have used to share information about social causes have historically been top-down, meaning the most influential messages are believed to come from planned events and campaigns. However, more people are independently engaging with social causes today than ever before, in part because online platforms allow them to instantaneously seek, create, and share information. In some cases this ``organic advocacy'' may rival or even eclipse top-down strategies. Big data analytics make it possible to rapidly detect public engagement with social causes by analyzing the same platforms from which organic advocacy spreads. To demonstrate this claim we evaluated how Leonardo DiCaprio's 2016 Oscar acceptance speech citing climate change motivated global English language news (Bloomberg Terminal news archives), social media (Twitter postings) and information seeking (Google searches) about climate change. Despite an insignificant increase in traditional news coverage (54%; 95%CI: -144 to 247), tweets including the terms ``climate change'' or ``global warming'' reached record highs, increasing 636% (95%CI: 573--699) with more than 250,000 tweets the day DiCaprio spoke. In practical terms the ``DiCaprio effect'' surpassed the daily average effect of the 2015 Conference of the Parties (COP) and the Earth Day effect by a factor of 3.2 and 5.3, respectively. At the same time, Google searches for ``climate change'' or ``global warming'' increased 261% (95%CI, 186--335) and 210% (95%CI 149--272) the day DiCaprio spoke and remained higher for 4 more days, representing 104,190 and 216,490 searches. This increase was 3.8 and 4.3 times larger than the increases observed during COP's daily average or on Earth Day. Searches were closely linked to content from Dicaprio's speech (e.g., ``hottest year''), as unmentioned content did not have search increases (e.g., ``electric car''). Because these data are freely available in real time our analytical strategy provides substantial lead time for experts to detect and participate in organic advocacy while an issue is salient. Our study demonstrates new opportunities to detect and aid agents of change and advances our understanding of communication in the 21st century media landscape.}, author = {Eric C Leas and Benjamin M Althouse and Mark Dredze and Nick Obradovich and James H Fowler and Seth M Noar and Jon-Patrick Allem and John W Ayers}, date-added = {2016-07-07 15:56:45 +0000}, date-modified = {2019-01-10 00:11:13 -0500}, file = {http://dx.doi.org/10.1371/journal.pone.0159885}, journal = {PLoS One}, number = {8}, pages = {e0159885}, title = {Big data sensors of organic advocacy: The case of Leonardo DiCaprio and Climate Change}, volume = {11}, year = {2016} } The strategies that experts have used to share information about social causes have historically been top-down, meaning the most influential messages are believed to come from planned events and campaigns. However, more people are independently engaging with social causes today than ever before, in part because online platforms allow them to instantaneously seek, create, and share information. In some cases this ``organic advocacy'' may rival or even eclipse top-down strategies. Big data analytics make it possible to rapidly detect public engagement with social causes by analyzing the same platforms from which organic advocacy spreads. To demonstrate this claim we evaluated how Leonardo DiCaprio's 2016 Oscar acceptance speech citing climate change motivated global English language news (Bloomberg Terminal news archives), social media (Twitter postings) and information seeking (Google searches) about climate change. Despite an insignificant increase in traditional news coverage (54%; 95%CI: -144 to 247), tweets including the terms ``climate change'' or ``global warming'' reached record highs, increasing 636% (95%CI: 573--699) with more than 250,000 tweets the day DiCaprio spoke. In practical terms the ``DiCaprio effect'' surpassed the daily average effect of the 2015 Conference of the Parties (COP) and the Earth Day effect by a factor of 3.2 and 5.3, respectively. At the same time, Google searches for ``climate change'' or ``global warming'' increased 261% (95%CI, 186--335) and 210% (95%CI 149--272) the day DiCaprio spoke and remained higher for 4 more days, representing 104,190 and 216,490 searches. This increase was 3.8 and 4.3 times larger than the increases observed during COP's daily average or on Earth Day. Searches were closely linked to content from Dicaprio's speech (e.g., ``hottest year''), as unmentioned content did not have search increases (e.g., ``electric car''). Because these data are freely available in real time our analytical strategy provides substantial lead time for experts to detect and participate in organic advocacy while an issue is salient. Our study demonstrates new opportunities to detect and aid agents of change and advances our understanding of communication in the 21st century media landscape.

		Michael J Paul, Margaret S Chisolm, Matthew W Johnson, Ryan G Vandrey, Mark Dredze. Assessing the validity of online drug forums as a source for estimating demographic and temporal trends in drug use. Journal of Addiction Medicine, 2016;10(5):324--330. [PDF] [Bibtex] [Close] @article{Michael-J.-Paul:2016fp, abstract = {Objectives: Addiction researchers have begun monitoring online forums to uncover self-reported details about use and effects of emerging drugs. The use of such online data sources has not been validated against data from large epidemiological surveys. This study aimed to characterize and compare the demographic and temporal trends associated with drug use as reported in online forums and in a large epidemiological survey. Methods: Data were collected from the website, drugs-forum.com, from January 2007 through August 2012 (143,416 messages posted by 8,087 members) and from the United States National Survey on Drug Use and Health (NSDUH) from 2007-2012. Measures of forum participation levels were compared with and validated against two measures from the NSDUH survey data: percentage of people using the drug in last 30 days and percentage using the drug more than 100 times in the past year. Results: For established drugs (e.g., cannabis), significant correlations were found across demographic groups between drugs-forum.com and the NSDUH survey data, while weaker, non-significant correlations were found with temporal trends. Emerging drugs (e.g., Salvia divinorum) were strongly associated with male users in the forum, in agreement with survey-derived data, and had temporal patterns that increased in synchrony with poison control reports. Conclusions: These results offer the first assessment of online drug forums as a valid source for estimating demographic and temporal trends in drug use. The analyses suggest that online forums are a reliable source for estimation of demographic associations and early identification of emerging drugs, but a less reliable source for measurement of long-term temporal trends.}, author = {Michael J. Paul and Margaret S. Chisolm and Matthew W. Johnson and Ryan G. Vandrey and Mark Dredze}, date-added = {2016-06-02 23:58:51 +0000}, date-modified = {2017-08-14 21:29:03 +0000}, file = {http://dx.doi.org/10.1097/ADM.0000000000000238}, journal = {Journal of Addiction Medicine}, month = {September/October}, number = {5}, pages = {324--330}, title = {Assessing the validity of online drug forums as a source for estimating demographic and temporal trends in drug use}, volume = {10}, year = {2016} } Objectives: Addiction researchers have begun monitoring online forums to uncover self-reported details about use and effects of emerging drugs. The use of such online data sources has not been validated against data from large epidemiological surveys. This study aimed to characterize and compare the demographic and temporal trends associated with drug use as reported in online forums and in a large epidemiological survey. Methods: Data were collected from the website, drugs-forum.com, from January 2007 through August 2012 (143,416 messages posted by 8,087 members) and from the United States National Survey on Drug Use and Health (NSDUH) from 2007-2012. Measures of forum participation levels were compared with and validated against two measures from the NSDUH survey data: percentage of people using the drug in last 30 days and percentage using the drug more than 100 times in the past year. Results: For established drugs (e.g., cannabis), significant correlations were found across demographic groups between drugs-forum.com and the NSDUH survey data, while weaker, non-significant correlations were found with temporal trends. Emerging drugs (e.g., Salvia divinorum) were strongly associated with male users in the forum, in agreement with survey-derived data, and had temporal patterns that increased in synchrony with poison control reports. Conclusions: These results offer the first assessment of online drug forums as a valid source for estimating demographic and temporal trends in drug use. The analyses suggest that online forums are a reliable source for estimation of demographic associations and early identification of emerging drugs, but a less reliable source for measurement of long-term temporal trends.

		David A Broniatowski, Mark Dredze, Karen M Hilyard, Maeghan Dessecker, Sandra C Quinn, Amelia M Jamison, Michael J Paul, Michael C Smith. Both Mirror and Complement: A Comparison of Social Media Data and Survey Data about Flu Vaccination. American Public Health Association, 2016. [PDF] [Bibtex] [Close] @inproceedings{Broniatowski:2016yt, author = {David A Broniatowski and Mark Dredze and Karen M Hilyard and Maeghan Dessecker and Sandra C Quinn and Amelia M Jamison and Michael J. Paul and Michael C. Smith}, booktitle = {American Public Health Association}, date-added = {2016-06-02 01:36:35 +0000}, date-modified = {2016-07-07 15:45:58 +0000}, file = {2016_apha_twitter.pdf}, keywords = {abstract}, title = {Both Mirror and Complement: A Comparison of Social Media Data and Survey Data about Flu Vaccination}, year = {2016} }

		Matthew Biggerstaff, David Alper, Mark Dredze, Spencer Fox, Isaac Chun-Hai Fung, Kyle S Hickmann, Bryan Lewis, Roni Rosenfeld, Jeffrey Shaman, Ming-Hsiang Tsou, Paola Velardi, Alessandro Vespignani, Lyn Finelli. Results from the Centers for Disease Control and Prevention's Predict the 2013--2014 Influenza Season Challenge. BMC Infectious Diseases, 2016;16(357):10.1186/s12879-016-1669-x. [PDF] [Bibtex] [Close] @article{Biggerstaff:2016uk, abstract = {Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013--14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013--March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1%, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts.}, author = {Matthew Biggerstaff and David Alper and Mark Dredze and Spencer Fox and Isaac Chun-Hai Fung and Kyle S. Hickmann and Bryan Lewis and Roni Rosenfeld and Jeffrey Shaman and Ming-Hsiang Tsou and Paola Velardi and Alessandro Vespignani and Lyn Finelli}, date-added = {2016-06-01 20:51:13 +0000}, date-modified = {2019-01-10 00:12:12 -0500}, file = {http://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-016-1669-x}, journal = {BMC Infectious Diseases}, number = {357}, pages = {10.1186/s12879-016-1669-x}, title = {Results from the Centers for Disease Control and Prevention's Predict the 2013--2014 Influenza Season Challenge}, volume = {16}, year = {2016} } Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013--14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013--March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1%, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts.

		Mark Dredze, Manuel García-Herranz, Alex Rutherford, Gideon Mann. Twitter as a Source of Global Mobility Patterns for Social Good. ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, 2016. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2016rt, abstract = {Data on human spatial distribution and movement is essential for understanding and analyzing social systems. However existing sources for this data are lacking in various ways; difficult to access, biased, have poor geographical or temporal resolution, or are significantly delayed. In this paper, we describe how geolocation data from Twitter can be used to estimate global mobility patterns and address these shortcomings. These findings will inform how this novel data source can be harnessed to address humanitarian and development efforts.}, author = {Mark Dredze and Manuel Garc{\'\i}a-Herranz and Alex Rutherford and Gideon Mann}, booktitle = {ICML Workshop on #Data4Good: Machine Learning in Social Good Applications}, date-added = {2016-05-11 16:27:10 +0000}, date-modified = {2016-05-24 19:12:39 +0000}, file = {http://arxiv.org/abs/1606.06343}, keywords = {workshop}, title = {Twitter as a Source of Global Mobility Patterns for Social Good}, year = {2016} } Data on human spatial distribution and movement is essential for understanding and analyzing social systems. However existing sources for this data are lacking in various ways; difficult to access, biased, have poor geographical or temporal resolution, or are significantly delayed. In this paper, we describe how geolocation data from Twitter can be used to estimate global mobility patterns and address these shortcomings. These findings will inform how this novel data source can be harnessed to address humanitarian and development efforts.

		Mark Dredze, David A Broniatowski, Karen M Hilyard. Zika Vaccine Misconceptions: A social media analysis. Vaccine, 2016;34(30):3441-3442. [PDF] [Bibtex] [Close] @article{Dredze:2016oq, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/zika_conspiracy.json.gz"><span class="pub_link">Data</span></a>]}, author = {Mark Dredze and David A Broniatowski and Karen M Hilyard}, date-added = {2016-05-04 02:11:03 +0000}, date-modified = {2017-08-14 21:26:48 +0000}, file = {http://dx.doi.org/10.1016/j.vaccine.2016.05.008}, journal = {Vaccine}, number = {30}, pages = {3441-3442}, title = {Zika Vaccine Misconceptions: A social media analysis}, volume = {34}, year = {2016} } [Data]

		Mark Dredze, Prabhanjan Kambadur, Gary Kazantsev, Gideon Mann, Miles Osborne. How Twitter is Changing the Nature of Financial News Discovery. SIGMOD Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets, 2016. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2016qf, abstract = {Access to the most relevant and current information is critical to financial analysis and decision making.Historically, financial news has been discovered through company press releases, required disclosures and news articles. More recently, social media has reshaped the financial news landscape, radically changing the dynamics of news dissemination. In this paper we discuss the ways in which Twitter, a leading social media platform, has contributed to changes in this landscape. We explain why today Twitter is a valuable source of material financial information and describe opportunities and challenges in using this novel news source for financial information discovery.}, author = {Mark Dredze and Prabhanjan Kambadur and Gary Kazantsev and Gideon Mann and Miles Osborne}, booktitle = {SIGMOD Workshop on Data Science for Macro-Modeling with Financial and Economic Datasets}, date-added = {2016-05-03 14:09:06 +0000}, date-modified = {2019-01-10 00:15:29 -0500}, file = {2016_dsmm.pdf}, keywords = {workshop}, pages = {10.1145/2951894.2951903}, title = {How Twitter is Changing the Nature of Financial News Discovery}, year = {2016} } Access to the most relevant and current information is critical to financial analysis and decision making.Historically, financial news has been discovered through company press releases, required disclosures and news articles. More recently, social media has reshaped the financial news landscape, radically changing the dynamics of news dissemination. In this paper we discuss the ways in which Twitter, a leading social media platform, has contributed to changes in this landscape. We explain why today Twitter is a valuable source of material financial information and describe opportunities and challenges in using this novel news source for financial information discovery.

		Nanyun Peng, Mark Dredze. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning. Association for Computational Linguistics (ACL) (short paper), 2016. [PDF] [Bibtex] [Close] @inproceedings{Peng:2016yg, abstract = {Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings. For languages where word boundaries are not readily identified in text, word segmentation is a key first step to generating features for an NER system. While using word boundary tags as features are helpful, the signals that aid in identifying these boundaries may provide richer information for an NER system. New state-of-the-art word segmentation systems use neural models to learn representations for predicting word boundaries. We show that these same representations, jointly trained with an NER system, yield significant improvements in NER for Chinese social media. In our experiments, jointly training NER and word segmentation with an LSTM-CRF model yields nearly 5% absolute improvement over previously published results.}, author = {Nanyun Peng and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2016-04-15 21:01:46 +0000}, date-modified = {2017-08-14 21:25:57 +0000}, file = {http://aclweb.org/anthology/P/P16/P16-2025.pdf}, pages = {149-155}, title = {Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning}, year = {2016} } Named entity recognition, and other information extraction tasks, frequently use linguistic features such as part of speech tags or chunkings. For languages where word boundaries are not readily identified in text, word segmentation is a key first step to generating features for an NER system. While using word boundary tags as features are helpful, the signals that aid in identifying these boundaries may provide richer information for an NER system. New state-of-the-art word segmentation systems use neural models to learn representations for predicting word boundaries. We show that these same representations, jointly trained with an NER system, yield significant improvements in NER for Chinese social media. In our experiments, jointly training NER and word segmentation with an LSTM-CRF model yields nearly 5% absolute improvement over previously published results.

		Adrian Benton, Raman Arora, Mark Dredze. Learning Multiview Embeddings of Twitter Users. Association for Computational Linguistics (ACL) (short paper), 2016. [PDF] [Bibtex] [Close] @inproceedings{Adrian-Benton:2016yg, abstract = {Low-dimensional vector representations are widely used as stand-ins for the text of words, sentences, and entire documents. These embeddings are used to identify similar words or make predictions about documents. In this work, we consider embeddings for social media users and demonstrate that these can be used to identify users who behave similarly or to predict attributes of users. In order to capture information from all aspects of a user's online life, we take a multiview approach, applying a weighted variant of Generalized Canonical Correlation Analysis (GCCA) to a collection of over 100,000 Twitter users. We demonstrate the utility of these multiview embeddings on three downstream tasks: user engagement, friend selection, and demographic attribute prediction. }, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/multiview_embeddings/"><span class="pub_link">Code</span></a>]}, author = {Adrian Benton and Raman Arora and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2016-04-15 21:01:19 +0000}, date-modified = {2017-08-14 21:25:23 +0000}, file = {2016_acl_multiview.pdf}, pages = {14-19}, title = {Learning Multiview Embeddings of Twitter Users}, year = {2016} } [Code] Low-dimensional vector representations are widely used as stand-ins for the text of words, sentences, and entire documents. These embeddings are used to identify similar words or make predictions about documents. In this work, we consider embeddings for social media users and demonstrate that these can be used to identify users who behave similarly or to predict attributes of users. In order to capture information from all aspects of a user's online life, we take a multiview approach, applying a weighted variant of Generalized Canonical Correlation Analysis (GCCA) to a collection of over 100,000 Twitter users. We demonstrate the utility of these multiview embeddings on three downstream tasks: user engagement, friend selection, and demographic attribute prediction.

		David A Broniatowski, Mark Dredze, Karen M Hilyard. Effective Vaccine Communication during the Disneyland Measles Outbreak. Vaccine, 2016;34(28):3225-3228. [PDF] [Bibtex] [Close] @article{Broniatowski:2016rz, abstract = {Vaccine refusal rates have increased in recent years, highlighting the need for effective risk communication, especially over social media. Fuzzy-trace theory predicts that individuals encode bottom-line meaning (''gist'') and statistical information (''verbatim'') in parallel and those articles expressing a clear gist will be most compelling. We coded news articles (n = 4581) collected during the 2014−2015 Disneyland measles for content including statistics, stories, or bottom-line gists regarding vaccines and vaccine-preventable illnesses. We measured the extent to which articles were compelling by how frequently they were shared on Facebook. The most widely shared articles expressed bottom-line gists, although articles containing statistics were also more likely to be shared than articles lacking statistics. Stories had limited impact on Facebook shares. Results support Fuzzy Trace Theory's predictions regarding the distinct yet parallel impact of categorical gist and statistical verbatim information on public health communication.}, author = {David A Broniatowski and Mark Dredze and Karen M Hilyard}, date-added = {2016-04-15 19:36:17 +0000}, date-modified = {2019-01-10 00:13:00 -0500}, file = {http://dx.doi.org/10.1016/j.vaccine.2016.04.044}, journal = {Vaccine}, number = {28}, pages = {3225-3228}, title = {Effective Vaccine Communication during the Disneyland Measles Outbreak}, volume = {34}, year = {2016} } Vaccine refusal rates have increased in recent years, highlighting the need for effective risk communication, especially over social media. Fuzzy-trace theory predicts that individuals encode bottom-line meaning (''gist'') and statistical information (''verbatim'') in parallel and those articles expressing a clear gist will be most compelling. We coded news articles (n = 4581) collected during the 2014−2015 Disneyland measles for content including statistics, stories, or bottom-line gists regarding vaccines and vaccine-preventable illnesses. We measured the extent to which articles were compelling by how frequently they were shared on Facebook. The most widely shared articles expressed bottom-line gists, although articles containing statistics were also more likely to be shared than articles lacking statistics. Stories had limited impact on Facebook shares. Results support Fuzzy Trace Theory's predictions regarding the distinct yet parallel impact of categorical gist and statistical verbatim information on public health communication.

		Ning Gao, Mark Dredze, Douglas Oard. Knowledge Base Population for Organization Mentions in Email. NAACL Workshop on Automated Knowledge Base Construction (AKBC), 2016. [PDF] [Bibtex] [Close] @inproceedings{Gao:2016ys, abstract = {A prior study found that on average there are 6.3 named mentions of organizations found in email messages from the Enron collection, only about half of which could be linked to known entities in Wikipedia. That suggests a need for collection-specific approaches to entity linking, similar to those have proven successful for person mentions. This paper describes a process for automatically constructing such a collection-specific knowledge base of organization entities for named mentions in Enron. A new public test collection for linking 130 mentions of organizations found in Enron email to either Wikipedia or to this new collection-specific knowledge base is also described. Together, Wikipedia entities plus the new collection-specific knowledge base cover 83% of the 130 organization mentions, a 14% (absolute) improvement over the 69% that could be linked to Wikipedia alone.}, author = {Ning Gao and Mark Dredze and Douglas Oard}, booktitle = {NAACL Workshop on Automated Knowledge Base Construction (AKBC)}, date-added = {2016-04-04 14:23:31 +0000}, date-modified = {2017-08-14 21:24:12 +0000}, file = {2016_akbc_email.pdf}, keywords = {workshop}, pages = {24-28}, title = {Knowledge Base Population for Organization Mentions in Email}, year = {2016} } A prior study found that on average there are 6.3 named mentions of organizations found in email messages from the Enron collection, only about half of which could be linked to known entities in Wikipedia. That suggests a need for collection-specific approaches to entity linking, similar to those have proven successful for person mentions. This paper describes a process for automatically constructing such a collection-specific knowledge base of organization entities for named mentions in Enron. A new public test collection for linking 130 mentions of organizations found in Enron email to either Wikipedia or to this new collection-specific knowledge base is also described. Together, Wikipedia entities plus the new collection-specific knowledge base cover 83% of the 130 organization mentions, a 14% (absolute) improvement over the 69% that could be linked to Wikipedia alone.

		Michael C Smith, David A Broniatowski, Mark Dredze. Using Twitter to Examine Social Rationales for Vaccine Refusal. International Engineering Systems Symposium (CESUN), 2016. [PDF] [Bibtex] [Close] @inproceedings{Smith:2016rm, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/vaccine_relevance_sentiment.json.gz"><span class="pub_link">Data</span></a>] [<a href="http://www.cs.jhu.edu/~mdredze/publications/2016_cesun_vaccine_rationales_poster.pdf"><span class="pub_link">Poster</span></a>]}, author = {Michael C Smith and David A. Broniatowski and Mark Dredze}, booktitle = {International Engineering Systems Symposium (CESUN)}, date-added = {2016-04-01 14:14:34 +0000}, date-modified = {2016-04-01 14:15:38 +0000}, file = {2016_cesun_vaccine_rationales.pdf}, keywords = {abstract}, title = {Using Twitter to Examine Social Rationales for Vaccine Refusal}, year = {2016} } [Data] [Poster]

		Mo Yu, Mark Dredze, Raman Arora, Matthew R Gormley. Embedding Lexical Features via Low-rank Tensors. North American Chapter of the Association for Computational Linguistics (NAACL), 2016. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2016, abstract = {Modern NLP models rely heavily on engineered features, which often combine word and contextual information into complex lexical features. Such combination results in large numbers of features, which can lead to over-fitting. We present a new model that represents complex lexical features---comprised of parts for words, contextual information and labels---in a tensor that captures conjunction information among these parts. We apply low-rank tensor approximations to the corresponding parameter tensors to reduce the parameter space and improve prediction speed. Furthermore, we investigate two methods for handling features that include n-grams of mixed lengths. Our model achieves state-of-the-art results on tasks in relation extraction, PP-attachment, and preposition disambiguation.}, author = {Mo Yu and Mark Dredze and Raman Arora and Matthew R. Gormley}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2016-03-02 23:10:44 +0000}, date-modified = {2017-08-14 21:21:49 +0000}, file = {https://arxiv.org/abs/1604.00461.pdf}, pages = {1019-1029}, title = {Embedding Lexical Features via Low-rank Tensors}, year = {2016} } Modern NLP models rely heavily on engineered features, which often combine word and contextual information into complex lexical features. Such combination results in large numbers of features, which can lead to over-fitting. We present a new model that represents complex lexical features---comprised of parts for words, contextual information and labels---in a tensor that captures conjunction information among these parts. We apply low-rank tensor approximations to the corresponding parameter tensors to reduce the parameter space and improve prediction speed. Furthermore, we investigate two methods for handling features that include n-grams of mixed lengths. Our model achieves state-of-the-art results on tasks in relation extraction, PP-attachment, and preposition disambiguation.

		Mark Dredze, Miles Osborne, Prabhanjan Kambadur. Geolocation for Twitter: Timing Matters. North American Chapter of the Association for Computational Linguistics (NAACL) (short paper), 2016. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2016rm, abstract = {Automated geolocation of social media messages can benefit a variety of downstream applications. However, these geolocation systems are typically evaluated without attention to how changes in time impact geolocation. Since different people, in different locations write messages at different times, these factors can significantly vary the performance of a geolocation system over time. We demonstrate cyclical temporal effects on geolocation accuracy in Twitter, as well as rapid drops as test data moves beyond the time period of training data. We show that temporal drift can effectively be countered with even modest online model updates.}, author = {Mark Dredze and Miles Osborne and Prabhanjan Kambadur}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (short paper)}, date-added = {2016-03-02 23:08:03 +0000}, date-modified = {2017-08-14 21:21:17 +0000}, file = {2016_naacl_tweet_geolocation.pdf}, pages = {1064-1069}, title = {Geolocation for Twitter: Timing Matters}, year = {2016} } Automated geolocation of social media messages can benefit a variety of downstream applications. However, these geolocation systems are typically evaluated without attention to how changes in time impact geolocation. Since different people, in different locations write messages at different times, these factors can significantly vary the performance of a geolocation system over time. We demonstrate cyclical temporal effects on geolocation accuracy in Twitter, as well as rapid drops as test data moves beyond the time period of training data. We show that temporal drift can effectively be countered with even modest online model updates.

		John W Ayers, Benjamin M Althouse, Mark Dredze, Eric C Leas, Seth M Noar. News and Internet Searches About Human Immunodeficiency Virus After Charlie Sheen's Disclosure. JAMA Internal Medicine, 2016;176(4):552-554. [PDF] [Bibtex] [Close] @article{Ayers:2016fu, abstract = {Celebrity Charlie Sheen publicly disclosed his human immunodeficiency virus (HIV)--positive status on November 17, 2015. Could Sheen's disclosure, like similar announcements from celebrities, generate renewed attention to HIV? We provide an early answer by examining news trends to reveal discussion of HIV in the mass media and Internet searches to reveal engagement with HIV-related topics around the time of Sheen's disclosure.}, annote = {(<b>Ranked in the top .03% of 4.8m research outputs by <a href="https://jamanetwork.altmetric.com/details/5943892#score"><span class="pub_link">Altmetric</span></a></b>)}, author = {John W. Ayers and Benjamin M. Althouse and Mark Dredze and Eric C. Leas and Seth M. Noar}, date-added = {2016-02-22 16:53:00 +0000}, date-modified = {2017-08-14 21:20:37 +0000}, file = {http://archinte.jamanetwork.com/article.aspx?articleid=2495274}, journal = {JAMA Internal Medicine}, keywords = {selected}, number = {4}, pages = {552-554}, title = {News and Internet Searches About Human Immunodeficiency Virus After Charlie Sheen's Disclosure}, volume = {176}, year = {2016} } (Ranked in the top .03% of 4.8m research outputs by Altmetric) Celebrity Charlie Sheen publicly disclosed his human immunodeficiency virus (HIV)--positive status on November 17, 2015. Could Sheen's disclosure, like similar announcements from celebrities, generate renewed attention to HIV? We provide an early answer by examining news trends to reveal discussion of HIV in the mass media and Internet searches to reveal engagement with HIV-related topics around the time of Sheen's disclosure.

		Neeraja Nagarajan, Blair J Smart, Anthony Nastasi, Zoya J Effendi, Sruthi Murali, Zackary D Berger, Eric B Schneider, Mark Dredze, Joseph K Canner. An Analysis of Twitter Conversations on Global Surgical Care. Annual CUGH Global Health Conference, 2016. [Bibtex] [Close] @inproceedings{Nagarajan:2016qa, annote = {(poster)}, author = {Neeraja Nagarajan and Blair J Smart and Anthony Nastasi and Zoya J. Effendi and Sruthi Murali and Zackary D Berger and Eric B Schneider and Mark Dredze and Joseph K. Canner}, booktitle = {Annual CUGH Global Health Conference}, date-added = {2016-01-25 21:36:55 +0000}, date-modified = {2017-04-27 14:16:32 +0000}, keywords = {abstract}, title = {An Analysis of Twitter Conversations on Global Surgical Care}, year = {2016} } (poster)

		John W Ayers, J Lee Westmaas, Eric C Leas, Adrian Benton, Yunqi Chen, Mark Dredze, Benjamin M Althouse. Leveraging Big Data to Improve Health Awareness Campaigns: A Novel Evaluation of the Great American Smokeout. JMIR Public Health and Surveillance, 2016;2(1):e16. [PDF] [Bibtex] [Close] @article{Ayers:2016rc, abstract = {Awareness campaigns are ubiquitous, but little is known about their potential effectiveness because traditional evaluations are often unfeasible. For 40 years, the ``Great American Smokeout'' (GASO) has encouraged media coverage and popular engagement with smoking cessation on the third Thursday of November as the nation's longest running awareness campaign. We proposed a novel evaluation framework for assessing awareness campaigns using the GASO as a case study by observing cessation-related news reports and Twitter postings, and cessation-related help seeking via Google, Wikipedia, and government-sponsored quitlines.}, author = {John W. Ayers and J. Lee Westmaas and Eric C. Leas and Adrian Benton and Yunqi Chen and Mark Dredze and Benjamin M. Althouse}, date-added = {2016-01-19 14:34:12 +0000}, date-modified = {2019-01-10 00:13:35 -0500}, file = {http://publichealth.jmir.org/2016/1/e16/}, journal = {JMIR Public Health and Surveillance}, keywords = {selected}, number = {1}, pages = {e16}, title = {Leveraging Big Data to Improve Health Awareness Campaigns: A Novel Evaluation of the Great American Smokeout}, volume = {2}, year = {2016} } Awareness campaigns are ubiquitous, but little is known about their potential effectiveness because traditional evaluations are often unfeasible. For 40 years, the ``Great American Smokeout'' (GASO) has encouraged media coverage and popular engagement with smoking cessation on the third Thursday of November as the nation's longest running awareness campaign. We proposed a novel evaluation framework for assessing awareness campaigns using the GASO as a case study by observing cessation-related news reports and Twitter postings, and cessation-related help seeking via Google, Wikipedia, and government-sponsored quitlines.

		Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen A Coppersmith, Mrinal Kumar. Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media. Conference on Human Factors in Computing Systems (CHI), 2016. [PDF] [Bibtex] [Close] @inproceedings{Choudhury:2016lq, abstract = {History of mental illness is a major factor behind suicide risk and ideation. However research efforts toward characterizing and forecasting this risk is limited due to the paucity of information regarding suicide ideation, exacerbated by the stigma of mental illness. This paper fills gaps in the literature by developing a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation. We utilize semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts. We develop language and interactional measures for this purpose, as well as a propensity score matching based statistical approach. Our approach allows us to derive distinct markers of shifts to suicidal ideation. These markers can be modeled in a prediction framework to identify individuals likely to engage in suicidal ideation in the future. We discuss societal and ethical implications of this research.}, annote = {(<b>Honorable Mention Award</b>)}, author = {Munmun De Choudhury and Emre Kiciman and Mark Dredze and Glen A Coppersmith and Mrinal Kumar}, booktitle = {Conference on Human Factors in Computing Systems (CHI)}, date-added = {2015-12-14 14:16:17 +0000}, date-modified = {2017-08-14 21:18:18 +0000}, file = {2016_chi.pdf}, pages = {2098-2110}, title = {Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media}, year = {2016} } (Honorable Mention Award) History of mental illness is a major factor behind suicide risk and ideation. However research efforts toward characterizing and forecasting this risk is limited due to the paucity of information regarding suicide ideation, exacerbated by the stigma of mental illness. This paper fills gaps in the literature by developing a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation. We utilize semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts. We develop language and interactional measures for this purpose, as well as a propensity score matching based statistical approach. Our approach allows us to derive distinct markers of shifts to suicidal ideation. These markers can be modeled in a prediction framework to identify individuals likely to engage in suicidal ideation in the future. We discuss societal and ethical implications of this research.

		Animesh R Koratana, Mark Dredze, Margaret S Chisolm, Matthew W Johnson, Michael J Paul. Studying Anonymous Health Issues and Substance Use on College Campuses with Yik Yak. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2016. [PDF] [Bibtex] [Close] @inproceedings{Koratana:2016db, abstract = {This study investigates the public health intelligence utility of Yik Yak, a social media platform that allows users to anonymously post and view messages within precise geographic locations. Our dataset contains 122,179 "yaks" collected from 120 college campuses across the United States during 2015. We first present an exploratory analysis of the topics commonly discussed in Yik Yak, clarifying the health issues for which this may serve as a source of information. We then present an in-depth content analysis of data describing substance use, an important public health issue that is not often discussed in public social media, but commonly discussed on Yik Yak under the cloak of anonymity.}, author = {Animesh R Koratana and Mark Dredze and Margaret S Chisolm and Matthew W Johnson and Michael J. Paul}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2015-11-30 21:42:07 +0000}, date-modified = {2019-01-10 00:16:12 -0500}, file = {w3phi16_yikyak.pdf}, keywords = {workshop}, pages = {778-782}, title = {Studying Anonymous Health Issues and Substance Use on College Campuses with Yik Yak}, year = {2016} } This study investigates the public health intelligence utility of Yik Yak, a social media platform that allows users to anonymously post and view messages within precise geographic locations. Our dataset contains 122,179 "yaks" collected from 120 college campuses across the United States during 2015. We first present an exploratory analysis of the topics commonly discussed in Yik Yak, clarifying the health issues for which this may serve as a source of information. We then present an in-depth content analysis of data describing substance use, an important public health issue that is not often discussed in public social media, but commonly discussed on Yik Yak under the cloak of anonymity.

		Adrian Benton, Michael J Paul, Braden Hancock, Mark Dredze. Collective Supervision of Topic Models for Predicting Surveys with Social Media. Association for the Advancement of Artificial Intelligence (AAAI), 2016. [PDF] [Bibtex] [Close] @inproceedings{Benton:2016dn, abstract = {This paper considers survey prediction from social media. We use topic models to correlate social media messages with survey outcomes and to provide an interpretable representation of the data. Rather than rely on fully unsupervised topic models, we use existing aggregated survey data to inform the inferred topics, a class of topic model supervision referred to as collective supervision. We introduce and explore a variety of topic model variants and provide an empirical analysis, with conclusions of the most effective models for this task.}, author = {Adrian Benton and Michael J. Paul and Braden Hancock and Mark Dredze}, booktitle = {Association for the Advancement of Artificial Intelligence (AAAI)}, date-added = {2015-11-13 15:21:27 +0000}, date-modified = {2017-08-14 21:17:45 +0000}, file = {aaai16_collective.pdf}, pages = {2892-2898}, title = {Collective Supervision of Topic Models for Predicting Surveys with Social Media}, year = {2016} } This paper considers survey prediction from social media. We use topic models to correlate social media messages with survey outcomes and to provide an interpretable representation of the data. Rather than rely on fully unsupervised topic models, we use existing aggregated survey data to inform the inferred topics, a class of topic model supervision referred to as collective supervision. We introduce and explore a variety of topic model variants and provide an empirical analysis, with conclusions of the most effective models for this task.

		Michael C Smith, David A Broniatowski, Michael J Paul, Mark Dredze. Towards Real-Time Measurement of Public Epidemic Awareness: Monitoring Influenza Awareness through Twitter. AAAI Spring Symposium on Observational Studies through Social Media and Other Human-Generated Content, 2016. [PDF] [Bibtex] [Close] @inproceedings{Smith:2016fv, abstract = {This study analyzes temporal trends in Twitter data pertaining to both influenza awareness and influenza infection during the 2012--13 influenza season in the US. We make use of classifiers to distinguish tweets that express a personal infection (``sick with the flu'') versus a more general awareness (``worried about the flu''). While previous research has focused on estimating prevalence of influenza infection, little is known about trends in public awareness of the disease. Our analysis shows that infection and awareness have very different trends. In contrast to infection trends, awareness trends have little regional variation, and our experiments suggest that public awareness is primarily driven by news media.}, author = {Michael C Smith and David A. Broniatowski and Michael J. Paul and Mark Dredze}, booktitle = {AAAI Spring Symposium on Observational Studies through Social Media and Other Human-Generated Content}, date-added = {2015-11-12 02:03:36 +0000}, date-modified = {2015-11-12 02:04:02 +0000}, file = {2016_ossm.pdf}, keywords = {workshop}, title = {Towards Real-Time Measurement of Public Epidemic Awareness: Monitoring Influenza Awareness through Twitter}, year = {2016} } This study analyzes temporal trends in Twitter data pertaining to both influenza awareness and influenza infection during the 2012--13 influenza season in the US. We make use of classifiers to distinguish tweets that express a personal infection (``sick with the flu'') versus a more general awareness (``worried about the flu''). While previous research has focused on estimating prevalence of influenza infection, little is known about trends in public awareness of the disease. Our analysis shows that infection and awareness have very different trends. In contrast to infection trends, awareness trends have little regional variation, and our experiments suggest that public awareness is primarily driven by news media.

		Blair J Smart, Neeraja Nagarajan, Joseph K Canner, Mark Dredze, Eric B Schneider, Minh Luu, Zackary D Berger, Jonathan A Myers. The Use of Social Media in Surgical Education: An Analysis of Twitter. Annual Academic Surgical Congress, 2016. [Bibtex] [Close] @inproceedings{Smart:2016rw, author = {Blair. J. Smart and Neeraja Nagarajan and Joseph K. Canner and Mark Dredze and Eric B. Schneider and Minh Luu and Zackary D Berger and Jonathan A. Myers}, booktitle = {Annual Academic Surgical Congress}, date-added = {2015-11-04 02:35:55 +0000}, date-modified = {2015-11-04 02:35:55 +0000}, keywords = {abstract}, title = {The Use of Social Media in Surgical Education: An Analysis of Twitter}, year = {2016} }

		Neeraja Nagarajan, Blair J Smart, Mark Dredze, Joy L Lee, James Taylor, Jonathan A Myers, Eric B Schneider, Zackary D Berger, Joseph K Canner. How do Surgical Providers use Social Media? A Mixed-Methods Analysis using Twitter. Annual Academic Surgical Congress, 2016. [Bibtex] [Close] @inproceedings{Surgical-Providers-use-Social-Media-A-Mixed-Methods-Analysis-using-Twitter-Neeraja-Nagarajan:2016ys, author = {Neeraja Nagarajan and Blair J. Smart and Mark Dredze and Joy L. Lee and James Taylor and Jonathan A. Myers and Eric B. Schneider and Zackary D. Berger and Joseph K. Canner}, booktitle = {Annual Academic Surgical Congress}, date-added = {2015-11-02 04:24:29 +0000}, date-modified = {2015-11-02 04:24:55 +0000}, keywords = {abstract}, title = {How do Surgical Providers use Social Media? A Mixed-Methods Analysis using Twitter}, year = {2016} }

		John W Ayers, Benjamin M Althouse, Jon-Patrick Allem, Eric C Leas, Mark Dredze, Rebecca Williams. Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance. American Journal of Preventive Medicine (AJPM), 2016;50(6):e173-e181. [PDF] [Bibtex] [Close] @article{ayers-2015, abstract = {Introduction: Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and, when implemented, there are long delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS' popularity in the U.S. is reassessed using Google searches. Methods: ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers); their geolocation (e.g., state); the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws); and by terms that indicated the searcher's potential interest (e.g., buy e-cigs likely indicates shopping)---all analyzed in 2015. Results: ENDS searches are rapidly increasing in the U.S., with 8,498,000 searches during 2014 alone. Increasingly, searches are shifting from e-cigarette- to vaping-focused terms, especially in coastal states and states where anti-smoking norms are stronger. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), even surpassing e-cigarette searches. Additionally, the percentage of ENDS searches related to shopping (e.g., vape shop) nearly doubled in 2014, whereas searches related to health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs) were rare and declined in 2014. Conclusions: ENDS popularity is rapidly growing and evolving. These findings could inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives the health risks or cessation benefits of ENDS.}, author = {John W. Ayers and Benjamin M. Althouse and Jon-Patrick Allem and Eric C. Leas and Mark Dredze and Rebecca Williams}, date-added = {2015-10-20 12:09:04 +0000}, date-modified = {2017-08-14 20:30:48 +0000}, file = {http://dx.doi.org/10.1016/j.amepre.2015.12.008}, journal = {American Journal of Preventive Medicine (AJPM)}, month = {June}, number = {6}, pages = {e173-e181}, title = {Revisiting the Rise of Electronic Nicotine Delivery Systems Using Search Query Surveillance}, volume = {50}, year = {2016} } Introduction: Public perceptions of electronic nicotine delivery systems (ENDS) remain poorly understood because surveys are too costly to regularly implement and, when implemented, there are long delays between data collection and dissemination. Search query surveillance has bridged some of these gaps. Herein, ENDS' popularity in the U.S. is reassessed using Google searches. Methods: ENDS searches originating in the U.S. from January 2009 through January 2015 were disaggregated by terms focused on e-cigarette (e.g., e-cig) versus vaping (e.g., vapers); their geolocation (e.g., state); the aggregate tobacco control measures corresponding to their geolocation (e.g., clean indoor air laws); and by terms that indicated the searcher's potential interest (e.g., buy e-cigs likely indicates shopping)---all analyzed in 2015. Results: ENDS searches are rapidly increasing in the U.S., with 8,498,000 searches during 2014 alone. Increasingly, searches are shifting from e-cigarette- to vaping-focused terms, especially in coastal states and states where anti-smoking norms are stronger. For example, nationally, e-cigarette searches declined 9% (95% CI=1%, 16%) during 2014 compared with 2013, whereas vaping searches increased 136% (95% CI=97%, 186%), even surpassing e-cigarette searches. Additionally, the percentage of ENDS searches related to shopping (e.g., vape shop) nearly doubled in 2014, whereas searches related to health concerns (e.g., vaping risks) or cessation (e.g., quit smoking with e-cigs) were rare and declined in 2014. Conclusions: ENDS popularity is rapidly growing and evolving. These findings could inform survey questionnaire development for follow-up investigation and immediately guide policy debates about how the public perceives the health risks or cessation benefits of ENDS.

		Atul Nakhasi, Sarah G Bell, Ralph J Passarella, Michael J Paul, Mark Dredze, Peter J Pronovost. The Potential of Twitter as a Data Source for Patient Safety. Journal of Patient Safety, 2016. [PDF] [Bibtex] [Close] @article{Nakhasi:2015bh, abstract = {Background: Error-reporting systems are widely regarded as critical components to improving patient safety, yet current systems do not effectively engage patients. We sought to assess Twitter as a source to gather patient perspective on errors in this feasibility study. Methods: We included publicly accessible tweets in English from any geography. To collect patient safety tweets, we consulted a patient safety expert and constructed a set of highly relevant phrases, such as "doctor screwed up." We used Twitter's search application program interface from January to August 2012 to identify tweets that matched the set of phrases. Two researchers used criteria to independently review tweets and choose those relevant to patient safety; a third reviewer resolved discrepancies. Variables included source and sex of tweeter, source and type of error, emotional response, and mention of litigation. Results: Of 1006 tweets analyzed, 839 (83%) identified the type of error: 26% of which were procedural errors, 23% were medication errors, 23% were diagnostic errors, and 14% were surgical errors. A total of 850 (84%) identified a tweet source, 90% of which were by the patient and 9% by a family member. A total of 519 (52%) identified an emotional response, 47% of which expressed anger or frustration, 21% expressed humor or sarcasm, and 14% expressed sadness or grief. Of the tweets, 6.3% mentioned an intent to pursue malpractice litigation. Conclusions: Twitter is a relevant data source to obtain the patient perspective on medical errors. Twitter may provide an opportunity for health systems and providers to identify and communicate with patients who have experienced a medical error. Further research is needed to assess the reliability of the data.}, author = {Atul Nakhasi and Sarah G Bell and Ralph J Passarella and Michael J Paul and Mark Dredze and Peter J Pronovost}, date-added = {2015-10-14 04:02:10 +0000}, date-modified = {2019-01-10 00:14:04 -0500}, file = {http://journals.lww.com/journalpatientsafety/Abstract/publishahead/The_Potential_of_Twitter_as_a_Data_Source_for.99609.aspx}, journal = {Journal of Patient Safety}, month = {Jan}, pages = {10.1097/PTS.0000000000000253}, title = {The Potential of Twitter as a Data Source for Patient Safety}, year = {2016} } Background: Error-reporting systems are widely regarded as critical components to improving patient safety, yet current systems do not effectively engage patients. We sought to assess Twitter as a source to gather patient perspective on errors in this feasibility study. Methods: We included publicly accessible tweets in English from any geography. To collect patient safety tweets, we consulted a patient safety expert and constructed a set of highly relevant phrases, such as "doctor screwed up." We used Twitter's search application program interface from January to August 2012 to identify tweets that matched the set of phrases. Two researchers used criteria to independently review tweets and choose those relevant to patient safety; a third reviewer resolved discrepancies. Variables included source and sex of tweeter, source and type of error, emotional response, and mention of litigation. Results: Of 1006 tweets analyzed, 839 (83%) identified the type of error: 26% of which were procedural errors, 23% were medication errors, 23% were diagnostic errors, and 14% were surgical errors. A total of 850 (84%) identified a tweet source, 90% of which were by the patient and 9% by a family member. A total of 519 (52%) identified an emotional response, 47% of which expressed anger or frustration, 21% expressed humor or sarcasm, and 14% expressed sadness or grief. Of the tweets, 6.3% mentioned an intent to pursue malpractice litigation. Conclusions: Twitter is a relevant data source to obtain the patient perspective on medical errors. Twitter may provide an opportunity for health systems and providers to identify and communicate with patients who have experienced a medical error. Further research is needed to assess the reliability of the data.

		Brad J Bushman, Katherine Newman, Sandra L Calvert, Geraldine Downey, Mark Dredze, Michael Gottfredson, Nina G Jablonski, Ann S Masten, Calvin Morrill, Daniel B Neill, Daniel Romer, Daniel W Webster. Youth Violence: What We Know and What We Need to Know. American Psychologist, 2016;71(1):17-39. [PDF] [Bibtex] [Close] @article{Bushman:2015fj, abstract = {School shootings tear the fabric of society. In the wake of a school shooting, parents, pediatricians, policymakers, politicians, and the public search for ``the'' cause of the shooting. But there is no single cause. The causes of school shootings are extremely complex. After the Sandy Hook Elementary School rampage shooting in Newtown, Connecticut, we wrote a report for the National Science Foundation on what is known and not known about youth violence. This article summarizes and updates that report. After distinguishing violent behavior from aggressive behavior, we describe the prevalence of gun violence in the United States and age-related risks for violence. We delineate important differences between violence in the context of rare rampage school shootings, and much more common urban street violence. Acts of violence are influenced by multiple factors, often acting together. We summarize evidence on some major risk factors and protective factors for youth violence, highlighting individual and contextual factors, which often interact. We consider new quantitative ``data mining'' procedures that can be used to predict youth violence perpetrated by groups and individuals, recognizing critical issues of privacy and ethical concerns that arise in the prediction of violence. We also discuss implications of the current evidence for reducing youth violence, and we offer suggestions for future research. We conclude by arguing that the prevention of youth violence should be a national priority.}, author = {Brad J. Bushman and Katherine Newman and Sandra L. Calvert and Geraldine Downey and Mark Dredze and Michael Gottfredson and Nina G. Jablonski and Ann S. Masten and Calvin Morrill and Daniel B. Neill and Daniel Romer and Daniel W. Webster}, date-added = {2015-05-28 16:35:05 +0000}, date-modified = {2016-01-15 15:44:29 +0000}, file = {http://psycnet.apa.org/journals/amp/71/1/17/}, journal = {American Psychologist}, month = {Jan}, number = {1}, pages = {17-39}, title = {Youth Violence: What We Know and What We Need to Know}, volume = {71}, year = {2016} } School shootings tear the fabric of society. In the wake of a school shooting, parents, pediatricians, policymakers, politicians, and the public search for ``the'' cause of the shooting. But there is no single cause. The causes of school shootings are extremely complex. After the Sandy Hook Elementary School rampage shooting in Newtown, Connecticut, we wrote a report for the National Science Foundation on what is known and not known about youth violence. This article summarizes and updates that report. After distinguishing violent behavior from aggressive behavior, we describe the prevalence of gun violence in the United States and age-related risks for violence. We delineate important differences between violence in the context of rare rampage school shootings, and much more common urban street violence. Acts of violence are influenced by multiple factors, often acting together. We summarize evidence on some major risk factors and protective factors for youth violence, highlighting individual and contextual factors, which often interact. We consider new quantitative ``data mining'' procedures that can be used to predict youth violence perpetrated by groups and individuals, recognizing critical issues of privacy and ethical concerns that arise in the prediction of violence. We also discuss implications of the current evidence for reducing youth violence, and we offer suggestions for future research. We conclude by arguing that the prevention of youth violence should be a national priority.

		2015 (28 Publications)
		Yu Wang, Eugene Agichtein, Tom Clark, Mark Dredze, Jeffrey Staton. Inferring latent user characteristics for analyzing political discussions in social media. Atlanta Computational Social Science Workshop, 2015. [Bibtex] [Close] @inproceedings{Wang:2015tg, author = {Yu Wang and Eugene Agichtein and Tom Clark and Mark Dredze and Jeffrey Staton}, booktitle = {Atlanta Computational Social Science Workshop}, date-added = {2016-01-07 15:10:47 +0000}, date-modified = {2016-01-07 15:10:47 +0000}, keywords = {workshop}, title = {Inferring latent user characteristics for analyzing political discussions in social media}, year = {2015} }

		Mark Dredze, David A Broniatowski, Michael C Smith, Karen M Hilyard. Understanding Vaccine Refusal: Why We Need Social Media Now. American Journal of Preventive Medicine (AJPM), 2015;50(4):550-552. [PDF] [Bibtex] [Close] @article{Dredze:2016db, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/vaccine_relevance_sentiment.json.gz"><span class="pub_link">Data</span></a>]}, author = {Mark Dredze and David A. Broniatowski and Michael C Smith and Karen M. Hilyard}, date-added = {2015-09-22 14:08:28 +0000}, date-modified = {2017-08-14 21:17:03 +0000}, file = {http://www.ajpmonline.org/article/S0749-3797(15)00640-6/abstract}, journal = {American Journal of Preventive Medicine (AJPM)}, keywords = {selected}, number = {4}, pages = {550-552}, title = {Understanding Vaccine Refusal: Why We Need Social Media Now}, volume = {50}, year = {2015} } [Data]

		Mauricio Santillana, Andre T Nguyen, Mark Dredze, Michael J Paul, Elaine Nsoesie, John S Brownstein. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Computational Biology, 2015. [PDF] [Bibtex] [Close] @article{Santillana:2015qv, abstract = {We present a machine learning-based methodology capable of providing real-time (``nowcast'') and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013--2014 (retrospective) and 2014--2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.}, author = {Mauricio Santillana and Andre T. Nguyen and Mark Dredze and Michael J. Paul and Elaine Nsoesie and John S. Brownstein}, date-added = {2015-08-24 15:35:13 +0000}, date-modified = {2015-09-30 18:13:40 +0000}, file = {http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004513}, journal = {PLOS Computational Biology}, title = {Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance}, year = {2015} } We present a machine learning-based methodology capable of providing real-time (``nowcast'') and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013--2014 (retrospective) and 2014--2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.

		Matthew R Gormley, Mark Dredze, Jason Eisner. Approximation-Aware Dependency Parsing by Belief Propagation. Transactions of the Association for Computational Linguistics (TACL), 2015. [PDF] [Bibtex] [Close] @article{TACL638, abstract = {We show how to train the fast dependency parser of Smith and Eisner (2008) for improved accuracy. This parser can consider higher-order interactions among edges while retaining O(n^3) runtime. It outputs the parse with maximum expected recall---but for speed, this expectation is taken under a posterior distribution that is constructed only approximately, using loopy belief propagation through structured factors. We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data. We find this gradient by backpropagation. That is, we treat the entire parser (approximations and all) as a differentiable circuit, as others have done for loopy CRFs (Domke, 2010; Stoyanov et al., 2011; Domke, 2011; Stoyanov and Eisner, 2012). The resulting parser obtains higher accuracy with fewer iterations of belief propagation than one trained by conditional log-likelihood. }, author = {Matthew R Gormley and Mark Dredze and Jason Eisner}, date-added = {2015-08-14 14:03:41 +0000}, date-modified = {2017-02-22 18:41:25 +0000}, file = {https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/638}, issn = {2307-387X}, journal = {Transactions of the Association for Computational Linguistics (TACL)}, pages = {489--501}, title = {Approximation-Aware Dependency Parsing by Belief Propagation}, url = {https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/638}, volume = {3}, year = {2015}, bdsk-url-1 = {https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/638} } We show how to train the fast dependency parser of Smith and Eisner (2008) for improved accuracy. This parser can consider higher-order interactions among edges while retaining O(n^3) runtime. It outputs the parse with maximum expected recall---but for speed, this expectation is taken under a posterior distribution that is constructed only approximately, using loopy belief propagation through structured factors. We show how to adjust the model parameters to compensate for the errors introduced by this approximation, by following the gradient of the actual loss on training data. We find this gradient by backpropagation. That is, we treat the entire parser (approximations and all) as a differentiable circuit, as others have done for loopy CRFs (Domke, 2010; Stoyanov et al., 2011; Domke, 2011; Stoyanov and Eisner, 2012). The resulting parser obtains higher accuracy with fewer iterations of belief propagation than one trained by conditional log-likelihood.

		David A Broniatowski, Mark Dredze, Karen M Hilyard. News Articles are More Likely to be Shared if they Combine Statistics with Explanation. Conference of the Society for Medical Decision Making, 2015. [Bibtex] [Close] @inproceedings{Broniatowski:2015it, author = {David A Broniatowski and Mark Dredze and Karen M Hilyard}, booktitle = {Conference of the Society for Medical Decision Making}, date-added = {2015-08-06 03:39:09 +0000}, date-modified = {2015-08-06 03:41:25 +0000}, keywords = {abstract}, title = {News Articles are More Likely to be Shared if they Combine Statistics with Explanation}, year = {2015} }

		Matthew R Gormley, Mo Yu, Mark Dredze. Improved Relation Extraction with Feature-Rich Compositional Embedding Models. Empirical Methods in Natural Language Processing (EMNLP), 2015. [PDF] [Bibtex] [Close] @inproceedings{Matthew-R.-Gormley:2015ly, abstract = {Compositional embedding models build a representation (or embedding) for a linguistic structure based on its component word embeddings. We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement. The key idea is to combine both (unlexicalized) handcrafted features with learned word embeddings. The model is able to directly tackle the difficulties met by traditional compositional embeddings models, such as handling arbitrary types of sentence annotations and utilizing global information for composition. We test the proposed model on two relation extraction tasks, and demonstrate that our model outperforms both previous compositional models and traditional feature rich models on the ACE 2005 relation extraction task, and the SemEval 2010 relation classification task. The combination of our model and a loglinear classifier with hand-crafted features gives state-of-the-art results. We made our implementation available for general use.}, author = {Matthew R. Gormley and Mo Yu and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2015-07-30 04:13:47 +0000}, date-modified = {2017-08-14 21:15:53 +0000}, file = {https://aclweb.org/anthology/D/D15/D15-1205.pdf}, pages = {1774-1784}, title = {Improved Relation Extraction with Feature-Rich Compositional Embedding Models}, year = {2015} } Compositional embedding models build a representation (or embedding) for a linguistic structure based on its component word embeddings. We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement. The key idea is to combine both (unlexicalized) handcrafted features with learned word embeddings. The model is able to directly tackle the difficulties met by traditional compositional embeddings models, such as handling arbitrary types of sentence annotations and utilizing global information for composition. We test the proposed model on two relation extraction tasks, and demonstrate that our model outperforms both previous compositional models and traditional feature rich models on the ACE 2005 relation extraction task, and the SemEval 2010 relation classification task. The combination of our model and a loglinear classifier with hand-crafted features gives state-of-the-art results. We made our implementation available for general use.

		Nanyun Peng, Mark Dredze. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. Empirical Methods in Natural Language Processing (EMNLP) (short paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Peng:2015rt, abstract = {We consider the task of named entity recognition for Chinese social media. The long line of work in Chinese NER has focused on formal domains, and NER for social media has been largely restricted to English. We present a new corpus of Weibo messages annotated for both name and nominal mentions. Additionally, we evaluate three types of neural embeddings for representing Chinese text. Finally, we propose a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text. Our methods yield a 9% improvement over a state-of-the-art baseline.}, annote = {[<a href="https://github.com/hltcoe/golden-horse"><span class="pub_link">Code</span></a>]}, author = {Nanyun Peng and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP) (short paper)}, date-added = {2015-07-30 04:13:28 +0000}, date-modified = {2017-08-14 21:14:38 +0000}, file = {https://aclweb.org/anthology/D/D15/D15-1064.pdf}, pages = {548-554}, title = {Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings}, year = {2015} } [Code] We consider the task of named entity recognition for Chinese social media. The long line of work in Chinese NER has focused on formal domains, and NER for social media has been largely restricted to English. We present a new corpus of Weibo messages annotated for both name and nominal mentions. Additionally, we evaluate three types of neural embeddings for representing Chinese text. Finally, we propose a joint training objective for the embeddings that makes use of both (NER) labeled and unlabeled raw text. Our methods yield a 9% improvement over a state-of-the-art baseline.

		Matthew Biggerstaff, David Alper, Mark Dredze, Spencer Fox, Isaac Chun-Hai Fung, Kyle S Hickmann, Bryan Lewis, Roni Rosenfeld, Jeffrey Shaman, Ming-Hsiang Tsou, Paola Velardi, Alessandro Vespignani, Lyn Finelli. Results from the Centers for Disease Control and Prevention's Predict the 2013--2014 Influenza Season Challenge. International Conference of Emerging Infectious Diseases Conference, 2015. [PDF] [Bibtex] [Close] @inproceedings{Biggerstaff:2015kb, author = {Matthew Biggerstaff and David Alper and Mark Dredze and Spencer Fox and Isaac Chun-Hai Fung and Kyle S. Hickmann and Bryan Lewis and Roni Rosenfeld and Jeffrey Shaman and Ming-Hsiang Tsou and Paola Velardi and Alessandro Vespignani and Lyn Finelli}, booktitle = {International Conference of Emerging Infectious Diseases Conference}, date-added = {2015-07-08 14:31:52 +0000}, date-modified = {2016-07-05 19:21:50 +0000}, file = {http://www.biomedcentral.com/1471-2334/16/357}, keywords = {abstract}, title = {Results from the Centers for Disease Control and Prevention's Predict the 2013--2014 Influenza Season Challenge}, year = {2015} }

		Ellie Pavlick, Travis Wolfe, Pushpendre Rastogi, Chris Callison-Burch, Mark Dredze, Benjamin Van Durme. FrameNet+: Fast Paraphrastic Tripling of FrameNet. Association for Computational Linguistics (ACL) (short paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Pavlick:2015bs, abstract = {We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40% better coverage when evaluated in a practical setting on New York Times data.}, annote = {[<a href="http://www.seas.upenn.edu/~nlp/resources/FN+.zip"><span class="pub_link">Data</span></a>]}, author = {Ellie Pavlick and Travis Wolfe and Pushpendre Rastogi and Chris Callison-Burch and Mark Dredze and Benjamin Van Durme}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2015-06-10 13:56:43 +0000}, date-modified = {2017-08-14 21:04:27 +0000}, file = {http://www.aclweb.org/anthology/P15-2067}, pages = {408-413}, title = {FrameNet+: Fast Paraphrastic Tripling of FrameNet}, year = {2015} } [Data] We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40% better coverage when evaluated in a practical setting on New York Times data.

		Nanyun Peng, Mo Yu, Mark Dredze. An Empirical Study of Chinese Name Matching and Applications. Association for Computational Linguistics (ACL) (short paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Peng:2015db, abstract = {Methods for name matching, an important component to support downstream tasks such as entity linking and entity clustering, have focused on alphabetic languages, primarily English. In contrast, logogram languages such as Chinese remain untested. We evaluate methods for name matching in Chinese, including both string matching and learning approaches. Our approach, based on new representations for Chinese, improves both name matching and a downstream entity clustering task.}, annote = {[<a href="https://github.com/hltcoe/mingpipe"><span class="pub_link">Code</span></a>]}, author = {Nanyun Peng and Mo Yu and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2015-06-10 13:53:30 +0000}, date-modified = {2017-08-14 21:03:02 +0000}, file = {https://aclweb.org/anthology/P/P15/P15-2062.pdf}, pages = {377-383}, title = {An Empirical Study of Chinese Name Matching and Applications}, year = {2015} } [Code] Methods for name matching, an important component to support downstream tasks such as entity linking and entity clustering, have focused on alphabetic languages, primarily English. In contrast, logogram languages such as Chinese remain untested. We evaluate methods for name matching in Chinese, including both string matching and learning approaches. Our approach, based on new representations for Chinese, improves both name matching and a downstream entity clustering task.

		Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin, Benjamin Van Durme. Interactive Knowledge Base Population. Unpublished Manuscript, 2015. [PDF] [Bibtex] [Close] @unpublished{Wolfe:2015qr, abstract = {Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).}, author = {Travis Wolfe and Mark Dredze and James Mayfield and Paul McNamee and Craig Harman and Tim Finin and Benjamin Van Durme}, date-added = {2015-06-07 17:49:34 +0000}, date-modified = {2016-01-22 05:31:27 +0000}, file = {http://arxiv.org/abs/1506.00301}, journal = {arXiv}, number = {arXiv:1506.00301}, title = {Interactive Knowledge Base Population}, year = {2015} } Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).

		Mrinal Kumar, Mark Dredze, Glen A Coppersmith, Munmun De Choudhury. Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides. Conference on Hypertext and Social Media, 2015. [PDF] [Bibtex] [Close] @inproceedings{Kumar:2015sf, abstract = {The Werther effect describes the increased rate of completed or attempted suicides following the depiction of an individual's suicide in the media, typically a celebrity. We present findings on the prevalence of this effect in an online platform: r/SuicideWatch on Reddit. We examine both the posting activity and post content after the death of ten high-profile suicides. Posting activity increases following reports of celebrity suicides, and post content exhibits considerable changes that indicate increased suicidal ideation. Specifically, we observe that post-celebrity suicide content is more likely to be inward focused, manifest decreased social concerns, and laden with greater anxiety, anger, and negative emotion. Topic model analysis further reveals content in this period to switch to a more derogatory tone that bears evidence of self-harm and suicidal tendencies. We discuss the implications of our findings in enabling better community support to psychologically vulnerable populations, and the potential of building suicide prevention interventions following high-profile suicides.}, author = {Mrinal Kumar and Mark Dredze and Glen A Coppersmith and Munmun De Choudhury}, booktitle = {Conference on Hypertext and Social Media}, date-added = {2015-05-31 22:19:22 +0000}, date-modified = {2017-08-14 20:55:58 +0000}, file = {2015_hypertext_suicide_reddit.pdf}, pages = {85-94}, title = {Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides}, year = {2015} } The Werther effect describes the increased rate of completed or attempted suicides following the depiction of an individual's suicide in the media, typically a celebrity. We present findings on the prevalence of this effect in an online platform: r/SuicideWatch on Reddit. We examine both the posting activity and post content after the death of ten high-profile suicides. Posting activity increases following reports of celebrity suicides, and post content exhibits considerable changes that indicate increased suicidal ideation. Specifically, we observe that post-celebrity suicide content is more likely to be inward focused, manifest decreased social concerns, and laden with greater anxiety, anger, and negative emotion. Topic model analysis further reveals content in this period to switch to a more derogatory tone that bears evidence of self-harm and suicidal tendencies. We discuss the implications of our findings in enabling better community support to psychologically vulnerable populations, and the potential of building suicide prevention interventions following high-profile suicides.

		Michael C Smith, David A Broniatowski, Michael J Paul, Mark Dredze. Tracking Public Awareness of Influenza through Twitter. 3rd International Conference on Digital Disease Detection (DDD), 2015. [Bibtex] [Close] @inproceedings{Smith:2015mz, annote = {(rapid fire talk)}, author = {Michael C Smith and David A Broniatowski and Michael J Paul and Mark Dredze}, booktitle = {3rd International Conference on Digital Disease Detection (DDD)}, date-added = {2015-05-22 03:57:02 +0000}, date-modified = {2015-05-22 03:57:02 +0000}, keywords = {abstract}, title = {Tracking Public Awareness of Influenza through Twitter}, year = {2015} } (rapid fire talk)

		Joanna E Cohen, Rebecca Shillenn, Mark Dredze, John W Ayers. Tobacco Watcher: Real-Time Global Tobacco Surveillance Using Online News Media. Annual Meeting of the Society for Research on Nicotine and Tobacco, 2015. [Bibtex] [Close] @inproceedings{Cohen:2015hl, author = {Joanna E. Cohen and Rebecca Shillenn and Mark Dredze and John W. Ayers}, booktitle = {Annual Meeting of the Society for Research on Nicotine and Tobacco}, date-added = {2015-05-20 20:13:55 +0000}, date-modified = {2015-05-20 20:13:55 +0000}, keywords = {abstract}, title = {Tobacco Watcher: Real-Time Global Tobacco Surveillance Using Online News Media}, year = {2015} }

		David A Broniatowski, Mark Dredze, Michael J Paul, Andrea Dugas. Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital. JMIR Public Health and Surveillance, 2015. [PDF] [Bibtex] [Close] @article{Broniatowski:2015pi, abstract = {Background: Public health officials and policy makers in the United States expend significant resources at the national, state, county, and city levels to measure the rate of influenza infection. These individuals rely on influenza infection rate information to make important decisions during the course of an influenza season driving vaccination campaigns, clinical guidelines, and medical staffing. Web and social media data sources have emerged as attractive alternatives to supplement existing practices. While traditional surveillance methods take 1-2 weeks, and significant labor, to produce an infection estimate in each locale, web and social media data are available in near real-time for a broad range of locations. Objective: The objective of this study was to analyze the efficacy of flu surveillance from combining data from the websites Google Flu Trends and HealthTweets at the local level. We considered both emergency department influenza-like illness cases and laboratory-confirmed influenza cases for a single hospital in the City of Baltimore. Methods: This was a retrospective observational study comparing estimates of influenza activity of Google Flu Trends and Twitter to actual counts of individuals with laboratory-confirmed influenza, and counts of individuals presenting to the emergency department with influenza-like illness cases. Data were collected from November 20, 2011 through March 16, 2014. Each parameter was evaluated on the municipal, regional, and national scale. We examined the utility of social media data for tracking actual influenza infection at the municipal, state, and national levels. Specifically, we compared the efficacy of Twitter and Google Flu Trends data. Results: We found that municipal-level Twitter data was more effective than regional and national data when tracking actual influenza infection rates in a Baltimore inner-city hospital. When combined, national-level Twitter and Google Flu Trends data outperformed each data source individually. In addition, influenza-like illness data at all levels of geographic granularity were best predicted by national Google Flu Trends data. Conclusions: In order to overcome sensitivity to transient events, such as the news cycle, the best-fitting Google Flu Trends model relies on a 4-week moving average, suggesting that it may also be sacrificing sensitivity to transient fluctuations in influenza infection to achieve predictive power. Implications for influenza forecasting are discussed in this report.}, author = {David A Broniatowski and Mark Dredze and Michael J Paul and Andrea Dugas}, date-added = {2015-05-05 17:08:40 +0000}, date-modified = {2017-08-14 20:28:40 +0000}, file = {http://publichealth.jmir.org/2015/1/e5/}, journal = {JMIR Public Health and Surveillance}, month = {May 29}, number = {1}, title = {Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital}, volume = {1}, year = {2015} } Background: Public health officials and policy makers in the United States expend significant resources at the national, state, county, and city levels to measure the rate of influenza infection. These individuals rely on influenza infection rate information to make important decisions during the course of an influenza season driving vaccination campaigns, clinical guidelines, and medical staffing. Web and social media data sources have emerged as attractive alternatives to supplement existing practices. While traditional surveillance methods take 1-2 weeks, and significant labor, to produce an infection estimate in each locale, web and social media data are available in near real-time for a broad range of locations. Objective: The objective of this study was to analyze the efficacy of flu surveillance from combining data from the websites Google Flu Trends and HealthTweets at the local level. We considered both emergency department influenza-like illness cases and laboratory-confirmed influenza cases for a single hospital in the City of Baltimore. Methods: This was a retrospective observational study comparing estimates of influenza activity of Google Flu Trends and Twitter to actual counts of individuals with laboratory-confirmed influenza, and counts of individuals presenting to the emergency department with influenza-like illness cases. Data were collected from November 20, 2011 through March 16, 2014. Each parameter was evaluated on the municipal, regional, and national scale. We examined the utility of social media data for tracking actual influenza infection at the municipal, state, and national levels. Specifically, we compared the efficacy of Twitter and Google Flu Trends data. Results: We found that municipal-level Twitter data was more effective than regional and national data when tracking actual influenza infection rates in a Baltimore inner-city hospital. When combined, national-level Twitter and Google Flu Trends data outperformed each data source individually. In addition, influenza-like illness data at all levels of geographic granularity were best predicted by national Google Flu Trends data. Conclusions: In order to overcome sensitivity to transient events, such as the news cycle, the best-fitting Google Flu Trends model relies on a 4-week moving average, suggesting that it may also be sacrificing sensitivity to transient fluctuations in influenza infection to achieve predictive power. Implications for influenza forecasting are discussed in this report.

		J Lee Westmaas, John W Ayers, Mark Dredze, Benjamin M Althouse. Evaluation of the Great American Smokeout by Digital Surveillance. Society of Behavioral Medicine, 2015. [Bibtex] [Close] @inproceedings{Westmaas:2015rw, annote = {(<b>Citation Award, given to the best submissions</b>)}, author = {J. Lee Westmaas and John W. Ayers and Mark Dredze and Benjamin M. Althouse}, booktitle = {Society of Behavioral Medicine}, date-added = {2015-04-27 13:44:46 +0000}, date-modified = {2015-04-27 13:45:13 +0000}, keywords = {abstract}, title = {Evaluation of the Great American Smokeout by Digital Surveillance}, year = {2015} } (Citation Award, given to the best submissions)

		Glen A Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, Margaret Mitchell. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. NAACL Workshop on Computational Linguistics and Clinical Psychology, 2015. [PDF] [Bibtex] [Close] @inproceedings{Coppersmith:2015eu, abstract = {This paper presents a summary of the Computational Linguistics and Clinical Psychology (CLPsych) 2015 shared and unshared tasks. These tasks aimed to provide apples-to-apples comparisons of various approaches to modeling language relevant to mental health from social media. The data used for these tasks is from Twitter users who state a diagnosis of depression or post traumatic stress disorder (PTSD) and demographically-matched community controls. The unshared task was a hackathon held at Johns Hopkins University in November 2014 to explore the data, and the shared task was conducted remotely, with each participating team submitted scores for a held-back test set of users. The shared task consisted of three binary classification experiments: (1) depression versus control, (2) PTSD versus control, and (3) depression versus PTSD. Classifiers were compared primarily via their average precision, though a number of other metrics are used along with this to allow a more nuanced interpretation of the performance measures.}, author = {Glen A Coppersmith and Mark Dredze and Craig Harman and Kristy Hollingshead and Margaret Mitchell}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology}, date-added = {2015-04-21 04:19:36 +0000}, date-modified = {2017-08-14 20:27:46 +0000}, file = {clpsych15_shared_task.pdf}, journal = {NAACL Workshop on Computational Linguistics and Clinical Psychology}, keywords = {workshop}, pages = {31-39}, title = {CLPsych 2015 Shared Task: Depression and PTSD on Twitter}, year = {2015} } This paper presents a summary of the Computational Linguistics and Clinical Psychology (CLPsych) 2015 shared and unshared tasks. These tasks aimed to provide apples-to-apples comparisons of various approaches to modeling language relevant to mental health from social media. The data used for these tasks is from Twitter users who state a diagnosis of depression or post traumatic stress disorder (PTSD) and demographically-matched community controls. The unshared task was a hackathon held at Johns Hopkins University in November 2014 to explore the data, and the shared task was conducted remotely, with each participating team submitted scores for a held-back test set of users. The shared task consisted of three binary classification experiments: (1) depression versus control, (2) PTSD versus control, and (3) depression versus PTSD. Classifiers were compared primarily via their average precision, though a number of other metrics are used along with this to allow a more nuanced interpretation of the performance measures.

		Mo Yu, Mark Dredze. Learning Composition Models for Phrase Embeddings. Transactions of the Association for Computational Linguistics (TACL), 2015. [PDF] [Bibtex] [Close] @article{TACL586, abstract = {Lexical embeddings can serve as useful representations for words for a variety of NLP tasks, but learning embeddings for phrases can be challenging. While separate embeddings are learned for each word, this is infeasible for every phrase. We construct phrase embeddings by learning how to compose word embeddings using features that capture phrase structure and context. We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets. We demonstrate improvements on both language modeling and several phrase semantic similarity tasks with various phrase lengths. We make the implementation of our model and the datasets available for general use.}, annote = {[<a href="https://github.com/Gorov/FCT_PhraseSim_TACL"><span class="pub_link">Code</span></a>]}, author = {Mo Yu and Mark Dredze}, date-modified = {2017-02-22 18:41:29 +0000}, file = {https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/586}, issn = {2307-387X}, journal = {Transactions of the Association for Computational Linguistics (TACL)}, pages = {227--242}, title = {Learning Composition Models for Phrase Embeddings}, volume = {3}, year = {2015} } [Code] Lexical embeddings can serve as useful representations for words for a variety of NLP tasks, but learning embeddings for phrases can be challenging. While separate embeddings are learned for each word, this is infeasible for every phrase. We construct phrase embeddings by learning how to compose word embeddings using features that capture phrase structure and context. We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets. We demonstrate improvements on both language modeling and several phrase semantic similarity tasks with various phrase lengths. We make the implementation of our model and the datasets available for general use.

		Glen A Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead. From ADHD to SAD: analyzing the language of mental health on Twitter through self-reported diagnoses. NAACL Workshop on Computational Linguistics and Clinical Psychology, 2015. [PDF] [Bibtex] [Close] @inproceedings{coppersmith15a, abstract = {Many significant challenges exist for the mental health field, but one in particular is a lack of data available to guide research. Language provides a natural lens for studying mental health -- much existing work and therapy have strong linguistic components, so the creation of a large, varied, language-centric dataset could provide significant grist for the field of mental health research. We examine a broad range of mental health conditions in Twitter data by identifying self-reported statements of diagnosis. We systematically explore language differences between ten conditions with respect to the general population, and to each other. Our aim is to provide guidance and a roadmap for where deeper exploration is likely to be fruitful.}, author = {Glen A Coppersmith and Mark Dredze and Craig Harman and Kristy Hollingshead}, booktitle = {NAACL Workshop on Computational Linguistics and Clinical Psychology}, date-added = {2015-03-27 23:02:03 +0000}, date-modified = {2017-08-14 20:23:12 +0000}, file = {clpsych15_self_reports.pdf}, keywords = {workshop}, pages = {1-10}, title = {From ADHD to SAD: analyzing the language of mental health on Twitter through self-reported diagnoses}, year = {2015} } Many significant challenges exist for the mental health field, but one in particular is a lack of data available to guide research. Language provides a natural lens for studying mental health -- much existing work and therapy have strong linguistic components, so the creation of a large, varied, language-centric dataset could provide significant grist for the field of mental health research. We examine a broad range of mental health conditions in Twitter data by identifying self-reported statements of diagnosis. We systematically explore language differences between ten conditions with respect to the general population, and to each other. Our aim is to provide guidance and a roadmap for where deeper exploration is likely to be fruitful.

		Nanyun Peng, Francis Ferraro, Mo Yu, Nicholas Andrews, Jay DeYoung, Max Thomas, Matthew R Gormley, Travis Wolfe, Craig Harman, Benjamin Van Durme, Mark Dredze. A Chinese Concrete NLP Pipeline. North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Peng:2015yf, abstract = {Natural language processing research increasingly relies on the output of a variety of syntactic and semantic analytics. Yet integrating output from multiple analytics into a single framework can be time consuming and slow research progress. We present a CONCRETE Chinese NLP Pipeline: an NLP stack built using a series of open source systems integrated based on the CONCRETE data schema. Our pipeline includes data ingest, word segmentation, part of speech tagging, parsing, named entity recognition, relation extraction and cross document coreference resolution. Additionally, we integrate a tool for visualizing these annotations as well as allowing for the manual annotation of new data. We release our pipeline to the research community to facilitate work on Chinese language tasks that require rich linguistic annotations.}, author = {Nanyun Peng and Francis Ferraro and Mo Yu and Nicholas Andrews and Jay DeYoung and Max Thomas and Matthew R. Gormley and Travis Wolfe and Craig Harman and Benjamin Van Durme and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper)}, date-added = {2015-03-19 16:55:08 +0000}, date-modified = {2017-08-14 20:22:47 +0000}, file = {naacl15_demo_concrete.pdf}, pages = {86-90}, title = {A Chinese Concrete NLP Pipeline}, year = {2015} } Natural language processing research increasingly relies on the output of a variety of syntactic and semantic analytics. Yet integrating output from multiple analytics into a single framework can be time consuming and slow research progress. We present a CONCRETE Chinese NLP Pipeline: an NLP stack built using a series of open source systems integrated based on the CONCRETE data schema. Our pipeline includes data ingest, word segmentation, part of speech tagging, parsing, named entity recognition, relation extraction and cross document coreference resolution. Additionally, we integrate a tool for visualizing these annotations as well as allowing for the manual annotation of new data. We release our pipeline to the research community to facilitate work on Chinese language tasks that require rich linguistic annotations.

		Adrian Benton, Mark Dredze. Entity Linking for Spoken Language. North American Chapter of the Association for Computational Linguistics (NAACL) (short paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Benton:2015qq, abstract = {Research on entity linking has considered a broad range of text, including newswire, blogs and web documents in multiple languages. However, the problem of entity linking for spoken language remains unexplored. Spoken language obtained from automatic speech recognition systems poses different types of challenges for entity linking; transcription errors can distort the context, and named entities tend to have high error rates. We propose features to mitigate these errors and evaluate the impact of ASR errors on entity linking using a new corpus of entity linked broadcast news transcripts.}, annote = {[<a href="https://github.com/mdredze/speech_ner_entity_linking_data/"><span class="pub_link">Data</span></a>]}, author = {Adrian Benton and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (short paper)}, date-added = {2015-02-22 05:29:44 +0000}, date-modified = {2017-08-14 20:22:21 +0000}, file = {naacl15_spoken_entity_linking.pdf}, pages = {225-230}, title = {Entity Linking for Spoken Language}, year = {2015} } [Data] Research on entity linking has considered a broad range of text, including newswire, blogs and web documents in multiple languages. However, the problem of entity linking for spoken language remains unexplored. Spoken language obtained from automatic speech recognition systems poses different types of challenges for entity linking; transcription errors can distort the context, and named entities tend to have high error rates. We propose features to mitigate these errors and evaluate the impact of ASR errors on entity linking using a new corpus of entity linked broadcast news transcripts.

		Travis Wolfe, Mark Dredze, Benjamin Van Durme. Predicate Argument Alignment using a Global Coherence Model. North American Chapter of the Association for Computational Linguistics (NAACL), 2015. [PDF] [Bibtex] [Close] @inproceedings{Wolfe:2015qf, abstract = {We present a joint model for predicate argument alignment. We leverage multiple sources of semantic information, including temporal ordering constraints between events. These are combined in a max-margin framework to find a globally consistent view of entities and events across multiple documents, which leads to improvements over a very strong local baseline.}, author = {Travis Wolfe and Mark Dredze and Benjamin Van Durme}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2015-02-22 05:27:04 +0000}, date-modified = {2017-08-14 20:21:53 +0000}, file = {naacl15_parma.pdf}, pages = {11-20}, title = {Predicate Argument Alignment using a Global Coherence Model}, year = {2015} } We present a joint model for predicate argument alignment. We leverage multiple sources of semantic information, including temporal ordering constraints between events. These are combined in a max-margin framework to find a globally consistent view of entities and events across multiple documents, which leads to improvements over a very strong local baseline.

		Mo Yu, Matthew R Gormley, Mark Dredze. Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction. North American Chapter of the Association for Computational Linguistics (NAACL) (short paper), 2015. [PDF] [Bibtex] [Close] @inproceedings{Yu:2015rt, abstract = {Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model complexity, thus limiting its applicability. We propose a new model that conjoins features and word embeddings while maintaining a small number of parameters by learning feature embeddings jointly with the parameters of a compositional model. The result is a method that can scale to more features and more labels, while avoiding overfitting. We demonstrate that our model attains state-of-the-art results on ACE and ERE fine-grained relation extraction.}, author = {Mo Yu and Matthew R. Gormley and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (short paper)}, date-added = {2015-02-22 05:24:59 +0000}, date-modified = {2017-08-14 20:21:26 +0000}, file = {naacl15_feature_embeddings.pdf}, pages = {1374-1379}, title = {Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction}, year = {2015} } Compositional embedding models build a representation for a linguistic structure based on its component word embeddings. While recent work has combined these word embeddings with hand crafted features for improved performance, it was restricted to a small number of features due to model complexity, thus limiting its applicability. We propose a new model that conjoins features and word embeddings while maintaining a small number of parameters by learning feature embeddings jointly with the parameters of a compositional model. The result is a method that can scale to more features and more labels, while avoiding overfitting. We demonstrate that our model attains state-of-the-art results on ACE and ERE fine-grained relation extraction.

		Michael J Paul, Mark Dredze. SPRITE: Generalizing Topic Models with Structured Priors. Transactions of the Association for Computational Linguistics (TACL), 2015. [PDF] [Bibtex] [Close] @article{Paul:2015sf, abstract = {We introduce SPRITE, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing SPRITE to be tailored to particular settings. We demonstrate this flexibility by constructing a SPRITE-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.}, annote = {[<a href="https://bitbucket.org/adrianbenton/sprite"><span class="pub_link">Code</span></a>]}, author = {Michael J Paul and Mark Dredze}, date-added = {2015-01-13 14:10:08 +0000}, date-modified = {2017-08-14 20:20:55 +0000}, file = {https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/403}, journal = {Transactions of the Association for Computational Linguistics (TACL)}, pages = {43-58}, title = {{SPRITE}: Generalizing Topic Models with Structured Priors}, year = {2015} } [Code] We introduce SPRITE, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing SPRITE to be tailored to particular settings. We demonstrate this flexibility by constructing a SPRITE-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.

		Shiliang Wang, Michael J Paul, Mark Dredze. Social Media as a Sensor of Air Quality and Public Response in China. Journal of Medical Internet Research (JMIR), 2015. [PDF] [Bibtex] [Close] @article{Wang:2015e, abstract = {Background: Recent studies have demonstrated the utility of social media data sources for a wide range of public health goals, including disease surveillance, mental health trends, and health perceptions and sentiment. Most such research has focused on English-language social media for the task of disease surveillance. Objective: We investigated the value of Chinese social media for monitoring air quality trends and related public perceptions and response. The goal was to determine if this data is suitable for learning actionable information about pollution levels and public response. Methods: We mined a collection of 93 million messages from Sina Weibo, China's largest microblogging service. We experimented with different filters to identify messages relevant to air quality, based on keyword matching and topic modeling. We evaluated the reliability of the data filters by comparing message volume per city to air particle pollution rates obtained from the Chinese government for 74 cities. Additionally, we performed a qualitative study of the content of pollution-related messages by coding a sample of 170 messages for relevance to air quality, and whether the message included details such as a reactive behavior or a health concern. Results: The volume of pollution-related messages is highly correlated with particle pollution levels, with Pearson correlation values up to .718 (n= 74, P<.001). Our qualitative results found that 67.1% (114/170) of messages were relevant to air quality and of those, 78.9% (90/114) were a firsthand report. Of firsthand reports, 28% (32/90) indicated a reactive behavior and 19% (17/90) expressed a health concern. Additionally, 3 messages of 170 requested that action be taken to improve quality. Conclusions: We have found quantitatively that message volume in Sina Weibo is indicative of true particle pollution levels, and we have found qualitatively that messages contain rich details including perceptions, behaviors, and self-reported health effects. Social media data can augment existing air pollution surveillance data, especially perception and health-related data that traditionally requires expensive surveys or interviews.}, author = {Shiliang Wang and Michael J Paul and Mark Dredze}, date-added = {2014-12-12 21:11:01 +0000}, date-modified = {2017-08-14 20:19:33 +0000}, file = {http://www.jmir.org/2015/3/e22/}, journal = {Journal of Medical Internet Research (JMIR)}, number = {3}, title = {Social Media as a Sensor of Air Quality and Public Response in China}, volume = {17}, year = {2015} } Background: Recent studies have demonstrated the utility of social media data sources for a wide range of public health goals, including disease surveillance, mental health trends, and health perceptions and sentiment. Most such research has focused on English-language social media for the task of disease surveillance. Objective: We investigated the value of Chinese social media for monitoring air quality trends and related public perceptions and response. The goal was to determine if this data is suitable for learning actionable information about pollution levels and public response. Methods: We mined a collection of 93 million messages from Sina Weibo, China's largest microblogging service. We experimented with different filters to identify messages relevant to air quality, based on keyword matching and topic modeling. We evaluated the reliability of the data filters by comparing message volume per city to air particle pollution rates obtained from the Chinese government for 74 cities. Additionally, we performed a qualitative study of the content of pollution-related messages by coding a sample of 170 messages for relevance to air quality, and whether the message included details such as a reactive behavior or a health concern. Results: The volume of pollution-related messages is highly correlated with particle pollution levels, with Pearson correlation values up to .718 (n= 74, P<.001). Our qualitative results found that 67.1% (114/170) of messages were relevant to air quality and of those, 78.9% (90/114) were a firsthand report. Of firsthand reports, 28% (32/90) indicated a reactive behavior and 19% (17/90) expressed a health concern. Additionally, 3 messages of 170 requested that action be taken to improve quality. Conclusions: We have found quantitatively that message volume in Sina Weibo is indicative of true particle pollution levels, and we have found qualitatively that messages contain rich details including perceptions, behaviors, and self-reported health effects. Social media data can augment existing air pollution surveillance data, especially perception and health-related data that traditionally requires expensive surveys or interviews.

		Haoyu Wang, Eduard Hovy, Mark Dredze. The Hurricane Sandy Twitter Corpus. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2015. [PDF] [Bibtex] [Close] @inproceedings{Wang:2015ve, abstract = {The growing use of social media has made it a critical component of disaster response and recovery efforts. Both in terms of preparedness and response, public health officials and first responders have turned to automated tools to assist with organizing and visualizing large streams of social media. In turn, this has spurred new research into algorithms for information extraction, event detection and organization, and information visualization. One challenge of these efforts has been the lack of a common corpus for disaster response on which researchers can compare and contrast their work. This paper describes the Hurricane Sandy Twitter Corpus: 6.5 million geotagged Twitter posts from the geographic area and time period of the 2012 Hurricane Sandy.}, annote = {[<a href="https://github.com/mdredze/twitter_sandy"><span class="pub_link">Data</span></a>]}, author = {Haoyu Wang and Eduard Hovy and Mark Dredze}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2014-11-18 00:59:55 +0000}, date-modified = {2017-08-14 20:18:29 +0000}, file = {aaai_w3phi_sandy.pdf}, keywords = {workshop}, pages = {20-24}, title = {The Hurricane Sandy Twitter Corpus}, year = {2015} } [Data] The growing use of social media has made it a critical component of disaster response and recovery efforts. Both in terms of preparedness and response, public health officials and first responders have turned to automated tools to assist with organizing and visualizing large streams of social media. In turn, this has spurred new research into algorithms for information extraction, event detection and organization, and information visualization. One challenge of these efforts has been the lack of a common corpus for disaster response on which researchers can compare and contrast their work. This paper describes the Hurricane Sandy Twitter Corpus: 6.5 million geotagged Twitter posts from the geographic area and time period of the 2012 Hurricane Sandy.

		Michael J Paul, Mark Dredze, David A Broniatowski, Nicholas Generous. Worldwide Influenza Surveillance through Twitter. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2015. [Bibtex] [Close] @inproceedings{Paul:2015la, author = {Michael J Paul and Mark Dredze and David A Broniatowski and Nicholas Generous}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2014-11-18 00:59:55 +0000}, date-modified = {2014-11-26 19:33:26 +0000}, keywords = {workshop}, title = {Worldwide Influenza Surveillance through Twitter}, year = {2015} }

		Joanna E Cohen, John W Ayers, Mark Dredze. Tobacco Watcher: Real-time Global Surveillance for Tobacco Control. World Conference on Tobacco or Health (WCTOH), 2015. [Bibtex] [Close] @inproceedings{Cohen:2015zl, author = {Joanna E Cohen and John W Ayers and Mark Dredze}, booktitle = {World Conference on Tobacco or Health (WCTOH)}, date-added = {2014-08-22 14:46:27 +0000}, date-modified = {2014-11-26 19:33:35 +0000}, keywords = {abstract}, title = {Tobacco Watcher: Real-time Global Surveillance for Tobacco Control}, year = {2015} }

		2014 (23 Publications)
		Ning Gao, Douglas Oard, Mark Dredze. A Test Collection for Email Entity Linking. NIPS Workshop on Automated Knowledge Base Construction, 2014. [PDF] [Bibtex] [Close] @inproceedings{Gao:2014ty, abstract = {Most prior work on entity linking has focused on linking name mentions found in third-person communication (e.g., news) to broad-coverage knowledge bases (e.g., Wikipedia). A restricted form of domain-specific entity linking has, however, been tried with email, linking mentions of people to specific email addresses. This paper introduces a new test collection for the task of linking mentions of people, organizations, and locations to Wikipedia. Annotation of 200 randomly selected entities of each type from the Enron email collection indicates that domain specific knowledge bases are indeed required to get good coverage of people and organizations, but that Wikipedia provides good (93%) coverage for the named mentions of locations in the Enron collection. Furthermore, experiments with an existing entity linking system indicate that the absence of a suitable referent in Wikipedia can easily be recognized by automated systems, with NIL precision (i.e., correct detection of the absence of a suitable referent) above 90% for all three entity types.}, author = {Ning Gao and Douglas Oard and Mark Dredze}, booktitle = {NIPS Workshop on Automated Knowledge Base Construction}, date-added = {2014-11-13 18:01:16 +0000}, date-modified = {2014-11-13 18:01:16 +0000}, file = {2014_nips_akbc_test_collection.pdf}, keywords = {workshop}, title = {A Test Collection for Email Entity Linking}, year = {2014} } Most prior work on entity linking has focused on linking name mentions found in third-person communication (e.g., news) to broad-coverage knowledge bases (e.g., Wikipedia). A restricted form of domain-specific entity linking has, however, been tried with email, linking mentions of people to specific email addresses. This paper introduces a new test collection for the task of linking mentions of people, organizations, and locations to Wikipedia. Annotation of 200 randomly selected entities of each type from the Enron email collection indicates that domain specific knowledge bases are indeed required to get good coverage of people and organizations, but that Wikipedia provides good (93%) coverage for the named mentions of locations in the Enron collection. Furthermore, experiments with an existing entity linking system indicate that the absence of a suitable referent in Wikipedia can easily be recognized by automated systems, with NIL precision (i.e., correct detection of the absence of a suitable referent) above 90% for all three entity types.

		Adrian Benton, Jay DeYoung, Adam Teichert, Mark Dredze, Benjamin Van Durme, Stephen Mayhew, Max Thomas. Faster (and Better) Entity Linking with Cascades. NIPS Workshop on Automated Knowledge Base Construction, 2014. [PDF] [Bibtex] [Close] @inproceedings{Benton:2014qe, abstract = {Entity linking requires ranking thousands of candidates for each query, a time consuming process and a challenge for large scale linking. Many systems rely on prediction cascades to efficiently rank candidates. However, the design of these cascades often requires manual decisions about pruning and feature use, limiting the effectiveness of cascades. We present Slinky, a modular, flexible, fast and accurate entity linker based on prediction cascades. We adapt the web-ranking prediction cascade learning algorithm, Cronus, in order to learn cascades that are both accurate and fast. We show that by balancing between accurate and fast linking, this algorithm can produce Slinky configurations that are significantly faster and more accurate than a baseline configuration and an alternate cascade learning method with a fixed introduction of features.}, author = {Adrian Benton and Jay DeYoung and Adam Teichert and Mark Dredze and Benjamin Van Durme and Stephen Mayhew and Max Thomas}, booktitle = {NIPS Workshop on Automated Knowledge Base Construction}, date-added = {2014-11-13 18:01:16 +0000}, date-modified = {2014-11-13 18:01:16 +0000}, file = {2014_nips_slinky_cascades.pdf}, keywords = {workshop}, title = {Faster (and Better) Entity Linking with Cascades}, year = {2014} } Entity linking requires ranking thousands of candidates for each query, a time consuming process and a challenge for large scale linking. Many systems rely on prediction cascades to efficiently rank candidates. However, the design of these cascades often requires manual decisions about pruning and feature use, limiting the effectiveness of cascades. We present Slinky, a modular, flexible, fast and accurate entity linker based on prediction cascades. We adapt the web-ranking prediction cascade learning algorithm, Cronus, in order to learn cascades that are both accurate and fast. We show that by balancing between accurate and fast linking, this algorithm can produce Slinky configurations that are significantly faster and more accurate than a baseline configuration and an alternate cascade learning method with a fixed introduction of features.

		Mo Yu, Matthew R Gormley, Mark Dredze. Factor-based Compositional Embedding Models. NIPS Workshop on Learning Semantics, 2014. [PDF] [Bibtex] [Close] @inproceedings{Mo-Yu:2014qv, annote = {[<a href="https://github.com/Gorov/FCM_nips_workshop"><span class="pub_link">Code</span></a>]}, author = {Mo Yu and Matthew R Gormley and Mark Dredze}, booktitle = {NIPS Workshop on Learning Semantics}, date-added = {2014-11-09 02:28:58 +0000}, date-modified = {2014-11-09 02:29:20 +0000}, file = {fcm-learningsemantic-nips2014.pdf}, keywords = {workshop}, title = {Factor-based Compositional Embedding Models}, year = {2014} } [Code]

		Rebecca Knowles, Mark Dredze, Kathleen Evans, Elyse Lasser, Tom Richards, Jonathan Weiner, Hadi Kharrazi. High Risk Pregnancy Prediction from Clinical Text. NIPS Workshop on Machine Learning for Clinical Data Analysis, 2014. [PDF] [Bibtex] [Close] @inproceedings{Knowles:2014ly, author = {Rebecca Knowles and Mark Dredze and Kathleen Evans and Elyse Lasser and Tom Richards and Jonathan Weiner and Hadi Kharrazi}, booktitle = {NIPS Workshop on Machine Learning for Clinical Data Analysis}, date-added = {2014-11-07 17:29:03 +0000}, date-modified = {2014-11-07 17:29:32 +0000}, file = {hrob_nips2014.pdf}, keywords = {workshop}, title = {High Risk Pregnancy Prediction from Clinical Text}, year = {2014} }

		Michael J Paul, Mark Dredze, David A Broniatowski. Twitter Improves Influenza Forecasting. PLOS Currents Outbreaks, 2014. [PDF] [Bibtex] [Close] @article{Paul_Dredze_Broniatowski:2014, abstract = {Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.}, author = {Michael J Paul and Mark Dredze and David A Broniatowski}, file = {http://currents.plos.org/outbreaks/article/twitter-improves-influenza-forecasting/}, journal = {PLOS Currents Outbreaks}, title = {Twitter Improves Influenza Forecasting}, year = {2014} } Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.

		Joy L Lee, Matthew DeCamp, Mark Dredze, Margaret S Chisolm, Zackary D Berger. What Are Health-related Users Tweeting? A Qualitative Content Analysis of Health-related Users and their Messages on Twitter. Journal of Medical Internet Research (JMIR), 2014. [PDF] [Bibtex] [Close] @article{Lee:2014ve, author = {Joy L Lee and Matthew DeCamp and Mark Dredze and Margaret S. Chisolm and Zackary D Berger}, date-added = {2014-09-16 20:21:09 +0000}, date-modified = {2014-09-16 20:21:09 +0000}, file = {http://www.jmir.org/2014/10/e237}, journal = {Journal of Medical Internet Research (JMIR)}, number = {16(10):e237}, title = {What Are Health-related Users Tweeting? A Qualitative Content Analysis of Health-related Users and their Messages on Twitter}, year = {2014} }

		Michael J Paul, Mark Dredze. Discovering Health Topics in Social Media Using Topic Models. PLoS ONE, 2014. [PDF] [Bibtex] [Close] @article{Paul:2014rt, abstract = {By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.}, annote = {[<a href="http://figshare.com/articles/Discovering_health_topics_in_social_media_using_topic_models/1007712"><span class="pub_link">Data</span></a>]}, author = {Michael J Paul and Mark Dredze}, date-added = {2014-08-04 14:35:06 +0000}, date-modified = {2014-08-04 15:23:14 +0000}, file = {http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0103408}, journal = {PLoS ONE}, number = {8}, title = {Discovering Health Topics in Social Media Using Topic Models}, volume = {9}, year = {2014} } [Data] By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.

		David A Broniatowski, Michael J Paul, Mark Dredze. Twitter: Big Data Opportunities (Letter) Science, 2014;345(6193):148. [PDF] [Bibtex] [Close] @article{Broniatowski:2014nr, author = {David A Broniatowski and Michael J. Paul and Mark Dredze}, date-added = {2014-07-11 01:53:06 +0000}, date-modified = {2016-01-22 05:30:11 +0000}, file = {http://www.sciencemag.org/content/345/6193/148.1.full}, journal = {Science}, keywords = {selected}, number = {6193}, pages = {148}, title = {Twitter: Big Data Opportunities (Letter)}, volume = {345}, year = {2014} }

		Ahmed Abbasi, Donald Adjeroh, Mark Dredze, Michael J Paul, Fatemeh Mariam Zahedi, Huimin Zhao, Nitin Walia, Hemant Jain, Patrick Sanvanson, Reza Shaker, Marco D Huesch, Richard Beal, Wanhong Zheng, Marie Abate, Arun Ross. Social Media Analytics for Smart Health. IEEE Intelligent Systems, 2014;29(2):60--80. [PDF] [Bibtex] [Close] @article{Dredze:2014lq, author = {Ahmed Abbasi and Donald Adjeroh and Mark Dredze and Michael J. Paul and Fatemeh Mariam Zahedi and Huimin Zhao and Nitin Walia and Hemant Jain and Patrick Sanvanson and Reza Shaker and Marco D. Huesch and Richard Beal and Wanhong Zheng and Marie Abate and Arun Ross}, date-added = {2014-06-11 04:23:16 +0000}, date-modified = {2014-06-11 04:23:35 +0000}, file = {http://www.computer.org/csdl/mags/ex/2014/02/mex2014020060-abs.html}, journal = {IEEE Intelligent Systems}, month = {March -- April}, number = {2}, pages = {60--80}, title = {Social Media Analytics for Smart Health}, volume = {29}, year = {2014} }

		Byron C Wallace, Michael J Paul, Urmimala Sarkar, Thomas A Trikalinos, Mark Dredze. A Large-Scale Quantitative Analysis of Latent Factors and Sentiment in Online Doctor Reviews. Journal of the American Medical Informatics Association (JAMIA), 2014;21(6):1098--1103. [PDF] [Bibtex] [Close] @article{Wallace:2014qd, author = {Byron C. Wallace and Michael J. Paul and Urmimala Sarkar and Thomas A. Trikalinos and Mark Dredze}, date-added = {2014-05-28 00:12:56 +0000}, date-modified = {2017-08-14 20:12:54 +0000}, file = {http://dx.doi.org/10.1136/amiajnl-2014-002711}, journal = {Journal of the American Medical Informatics Association (JAMIA)}, number = {6}, pages = {1098--1103}, title = {A Large-Scale Quantitative Analysis of Latent Factors and Sentiment in Online Doctor Reviews}, volume = {21}, year = {2014} }

		Mark Dredze, Renyuan Cheng, Michael J Paul, David A Broniatowski. HealthTweets.org: A Platform for Public Health Surveillance using Twitter. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2014. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2014fk, abstract = {We present HealthTweets.org, a new platform for sharing the latest research results on Twitter data with researchers and public officials. In this demo paper, we describe data collection, processing, and features of the site. The goal of this service is to transition results from research to practice.}, annote = {[<a href="http://www.healthtweets.org"><span class="pub_link">Website</span></a>]}, author = {Mark Dredze and Renyuan Cheng and Michael J Paul and David A Broniatowski}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2014-05-07 01:18:42 +0000}, date-modified = {2017-08-14 20:11:44 +0000}, file = {2014_w3phi_healthtweets.pdf}, keywords = {workshop}, pages = {2-3}, title = {HealthTweets.org: A Platform for Public Health Surveillance using Twitter}, year = {2014} } [Website] We present HealthTweets.org, a new platform for sharing the latest research results on Twitter data with researchers and public officials. In this demo paper, we describe data collection, processing, and features of the site. The goal of this service is to transition results from research to practice.

		Michael J Paul, Mark Dredze, David A Broniatowski. Challenges in Influenza Forecasting and Opportunities for Social Media. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2014. [Bibtex] [Close] @inproceedings{paul_dredze_aaai:14, author = {Michael J Paul and Mark Dredze and David A Broniatowski}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2014-05-07 01:15:56 +0000}, date-modified = {2014-05-07 01:19:17 +0000}, keywords = {workshop}, title = {Challenges in Influenza Forecasting and Opportunities for Social Media}, year = {2014} }

		Shiliang Wang, Michael J Paul, Mark Dredze. Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo. AAAI Workshop on the World Wide Web and Public Health Intelligence, 2014. [PDF] [Bibtex] [Close] @inproceedings{Wang:2014fk, abstract = {This paper seeks to identify and characterize health-related topics discussed on the Chinese microblogging website, Sina Weibo. We identified nearly 1 million messages containing health-related keywords, filtered from a dataset of 93 million messages spanning five years. We applied probabilistic topic models to this dataset and identified the prominent health topics. We show that a variety of health topics are discussed in Sina Weibo, and that four flu-related topics are correlated with monthly influenza case rates in China.}, author = {Shiliang Wang and Michael J Paul and Mark Dredze}, booktitle = {AAAI Workshop on the World Wide Web and Public Health Intelligence}, date-added = {2014-05-07 01:15:30 +0000}, date-modified = {2017-08-14 20:10:46 +0000}, file = {2014_w3phi_weibo.pdf}, keywords = {workshop}, pages = {20-23}, title = {Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo}, year = {2014} } This paper seeks to identify and characterize health-related topics discussed on the Chinese microblogging website, Sina Weibo. We identified nearly 1 million messages containing health-related keywords, filtered from a dataset of 93 million messages spanning five years. We applied probabilistic topic models to this dataset and identified the prominent health topics. We show that a variety of health topics are discussed in Sina Weibo, and that four flu-related topics are correlated with monthly influenza case rates in China.

		Mo Yu, Mark Dredze. Improving Lexical Embeddings with Semantic Knowledge. Association for Computational Linguistics (ACL) (short paper), 2014. [PDF] [Bibtex] [Close] @inproceedings{Yu:2014, abstract = {Word embeddings learned on unlabeled data are a popular tool in semantics, but may not capture the desired semantics. We propose a new learning objective that incorporates both a neural language model objective and prior knowledge from semantic resources to learn improved lexical semantic embeddings. We demonstrate that our embeddings improve over those learned solely on raw text in three settings: language modeling, measuring semantic similarity, and predicting human judgements.}, annote = {[<a href="https://github.com/Gorov/JointRCM"><span class="pub_link">Code</span></a>]}, author = {Mo Yu and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2014-04-18 02:58:45 +0000}, date-modified = {2017-08-14 20:10:15 +0000}, file = {2014_acl_embeddings.pdf}, pages = {545-550}, title = {Improving Lexical Embeddings with Semantic Knowledge}, year = {2014} } [Code] Word embeddings learned on unlabeled data are a popular tool in semantics, but may not capture the desired semantics. We propose a new learning objective that incorporates both a neural language model objective and prior knowledge from semantic resources to learn improved lexical semantic embeddings. We demonstrate that our embeddings improve over those learned solely on raw text in three settings: language modeling, measuring semantic similarity, and predicting human judgements.

		Nanyun Peng, Yiming Wang, Mark Dredze. Learning Polylingual Topic Models from Code-Switched Social Media Documents. Association for Computational Linguistics (ACL) (short paper), 2014. [PDF] [Bibtex] [Close] @inproceedings{Peng:2014fk, abstract = {Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multi-lingual corpus analysis. We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perplexity over LDA, and learns semantically coherent aligned topics as judged by human annotators.}, annote = {[<a href="https://github.com/VioletPeng/csLDA/"><span class="pub_link">Code</span></a>]}, author = {Nanyun Peng and Yiming Wang and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2014-04-18 02:58:16 +0000}, date-modified = {2017-08-14 20:09:36 +0000}, file = {2014_acl_cslda.pdf}, pages = {674-679}, title = {Learning Polylingual Topic Models from Code-Switched Social Media Documents}, year = {2014} } [Code] Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multi-lingual corpus analysis. We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perplexity over LDA, and learns semantically coherent aligned topics as judged by human annotators.

		Glen A Coppersmith, Mark Dredze, Craig Harman. Quantifying Mental Health Signals in Twitter. ACL Workshop on Computational Linguistics and Clinical Psychology, 2014. [PDF] [Bibtex] [Close] @inproceedings{Coppersmith:2014fk, abstract = {The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. We present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. We present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder (PTSD), major depressive disorder, bipolar disorder, and seasonal affective disorder. We intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information.}, author = {Glen A Coppersmith and Mark Dredze and Craig Harman}, booktitle = {ACL Workshop on Computational Linguistics and Clinical Psychology}, date-added = {2014-04-13 00:42:53 +0000}, date-modified = {2017-08-14 20:09:00 +0000}, file = {2014_acl_mental_health.pdf}, keywords = {workshop}, pages = {51-60}, title = {Quantifying Mental Health Signals in Twitter}, year = {2014} } The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. We present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. We present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder (PTSD), major depressive disorder, bipolar disorder, and seasonal affective disorder. We intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information.

		Glen A Coppersmith, Craig Harman, Mark Dredze. Measuring Post Traumatic Stress Disorder in Twitter. International Conference on Weblogs and Social Media (ICWSM), 2014. [PDF] [Bibtex] [Close] @inproceedings{Coppersmith:2014lr, abstract = {Traditional mental health studies rely on information primarily collected and analyzed through personal contact with a health care professional. Recent work has shown the utility of social media data for studying depression, but there have been limited evaluations of other mental health conditions. We consider post traumatic stress disorder (PTSD), a serious condition that affects millions worldwide, with especially high rates in military veterans. We show how to obtain a PTSD classifier for social media using simple searches of available Twitter data, a significant reduction in training data cost compared to previous work on mental health. We demonstrate its utility by an examination of language use from PTSD individuals, and by detecting elevated rates of PTSD at and around US military bases using our classifiers.}, author = {Glen A Coppersmith and Craig Harman and Mark Dredze}, booktitle = {International Conference on Weblogs and Social Media (ICWSM)}, date-added = {2014-03-11 01:37:14 +0000}, date-modified = {2017-08-14 20:53:50 +0000}, file = {2014_icwsm_ptsd.pdf}, pages = {579-582}, title = {Measuring Post Traumatic Stress Disorder in Twitter}, year = {2014} } Traditional mental health studies rely on information primarily collected and analyzed through personal contact with a health care professional. Recent work has shown the utility of social media data for studying depression, but there have been limited evaluations of other mental health conditions. We consider post traumatic stress disorder (PTSD), a serious condition that affects millions worldwide, with especially high rates in military veterans. We show how to obtain a PTSD classifier for social media using simple searches of available Twitter data, a significant reduction in training data cost compared to previous work on mental health. We demonstrate its utility by an examination of language use from PTSD individuals, and by detecting elevated rates of PTSD at and around US military bases using our classifiers.

		Miles Osborne, Mark Dredze. Facebook, Twitter and Google Plus for Breaking News: Is there a winner? International Conference on Weblogs and Social Media (ICWSM), 2014. [PDF] [Bibtex] [Close] @inproceedings{Osborne:2014fk, abstract = {Twitter is widely seen as being the go to place for breaking news. Recently however, competing Social Media have begun to carry news. Here we examine how Facebook, Google Plus and Twitter report on breaking news. We consider coverage (whether news events are reported) and latency (the time when they are reported). Using data drawn from three weeks in December 2013, we identify 29 major news events, ranging from celebrity deaths, plague outbreaks to sports events. We find that all media carry the same major events, but Twitter continues to be the preferred medium for breaking news, almost consistently leading Facebook or Google Plus. Facebook and Google Plus largely repost newswire stories and their main research value is that they conveniently package multitple sources of information together.}, annote = {[<a href="https://docs.google.com/document/d/1MCyNsRzWcAG336GEOTFxoLxvbbQRiLJnc_VG45esQi0/"><span class="pub_link">Supplement</span></a>]}, author = {Miles Osborne and Mark Dredze}, booktitle = {International Conference on Weblogs and Social Media (ICWSM)}, date-added = {2014-03-11 01:37:14 +0000}, date-modified = {2017-08-14 20:54:34 +0000}, file = {2014_icwsm_news.pdf}, pages = {611-614}, title = {Facebook, Twitter and Google Plus for Breaking News: Is there a winner?}, year = {2014} } [Supplement] Twitter is widely seen as being the go to place for breaking news. Recently however, competing Social Media have begun to carry news. Here we examine how Facebook, Google Plus and Twitter report on breaking news. We consider coverage (whether news events are reported) and latency (the time when they are reported). Using data drawn from three weeks in December 2013, we identify 29 major news events, ranging from celebrity deaths, plague outbreaks to sports events. We find that all media carry the same major events, but Twitter continues to be the preferred medium for breaking news, almost consistently leading Facebook or Google Plus. Facebook and Google Plus largely repost newswire stories and their main research value is that they conveniently package multitple sources of information together.

		Matthew R Gormley, Margaret Mitchell, Benjamin Van Durme, Mark Dredze. Low-Resource Semantic Role Labeling. Association for Computational Linguistics (ACL), 2014. [PDF] [Bibtex] [Close] @inproceedings{Gormley:2014uq, abstract = {We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL). We examine how performance changes without syntactic supervision, comparing both joint and pipelined methods to induce latent syntax. This work highlights a new application of unsupervised grammar induction and demonstrates several approaches to SRL in the absence of supervised syntax. Our best models obtain competitive results in the high-resource setting and state-of-the-art results in the low resource setting, reaching 72.48% F1 averaged across languages. We release our code for this work along with a larger toolkit for specifying arbitrary graphical structure.}, annote = {[<a href="http://www.cs.jhu.edu/~mrg/software/"><span class="pub_link">Code</span></a>]}, author = {Matthew R. Gormley and Margaret Mitchell and Benjamin Van Durme and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2014-03-06 17:04:56 +0000}, date-modified = {2017-08-09 19:42:46 +0000}, file = {2014_acl_srl.pdf}, pages = {1177-1187}, title = {Low-Resource Semantic Role Labeling}, year = {2014} } [Code] We explore the extent to which high-resource manual annotations such as treebanks are necessary for the task of semantic role labeling (SRL). We examine how performance changes without syntactic supervision, comparing both joint and pipelined methods to induce latent syntax. This work highlights a new application of unsupervised grammar induction and demonstrates several approaches to SRL in the absence of supervised syntax. Our best models obtain competitive results in the high-resource setting and state-of-the-art results in the low resource setting, reaching 72.48% F1 averaged across languages. We release our code for this work along with a larger toolkit for specifying arbitrary graphical structure.

		Nicholas Andrews, Jason Eisner, Mark Dredze. Robust Entity Clustering via Phylogenetic Inference. Association for Computational Linguistics (ACL), 2014. [PDF] [Bibtex] [Close] @inproceedings{Andrews:2014fk, abstract = {Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline architecture that clusters the mentions using fixed or learned measures of name and context similarity. In this paper, we propose a model for cross-document coreference resolution that achieves robustness by learning similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and optionally mutating an earlier name from a similar context. Clustering the mentions into entities depends on recovering this copying tree jointly with estimating models of the mutation process and parent selection process. We present a block Gibbs sampler for posterior inference and an empirical evalution on several datasets. On a challenging Twitter corpus, our method outperforms the best baseline by 12.6 points of F1 score.}, annote = {[<a href="https://bitbucket.org/noandrews/phyloinf"><span class="pub_link">Code</span></a>]}, author = {Nicholas Andrews and Jason Eisner and Mark Dredze}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2014-03-06 17:02:08 +0000}, date-modified = {2017-08-09 19:42:30 +0000}, file = {2014_acl_phylo.pdf}, pages = {775-785}, title = {Robust Entity Clustering via Phylogenetic Inference}, year = {2014} } [Code] Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline architecture that clusters the mentions using fixed or learned measures of name and context similarity. In this paper, we propose a model for cross-document coreference resolution that achieves robustness by learning similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and optionally mutating an earlier name from a similar context. Clustering the mentions into entities depends on recovering this copying tree jointly with estimating models of the mutation process and parent selection process. We present a block Gibbs sampler for posterior inference and an empirical evalution on several datasets. On a challenging Twitter corpus, our method outperforms the best baseline by 12.6 points of F1 score.

		John W Ayers, Benjamin M Althouse, Mark Dredze. Could Behavioral Medicine Lead the Web Data Revolution? Journal of the American Medical Association (JAMA), 2014;311(14):1399--1400. [PDF] [Bibtex] [Close] @article{Ayers:2014fk, author = {John W. Ayers and Benjamin M. Althouse and Mark Dredze}, date-added = {2014-02-26 15:14:51 +0000}, date-modified = {2017-08-14 20:16:04 +0000}, file = {http://jama.jamanetwork.com/article.aspx?articleid=1838433}, journal = {Journal of the American Medical Association (JAMA)}, keywords = {selected}, month = {February 27}, number = {14}, pages = {1399--1400}, title = {Could Behavioral Medicine Lead the Web Data Revolution?}, volume = {311}, year = {2014} }

		John W Ayers, Benjamin M Althouse, Morgan Johnson, Mark Dredze, Joanna E Cohen. What's the Healthiest Day? Circaseptan (Weekly) Rhythms in Healthy Considerations. American Journal of Preventive Medicine (AJPM), 2014;47(1):73-76. [PDF] [Bibtex] [Close] @article{Ayers:2014lr, author = {John W. Ayers and Benjamin M. Althouse and Morgan Johnson and Mark Dredze and Joanna E. Cohen}, date-added = {2014-01-28 00:49:18 +0000}, date-modified = {2017-08-09 19:41:54 +0000}, file = {http://www.ajpmonline.org/article/S0749-3797(14)00099-3/abstract}, journal = {American Journal of Preventive Medicine (AJPM)}, number = {1}, pages = {73-76}, title = {What's the Healthiest Day? Circaseptan (Weekly) Rhythms in Healthy Considerations}, volume = {47}, year = {2014} }

		Benjamin M Althouse, Jon-Patrick Allem, Matt Childers, Mark Dredze, John W Ayers. Population Health Concerns During the United States' Great Recession. American Journal of Preventive Medicine (AJPM), 2014;46(2):166-170. [PDF] [Bibtex] [Close] @article{Althouse:2014lr, author = {Benjamin M Althouse and Jon-Patrick Allem and Matt Childers and Mark Dredze and John W Ayers}, date-added = {2013-10-16 17:02:34 +0000}, date-modified = {2018-10-25 23:45:58 -0400}, file = {http://www.ajpmonline.org/article/S0749-3797(13)00581-3/abstract}, journal = {American Journal of Preventive Medicine (AJPM)}, month = {February}, number = {2}, pages = {166-170}, title = {Population Health Concerns During the United States' Great Recession}, volume = {46}, year = {2014} }

		2013 (14 Publications)
		Mark Dredze, Bill Schilit. Facet suggestion for search query augmentation. US Patent 8,433,705, 2013. [Bibtex] [Close] @patent{dredze2013facet, author = {Mark Dredze and Bill Schilit}, date-added = {2016-01-25 03:56:34 +0000}, date-modified = {2016-02-02 13:58:56 +0000}, month = April ~30}, note = {US Patent 8,433,705}, publisher = {Google Patents}, title = {Facet suggestion for search query augmentation}, year = {2013} }

		David A Broniatowski, Michael J Paul, Mark Dredze. National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic. PLOS ONE, 2013. [PDF] [Bibtex] [Close] @article{Paul:2013lm, abstract = {Social media have been proposed as a data source for influenza surveillance because they have the potential to offer real-time access to millions of short, geographically localized messages containing information regarding personal well-being. However, accuracy of social media surveillance systems declines with media attention because media attention increases ``chatter'' -- messages that are about influenza but that do not pertain to an actual infection -- masking signs of true influenza prevalence. This paper summarizes our recently developed influenza infection detection algorithm that automatically distinguishes relevant tweets from other chatter, and we describe our current influenza surveillance system which was actively deployed during the full 2012-2013 influenza season. Our objective was to analyze the performance of this system during the most recent 2012--2013 influenza season and to analyze the performance at multiple levels of geographic granularity, unlike past studies that focused on national or regional surveillance. Our system's influenza prevalence estimates were strongly correlated with surveillance data from the Centers for Disease Control and Prevention for the United States (r = 0.93, p < 0.001) as well as surveillance data from the Department of Health and Mental Hygiene of New York City (r = 0.88, p < 0.001). Our system detected the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets from other chatter.}, author = {David A Broniatowski and Michael J. Paul and Mark Dredze}, date-added = {2013-11-15 01:51:22 +0000}, date-modified = {2017-08-09 19:39:10 +0000}, file = {http://dx.plos.org/10.1371/journal.pone.0083672}, journal = {PLOS ONE}, month = {December 9}, number = {12}, title = {National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic}, volume = {8}, year = {2013} } Social media have been proposed as a data source for influenza surveillance because they have the potential to offer real-time access to millions of short, geographically localized messages containing information regarding personal well-being. However, accuracy of social media surveillance systems declines with media attention because media attention increases ``chatter'' -- messages that are about influenza but that do not pertain to an actual infection -- masking signs of true influenza prevalence. This paper summarizes our recently developed influenza infection detection algorithm that automatically distinguishes relevant tweets from other chatter, and we describe our current influenza surveillance system which was actively deployed during the full 2012-2013 influenza season. Our objective was to analyze the performance of this system during the most recent 2012--2013 influenza season and to analyze the performance at multiple levels of geographic granularity, unlike past studies that focused on national or regional surveillance. Our system's influenza prevalence estimates were strongly correlated with surveillance data from the Centers for Disease Control and Prevention for the United States (r = 0.93, p < 0.001) as well as surveillance data from the Department of Health and Mental Hygiene of New York City (r = 0.88, p < 0.001). Our system detected the weekly change in direction (increasing or decreasing) of influenza prevalence with 85% accuracy, a nearly twofold increase over a simpler model, demonstrating the utility of explicitly distinguishing infection tweets from other chatter.

		Travis Wolfe, Benjamin Van Durme, Mark Dredze, Nicholas Andrews, Charley Beller, Chris Callison-Burch, Jay DeYoung, Justin Snyder, Jonathan Weese, Tan Xu, Xuchen Yao. PARMA: A Predicate Argument Aligner. Association for Computational Linguistics (ACL) (short paper), 2013. [PDF] [Bibtex] [Close] @inproceedings{Wolfe:2013lr, abstract = {We introduce PARMA, a system for cross-document, semantic predicate and argument alignment. Our system integrates popular lexical semantic resources into a simple discriminative model. PARMA achieves state of the art results. We suggest that existing efforts have focussed on data that is too easy, and we provide a more difficult dataset based on MT translation references which has a lower baseline which we beat by 17% absolute F1. }, author = {Travis Wolfe and Benjamin Van Durme and Mark Dredze and Nicholas Andrews and Charley Beller and Chris Callison-Burch and Jay DeYoung and Justin Snyder and Jonathan Weese and Tan Xu and Xuchen Yao}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2013-05-13 10:06:54 -0400}, date-modified = {2017-08-09 19:24:06 +0000}, file = {http://www.aclweb.org/anthology/P13-2012}, pages = {63-68}, title = {PARMA: A Predicate Argument Aligner}, year = {2013} } We introduce PARMA, a system for cross-document, semantic predicate and argument alignment. Our system integrates popular lexical semantic resources into a simple discriminative model. PARMA achieves state of the art results. We suggest that existing efforts have focussed on data that is too easy, and we provide a more difficult dataset based on MT translation references which has a lower baseline which we beat by 17% absolute F1.

		Carolina Parada, Mark Dredze, Abhinav Sethy, Ariya Rastrow. Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition. Technical Report 10, Human Language Technology Center of Excellence, Johns Hopkins University, 2013. [PDF] [Bibtex] [Close] @techreport{parada_tech, abstract = {Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.}, author = {Carolina Parada and Mark Dredze and Abhinav Sethy and Ariya Rastrow}, date-added = {2013-05-08 01:41:17 -0400}, date-modified = {2013-05-08 01:41:17 -0400}, file = {oovdet_techreport.pdf}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, number = {10}, title = {Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition}, year = {2013} } Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. We present a novel probabilistic model to learn the sub-word lexicon optimized for a given task. We consider the task of Out Of vocabulary (OOV) word detection, which relies on output from a hybrid system. We combine the proposed hybrid system with confidence based metrics to improve OOV detection performance. Previous work address OOV detection as a binary classification task, where each region is independently classified using local information. We propose to treat OOV detection as a sequence labeling problem, and we show that 1) jointly predicting out-of-vocabulary regions, 2) including contextual information from each region, and 3) learning sub-lexical units optimized for this task, leads to substantial improvements with respect to state-of-the-art on an English Broadcast News and MIT Lectures task.

		Mark Dredze, Michael J Paul, Shane Bergsma, Hieu Tran. Carmen: A Twitter Geolocation System with Applications to Public Health. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), 2013. [PDF] [Bibtex] [Close] @inproceedings{Dredze:2013a, abstract = {Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.}, annote = {[<a href="https://github.com/mdredze/carmen"><span class="pub_link">Code</span></a>]}, author = {Mark Dredze and Michael J Paul and Shane Bergsma and Hieu Tran}, booktitle = {AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI)}, date-added = {2013-04-21 18:29:38 -0400}, date-modified = {2013-06-06 11:18:47 -0400}, file = {aaai13_geo.pdf}, keywords = {workshop}, title = {Carmen: A Twitter Geolocation System with Applications to Public Health}, year = {2013} } [Code] Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.

		Michael J Paul, Byron C Wallace, Mark Dredze. What Affects Patient (Dis)satisfaction? Analyzing Online Doctor Ratings with a Joint Topic-Sentiment Model. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), 2013. [PDF] [Bibtex] [Close] @inproceedings{Paul:2013fk, abstract = {We analyze patient reviews of doctors using a novel probabilistic joint model of aspect and sentiment based on factorial LDA. We leverage this model to exploit a small set of previously annotated reviews to automatically analyze the topics and sentiment latent in over 50,000 online reviews of physicians (and we make this dataset publicly available). The proposed model outperforms baseline models for this task with respect to model perplexity and sentiment classification. We report the most representative words with respect to positive and negative sentiment along three clinical aspects, thus complementing existing qualitative work exploring patient reviews of physicians. }, author = {Michael J Paul and Byron C Wallace and Mark Dredze}, booktitle = {AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI)}, date-added = {2013-04-21 18:28:58 -0400}, date-modified = {2013-06-06 11:20:37 -0400}, file = {aaai13_flda_sentiment.pdf}, keywords = {workshop}, title = {What Affects Patient (Dis)satisfaction? Analyzing Online Doctor Ratings with a Joint Topic-Sentiment Model}, year = {2013} } We analyze patient reviews of doctors using a novel probabilistic joint model of aspect and sentiment based on factorial LDA. We leverage this model to exploit a small set of previously annotated reviews to automatically analyze the topics and sentiment latent in over 50,000 online reviews of physicians (and we make this dataset publicly available). The proposed model outperforms baseline models for this task with respect to model perplexity and sentiment classification. We report the most representative words with respect to positive and negative sentiment along three clinical aspects, thus complementing existing qualitative work exploring patient reviews of physicians.

		Justin Snyder, Rebecca Knowles, Mark Dredze, Matthew R Gormley, Travis Wolfe. Topic Models and Metadata for Visualizing Text Corpora. North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper), 2013. [PDF] [Bibtex] [Close] @inproceedings{Snyder:2013lr, abstract = {Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool.}, author = {Justin Snyder and Rebecca Knowles and Mark Dredze and Matthew R. Gormley and Travis Wolfe}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (Demo Paper)}, date-added = {2013-03-29 00:15:56 -0400}, date-modified = {2017-08-09 19:23:04 +0000}, file = {http://aclweb.org/anthology/N/N13/N13-3002.pdf}, pages = {5-9}, title = {Topic Models and Metadata for Visualizing Text Corpora}, year = {2013} } Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool.

		Damianos Karakos, Mark Dredze, Sanjeev Khudanpur. Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation. Technical Report 8, Johns Hopkins University, 2013. [PDF] [Bibtex] [Close] @techreport{Karakos:2013fk, abstract = {Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language model's parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.}, address = {http://arxiv.org/abs/1303.5148}, author = {Damianos Karakos and Mark Dredze and Sanjeev Khudanpur}, date-added = {2013-03-21 21:43:41 -0400}, date-modified = {2013-06-06 11:23:31 -0400}, file = {http://arxiv.org/abs/1303.5148}, institution = {Johns Hopkins University}, number = {8}, title = {Estimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation}, year = {2013} } Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language model's parameters are tuned based on automatically transcribed audio. However, transcription errors can misguide self-training, particularly in challenging settings such as conversational speech. In this work, we propose a model that considers the confusions (errors) of the ASR channel. By modeling the likely confusions in the ASR output instead of using just the 1-best, we improve self-training efficacy by obtaining a more reliable reference transcription estimate. We demonstrate improved topic-based language modeling adaptation results over both 1-best and lattice self-training using our ASR channel confusion estimates on telephone conversations.

		Shane Bergsma, Mark Dredze, Benjamin Van Durme, Theresa Wilson, David Yarowsky. Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter. North American Chapter of the Association for Computational Linguistics (NAACL), 2013. [PDF] [Bibtex] [Close] @inproceedings{bergsma:2013, abstract = {Hidden properties of social media users, such as their ethnicity, gender, and location, are often reflected in their observed attributes, such as their first and last names. Furthermore, users who communicate with each other often have similar hidden properties. We propose an algorithm that exploits these insights to cluster the observed attributes of hundreds of millions of Twitter users. Attributes such as user names are grouped together if users with those names communicate with other similar users. We separately cluster millions of unique first names, last names, and userprovided locations. The efficacy of these clusters is then evaluated on a diverse set of classification tasks that predict hidden users properties such as ethnicity, geographic location, gender, language, and race, using only profile names and locations when appropriate. Our readily-replicable approach and publicly released clusters are shown to be remarkably effective and versatile, substantially outperforming state-of-the-art approaches and human accuracy on each of the tasks studied.}, annote = {[<a href="http://old-site.clsp.jhu.edu/~sbergsma/TwitterClusters/"><span class="pub_link">Data</span></a>]}, author = {Shane Bergsma and Mark Dredze and Benjamin Van Durme and Theresa Wilson and David Yarowsky}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2013-02-13 12:54:21 -0500}, date-modified = {2017-08-09 19:22:43 +0000}, file = {http://aclweb.org/anthology/N/N13/N13-1121.pdf}, pages = {1010-1019}, title = {Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter}, year = {2013} } [Data] Hidden properties of social media users, such as their ethnicity, gender, and location, are often reflected in their observed attributes, such as their first and last names. Furthermore, users who communicate with each other often have similar hidden properties. We propose an algorithm that exploits these insights to cluster the observed attributes of hundreds of millions of Twitter users. Attributes such as user names are grouped together if users with those names communicate with other similar users. We separately cluster millions of unique first names, last names, and userprovided locations. The efficacy of these clusters is then evaluated on a diverse set of classification tasks that predict hidden users properties such as ethnicity, geographic location, gender, language, and race, using only profile names and locations when appropriate. Our readily-replicable approach and publicly released clusters are shown to be remarkably effective and versatile, substantially outperforming state-of-the-art approaches and human accuracy on each of the tasks studied.

		Mahesh Joshi, Mark Dredze, William W Cohen, Carolyn P Rose. What's in a Domain? Multi-Domain Learning for Multi-Attribute Data. North American Chapter of the Association for Computational Linguistics (NAACL) (short paper), 2013. [PDF] [Bibtex] [Close] @inproceedings{joshi:2013, abstract = {Multi-Domain learning assumes that a single metadata attribute is used in order to divide the data into so-called domains. However, real-world datasets often have multiple metadata attributes that can divide the data into domains. It is not always apparent which single attribute will lead to the best domains, and more than one attribute might impact classification. We propose extensions to two multi-domain learning techniques for our multi-attribute setting, enabling them to simultaneously learn from several metadata attributes. Experimentally, they outperform the multi-domain learning baseline, even when it selects the single "best" attribute.}, author = {Mahesh Joshi and Mark Dredze and William W. Cohen and Carolyn P. Rose}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (short paper)}, date-added = {2013-02-13 12:53:55 -0500}, date-modified = {2017-08-09 19:22:23 +0000}, file = {http://aclweb.org/anthology/N/N13/N13-1080.pdf}, pages = {685-690}, title = {What's in a Domain? Multi-Domain Learning for Multi-Attribute Data}, year = {2013} } Multi-Domain learning assumes that a single metadata attribute is used in order to divide the data into so-called domains. However, real-world datasets often have multiple metadata attributes that can divide the data into domains. It is not always apparent which single attribute will lead to the best domains, and more than one attribute might impact classification. We propose extensions to two multi-domain learning techniques for our multi-attribute setting, enabling them to simultaneously learn from several metadata attributes. Experimentally, they outperform the multi-domain learning baseline, even when it selects the single "best" attribute.

		Alex Lamb, Michael J Paul, Mark Dredze. Separating Fact from Fear: Tracking Flu Infections on Twitter. North American Chapter of the Association for Computational Linguistics (NAACL) (short paper), 2013. [PDF] [Bibtex] [Close] @inproceedings{lamb:2013, abstract = {Twitter has been shown to be a fast and reliable method for disease surveillance of common illnesses like influenza. However, previous work has relied on simple content analysis, which conflates flu tweets that report infection with those that express concerned awareness of the flu. By discriminating these categories, as well as tweets about the authors versus about others, we demonstrate significant improvements on influenza surveillance using Twitter.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/naacl13_flu_data.zip"><span class="pub_link">Data</span></a>]}, author = {Alex Lamb and Michael J. Paul and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL) (short paper)}, date-added = {2013-02-13 12:53:30 -0500}, date-modified = {2017-08-09 19:22:03 +0000}, file = {naacl_2013_flu.pdf}, pages = {789-795}, title = {Separating Fact from Fear: Tracking Flu Infections on Twitter}, year = {2013} } [Data] Twitter has been shown to be a fast and reliable method for disease surveillance of common illnesses like influenza. However, previous work has relied on simple content analysis, which conflates flu tweets that report infection with those that express concerned awareness of the flu. By discriminating these categories, as well as tweets about the authors versus about others, we demonstrate significant improvements on influenza surveillance using Twitter.

		Michael J Paul, Mark Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. North American Chapter of the Association for Computational Linguistics (NAACL), 2013. [PDF] [Bibtex] [Close] @inproceedings{Paul:2013, abstract = {Multi-dimensional latent text models, such as factorial LDA (f-LDA), capture multiple factors of corpora, creating structured output for researchers to better understand the contents of a corpus. We consider such models for clinical research of new recreational drugs and trends, an important application for mining current information for healthcare workers. We use a "three-dimensional" f-LDA variant to jointly model combinations of drug (marijuana, salvia, etc.), aspect (effects, chemistry, etc.) and route of administration (smoking, oral, etc.) Since a purely unsupervised topic model is unlikely to discover these specific factors of interest, we develop a novel method of incorporating prior knowledge by leveraging user generated tags as priors in our model. We demonstrate that this model can be used as an exploratory tool for learning about these drugs from the Web by applying it to the task of extractive summarization. In addition to providing useful output for this important public health task, our prior-enriched model provides a framework for the application of f-LDA to other tasks}, author = {Michael J Paul and Mark Dredze}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2013-02-13 12:52:52 -0500}, date-modified = {2017-08-09 19:21:39 +0000}, file = {naacl_2013_drugs.pdf}, pages = {168-178}, title = {Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models}, year = {2013} } Multi-dimensional latent text models, such as factorial LDA (f-LDA), capture multiple factors of corpora, creating structured output for researchers to better understand the contents of a corpus. We consider such models for clinical research of new recreational drugs and trends, an important application for mining current information for healthcare workers. We use a "three-dimensional" f-LDA variant to jointly model combinations of drug (marijuana, salvia, etc.), aspect (effects, chemistry, etc.) and route of administration (smoking, oral, etc.) Since a purely unsupervised topic model is unlikely to discover these specific factors of interest, we develop a novel method of incorporating prior knowledge by leveraging user generated tags as priors in our model. We demonstrate that this model can be used as an exploratory tool for learning about these drugs from the Web by applying it to the task of extractive summarization. In addition to providing useful output for this important public health task, our prior-enriched model provides a framework for the application of f-LDA to other tasks

		Koby Crammer, Alex Kulesza, Mark Dredze. Adaptive Regularization of Weight Vectors. Machine Learning, 2013;91(2):155-187. [PDF] [Bibtex] [Close] @article{Crammer:2013fk, abstract = {We present AROW, an online learning algorithm for binary and multiclass problems that combines large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive mistake bounds for the binary and multiclass settings that are similar in form to the second order perceptron bound. Our bounds do not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques. Empirical evaluations show that AROW achieves state-of-the-art performance on a wide range of binary and multiclass tasks, as well as robustness in the face of non-separable data.}, author = {Koby Crammer and Alex Kulesza and Mark Dredze}, date-added = {2012-12-31 13:42:35 -0500}, date-modified = {2013-04-14 02:05:41 -0400}, doi = {10.1007/s10994-013-5327-x}, file = {http://link.springer.com/article/10.1007%2Fs10994-013-5327-x}, journal = {Machine Learning}, number = {2}, pages = {155-187}, title = {Adaptive Regularization of Weight Vectors}, volume = {91}, year = {2013}, bdsk-url-1 = {http://dx.doi.org/10.1007/s10994-013-5327-x} } We present AROW, an online learning algorithm for binary and multiclass problems that combines large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive mistake bounds for the binary and multiclass settings that are similar in form to the second order perceptron bound. Our bounds do not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques. Empirical evaluations show that AROW achieves state-of-the-art performance on a wide range of binary and multiclass tasks, as well as robustness in the face of non-separable data.

		Delip Rao, Paul McNamee, Mark Dredze. Entity Linking: Finding Extracted Entities in a Knowledge Base. Multi-source, Multi-lingual Information Extraction and Summarization, 2013. [Bibtex] [Close] @incollection{Rao:2011fk, abstract = {In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-entity to an appropriate entry in a knowledge base, which may or may not contain the entity. This has manifold applications ranging from linking patient health records to maintaining personal credit files, prevention of identity crimes, and supporting law enforcement. We discuss the key challenges present in this task and we present a high-performing system that links entities using max-margin ranking. We also summarize recent work in this area and describe several open research problems.}, author = {Delip Rao and Paul McNamee and Mark Dredze}, booktitle = {Multi-source, Multi-lingual Information Extraction and Summarization}, date-added = {2011-06-24 18:01:02 -0400}, date-modified = {2017-08-09 19:19:52 +0000}, pages = {93-115}, publisher = {Springer Berlin Heidelberg}, title = {Entity Linking: Finding Extracted Entities in a Knowledge Base}, year = {2013} } In the menagerie of tasks for information extraction, entity linking is a new beast that has drawn a lot of attention from NLP practitioners and researchers recently. Entity Linking, also referred to as record linkage or entity resolution, involves aligning a textual mention of a named-entity to an appropriate entry in a knowledge base, which may or may not contain the entity. This has manifold applications ranging from linking patient health records to maintaining personal credit files, prevention of identity crimes, and supporting law enforcement. We discuss the key challenges present in this task and we present a high-performing system that links entities using max-margin ranking. We also summarize recent work in this area and describe several open research problems.

		2012 (17 Publications)
		Kristian Hammond, Jerome Budzik, Lawrence Birnbaum, Kevin Livingston, Mark Dredze. Request initiated collateral content offering. US Patent 8,260,874, 2012. [Bibtex] [Close] @patent{hammond2012request, author = {Kristian Hammond and Jerome Budzik and Lawrence Birnbaum and Kevin Livingston and Mark Dredze}, date-added = {2016-01-25 03:57:21 +0000}, date-modified = {2018-10-25 23:40:17 -0400}, month = September ~4}, note = {US Patent 8,260,874}, publisher = {Google Patents}, title = {Request initiated collateral content offering}, year = {2012} }

		Mark Dredze. How Social Media Will Change Public Health. IEEE Intelligent Systems, 2012;27(4):81-84. [Bibtex] [Close] @article{Dredze:2012qy, abstract = {Recent work in machine learning and natural language processing has studied the health content of tweets and demonstrated the potential for extracting useful public health information from their aggregation. This article examines the types of health topics discussed on Twitter, and how tweets can both augment existing public health capabilities and enable new ones. The author also discusses key challenges that researchers must address to deliver high-quality tools to the public health community.}, annote = {[<a href="http://ieeexplore.ieee.org/arnumber=6285937"><span class="pub_link">Link</span></a>]}, author = {Mark Dredze}, date-added = {2012-09-03 14:03:36 -0400}, date-modified = {2012-09-03 14:04:08 -0400}, journal = {IEEE Intelligent Systems}, number = {4}, pages = {81-84}, title = {How Social Media Will Change Public Health}, volume = {27}, year = {2012} } [Link] Recent work in machine learning and natural language processing has studied the health content of tweets and demonstrated the potential for extracting useful public health information from their aggregation. This article examines the types of health topics discussed on Twitter, and how tweets can both augment existing public health capabilities and enable new ones. The author also discusses key challenges that researchers must address to deliver high-quality tools to the public health community.

		Michael J Paul, Mark Dredze. Factorial LDA: Sparse Multi-Dimensional Text Models. Neural Information Processing Systems (NIPS), 2012. [PDF] [Bibtex] [Close] @inproceedings{Paul:2012lr, abstract = {Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors.}, author = {Michael J. Paul and Mark Dredze}, booktitle = {Neural Information Processing Systems (NIPS)}, date-added = {2012-09-03 14:02:17 -0400}, date-modified = {2017-02-20 17:40:15 +0000}, file = {nips_2012-flda.pdf}, title = {Factorial LDA: Sparse Multi-Dimensional Text Models}, year = {2012} } Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors.

		Alex Lamb, Michael J Paul, Mark Dredze. Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics. AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, 2012. [PDF] [Bibtex] [Close] @inproceedings{lamb:2012, abstract = {We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.}, author = {Alex Lamb and Michael J. Paul and Mark Dredze}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, date-added = {2012-08-02 23:20:09 -0400}, date-modified = {2012-08-02 23:20:27 -0400}, file = {aaai_2012_flu_concern.pdf}, keywords = {workshop}, title = {Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics}, year = {2012} } We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.

		Atul Nakhasi, Ralph J Passarella, Sarah G Bell, Michael J Paul, Mark Dredze, Peter J Pronovost. Malpractice and Malcontent: Analyzing Medical Complaints in Twitter. AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, 2012. [PDF] [Bibtex] [Close] @inproceedings{nakhasi:2012, abstract = {In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.}, author = {Atul Nakhasi and Ralph J Passarella and Sarah G Bell and Michael J Paul and Mark Dredze and Peter J Pronovost}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, date-added = {2012-08-02 23:19:33 -0400}, date-modified = {2012-09-07 14:08:31 -0400}, file = {aaai_2012_patient_safety.pdf}, keywords = {workshop}, title = {Malpractice and Malcontent: Analyzing Medical Complaints in Twitter}, year = {2012} } In this paper we report preliminary results from a study of Twitter to identify patient safety reports, which offer an immediate, untainted, and expansive patient perspective un- like any other mechanism to date for this topic. We identify patient safety related tweets and characterize them by which medical populations caused errors, who reported these er- rors, what types of errors occurred, and what emotional states were expressed in response. Our long term goal is to improve the handling and reduction of errors by incorpo- rating this patient input into the patient safety process.

		Michael J Paul, Mark Dredze. Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions. AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, 2012. [PDF] [Bibtex] [Close] @inproceedings{Paul:2012fk, abstract = {Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.}, author = {Michael J. Paul and Mark Dredze}, booktitle = {AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text}, date-added = {2012-07-27 17:35:07 -0400}, date-modified = {2012-07-27 17:35:36 -0400}, file = {aaai_2012_drugs.pdf}, keywords = {workshop}, title = {Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions}, year = {2012} } Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of a multi-dimensional latent text model -- factorial LDA -- that captures orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interest to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.

		Ralph J Passarella, Atul Nakhasi, Sarah G Bell, Michael J Paul, Peter J Pronovost, Mark Dredze. Twitter as a Source for Learning about Patient Safety Events. Annual Symposium of the American Medical Informatics Association (AMIA), 2012. [Bibtex] [Close] @inproceedings{Passarella:2012fk, author = {Ralph J Passarella and Atul Nakhasi and Sarah G Bell and Michael J. Paul and Peter J Pronovost and Mark Dredze}, booktitle = {Annual Symposium of the American Medical Informatics Association (AMIA)}, date-added = {2012-06-24 15:33:40 -0400}, date-modified = {2012-06-24 15:36:11 -0400}, keywords = {abstract}, title = {Twitter as a Source for Learning about Patient Safety Events}, year = {2012} }

		Damianos Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Dan Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley. Deriving conversation-based features from unlabeled speech for discriminative language modeling. International Speech Communication Association (INTERSPEECH), 2012. [PDF] [Bibtex] [Close] @inproceedings{Karakos:2012fk, abstract = {The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.}, author = {Damianos Karakos and Brian Roark and Izhak Shafran and Kenji Sagae and Maider Lehr and Emily Prud'hommeaux and Puyang Xu and Nathan Glenn and Sanjeev Khudanpur and Murat Saraclar and Dan Bikel and Mark Dredze and Chris Callison-Burch and Yuan Cao and Keith Hall and Eva Hasler and Philipp Koehn and Adam Lopez and Matt Post and Darcey Riley}, booktitle = {International Speech Communication Association (INTERSPEECH)}, date-added = {2012-06-10 01:27:43 -0400}, date-modified = {2012-06-10 01:28:45 -0400}, file = {interspeech_2012_semisup.pdf}, title = {Deriving conversation-based features from unlabeled speech for discriminative language modeling}, year = {2012} } The perceptron algorithm was used in [1] to estimate discriminative language models which correct errors in the output of ASR systems. In its simplest version, the algorithm simply increases the weight of n-gram features which appear in the correct (oracle) hypothesis and decreases the weight of n-gram features which appear in the 1-best hypothesis. In this paper, we show that the perceptron algorithm can be successfully used in a semi-supervised learning (SSL) framework, where limited amounts of labeled data are available. Our framework has some similarities to graph-based label propagation [2] in the sense that a graph is built based on proximity of unlabeled conversations, and then it is used to propagate confidences (in the form of features) to the labeled data, based on which perceptron trains a discriminative model. The novelty of our approach lies in the fact that the confidence "flows" from the unlabeled data to the labeled data, and not vice-versa, as is done traditionally in SSL. Experiments conducted at the 2011 CLSP Summer Workshop on the conversational telephone speech corpora Dev04f and Eval04f demonstrate the effectiveness of the proposed approach.

		Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur. Efficient Structured Language Modeling for Speech Recognition. International Speech Communication Association (INTERSPEECH), 2012. [PDF] [Bibtex] [Close] @inproceedings{Rastrow:2012, abstract = {The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.}, author = {Ariya Rastrow and Mark Dredze and Sanjeev Khudanpur}, booktitle = {International Speech Communication Association (INTERSPEECH)}, date-added = {2012-06-08 13:42:17 -0400}, date-modified = {2012-06-08 13:42:46 -0400}, file = {interspeech_2012_slm.pdf}, title = {Efficient Structured Language Modeling for Speech Recognition}, year = {2012} } The structured language model (SLM) of [1] was one of the first to successfully integrate syntactic structure into language models. We extend the SLM framework in two new directions. First, we propose a new syntactic hierarchical interpolation that improves over previous approaches. Second, we develop a general information-theoretic algorithm for pruning the underlying Jelinek-Mercer interpolated LM used in [1], which substantially reduces the size of the LM, enabling us to train on large data. When combined with hill-climbing [2] the SLM is an accurate model, space-efficient and fast for rescoring large speech lattices. Experimental results on broadcast news demonstrate that the SLM outperforms a large 4-gram LM.

		Nicholas Andrews, Jason Eisner, Mark Dredze. Name Phylogeny: A Generative Model of String Variation. Empirical Methods in Natural Language Processing (EMNLP), 2012. [PDF] [Bibtex] [Close] @inproceedings{Andrews:2012uq, abstract = {Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, "similar" strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to refer to persons in Wikipedia, improving over standard untrained distances such as Jaro-Winkler and Levenshtein distance.}, author = {Nicholas Andrews and Jason Eisner and Mark Dredze}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2012-05-21 23:15:53 -0400}, date-modified = {2017-08-14 20:52:06 +0000}, file = {http://aclweb.org/anthology-new/D/D12/D12-1032v2.pdf}, pages = {344-355}, title = {Name Phylogeny: A Generative Model of String Variation}, year = {2012} } Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, "similar" strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to refer to persons in Wikipedia, improving over standard untrained distances such as Jaro-Winkler and Levenshtein distance.

		Mahesh Joshi, Mark Dredze, William W Cohen, Carolyn P Rose. Multi-Domain Learning: When Do Domains Matter? Empirical Methods in Natural Language Processing (EMNLP), 2012. [PDF] [Bibtex] [Close] @inproceedings{Joshi:2012fk, abstract = {We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.}, author = {Mahesh Joshi and Mark Dredze and William W Cohen and Carolyn P Rose}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2012-05-21 23:14:55 -0400}, date-modified = {2017-08-14 20:50:14 +0000}, file = {emnlp_2012_multi_domain.pdf}, pages = {1302-1312}, title = {Multi-Domain Learning: When Do Domains Matter?}, year = {2012} } We present a systematic analysis of existing multi-domain learning approaches with respect to two questions. First, many multi-domain learning algorithms resemble ensemble learning algorithms. (1) Are multi-domain learning improvements the result of ensemble learning effects? Second, these algorithms are traditionally evaluated in a balanced label setting, although in practice many multi-domain settings have domain-specific label biases. When multi-domain learning is applied to these settings, (2) are multi-domain methods improving because they capture domain-specific class biases? An understanding of these two issues presents a clearer idea about where the field has had success in multi-domain learning, and it suggests some important open questions for improving beyond the current state of the art.

		Ariya Rastrow, Sanjeev Khudanpur, Mark Dredze. Revisiting the Case for Explicit Syntactic Information in Language Models. NAACL Workshop on the Future of Language Modeling for HLT, 2012. [PDF] [Bibtex] [Close] @inproceedings{Rastrow:2012fl, abstract = {Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.}, author = {Ariya Rastrow and Sanjeev Khudanpur and Mark Dredze}, booktitle = {NAACL Workshop on the Future of Language Modeling for HLT}, date-added = {2012-04-30 22:25:09 -0400}, date-modified = {2017-08-14 20:49:26 +0000}, file = {naacl2012_ngram_workshop.pdf}, keywords = {workshop}, pages = {50-58}, title = {Revisiting the Case for Explicit Syntactic Information in Language Models}, year = {2012} } Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naive, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactically well-motivated models. This unusual resilience of n-grams, as well as their weaknesses, are examined here. It is demonstrated that n-grams are good word-predictors, even linguistically speaking, in a large majority of word-positions, and it is suggested that to improve over n-grams, one must explore syntax-aware (or other) language models that focus on positions where n-grams are weak.

		Spence Green, Nicholas Andrews, Matthew R Gormley, Mark Dredze, Christopher D Manning. Entity Clustering Across Languages. North American Chapter of the Association for Computational Linguistics (NAACL), 2012. [PDF] [Bibtex] [Close] @inproceedings{Green:2012uq, abstract = {Standard entity clustering systems commonly rely on mention (string) matching, syntactic features, and linguistic resources like English WordNet. When co-referent text mentions appear in different languages, these techniques cannot be easily applied. Consequently, we develop new methods for clustering text mentions across documents and languages simultaneously, producing cross-lingual entity clusters. Our approach extends standard clustering algorithms with cross-lingual mention and context similarity measures. Crucially, we do not assume a pre-existing entity list (knowledge base), so entity characteristics are unknown. On an Arabic-English corpus that contains seven different text genres, our best model yields a 24.3% F1 gain over the baseline.}, author = {Spence Green and Nicholas Andrews and Matthew R Gormley and Mark Dredze and Christopher D. Manning}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2012-03-21 16:27:52 -0400}, date-modified = {2017-08-14 20:48:38 +0000}, file = {naacl2012_entity_clustering.pdf}, pages = {60-69}, title = {Entity Clustering Across Languages}, year = {2012} } Standard entity clustering systems commonly rely on mention (string) matching, syntactic features, and linguistic resources like English WordNet. When co-referent text mentions appear in different languages, these techniques cannot be easily applied. Consequently, we develop new methods for clustering text mentions across documents and languages simultaneously, producing cross-lingual entity clusters. Our approach extends standard clustering algorithms with cross-lingual mention and context similarity measures. Crucially, we do not assume a pre-existing entity list (knowledge base), so entity characteristics are unknown. On an Arabic-English corpus that contains seven different text genres, our best model yields a 24.3% F1 gain over the baseline.

		Matthew R Gormley, Mark Dredze, Benjamin Van Durme, Jason Eisner. Shared Components Topic Models. North American Chapter of the Association for Computational Linguistics (NAACL), 2012. [PDF] [Bibtex] [Close] @inproceedings{Gormley:2012fk, abstract = {With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the underlying structure of the topics themselves. As a result, most topic models generate topics independently from a single underlying distribution and require millions of parameters, in the form of multinomial distributions over the vocabulary. In this paper, we introduce the Shared Components Topic Model (SCTM), in which each topic is a normalized product of a smaller number of underlying component distributions. Our model learns these component distributions and the structure of how to combine subsets of them into topics. The SCTM can represent topics in a much more compact representation than LDA and achieves better perplexity with fewer parameters.}, author = {Matthew R. Gormley and Mark Dredze and Benjamin Van Durme and Jason Eisner}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2012-03-21 16:27:12 -0400}, date-modified = {2017-08-14 20:48:06 +0000}, file = {naacl2012_sctm.pdf}, pages = {783-792}, title = {Shared Components Topic Models}, year = {2012} } With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the underlying structure of the topics themselves. As a result, most topic models generate topics independently from a single underlying distribution and require millions of parameters, in the form of multinomial distributions over the vocabulary. In this paper, we introduce the Shared Components Topic Model (SCTM), in which each topic is a normalized product of a smaller number of underlying component distributions. Our model learns these component distributions and the structure of how to combine subsets of them into topics. The SCTM can represent topics in a much more compact representation than LDA and achieves better perplexity with fewer parameters.

		Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur. Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining. Association for Computational Linguistics (ACL), 2012. [PDF] [Bibtex] [Close] @inproceedings{Rastrow:2012fk, abstract = {Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.}, author = {Ariya Rastrow and Mark Dredze and Sanjeev Khudanpur}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2012-03-11 21:58:29 -0500}, date-modified = {2017-08-14 20:46:12 +0000}, file = {acl2012_substructure.pdf}, pages = {175-183}, title = {Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining}, year = {2012} } Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N-best and hill climbing rescoring, and show that up-training leads to WER reduction.

		Koby Crammer, Alex Kulesza, Mark Dredze. New H-∞ Bounds for the Recursive Least Squares Algorithm Exploiting Input Structure. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012. [PDF] [Bibtex] [Close] @inproceedings{Crammer:2012fk, abstract = {The well known recursive least squares (RLS) algorithm has been widely used for many years. Most analyses of RLS have assumed statistical properties of the data or noise process, but recent robust H∞ analyses have been used to bound the ratio of the performance of the algorithm to the total noise. In this paper, we provide the first additive analysis bounding the difference between performance and noise. Our analysis provides additional convergence guarantees in general, and particularly with structured input data. We illustrate the analysis using human speech and white noise.}, author = {Koby Crammer and Alex Kulesza and Mark Dredze}, booktitle = {International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, date-added = {2011-12-22 21:18:35 -0500}, date-modified = {2017-08-14 20:45:18 +0000}, file = {icassp12_rls.pdf}, pages = {2017-2020}, title = {New H-∞ Bounds for the Recursive Least Squares Algorithm Exploiting Input Structure}, year = {2012} } The well known recursive least squares (RLS) algorithm has been widely used for many years. Most analyses of RLS have assumed statistical properties of the data or noise process, but recent robust H∞ analyses have been used to bound the ratio of the performance of the algorithm to the total noise. In this paper, we provide the first additive analysis bounding the difference between performance and noise. Our analysis provides additional convergence guarantees in general, and particularly with structured input data. We illustrate the analysis using human speech and white noise.

		Koby Crammer, Mark Dredze, Fernando Pereira. Confidence-Weighted Linear Classification for Text Categorization. Journal of Machine Learning Research (JMLR), 2012;13(Jun):1891-1926. [PDF] [Bibtex] [Close] @article{Pereira:2011fk, abstract = {Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.}, author = {Koby Crammer and Mark Dredze and Fernando Pereira}, date-added = {2011-12-15 19:31:13 -0500}, date-modified = {2014-01-12 02:08:25 +0000}, file = {http://jmlr.org/papers/volume13/crammer12a/crammer12a.pdf}, journal = {Journal of Machine Learning Research (JMLR)}, number = {Jun}, pages = {1891-1926}, title = {Confidence-Weighted Linear Classification for Text Categorization}, volume = {13}, year = {2012} } Confidence-weighted online learning is a generalization of margin-based learning of linear classifiers in which the margin constraint is replaced by a probabilistic constraint based on a distribution over classifier weights that is updated online as examples are observed. The distribution captures a notion of confidence on classifier weights, and in some cases it can also be interpreted as replacing a single learning rate by adaptive per-weight rates. Confidence-weighted learning was motivated by the statistical properties of natural language classification tasks, where most of the informative features are relatively rare. We investigate several versions of confidence-weighted learning that use a Gaussian distribution over weight vectors, updated at each observed example to achieve high probability of correct classification for the example. Empirical evaluation on a range of text-categorization tasks show that our algorithms improve over other state-of-the-art online and batch methods, learn faster in the online setting, and lead to better classifier combination for a type of distributed training commonly used in cloud computing.

		2011 (12 Publications)
		Joshua T Vogelstein, William R Gray, Jason G Martin, Glen A Coppersmith, Mark Dredze, J Bogovic, J L Prince, S M Resnick, Carey E Priebe, R Jacob Vogelstein. Connectome Classification using Statistical Graph Theory and Machine Learning. Society for Neuroscience (Poster), 2011. [Bibtex] [Close] @inproceedings{Vogelstein:2011rw, author = {Joshua T. Vogelstein and William R. Gray and Jason G. Martin and Glen A. Coppersmith and Mark Dredze and J. Bogovic and J. L. Prince and S. M. Resnick and Carey E. Priebe and R. Jacob Vogelstein}, booktitle = {Society for Neuroscience (Poster)}, date-added = {2014-09-24 04:01:20 +0000}, date-modified = {2016-02-02 14:00:34 +0000}, keywords = {abstract}, title = {Connectome Classification using Statistical Graph Theory and Machine Learning}, year = {2011} }

		Spence Green, Nicholas Andrews, Matthew R Gormley, Mark Dredze, Christopher D Manning. Cross-lingual Coreference Resolution: A New Task for Multilingual Comparable Corpora. Technical Report 6, Human Language Technology Center of Excellence, Johns Hopkins University, 2011. [Bibtex] [Close] @techreport{Green:2011lr, abstract = {We introduce cross-lingual coreference resolution, the task of grouping entity mentions with a common referent in a multilingual corpus. Information, especially on the web, is increasingly multilingual. We would like to track entity references across languages without machine translation, which is expensive and unavailable for many language pairs. Therefore, we develop a set of models that rely on decreasing levels of parallel resources: a bitext, a bilingual lexicon, and a parallel name list. We propose baselines, provide experimental results, and analyze sources of error. Across a range of metrics, we find that even our lowest resource model gives a 2.5% F1 absolute improvement over the strongest baseline. Our results present a positive outlook for crosslingual coreference resolution even in low resource languages. We are releasing our crosslingual annotations for the ACE2008 ArabicEnglish evaluation corpus.}, author = {Spence Green and Nicholas Andrews and Matthew R. Gormley and Mark Dredze and Christopher D Manning}, date-added = {2013-02-01 10:04:25 -0500}, date-modified = {2013-06-06 11:31:49 -0400}, institution = {Human Language Technology Center of Excellence, Johns Hopkins University}, number = {6}, title = {Cross-lingual Coreference Resolution: A New Task for Multilingual Comparable Corpora}, year = {2011} } We introduce cross-lingual coreference resolution, the task of grouping entity mentions with a common referent in a multilingual corpus. Information, especially on the web, is increasingly multilingual. We would like to track entity references across languages without machine translation, which is expensive and unavailable for many language pairs. Therefore, we develop a set of models that rely on decreasing levels of parallel resources: a bitext, a bilingual lexicon, and a parallel name list. We propose baselines, provide experimental results, and analyze sources of error. Across a range of metrics, we find that even our lowest resource model gives a 2.5% F1 absolute improvement over the strongest baseline. Our results present a positive outlook for crosslingual coreference resolution even in low resource languages. We are releasing our crosslingual annotations for the ACE2008 ArabicEnglish evaluation corpus.

		Matthew R Gormley, Mark Dredze, Benjamin Van Durme, Jason Eisner. Shared Components Topic Models with Application to Selectional Preference. NIPS Workshop on Learning Semantics, 2011. [PDF] [Bibtex] [Close] @inproceedings{Gormley:2011fk, author = {Matthew R. Gormley and Mark Dredze and Benjamin Van Durme and Jason Eisner}, booktitle = {NIPS Workshop on Learning Semantics}, date-added = {2011-10-27 14:58:38 -0400}, date-modified = {2011-10-27 14:59:38 -0400}, file = {2011_nips_workshop_gormley.pdf}, keywords = {workshop}, title = {Shared Components Topic Models with Application to Selectional Preference}, year = {2011} }

		Damianos Karakos, Mark Dredze, Kenneth Church, Aren Jansen, Sanjeev Khudanpur. Estimating Document Frequencies in a Speech Corpus. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011. [PDF] [Bibtex] [Close] @inproceedings{Krakos:2011, abstract = {Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df(w), the number of documents that mention w at least once. This quantity is relatively easy to compute over textual documents, but spoken documents are more challenging. This paper considers two baselines: (1) an estimate based on the 1-best ASR output and (2) an estimate based on expected term frequencies computed from the lattice. We improve over these baselines by taking advantage of repetition. Whatever the document is about is likely to be repeated, unlike ASR errors, which tend to be more random (Poisson). In addition, we find it helpful to consider an ensemble of language models. There is an opportunity for the ensemble to reduce noise, assuming that the errors across language models are relatively uncorrelated. The opportunity for improvement is larger when WER is high. This paper considers a pairing task application that could benefit from improved estimates of df. The pairing task inputs conversational sides from the English Fisher corpus and outputs estimates of which sides were from the same conversation. Better estimates of df lead to better performance on this task.}, author = {Damianos Karakos and Mark Dredze and Kenneth Church and Aren Jansen and Sanjeev Khudanpur}, booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)}, date-added = {2011-08-22 10:26:39 -0400}, date-modified = {2011-08-22 10:27:23 -0400}, file = {2011_asru_idf.pdf}, title = {Estimating Document Frequencies in a Speech Corpus}, year = {2011} } Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df(w), the number of documents that mention w at least once. This quantity is relatively easy to compute over textual documents, but spoken documents are more challenging. This paper considers two baselines: (1) an estimate based on the 1-best ASR output and (2) an estimate based on expected term frequencies computed from the lattice. We improve over these baselines by taking advantage of repetition. Whatever the document is about is likely to be repeated, unlike ASR errors, which tend to be more random (Poisson). In addition, we find it helpful to consider an ensemble of language models. There is an opportunity for the ensemble to reduce noise, assuming that the errors across language models are relatively uncorrelated. The opportunity for improvement is larger when WER is high. This paper considers a pairing task application that could benefit from improved estimates of df. The pairing task inputs conversational sides from the English Fisher corpus and outputs estimates of which sides were from the same conversation. Better estimates of df lead to better performance on this task.

		Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur. Adapting N-Gram Maximum Entropy Language Models with Conditional Entropy Regularization. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011. [PDF] [Bibtex] [Close] @inproceedings{Rastrow:2011fl, abstract = {Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of- domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.}, author = {Ariya Rastrow and Mark Dredze and Sanjeev Khudanpur}, booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)}, date-added = {2011-08-22 10:24:06 -0400}, date-modified = {2011-08-22 10:24:40 -0400}, file = {2011_asru_semisupervised_adaptation.pdf}, title = {Adapting N-Gram Maximum Entropy Language Models with Conditional Entropy Regularization}, year = {2011} } Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of- domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.

		Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur. Efficient Discrimnative Training of Long-Span Language Models. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2011. [PDF] [Bibtex] [Close] @inproceedings{Rastrow:2011fk, abstract = {Long-span language models, such as those involving syntactic dependencies, produce more coherent text than their n-gram counterparts. However, evaluating the large number of sentence-hypotheses in a packed representation such as an ASR lattice is intractable under such long-span models both during decoding and discriminative training. The accepted compromise is to rescore only the N-best hypotheses in the lattice using the long-span LM. We present discriminative hill climbing, an efficient and effective discriminative training procedure for long- span LMs based on a hill climbing rescoring algorithm. We empirically demonstrate significant computational savings as well as error-rate reduction over N-best training methods in a state of the art ASR system for Broadcast News transcription.}, author = {Ariya Rastrow and Mark Dredze and Sanjeev Khudanpur}, booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)}, date-added = {2011-08-22 10:19:46 -0400}, date-modified = {2011-08-22 10:21:23 -0400}, file = {2011_asru_discrimnative_hill_climbing.pdf}, title = {Efficient Discrimnative Training of Long-Span Language Models}, year = {2011} } Long-span language models, such as those involving syntactic dependencies, produce more coherent text than their n-gram counterparts. However, evaluating the large number of sentence-hypotheses in a packed representation such as an ASR lattice is intractable under such long-span models both during decoding and discriminative training. The accepted compromise is to rescore only the N-best hypotheses in the lattice using the long-span LM. We present discriminative hill climbing, an efficient and effective discriminative training procedure for long- span LMs based on a hill climbing rescoring algorithm. We empirically demonstrate significant computational savings as well as error-rate reduction over N-best training methods in a state of the art ASR system for Broadcast News transcription.

		Ann Irvine, Mark Dredze, Geraldine Legendre, Paul Smolensky. Optimality Theory Syntax Learnability: An Empirical Exploration of the Perceptron and GLA. CogSci Workshop on OT as a General Cognitive Architecture, 2011. [Bibtex] [Close] @inproceedings{Irvine:2011lr, author = {Ann Irvine and Mark Dredze and Geraldine Legendre and Paul Smolensky}, booktitle = {CogSci Workshop on OT as a General Cognitive Architecture}, date-added = {2011-06-01 12:52:35 -0400}, date-modified = {2011-06-01 12:54:13 -0400}, keywords = {workshop}, title = {Optimality Theory Syntax Learnability: An Empirical Exploration of the Perceptron and GLA}, year = {2011} }

		Carolina Parada, Mark Dredze, Frederick Jelinek. OOV Sensitive Named-Entity Recognition in Speech. International Speech Communication Association (INTERSPEECH), 2011. [PDF] [Bibtex] [Close] @inproceedings{Parada:2011fk, abstract = {Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named entities and always produce transcription errors. In this work, we improve speech NER by including features indicative of OOVs based on a OOV detector, allowing for the identification of regions of speech containing named entities, even if they are incorrectly transcribed. We construct a new speech NER data set and demonstrate significant improvements for this task.}, annote = {[<a href="https://github.com/mdredze/speech_ner_entity_linking_data/"><span class="pub_link">Data</span></a>]}, author = {Carolina Parada and Mark Dredze and Frederick Jelinek}, booktitle = {International Speech Communication Association (INTERSPEECH)}, date-added = {2011-05-28 22:07:38 -0400}, date-modified = {2011-05-28 22:10:48 -0400}, file = {2011_interspeech_oov_ner.pdf}, title = {OOV Sensitive Named-Entity Recognition in Speech}, year = {2011} } [Data] Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named entities and always produce transcription errors. In this work, we improve speech NER by including features indicative of OOVs based on a OOV detector, allowing for the identification of regions of speech containing named entities, even if they are incorrectly transcribed. We construct a new speech NER data set and demonstrate significant improvements for this task.

		Michael J Paul, Mark Dredze. A Model for Mining Public Health Topics from Twitter. Technical Report -, Johns Hopkins University, 2011. [PDF] [Bibtex] [Close] @techreport{Paul:2011lr, abstract = {We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ailments, such as influenza, infections, obesity, as compared to standard topic models. Furthermore, ATAM matches influenza tracking results produced by Google Flu Trends and previous influenza specialized Twitter models compared with government public health data.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/health_tweetIDs.txt"><span class="pub_link">Data</span></a>]}, author = {Michael J. Paul and Mark Dredze}, date-added = {2011-05-01 11:58:19 -0400}, date-modified = {2011-05-01 11:58:58 -0400}, file = {2011.tech.twitter_health.pdf}, institution = {Johns Hopkins University}, number = {-}, title = {A Model for Mining Public Health Topics from Twitter}, year = {2011} } [Data] We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ailments, such as influenza, infections, obesity, as compared to standard topic models. Furthermore, ATAM matches influenza tracking results produced by Google Flu Trends and previous influenza specialized Twitter models compared with government public health data.

		Michael J Paul, Mark Dredze. You Are What You Tweet: Analyzing Twitter for Public Health. International Conference on Weblogs and Social Media (ICWSM), 2011. [PDF] [Bibtex] [Close] @inproceedings{Paul:2011fk, abstract = {Analyzing user messages in social media can mea- sure different population haracteristics, including public health measures. For example, recent work has correlated Twitter messages with influenza rates in the United States; but this has largely been the extent of mining Twitter for public health. In this work, we consider a broader range of public health applications for Twitter. We apply the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discover mentions of over a dozen ailments, including allergies, obesity and in- somnia. We introduce extensions to incorporate prior knowledge into this model and apply it to several tasks: tracking illnesses over times (syndromic surveillance), measuring behavioral risk factors, localizing illnesses by geographic region, and analyzing symptoms and medication usage. We show quantitative correlations with public health data and qualitative evaluations of model output. Our results suggest that Twitter has broad applicability for public health research.}, author = {Michael J. Paul and Mark Dredze}, booktitle = {International Conference on Weblogs and Social Media (ICWSM)}, date-added = {2011-03-19 20:51:01 -0400}, date-modified = {2017-08-09 19:18:15 +0000}, file = {twitter_health_icwsm_11.pdf}, keywords = {selected}, pages = {265-272}, title = {You Are What You Tweet: Analyzing Twitter for Public Health}, year = {2011} } Analyzing user messages in social media can mea- sure different population haracteristics, including public health measures. For example, recent work has correlated Twitter messages with influenza rates in the United States; but this has largely been the extent of mining Twitter for public health. In this work, we consider a broader range of public health applications for Twitter. We apply the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discover mentions of over a dozen ailments, including allergies, obesity and in- somnia. We introduce extensions to incorporate prior knowledge into this model and apply it to several tasks: tracking illnesses over times (syndromic surveillance), measuring behavioral risk factors, localizing illnesses by geographic region, and analyzing symptoms and medication usage. We show quantitative correlations with public health data and qualitative evaluations of model output. Our results suggest that Twitter has broad applicability for public health research.

		Carolina Parada, Mark Dredze, Abhinav Sethy, Ariya Rastrow. Learning Sub-Word Units for Open Vocabulary Speech Recognition. Association for Computational Linguistics (ACL), 2011. [PDF] [Bibtex] [Close] @inproceedings{parada:11, abstract = {Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to {\em learn} the sub-word lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. %We present results on a Broadcast News and MIT Lectures data sets. A hybrid model with our learned sub-word lexicon reduces error by 6.3\% and 7.6\% (absolute) at a 5\% false alarm rate on an English Broadcast News and MIT Lectures task respectively.}, author = {Carolina Parada and Mark Dredze and Abhinav Sethy and Ariya Rastrow}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2011-02-12 20:18:18 -0500}, date-modified = {2017-08-09 19:17:39 +0000}, file = {learning_units_acl_11.pdf}, pages = {712-721}, title = {Learning Sub-Word Units for Open Vocabulary Speech Recognition}, year = {2011} } Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to \em learn the sub-word lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. %We present results on a Broadcast News and MIT Lectures data sets. A hybrid model with our learned sub-word lexicon reduces error by 6.3\% and 7.6\% (absolute) at a 5\% false alarm rate on an English Broadcast News and MIT Lectures task respectively.

		Ariya Rastrow, Markus Dreyer, Abhinav Sethy, Sanjeev Khudanpur, Bhuvana Ramabhadran, Mark Dredze. Hill Climbing on Speech Lattices: A New Rescoring Framework. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. [PDF] [Bibtex] [Close] @inproceedings{rastrow:11, abstract = {We describe a new approach for rescoring speech lattices - with long-span language models or wide-context acoustic models - that does not entail computationally intensive lattice expansion or limited rescoring of only an N-best list. We view the set of word-sequences in a lattice as a discrete space equipped with the edit-distance metric, and develop a hill climbing technique to start with, say, the 1-best hypothesis under the lattice-generating model(s) and iteratively search a local neighborhood for the highest-scoring hypothesis under the rescoring model(s); such neighborhoods are efficiently constructed via finite state techniques. We demonstrate empirically that to achieve the same reduction in error rate using a better estimated, higher order language model, our technique evaluates fewer utterance-length hypotheses than conventional N-best rescoring by two orders of magnitude. For the same number of hypotheses evaluated, our technique results in a significantly lower error rate.}, author = {Ariya Rastrow and Markus Dreyer and Abhinav Sethy and Sanjeev Khudanpur and Bhuvana Ramabhadran and Mark Dredze}, booktitle = {International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, date-added = {2011-01-17 19:51:48 -0500}, date-modified = {2017-08-09 19:17:14 +0000}, file = {hill_climb_icassp_11.pdf}, pages = {5032-5035}, title = {Hill Climbing on Speech Lattices: A New Rescoring Framework}, year = {2011} } We describe a new approach for rescoring speech lattices - with long-span language models or wide-context acoustic models - that does not entail computationally intensive lattice expansion or limited rescoring of only an N-best list. We view the set of word-sequences in a lattice as a discrete space equipped with the edit-distance metric, and develop a hill climbing technique to start with, say, the 1-best hypothesis under the lattice-generating model(s) and iteratively search a local neighborhood for the highest-scoring hypothesis under the rescoring model(s); such neighborhoods are efficiently constructed via finite state techniques. We demonstrate empirically that to achieve the same reduction in error rate using a better estimated, higher order language model, our technique evaluates fewer utterance-length hypotheses than conventional N-best rescoring by two orders of magnitude. For the same number of hypotheses evaluated, our technique results in a significantly lower error rate.

		2010 (12 Publications)
		Mark Dredze, Aren Jansen, Glen A Coppersmith, Kenneth Church. NLP on Spoken Documents without ASR. Empirical Methods in Natural Language Processing (EMNLP), 2010. [PDF] [Bibtex] [Close] @inproceedings{dredze:10d, abstract = {There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-of-vocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long (~1 sec) repetitions in speech, and clusters them into pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudo-terms; performance on a Switchboard task approaches a baseline using gold standard manual transcriptions.}, author = {Mark Dredze and Aren Jansen and Glen A Coppersmith and Kenneth Church}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2010-08-15 15:20:49 -0400}, date-modified = {2017-08-09 19:15:50 +0000}, file = {emnlp_2010_nlp_asr.pdf}, pages = {460-470}, title = {NLP on Spoken Documents without ASR}, year = {2010} } There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and information retrieval. Many of these boxes, especially ASR, are often based on considerable linguistic resources. We would like to be able to process spoken documents with few (if any) resources. Moreover, connecting black boxes in series tends to multiply errors, especially when the key terms are out-of-vocabulary (OOV). The proposed alternative applies text processing directly to the speech without a dependency on ASR. The method finds long ( 1 sec) repetitions in speech, and clusters them into pseudo-terms (roughly phrases). Document clustering and classification work surprisingly well on pseudo-terms; performance on a Switchboard task approaches a baseline using gold standard manual transcriptions.

		Mark Dredze, Tim Oates, Christine Piatko. We're Not in Kansas Anymore: Detecting Domain Changes in Streams. Empirical Methods in Natural Language Processing (EMNLP), 2010. [PDF] [Bibtex] [Close] @inproceedings{dredze:10c, abstract = {Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention -- detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses A-distance, a metric for detecting shifts in data streams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.}, author = {Mark Dredze and Tim Oates and Christine Piatko}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2010-08-15 15:19:58 -0400}, date-modified = {2017-08-09 19:15:26 +0000}, file = {emnlp_2010_domain_shift.pdf}, pages = {585-595}, title = {We're Not in Kansas Anymore: Detecting Domain Changes in Streams}, year = {2010} } Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention -- detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses A-distance, a metric for detecting shifts in data streams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.

		Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek. A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web. International Speech Communication Association (INTERSPEECH), 2010. [PDF] [Bibtex] [Close] @inproceedings{parada:10a, abstract = {Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.}, author = {Carolina Parada and Abhinav Sethy and Mark Dredze and Frederick Jelinek}, booktitle = {International Speech Communication Association (INTERSPEECH)}, date-added = {2010-07-02 08:54:26 -0400}, date-modified = {2010-09-03 12:40:23 -0400}, file = {interspeech_2010_oovrecovery.pdf}, title = {A Spoken Term Detection Framework for Recovering Out-of-Vocabulary Words Using the Web}, year = {2010} } Vocabulary restrictions in large vocabulary continuous speech recognition (LVCSR) systems mean that out-of-vocabulary (OOV) words are lost in the output. However, OOV words tend to be information rich terms (often named entities) and their omission from the transcript negatively affects both usability and downstream NLP technologies, such as machine translation or knowledge distillation. We propose a novel approach to OOV recovery that uses a spoken term detection (STD) framework. Given an identified OOV region in the LVCSR output, we recover the uttered OOVs by utilizing contextual information and the vast and constantly updated vocabulary on the Web. Discovered words are integrated into system output, recovering up to 40% of OOVs and resulting in a reduction in system error.

		Delip Rao, Paul McNamee, Mark Dredze. Streaming Cross Document Entity Coreference Resolution. Conference on Computational Linguistics (Coling), 2010. [PDF] [Bibtex] [Close] @inproceedings{delip-rao:10, abstract = {Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n^2) space and time. In this paper we explore identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information. We test our methods using several corpora to quantitatively measure both the efficacy and scalability of our streaming approach. We show that our approach scales to at least an order of magnitude larger data than previous reported methods.}, author = {Delip Rao and Paul McNamee and Mark Dredze}, booktitle = {Conference on Computational Linguistics (Coling)}, date-added = {2010-05-28 17:16:31 -0400}, date-modified = {2017-08-09 14:22:26 +0000}, file = {streaming_coref_coling.pdf}, pages = {1050-1058}, title = {Streaming Cross Document Entity Coreference Resolution}, year = {2010} } Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n^2) space and time. In this paper we explore identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information. We test our methods using several corpora to quantitatively measure both the efficacy and scalability of our streaming approach. We show that our approach scales to at least an order of magnitude larger data than previous reported methods.

		Mark Dredze, Paul McNamee, Delip Rao, Adam Gerber, Tim Finin. Entity Disambiguation for Knowledge Base Population. Conference on Computational Linguistics (Coling), 2010. [PDF] [Bibtex] [Close] @inproceedings{dredze:10a, abstract = {The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95% on entities mentioned from newswire and 80% on a public test set that was designed to include challenging queries.}, author = {Mark Dredze and Paul McNamee and Delip Rao and Adam Gerber and Tim Finin}, booktitle = {Conference on Computational Linguistics (Coling)}, date-added = {2010-05-28 17:15:19 -0400}, date-modified = {2017-08-09 14:22:01 +0000}, file = {entity_linking_coling.pdf}, pages = {277-285}, title = {Entity Disambiguation for Knowledge Base Population}, year = {2010} } The integration of facts derived from information extraction systems into existing knowledge bases requires a system to disambiguate entity mentions in the text. This is challenging due to issues such as non-uniform variations in entity names, mention ambiguity, and entities absent from a knowledge base. We present a state of the art system for entity disambiguation that not only addresses these challenges but also scales to knowledge bases with several million entries using very little resources. Further, our approach achieves performance of up to 95% on entities mentioned from newswire and 80% on a public test set that was designed to include challenging queries.

		Chris Callison-Burch, Mark Dredze. Creating Speech and Language Data With Amazon's Mechanical Turk. NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk, 2010. [PDF] [Bibtex] [Close] @inproceedings{callison-burch:10, abstract = {In this paper we give an introduction to using Amazon\'s Mechanical Turk crowdsourcing platform for the purpose of collecting data for human language technologies. We survey the papers published in the NAACL-2010 Workshop. 24 researchers participated in the workshop\'s shared task to create data for speech and language applications with $100.}, author = {Chris Callison-Burch and Mark Dredze}, booktitle = {NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk}, date-added = {2010-04-22 21:56:56 -0400}, date-modified = {2017-08-09 14:21:38 +0000}, file = {amt_overview.pdf}, keywords = {workshop}, pages = {1-12}, title = {Creating Speech and Language Data With Amazon's Mechanical Turk}, year = {2010} } In this paper we give an introduction to using Amazon\'s Mechanical Turk crowdsourcing platform for the purpose of collecting data for human language technologies. We survey the papers published in the NAACL-2010 Workshop. 24 researchers participated in the workshop\'s shared task to create data for speech and language applications with $100.

		Tim Finin, William Murnane, Anand Karandikar, Nicholas Keller, Justin Martineau, Mark Dredze. Annotating named entities in Twitter data with crowdsourcing. NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk, 2010. [PDF] [Bibtex] [Close] @inproceedings{tim-finin:10, abstract = {We describe our experience using both Amazon Mechanical Turk (MTurk) and CrowdFlower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally been the focus of named entity experiments, Twitter is far more informal and abbreviated. The collected annotations and annotation techniques will provide a first step towards the full study of named entity recognition in domains like Facebook and Twitter. We also briefly describe how to use MTurk to collect judgements on the quality of "word clouds." }, annote = {[<a href="http://cs.jhu.edu/~mdredze/datasets/twitter_ner.zip"><span class="pub_link">Data</span></a>]}, author = {Tim Finin and William Murnane and Anand Karandikar and Nicholas Keller and Justin Martineau and Mark Dredze}, booktitle = {NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk}, date-added = {2010-04-02 14:30:48 -0400}, date-modified = {2017-08-09 14:21:16 +0000}, file = {amt_ner.pdf}, keywords = {workshop}, pages = {80-88}, title = {Annotating named entities in Twitter data with crowdsourcing}, year = {2010} } [Data] We describe our experience using both Amazon Mechanical Turk (MTurk) and CrowdFlower to collect simple named entity annotations for Twitter status updates. Unlike most genres that have traditionally been the focus of named entity experiments, Twitter is far more informal and abbreviated. The collected annotations and annotation techniques will provide a first step towards the full study of named entity recognition in domains like Facebook and Twitter. We also briefly describe how to use MTurk to collect judgements on the quality of "word clouds."

		Matthew R Gormley, Adam Gerber, Mary Harper, Mark Dredze. Non-Expert Correction of Automatically Generated Relation Annotations. NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk, 2010. [PDF] [Bibtex] [Close] @inproceedings{gormley:10, abstract = {We explore a new way to collect human annotated relations in text using Amazon Mechanical Turk. Given a knowledge base of relations and a corpus, we identify sentences which mention both an entity and an attribute that have some relation in the knowledge base. Each noisy sentence/relation pair is presented to multiple turkers, who are asked whether the sentence expresses the relation. We describe a design which encourages user efficiency and aids discovery of cheating. We also present results on inter-annotator agreement.}, author = {Matthew R. Gormley and Adam Gerber and Mary Harper and Mark Dredze}, booktitle = {NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk}, date-added = {2010-04-02 14:29:04 -0400}, date-modified = {2017-08-09 14:20:50 +0000}, file = {amt_relations.pdf}, keywords = {workshop}, pages = {204-207}, title = {Non-Expert Correction of Automatically Generated Relation Annotations}, year = {2010} } We explore a new way to collect human annotated relations in text using Amazon Mechanical Turk. Given a knowledge base of relations and a corpus, we identify sentences which mention both an entity and an attribute that have some relation in the knowledge base. Each noisy sentence/relation pair is presented to multiple turkers, who are asked whether the sentence expresses the relation. We describe a design which encourages user efficiency and aids discovery of cheating. We also present results on inter-annotator agreement.

		Courtney Napoles, Mark Dredze. Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language. NAACL-HLT Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids, 2010. [PDF] [Bibtex] [Close] @inproceedings{napoles:10, abstract = {Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST and its module SYSTAR), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.}, author = {Courtney Napoles and Mark Dredze}, booktitle = {NAACL-HLT Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids}, date-added = {2010-03-29 10:27:09 -0400}, date-modified = {2017-08-09 14:20:20 +0000}, file = {wiki_simple.pdf}, keywords = {workshop}, pages = {42-50}, title = {Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language}, year = {2010} } Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST and its module SYSTAR), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.

		Justin Ma, Alex Kulesza, Koby Crammer, Mark Dredze, Lawrence Saul, Fernando Pereira. Exploiting Feature Covariance in High-Dimensional Online Learning. AIStats, 2010. [PDF] [Bibtex] [Close] @inproceedings{justin-ma:10, abstract = {Some online algorithms for linear classification model the uncertainty in their weights over the course of learning. Modeling the full covariance structure of the weights can provide a significant advantage for classification. However, for high-dimensional, large-scale data, even though there may be many second-order feature interactions, it is computationally infeasible to maintain this covariance structure. To extend second-order methods to high-dimensional data, we develop low-rank approximations of the covariance structure. We evaluate our approach on both synthetic and real-world data sets using the confidence-weighted online learning framework. We show improvements over diagonal covariance matrices for both low and high-dimensional data.}, author = {Justin Ma and Alex Kulesza and Koby Crammer and Mark Dredze and Lawrence Saul and Fernando Pereira}, booktitle = {AIStats}, date-added = {2010-02-16 08:22:33 -0500}, date-modified = {2017-08-09 14:19:59 +0000}, file = {aistats10_diagfull.pdf}, pages = {493-500}, title = {Exploiting Feature Covariance in High-Dimensional Online Learning}, year = {2010} } Some online algorithms for linear classification model the uncertainty in their weights over the course of learning. Modeling the full covariance structure of the weights can provide a significant advantage for classification. However, for high-dimensional, large-scale data, even though there may be many second-order feature interactions, it is computationally infeasible to maintain this covariance structure. To extend second-order methods to high-dimensional data, we develop low-rank approximations of the covariance structure. We evaluate our approach on both synthetic and real-world data sets using the confidence-weighted online learning framework. We show improvements over diagonal covariance matrices for both low and high-dimensional data.

		Carolina Parada, Mark Dredze, Denis Filimonov, Frederick Jelinek. Contextual Information Improves OOV Detection in Speech. North American Chapter of the Association for Computational Linguistics (NAACL), 2010. [PDF] [Bibtex] [Close] @inproceedings{parada:10, abstract = {Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications. The detection of OOV regions in the output of a LVCSR system is typically addressed as a binary classification task, where each region is independently classified using local information. In this paper, we show that jointly predicting OOV regions, and including contextual information from each region, leads to substantial improvement in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate from 42.6% to 28.4% at 10% false alarm rate.}, author = {Carolina Parada and Mark Dredze and Denis Filimonov and Frederick Jelinek}, booktitle = {North American Chapter of the Association for Computational Linguistics (NAACL)}, date-added = {2010-02-16 08:21:21 -0500}, date-modified = {2017-08-09 14:19:30 +0000}, file = {oov_crf.pdf}, pages = {216-224}, title = {Contextual Information Improves OOV Detection in Speech}, year = {2010} } Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications. The detection of OOV regions in the output of a LVCSR system is typically addressed as a binary classification task, where each region is independently classified using local information. In this paper, we show that jointly predicting OOV regions, and including contextual information from each region, leads to substantial improvement in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate from 42.6% to 28.4% at 10% false alarm rate.

		Mark Dredze, Alex Kulesza, Koby Crammer. Multi-Domain Learning by Confidence-Weighted Parameter Combination. Machine Learning, 2010;79(1-2):123-149. [PDF] [Bibtex] [Close] @article{dredze-ml-09, abstract = {State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/publications/multi_domain_tech_report.pdf"><span class="pub_link">Tech Report</span></a>]}, author = {Mark Dredze and Alex Kulesza and Koby Crammer}, date-added = {2009-09-05 21:08:35 -0400}, date-modified = {2010-12-29 15:29:59 -0500}, file = {http://www.springerlink.com/content/a78049767680386l/}, journal = {Machine Learning}, number = {1-2}, pages = {123-149}, title = {Multi-Domain Learning by Confidence-Weighted Parameter Combination}, volume = {79}, year = {2010} } [Tech Report] State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

		2009 (6 Publications)
		Mark Dredze. Intelligent Email: Aiding Users with AI. PhD Thesis, Computer and Information Science, University of Pennsylvania, 2009. [PDF] [Bibtex] [Close] @phdthesis{dredze:09j, author = {Mark Dredze}, date-added = {2010-09-03 12:53:16 -0400}, date-modified = {2010-09-03 12:53:59 -0400}, file = {dredze_thesis.pdf}, school = {Computer and Information Science, University of Pennsylvania}, title = {Intelligent Email: Aiding Users with AI}, year = {2009} }

		Paul McNamee, Mark Dredze, Adam Gerber, Nikesh Garera, Tim Finin, James Mayfield, Christine Piatko, Delip Rao, David Yarowsky, Markus Dreyer. HLTCOE Approaches to Knowledge Base Population at TAC 2009. Text Analysis Conference (TAC), 2009. [Bibtex] [Close] @inproceedings{mcnamee:09, abstract = {The HLTCOE participated in the entity linking and slot filling tasks at TAC 2009. A machine learning-based approach to entity linking, operating over a wide range of feature types, yielded good performance on the entity linking task. Slot-filling based on sentence selection, application of weak patterns and exploitation of redundancy was ineffective in the slot filling task.}, author = {Paul McNamee and Mark Dredze and Adam Gerber and Nikesh Garera and Tim Finin and James Mayfield and Christine Piatko and Delip Rao and David Yarowsky and Markus Dreyer}, booktitle = {Text Analysis Conference (TAC)}, date-added = {2009-10-27 09:20:43 -0400}, date-modified = {2013-06-06 11:32:49 -0400}, keywords = {workshop}, title = {HLTCOE Approaches to Knowledge Base Population at TAC 2009}, year = {2009} } The HLTCOE participated in the entity linking and slot filling tasks at TAC 2009. A machine learning-based approach to entity linking, operating over a wide range of feature types, yielded good performance on the entity linking task. Slot-filling based on sentence selection, application of weak patterns and exploitation of redundancy was ineffective in the slot filling task.

		Koby Crammer, Alex Kulesza, Mark Dredze. Adaptive Regularization of Weight Vectors. Advances in Neural Information Processing Systems (NIPS), 2009. [PDF] [Bibtex] [Close] @inproceedings{crammer:09, abstract = {We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data.}, author = {Koby Crammer and Alex Kulesza and Mark Dredze}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, date-added = {2009-09-05 21:10:28 -0400}, date-modified = {2010-09-03 12:47:28 -0400}, file = {nips09_arow.pdf}, title = {Adaptive Regularization of Weight Vectors}, year = {2009} } We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data.

		Mark Dredze, Partha Pratim Talukdar, Koby Crammer. Sequence Learning from Data with Multiple Labels. ECML/PKDD Workshop on Learning from Multi-Label Data, 2009. [PDF] [Bibtex] [Close] @inproceedings{talukdar:09, abstract = {We present novel algorithms for learning structured predictors from instances with multiple labels in the presence of noise. The proposed algorithms improve performance on two standard NLP tasks when we have a small amount of training data (low quantity) and when the labels are noisy (low quality). In these settings, the methods improve performance over using a single label, in some cases exceeding performance using gold labels. Our methods could be used in a semi-supervised setting, where a limited amount of labeled data could be combined with a rule based automatic labeling of unlabeled data with multiple possible labels.}, author = {Mark Dredze and Partha Pratim Talukdar and Koby Crammer}, booktitle = {ECML/PKDD Workshop on Learning from Multi-Label Data}, date-added = {2009-07-03 09:37:22 -0400}, date-modified = {2010-09-03 12:51:19 -0400}, file = {mld09_ml.pdf}, keywords = {workshop}, title = {Sequence Learning from Data with Multiple Labels}, year = {2009} } We present novel algorithms for learning structured predictors from instances with multiple labels in the presence of noise. The proposed algorithms improve performance on two standard NLP tasks when we have a small amount of training data (low quantity) and when the labels are noisy (low quality). In these settings, the methods improve performance over using a single label, in some cases exceeding performance using gold labels. Our methods could be used in a semi-supervised setting, where a limited amount of labeled data could be combined with a rule based automatic labeling of unlabeled data with multiple possible labels.

		Koby Crammer, Mark Dredze, Alex Kulesza. Multi-Class Confidence Weighted Algorithms. Empirical Methods in Natural Language Processing (EMNLP), 2009. [PDF] [Bibtex] [Close] @inproceedings{koby-crammer:09, abstract = {The recently introduced online confidence-weighted (CW) learning algorithm for binary classification performs well on many binary NLP tasks. However, for multi-class problems CW learning updates and inference cannot be computed analytically or solved as convex optimization problems as they are in the binary case. We derive learning algorithms for the multi-class CW setting and provide extensive evaluation using nine NLP datasets, including three derived from the recently released New York Times corpus. Our best algorithm outperforms state-of-the-art online and batch methods on eight of the nine tasks. We also show that the confidence information maintained during learning yields useful probabilistic information at test time.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/amazon_7.zip"><span class="pub_link">Data (Amazon 7)</span></a>]}, author = {Koby Crammer and Mark Dredze and Alex Kulesza}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2009-06-02 06:57:58 -0400}, date-modified = {2017-08-09 14:18:33 +0000}, file = {emlnp09_mccw.pdf}, pages = {496-504}, title = {Multi-Class Confidence Weighted Algorithms}, year = {2009} } [Data (Amazon 7)] The recently introduced online confidence-weighted (CW) learning algorithm for binary classification performs well on many binary NLP tasks. However, for multi-class problems CW learning updates and inference cannot be computed analytically or solved as convex optimization problems as they are in the binary case. We derive learning algorithms for the multi-class CW setting and provide extensive evaluation using nine NLP datasets, including three derived from the recently released New York Times corpus. Our best algorithm outperforms state-of-the-art online and batch methods on eight of the nine tasks. We also show that the confidence information maintained during learning yields useful probabilistic information at test time.

		Mark Dredze, Bill Schilit, Peter Norvig. Suggesting Email View Filters for Triage and Search. International Joint Conference on Artificial Intelligence (IJCAI), 2009. [PDF] [Bibtex] [Close] @inproceedings{dredze:09, abstract = {Growing email volumes cause flooded inboxes and swelled email archives, making search and new email processing difficult. While emails have rich metadata, such as recipients and folders, suitable for creating filtered views, it is often difficult to choose appropriate filters for new inbox messages without first examining messages. In this work, we consider a system that automatically suggests relevant view filters to the user for the currently viewed messages. We propose several ranking algorithms for suggesting useful filters. Our work suggests that such systems quickly filter groups of inbox messages and find messages more easily during search.}, author = {Mark Dredze and Bill Schilit and Peter Norvig}, booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)}, date-added = {2009-04-07 13:06:23 -0400}, date-modified = {2017-08-09 14:18:07 +0000}, file = {dredze_ijcai_09.pdf}, pages = {1414-1419}, title = {Suggesting Email View Filters for Triage and Search}, year = {2009} } Growing email volumes cause flooded inboxes and swelled email archives, making search and new email processing difficult. While emails have rich metadata, such as recipients and folders, suitable for creating filtered views, it is often difficult to choose appropriate filters for new inbox messages without first examining messages. In this work, we consider a system that automatically suggests relevant view filters to the user for the currently viewed messages. We propose several ranking algorithms for suggesting useful filters. Our work suggests that such systems quickly filter groups of inbox messages and find messages more easily during search.

		2008 (12 Publications)
		Kevin Lerman, Ari Gilder, Mark Dredze, Fernando Pereira. Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis. Conference on Computational Linguistics (Coling), 2008. [PDF] [Bibtex] [Close] @inproceedings{lerman:08, abstract = {Media reporting shapes public opinion which can in turn influence events, particularly in political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict the impact of news on public perception of political candidates. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems.}, author = {Kevin Lerman and Ari Gilder and Mark Dredze and Fernando Pereira}, booktitle = {Conference on Computational Linguistics (Coling)}, date-added = {2009-03-30 12:34:34 -0400}, date-modified = {2017-08-09 14:17:38 +0000}, file = {markets_coling08.pdf}, pages = {473-480}, title = {Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis}, year = {2008} } Media reporting shapes public opinion which can in turn influence events, particularly in political elections, in which candidates both respond to and shape public perception of their campaigns. We use computational linguistics to automatically predict the impact of news on public perception of political candidates. Our system uses daily newspaper articles to predict shifts in public opinion as reflected in prediction markets. We discuss various types of features designed for this problem. The news system improves market prediction over baseline market systems.

		Mark Dredze, Joel Wallenberg. Further Results and Analysis of Icelandic Part of Speech Tagging. Technical Report MS-CIS-08-13, University of Pennsylvania, Department of Computer and Information Science, 2008. [PDF] [Bibtex] [Close] @techreport{dredze-wallenberg-tech:08, abstract = {Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions. This paper presents further results and analysis to the original work.}, author = {Mark Dredze and Joel Wallenberg}, date-added = {2009-03-30 12:33:27 -0400}, date-modified = {2010-09-03 14:07:10 -0400}, file = {http://repository.upenn.edu/cis_reports/878/}, institution = {University of Pennsylvania, Department of Computer and Information Science}, number = {MS-CIS-08-13}, title = {Further Results and Analysis of Icelandic Part of Speech Tagging}, year = {2008} } Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions. This paper presents further results and analysis to the original work.

		Mark Dredze, Joel Wallenberg. Icelandic Data-Driven Part of Speech Tagging. Association for Computational Linguistics (ACL) (short paper), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-wallenberg-08, abstract = {Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions.}, author = {Mark Dredze and Joel Wallenberg}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2009-03-30 12:32:30 -0400}, date-modified = {2017-08-09 14:17:12 +0000}, file = {acl_icelandic_pos.pdf}, pages = {33-36}, title = {Icelandic Data-Driven Part of Speech Tagging}, year = {2008} } Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions.

		Kuzman Ganchev, Mark Dredze. Small Statistical Models by Random Feature Mixing. ACL Workshop on Mobile NLP, 2008. [PDF] [Bibtex] [Close] @inproceedings{ganchev:08, abstract = {The application of statistical NLP systems to resource constrained devices is limited by the need to maintain parameters for a large number of features and an alphabet mapping features to parameters. We introduce random feature mixing to eliminate alphabet storage and reduce the number of parameters without severely impacting model performance.}, author = {Kuzman Ganchev and Mark Dredze}, booktitle = {ACL Workshop on Mobile NLP}, date-added = {2009-03-30 12:31:51 -0400}, date-modified = {2017-08-09 14:16:50 +0000}, file = {mobile_nlp_feature_mixing.pdf}, keywords = {workshop}, pages = {19-20}, title = {Small Statistical Models by Random Feature Mixing}, year = {2008} } The application of statistical NLP systems to resource constrained devices is limited by the need to maintain parameters for a large number of features and an alphabet mapping features to parameters. We introduce random feature mixing to eliminate alphabet storage and reduce the number of parameters without severely impacting model performance.

		Mark Dredze, Hanna Wallach, Danny Puller, Fernando Pereira. Generating Summary Keywords for Emails Using Topics. Intelligent User Interfaces (IUI), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-2008c, abstract = {Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.}, author = {Mark Dredze and Hanna Wallach and Danny Puller and Fernando Pereira}, booktitle = {Intelligent User Interfaces (IUI)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2017-08-09 14:14:13 +0000}, file = {dredze_summarization_iui08.pdf}, pages = {199-206}, title = {Generating Summary Keywords for Emails Using Topics}, year = {2008} } Email summary keywords, used to concisely represent the gist of an email, can help users manage and prioritize large numbers of messages. We develop an unsupervised learning framework for selecting summary keywords from emails using latent representations of the underlying topics in a user's mailbox. This approach selects words that describe each message in the context of existing topics rather than simply selecting keywords based on a single message in isolation. We present and compare four methods for selecting summary keywords based on two well-known models for inferring latent topics: latent semantic analysis and latent Dirichlet allocation. The quality of the summary keywords is assessed by generating summaries for emails from twelve users in the Enron corpus. The summary keywords are then used in place of entire messages in two proxy tasks: automated foldering and recipient prediction. We also evaluate the extent to which summary keywords enhance the information already available in a typical email user interface by repeating the same tasks using email subject lines.

		Mark Dredze, Hanna Wallach, Danny Puller, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer, Fernando Pereira. Intelligent Email: Aiding Users with AI. American National Conference on Artificial Intelligence (AAAI) (Nectar), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-2008b, abstract = {Email occupies a central role in the modern workplace. This has led to a vast increase in the number of email messages that users are expected to handle daily. Furthermore, email is no longer simply a tool for asynchronous online communication - email is now used for task management, personal archiving, as well both synchronous and asynchronous online communication. This explosion can lead to "email overload" - many users are overwhelmed by the large quantity of information in their mailboxes. In the human--computer interaction community, there has been much research on tackling email overload. Recently, similar efforts have emerged in the artificial intelligence (AI) and machine learning communities to form an area of research known as intelligent email.\nIn this paper, we take a user-oriented approach to applying AI to email. We identify enhancements to email user interfaces and employ machine learning techniques to support these changes. We focus on three tasks - summary keyword generation, reply prediction and attachment prediction - and summarize recent work in these areas.}, author = {Mark Dredze and Hanna Wallach and Danny Puller and Tova Brooks and Josh Carroll and Joshua Magarick and John Blitzer and Fernando Pereira}, booktitle = {American National Conference on Artificial Intelligence (AAAI) (Nectar)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2017-02-20 17:35:19 +0000}, file = {nectar_intelligent_email.pdf}, title = {Intelligent Email: {A}iding Users with {AI}}, year = {2008} } Email occupies a central role in the modern workplace. This has led to a vast increase in the number of email messages that users are expected to handle daily. Furthermore, email is no longer simply a tool for asynchronous online communication - email is now used for task management, personal archiving, as well both synchronous and asynchronous online communication. This explosion can lead to "email overload" - many users are overwhelmed by the large quantity of information in their mailboxes. In the human--computer interaction community, there has been much research on tackling email overload. Recently, similar efforts have emerged in the artificial intelligence (AI) and machine learning communities to form an area of research known as intelligent email.\nIn this paper, we take a user-oriented approach to applying AI to email. We identify enhancements to email user interfaces and employ machine learning techniques to support these changes. We focus on three tasks - summary keyword generation, reply prediction and attachment prediction - and summarize recent work in these areas.

		Mark Dredze, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer, Fernando Pereira. Intelligent Email: Reply and Attachment Prediction. Intelligent User Interfaces (IUI), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-2008, abstract = {We present two prediction problems under the rubric of Intelligent Email that are designed to support enhanced email interfaces that relieve the stress of email overload. Reply prediction alerts users when an email requires a response and facilitates email response management. Attachment prediction alerts users when they are about to send an email missing an attachment or triggers a document recommendation system, which can catch missing attachment emails before they are sent. Both problems use the same underlying email classification system and task specific features. Each task is evaluated for both single-user and cross-user settings.}, author = {Mark Dredze and Tova Brooks and Josh Carroll and Joshua Magarick and John Blitzer and Fernando Pereira}, booktitle = {Intelligent User Interfaces (IUI)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2017-08-09 14:15:55 +0000}, file = {dredze_intelligent_email_iui08.pdf}, pages = {321-324}, title = {Intelligent Email: Reply and Attachment Prediction}, year = {2008} } We present two prediction problems under the rubric of Intelligent Email that are designed to support enhanced email interfaces that relieve the stress of email overload. Reply prediction alerts users when an email requires a response and facilitates email response management. Attachment prediction alerts users when they are about to send an email missing an attachment or triggers a document recommendation system, which can catch missing attachment emails before they are sent. Both problems use the same underlying email classification system and task specific features. Each task is evaluated for both single-user and cross-user settings.

		Mark Dredze, Hanna Wallach. User Models for Email Activity Management. IUI Workshop on Ubiquitous User Modeling, 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-08g, abstract = {A single user activity, such as planning a conference trip, typically involves multiple actions. Although these actions may involve several applications, the central point of co-ordination for any particular activity is usually email. Previous work on email activity management has focused on clustering emails by activity. Dredze et al. accomplished this by combining supervised classifiers based on document similarity, authors and recipients, and thread information. In this paper, we take a different approach and present an unsupervised framework for email activity clustering. We use the same information sources as Dredze et al.- namely, document similarity, message recipients and authors, and thread information - but combine them to form an unsupervised, non-parametric Bayesian user model. This approach enables email activities to be inferred without any user input. Inferring activities from a user's mailbox adapts the model to that user. We next describe the statistical machinery that forms the basis of our user model, and explain how several email properties may be incorporated into the model. We evaluate this approach using the same data as Dredze et al., showing that our model does well at clustering emails by activity.}, author = {Mark Dredze and Hanna Wallach}, booktitle = {IUI Workshop on Ubiquitous User Modeling}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2010-12-29 15:25:03 -0500}, file = {dredze_ubiqum_user_model_08.pdf}, keywords = {workshop}, title = {User Models for Email Activity Management}, year = {2008} } A single user activity, such as planning a conference trip, typically involves multiple actions. Although these actions may involve several applications, the central point of co-ordination for any particular activity is usually email. Previous work on email activity management has focused on clustering emails by activity. Dredze et al. accomplished this by combining supervised classifiers based on document similarity, authors and recipients, and thread information. In this paper, we take a different approach and present an unsupervised framework for email activity clustering. We use the same information sources as Dredze et al.- namely, document similarity, message recipients and authors, and thread information - but combine them to form an unsupervised, non-parametric Bayesian user model. This approach enables email activities to be inferred without any user input. Inferring activities from a user's mailbox adapts the model to that user. We next describe the statistical machinery that forms the basis of our user model, and explain how several email properties may be incorporated into the model. We evaluate this approach using the same data as Dredze et al., showing that our model does well at clustering emails by activity.

		Koby Crammer, Mark Dredze, Fernando Pereira. Exact Convex Confidence-Weighted Learning. Advances in Neural Information Processing Systems (NIPS), 2008. [PDF] [Bibtex] [Close] @inproceedings{CrammerDrPe08, abstract = {Confidence-weighted (CW) learning, an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations. Confidence constraints ensure that a weight vector drawn from the hypothesis distribution correctly classifies examples with a specified probability. Within this framework, we derive a new convex form of the constraint and analyze it in the mistake bound model. Empirical evaluation with both synthetic and text data shows our version of CW learning achieves lower cumulative and out-of-sample errors than commonly used first-order and second-order online methods.}, author = {Koby Crammer and Mark Dredze and Fernando Pereira}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2010-09-03 12:52:41 -0400}, file = {cw_nips_08.pdf}, title = {Exact Convex Confidence-Weighted Learning}, year = 2008 } Confidence-weighted (CW) learning, an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations. Confidence constraints ensure that a weight vector drawn from the hypothesis distribution correctly classifies examples with a specified probability. Within this framework, we derive a new convex form of the constraint and analyze it in the mistake bound model. Empirical evaluation with both synthetic and text data shows our version of CW learning achieves lower cumulative and out-of-sample errors than commonly used first-order and second-order online methods.

		Mark Dredze, Koby Crammer, Fernando Pereira. Confidence-Weighted Linear Classification. International Conference on Machine Learning (ICML), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-08, abstract = {We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.}, author = {Mark Dredze and Koby Crammer and Fernando Pereira}, booktitle = {International Conference on Machine Learning (ICML)}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2017-08-09 14:10:54 +0000}, file = {icml_variance.pdf}, pages = {264-271}, title = {Confidence-Weighted Linear Classification}, year = {2008} } We introduce confidence-weighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training.

		Mark Dredze, Koby Crammer. Active Learning with Confidence. Association for Computational Linguistics (ACL) (short paper), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze-08b, abstract = {Active learning is a machine learning approach to achieving high-accuracy with a small amount of labels by letting the learning algorithm choose instances to be labeled. Most of previous approaches based on discriminative learning use the margin for choosing instances. We present a method for incorporating confidence into the margin by using a newly introduced online learning algorithm and show empirically that confidence improves active learning.}, author = {Mark Dredze and Koby Crammer}, booktitle = {Association for Computational Linguistics (ACL) (short paper)}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2017-08-09 14:12:59 +0000}, file = {acl_active_confident_learning.pdf}, pages = {233-236}, title = {Active Learning with Confidence}, year = {2008} } Active learning is a machine learning approach to achieving high-accuracy with a small amount of labels by letting the learning algorithm choose instances to be labeled. Most of previous approaches based on discriminative learning use the margin for choosing instances. We present a method for incorporating confidence into the margin by using a newly introduced online learning algorithm and show empirically that confidence improves active learning.

		Mark Dredze, Koby Crammer. Online Methods for Multi-Domain Learning and Adaptation. Empirical Methods in Natural Language Processing (EMNLP), 2008. [PDF] [Bibtex] [Close] @inproceedings{dredze:08c, abstract = {NLP tasks are often domain specific, yet systems can learn behaviors across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of dispirate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.}, author = {Mark Dredze and Koby Crammer}, booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2017-08-09 14:13:42 +0000}, file = {multi_domain_emnlp08.pdf}, pages = {689-697}, title = {Online Methods for Multi-Domain Learning and Adaptation}, year = {2008} } NLP tasks are often domain specific, yet systems can learn behaviors across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of dispirate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

		2007 (11 Publications)
		John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. North East Student Colloquium on Artificial Intelligence (NESCAI), 2007. [Bibtex] [Close] @inproceedings{blitzer:07, author = {John Blitzer and Mark Dredze and Fernando Pereira}, booktitle = {North East Student Colloquium on Artificial Intelligence (NESCAI)}, date-added = {2010-09-03 14:20:17 -0400}, date-modified = {2010-09-03 14:20:43 -0400}, keywords = {workshop}, title = {Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification}, year = {2007} }

		Danny Puller, Hanna Wallach, Mark Dredze, Fernando Pereira. Generating Summary Keywords for Emails Using Topics. Women in Machine Learning Workshop (WiML) at Grace Hopper, 2007. [Bibtex] [Close] @inproceedings{puller:07, author = {Danny Puller and Hanna Wallach and Mark Dredze and Fernando Pereira}, booktitle = {Women in Machine Learning Workshop (WiML) at Grace Hopper}, date-added = {2010-09-03 14:16:04 -0400}, date-modified = {2010-09-03 14:16:41 -0400}, keywords = {workshop}, title = {Generating Summary Keywords for Emails Using Topics}, year = {2007} }

		Koby Crammer, Mark Dredze, John Blitzer, Fernando Pereira. Batch Performance for an Online Price. NIPS Workshop on Efficient Machine Learning, 2007. [PDF] [Bibtex] [Close] @inproceedings{crammer:08, abstract = {Batch learning techniques achieve good performance, but at the cost of many (sometimes even hundreds) of passes over the data. For many tasks, such as web-scale ranking of machine translation hypotheses, making many passes over the data is prohibitively expensive, even in parallel over thousands of machines. Online algorithms, which treat data as a stream of examples, are conceptually appealing for these large scale problems. In practice, however, online algorithms tend to underperform batch methods, unless they are themselves run in multiple passes over the data. <br>In this work we explore a new type of online learning algorithm that incorporates a measure of confidence to the algorithm. The model maintains a confidence for each parameter, reflecting previously observed properties of the data. While this requires an additional parameter for each feature of the data, this is a minimal cost when compared to running the algorithm multiple times over the data. The resulting algorithm learns faster, requiring both fewer training instances and fewer passes over the training data, often approaching batch performance with only a single pass through the data.}, author = {Koby Crammer and Mark Dredze and John Blitzer and Fernando Pereira}, booktitle = {NIPS Workshop on Efficient Machine Learning}, date-added = {2009-03-30 12:30:39 -0400}, date-modified = {2017-02-20 17:35:51 +0000}, file = {crammer_batch_online_nips07.pdf}, keywords = {workshop}, title = {Batch Performance for an Online Price}, year = {2007} } Batch learning techniques achieve good performance, but at the cost of many (sometimes even hundreds) of passes over the data. For many tasks, such as web-scale ranking of machine translation hypotheses, making many passes over the data is prohibitively expensive, even in parallel over thousands of machines. Online algorithms, which treat data as a stream of examples, are conceptually appealing for these large scale problems. In practice, however, online algorithms tend to underperform batch methods, unless they are themselves run in multiple passes over the data. <br>In this work we explore a new type of online learning algorithm that incorporates a measure of confidence to the algorithm. The model maintains a confidence for each parameter, reflecting previously observed properties of the data. While this requires an additional parameter for each feature of the data, this is a minimal cost when compared to running the algorithm multiple times over the data. The resulting algorithm learns faster, requiring both fewer training instances and fewer passes over the training data, often approaching batch performance with only a single pass through the data.

		Mark Dredze, Krzysztof Czuba. Learning to Admit You're Wrong: Statistical Tools for Evaluating Web QA. NIPS Workshop on Machine Learning for Web Search, 2007. [PDF] [Bibtex] [Close] @inproceedings{dredze:08, abstract = {Web search engines provide specialized results to specific queries, often relying on the output of a QA system. However, targeted answers, while helpful, are embarrassing when wrong. Automated techniques are required to avoid wrong answers and improve system performance. We present the Expected Answer System, a statistical data-driven framework that analyzes the performance of a QA system with the goal of improving system accuracy. Our system is used for wrong answer prediction, missing answer discovery, and question class analysis. An empirical study of a production QA system, one of the first such evaluations presented in the literature, motivates our approach.}, author = {Mark Dredze and Krzysztof Czuba}, booktitle = {NIPS Workshop on Machine Learning for Web Search}, date-added = {2009-03-30 12:29:59 -0400}, date-modified = {2017-02-20 17:35:35 +0000}, file = {dredze_gqa_nips07.pdf}, keywords = {workshop}, title = {Learning to Admit You're Wrong: Statistical Tools for Evaluating Web QA}, year = {2007} } Web search engines provide specialized results to specific queries, often relying on the output of a QA system. However, targeted answers, while helpful, are embarrassing when wrong. Automated techniques are required to avoid wrong answers and improve system performance. We present the Expected Answer System, a statistical data-driven framework that analyzes the performance of a QA system with the goal of improving system accuracy. Our system is used for wrong answer prediction, missing answer discovery, and question class analysis. An empirical study of a production QA system, one of the first such evaluations presented in the literature, motivates our approach.

		Kedar Bellare, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum, Mark Dredze. Lightly-Supervised Attribute Extraction for Web Search. NIPS Workshop on Machine Learning for Web Search, 2007. [PDF] [Bibtex] [Close] @inproceedings{bellare:08, abstract = {Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large natural language corpus. We compare our methods against a previously proposed pattern-based relation extractor, showing that the new methods give considerable improvements over that baseline. We also demonstrate that query expansion using extracted attributes improves retrieval performance on underspecified information-seeking queries.}, author = {Kedar Bellare and Partha Pratim Talukdar and Giridhar Kumaran and Fernando Pereira and Mark Liberman and Andrew McCallum and Mark Dredze}, booktitle = {NIPS Workshop on Machine Learning for Web Search}, date-added = {2009-03-30 12:28:57 -0400}, date-modified = {2017-02-20 17:36:27 +0000}, file = {bellare_attributes_nips07.pdf}, keywords = {workshop}, title = {Lightly-Supervised Attribute Extraction for Web Search}, year = {2007} } Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large natural language corpus. We compare our methods against a previously proposed pattern-based relation extractor, showing that the new methods give considerable improvements over that baseline. We also demonstrate that query expansion using extracted attributes improves retrieval performance on underspecified information-seeking queries.

		Neal Parikh, Mark Dredze. Graphical Models for Primarily Unsupervised Sequence Labeling. Technical Report MS-CIS-07-18, University of Pennsylvania, Department of Computer and Information Science, 2007. [PDF] [Bibtex] [Close] @techreport{parikh:07, abstract = {Most models used in natural language processing must be trained on large corpora of labeled text. This tutorial explores a 'primarily unsupervised' approach (based on graphical models) that augments a corpus of unlabeled text with some form of prior domain knowledge, but does not require any fully labeled examples. We survey probabilistic graphical models for (supervised) classification and sequence labeling and then present the prototype-driven approach of Haghighi and Klein (2006) to sequence labeling in detail, including a discussion of the theory and implementation of both conditional random fields and prototype learning. We show experimental results for English part of speech tagging.}, author = {Neal Parikh and Mark Dredze}, date-added = {2009-03-30 12:27:51 -0400}, date-modified = {2010-09-03 14:17:15 -0400}, file = {http://repository.upenn.edu/cis_reports/638/}, institution = {University of Pennsylvania, Department of Computer and Information Science}, number = {MS-CIS-07-18}, title = {Graphical Models for Primarily Unsupervised Sequence Labeling}, year = {2007} } Most models used in natural language processing must be trained on large corpora of labeled text. This tutorial explores a 'primarily unsupervised' approach (based on graphical models) that augments a corpus of unlabeled text with some form of prior domain knowledge, but does not require any fully labeled examples. We survey probabilistic graphical models for (supervised) classification and sequence labeling and then present the prototype-driven approach of Haghighi and Klein (2006) to sequence labeling in detail, including a discussion of the theory and implementation of both conditional random fields and prototype learning. We show experimental results for English part of speech tagging.

		Mark Dredze, Reuven Gevaryahu, Ari Elias-Bachrach. Learning Fast Classifiers for Image Spam. Conference on Email and Anti-Spam (CEAS), 2007. [PDF] [Bibtex] [Close] @inproceedings{mark-dredze:07, abstract = {Recently, spammers have proliferated image spam, emails which contain the text of the spam message in a human readable image instead of the message body, making detection by conventional content filters difficult. New techniques are needed to filter these messages. Our goal is to automatically classify an image directly as being spam or ham. We present features that focus on simple properties of the image, making classification as fast as possible. Our evaluation shows that they accurately classify spam images in excess of 90% and up to 99% on real world data. Furthermore, we introduce a new feature selection algorithm that selects features for classification based on their speed as well as predictive power. This technique produces an accurate system that runs in a tiny fraction of the time. Finally, we introduce Just in Time (JIT) feature extraction, which creates features at classification time as needed by the classifier. We demonstrate JIT extraction using a JIT decision tree that further increases system speed. This paper makes imagespam classification practical by providing both high accuracy features and a method to learn fast classifiers.}, annote = {[<a href="http://www.cs.jhu.edu/~mdredze/datasets/image_spam/"><span class="pub_link">Data</span></a>]}, author = {Mark Dredze and Reuven Gevaryahu and Ari Elias-Bachrach}, booktitle = {Conference on Email and Anti-Spam (CEAS)}, date-added = {2009-03-30 12:27:22 -0400}, date-modified = {2010-09-03 14:18:04 -0400}, file = {image_spam_ceas07.pdf}, title = {Learning Fast Classifiers for Image Spam}, year = {2007} } [Data] Recently, spammers have proliferated image spam, emails which contain the text of the spam message in a human readable image instead of the message body, making detection by conventional content filters difficult. New techniques are needed to filter these messages. Our goal is to automatically classify an image directly as being spam or ham. We present features that focus on simple properties of the image, making classification as fast as possible. Our evaluation shows that they accurately classify spam images in excess of 90% and up to 99% on real world data. Furthermore, we introduce a new feature selection algorithm that selects features for classification based on their speed as well as predictive power. This technique produces an accurate system that runs in a tiny fraction of the time. Finally, we introduce Just in Time (JIT) feature extraction, which creates features at classification time as needed by the classifier. We demonstrate JIT extraction using a JIT decision tree that further increases system speed. This paper makes imagespam classification practical by providing both high accuracy features and a method to learn fast classifiers.

		Koby Crammer, Mark Dredze, Kuzman Ganchev, Partha Pratim Talukdar, Steven Carroll. Automatic Code Assignment to Medical Text. BioNLP Workshop at ACL, 2007. [PDF] [Bibtex] [Close] @inproceedings{koby-crammer:07, abstract = {Code assignment is important for handling large amounts of electronic medical data in the modern hospital. However, only expert annotators with extensive training can assign codes. We present a system for the assignment of ICD-9-CM clinical codes to free text radiology reports. Our system assigns a code configuration, predicting one or more codes for each document. We combine three coding systems into a single learning system for higher accuracy. We compare our system on a real world medical dataset with both human annotators and other automated systems, achieving nearly the maximum score on the Computational Medicine Center's challenge.}, author = {Koby Crammer and Mark Dredze and Kuzman Ganchev and Partha Pratim Talukdar and Steven Carroll}, booktitle = {BioNLP Workshop at ACL}, date-added = {2009-03-30 12:26:39 -0400}, date-modified = {2017-08-14 20:43:58 +0000}, file = {cmc_bionlp07.pdf}, keywords = {workshop}, pages = {129-136}, title = {Automatic Code Assignment to Medical Text}, year = {2007} } Code assignment is important for handling large amounts of electronic medical data in the modern hospital. However, only expert annotators with extensive training can assign codes. We present a system for the assignment of ICD-9-CM clinical codes to free text radiology reports. Our system assigns a code configuration, predicting one or more codes for each document. We combine three coding systems into a single learning system for higher accuracy. We compare our system on a real world medical dataset with both human annotators and other automated systems, achieving nearly the maximum score on the Computational Medicine Center's challenge.

		Mark Dredze, Hanna M Wallach. Email Keyword Summarization and Visualization with Topic Models. North East Student Colloquium on Artificial Intelligence (NESCAI), 2007. [Bibtex] [Close] @inproceedings{dredze-2007, author = {Mark Dredze and Hanna M. Wallach}, booktitle = {North East Student Colloquium on Artificial Intelligence (NESCAI)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2009-03-30 12:24:43 -0400}, keywords = {workshop}, title = {Email Keyword Summarization and Visualization with Topic Models}, year = {2007} }

		John Blitzer, Mark Dredze, Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association for Computational Linguistics (ACL), 2007. [PDF] [Bibtex] [Close] @inproceedings{blitzer-07, abstract = {Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.}, annote = {(<b>Over 1000 citations</b>)}, author = {John Blitzer and Mark Dredze and Fernando Pereira}, booktitle = {Association for Computational Linguistics (ACL)}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2017-08-09 14:06:18 +0000}, file = {sentiment_acl07.pdf}, pages = {440-447}, title = {Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification}, year = {2007} } (Over 1000 citations) Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.

		Mark Dredze, John Blitzer, Partha Pratim Talukdar, Kuzman Ganchev, Joao Graca, Fernando Pereira. Frustratingly Hard Domain Adaptation for Dependency Parsing. Shared Task - Conference on Natural Language Learning - CoNLL 2007 shared task, 2007. [PDF] [Bibtex] [Close] @inproceedings{dredze:07a, abstract = {We describe some challenges of adaptation in the 2007 CoNLL Shared Task on Domain Adaptation. Our error analysis for this task suggests that a primary source of error is differences in annotation guidelines between treebanks. Our suspicions are supported by the observation that no team was able to improve target domain performance substantially over a state of the art baseline.}, author = {Mark Dredze and John Blitzer and Partha Pratim Talukdar and Kuzman Ganchev and Joao Graca and Fernando Pereira}, booktitle = {Shared Task - Conference on Natural Language Learning - CoNLL 2007 shared task}, date-added = {2009-03-30 12:23:36 -0400}, date-modified = {2017-08-09 14:07:23 +0000}, file = {adaptation_conll07.pdf}, pages = {1051-1055}, title = {Frustratingly Hard Domain Adaptation for Dependency Parsing}, year = {2007} } We describe some challenges of adaptation in the 2007 CoNLL Shared Task on Domain Adaptation. Our error analysis for this task suggests that a primary source of error is differences in annotation guidelines between treebanks. Our suspicions are supported by the observation that no team was able to improve target domain performance substantially over a state of the art baseline.

		2006 (4 Publications)
		Mark Dredze, John Blitzer, Koby Crammer, Fernando Pereira. Feature Design for Transfer Learning. North East Student Colloquium on Artificial Intelligence (NESCAI), 2006. [PDF] [Bibtex] [Close] @inproceedings{dredze-2006, author = {Mark Dredze and John Blitzer and Koby Crammer and Fernando Pereira}, booktitle = {North East Student Colloquium on Artificial Intelligence (NESCAI)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2010-09-03 14:23:42 -0400}, file = {transfer_nescai_06.pdf}, keywords = {workshop}, title = {Feature Design for Transfer Learning}, year = {2006} }

		Mark Dredze, John Blitzer, Fernando Pereira. ``Sorry, I Forgot the Attachment:'' Email Attachment Prediction. Conference on Email and Anti-Spam (CEAS), 2006. [PDF] [Bibtex] [Close] @inproceedings{dredze-06b, abstract = {The missing attachment problem: a missing attachment generates a wave of emails from the recipients notifying the sender of the error. We present an attachment prediction system to reduce the volume of missing attachment mail. Our classifier could prompt an alert when an outgoing email is missing an attachment. Additionally, the system could activate an attachment recommendation system, whereby suggested documents are offered once the system determines the user is likely to include an attachment, effectively reminding the user to include the attachment. We present promising initial results and discuss implications of our work.}, author = {Mark Dredze and John Blitzer and Fernando Pereira}, booktitle = {Conference on Email and Anti-Spam (CEAS)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2010-09-03 14:22:25 -0400}, file = {attachment_ceas06.pdf}, title = {``{S}orry, {I} Forgot the Attachment:'' {E}mail Attachment Prediction}, year = {2006} } The missing attachment problem: a missing attachment generates a wave of emails from the recipients notifying the sender of the error. We present an attachment prediction system to reduce the volume of missing attachment mail. Our classifier could prompt an alert when an outgoing email is missing an attachment. Additionally, the system could activate an attachment recommendation system, whereby suggested documents are offered once the system determines the user is likely to include an attachment, effectively reminding the user to include the attachment. We present promising initial results and discuss implications of our work.

		Mark Dredze, Tessa Lau, Nicholas Kushmerick. Automatically classifying emails into activities. Intelligent User Interfaces (IUI), 2006. [PDF] [Bibtex] [Close] @inproceedings{dredze-06, abstract = {Email-based activity management systems promise to give users better tools for managing increasing volumes of email, by organizing email according to a user\'s activities. Current activity management systems do not automatically classify incoming messages by the activity to which they belong, instead relying on simple heuristics (such as message threads), or asking the user to manually classify incoming messages as belonging to an activity. This paper presents several algorithms for automatically recognizing emails as part of an ongoing activity. Our baseline methods are the use of message reply-to threads to determine activity membership and a naive Bayes classifier. Our SimSubset and SimOverlap algorithms compare the people involved in an activity against the recipients of each incoming message. Our SimContent algorithm uses IRR (a variant of latent semantic indexing) to classify emails into activities using similarity based on message contents. An empirical evaluation shows that each of these methods provide a significant improvement to the baseline methods. In addition, we show that a combined approach that votes the predictions of the individual methods performs better than each individual method alone.}, author = {Mark Dredze and Tessa Lau and Nicholas Kushmerick}, booktitle = {Intelligent User Interfaces (IUI)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2017-08-09 14:05:00 +0000}, file = {iui06-dredze.pdf}, pages = {70-77}, title = {Automatically classifying emails into activities}, year = {2006} } Email-based activity management systems promise to give users better tools for managing increasing volumes of email, by organizing email according to a user\'s activities. Current activity management systems do not automatically classify incoming messages by the activity to which they belong, instead relying on simple heuristics (such as message threads), or asking the user to manually classify incoming messages as belonging to an activity. This paper presents several algorithms for automatically recognizing emails as part of an ongoing activity. Our baseline methods are the use of message reply-to threads to determine activity membership and a naive Bayes classifier. Our SimSubset and SimOverlap algorithms compare the people involved in an activity against the recipients of each incoming message. Our SimContent algorithm uses IRR (a variant of latent semantic indexing) to classify emails into activities using similarity based on message contents. An empirical evaluation shows that each of these methods provide a significant improvement to the baseline methods. In addition, we show that a combined approach that votes the predictions of the individual methods performs better than each individual method alone.

		Nicholas Kushmerick, Tessa Lau, Mark Dredze, Rinat Khoussainov. Activity-Centric Email: A Machine Learning Approach. American National Conference on Artificial Intelligence (AAAI) (Nectar), 2006. [PDF] [Bibtex] [Close] @inproceedings{kushmerick-06, author = {Nicholas Kushmerick and Tessa Lau and Mark Dredze and Rinat Khoussainov}, booktitle = {American National Conference on Artificial Intelligence (AAAI) (Nectar)}, date-added = {2009-03-30 12:24:18 -0400}, date-modified = {2017-08-09 14:03:56 +0000}, file = {kushmerick-aaai06-nectar.pdf}, pages = {1634-1637}, title = {Activity-Centric Email: A Machine Learning Approach}, year = 2006 }

		2005 (3 Publications)
		Rie Kuboto Ando, Mark Dredze, Tong Zhang. Trec 2005 Genomics Track Experiments at IBM Watson. Text REtrieval Conference (TREC), 2005. [PDF] [Bibtex] [Close] @inproceedings{ando:05, annote = {(Group invited talk at TREC 2005, ranked 3rd and 4th out of 53 entries)}, author = {Rie Kuboto Ando and Mark Dredze and Tong Zhang}, booktitle = {Text {REtrieval} Conference (TREC)}, date-added = {2009-03-30 12:25:18 -0400}, date-modified = {2010-09-03 14:24:33 -0400}, file = {ibm_trec05_genomics.pdf}, keywords = {workshop}, title = {Trec 2005 Genomics Track Experiments at IBM Watson}, year = {2005} } (Group invited talk at TREC 2005, ranked 3rd and 4th out of 53 entries)

		Mark Dredze, John Blitzer, Fernando Pereira. Reply Expectation Prediction for Email Management. Conference on Email and Anti-Spam (CEAS), 2005. [PDF] [Bibtex] [Close] @inproceedings{dredze-05, abstract = {We reduce email overload by addressing the problem of waiting for a reply to one's email. We predict whether sent and received emails necessitate a reply, enabling the user to both better manage his inbox and to track mail sent to others. We discuss the features used to discriminate emails, show promising initial results with a logistic regression model, and outline future directions for this work.}, author = {Mark Dredze and John Blitzer and Fernando Pereira}, booktitle = {Conference on Email and Anti-Spam (CEAS)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2010-09-03 14:25:25 -0400}, file = {dredze_ceas05.pdf}, title = {Reply Expectation Prediction for Email Management}, year = {2005} } We reduce email overload by addressing the problem of waiting for a reply to one's email. We predict whether sent and received emails necessitate a reply, enabling the user to both better manage his inbox and to track mail sent to others. We discuss the features used to discriminate emails, show promising initial results with a logistic regression model, and outline future directions for this work.

		Catalina Danis, Wendy Kellogg, Tessa Lau, Mark Dredze, Jeffrey Stylos, Nicholas Kushmerick. Managers Email: Beyond Tasks and To-Dos. Conference on Human Factors in Computing Systems (CHI) (Extended Abstracts), 2005. [PDF] [Bibtex] [Close] @inproceedings{danis-2005, abstract = {In this paper, we describe preliminary findings that indicate that managers and non-mangers think about their email differently. We asked three research managers and three research non-managers to sort about 250 of their own email messages into categories that "would help them to manage their work." Our analyses indicate that managers create more categories and a more differentiated category structure than non-managers. Our data also suggest that managers create "relationship-oriented" categories more often than non-managers. These results are relevant to research on "email overload" that has highlighted the use of email for activities beyond communication. In particular, our findings suggest that too strong a focus on task management may be incomplete, and that a user's organizational role has an impact on their conceptualization and likely use of email.}, author = {Catalina Danis and Wendy Kellogg and Tessa Lau and Mark Dredze and Jeffrey Stylos and Nicholas Kushmerick}, booktitle = {Conference on Human Factors in Computing Systems (CHI) (Extended Abstracts)}, date-added = {2009-03-30 12:24:43 -0400}, date-modified = {2017-08-09 14:01:25 +0000}, file = {danis_chi2005.pdf}, pages = {1324-1327}, title = {Managers Email: Beyond Tasks and To-Dos}, year = {2005} } In this paper, we describe preliminary findings that indicate that managers and non-mangers think about their email differently. We asked three research managers and three research non-managers to sort about 250 of their own email messages into categories that "would help them to manage their work." Our analyses indicate that managers create more categories and a more differentiated category structure than non-managers. Our data also suggest that managers create "relationship-oriented" categories more often than non-managers. These results are relevant to research on "email overload" that has highlighted the use of email for activities beyond communication. In particular, our findings suggest that too strong a focus on task management may be incomplete, and that a user's organizational role has an impact on their conceptualization and likely use of email.

		2004 (1 Publications)
		Mark Dredze, Jeffrey Stylos, Tessa Lau, Wendy Kellogg, Catalina Danis, Nicholas Kushmerick. Taxie: Automatically identifying tasks in email. Unpublished Manuscript, 2004. [Bibtex] [Close] @unpublished{dredze:04, author = {Mark Dredze and Jeffrey Stylos and Tessa Lau and Wendy Kellogg and Catalina Danis and Nicholas Kushmerick}, date-added = {2010-09-03 14:25:41 -0400}, date-modified = {2010-09-03 14:26:26 -0400}, title = {Taxie: Automatically identifying tasks in email}, year = {2004} }

		2003 (1 Publications)
		Kevin Livingston, Mark Dredze, Kristian Hammond, Larry Birnbaum. Beyond Broadcast. International Conference on Intelligent User Interfaces (IUI), 2003. [PDF] [Bibtex] [Close] @inproceedings{livingston:03, abstract = {This research discusses a method for delivering just-in-time information to television viewers to provide more depth and more breadth to television broadcasts. A novel aspect of this research is that it uses broadcast news as a starting point for gathering information regarding specific stories, as opposed to considering the broadcast version to be the end of the viewers exploration. This work is implemented in Cronkite, a system that provides viewers with expanded coverage of broadcast news stories.}, author = {Kevin Livingston and Mark Dredze and Kristian Hammond and Larry Birnbaum}, booktitle = {International Conference on Intelligent User Interfaces (IUI)}, date-added = {2010-09-03 14:26:39 -0400}, date-modified = {2017-08-09 14:00:00 +0000}, file = {https://dl.acm.org/citation.cfm?id=604093}, pages = {260-262}, title = {Beyond Broadcast}, year = {2003} } This research discusses a method for delivering just-in-time information to television viewers to provide more depth and more breadth to television broadcasts. A novel aspect of this research is that it uses broadcast news as a starting point for gathering information regarding specific stories, as opposed to considering the broadcast version to be the end of the viewers exploration. This work is implemented in Cronkite, a system that provides viewers with expanded coverage of broadcast news stories.