CLPsych 2015 Shared Task Evaluation

Task: Depression and PTSD on Twitter

Task organizers: Glen Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, Margaret Mitchell

Maintained by: Mark Dredze

Last updated: February 7, 2018

The Computational Linguistics and Clinical Psychology (CLPsych) workshop has hosted shared and unshared tasks for several years.

In 2015 the shared task used data from Twitter users who state a diagnosis of depression or post traumatic stress disorder (PTSD) along with demographically-matched community controls. The shared task provided an apples-to-apples comparisons of various approaches to modeling language relevant to mental health from social media. The shared task consisted of three binary classification experiments: (1) depression versus control, (2) PTSD versus control, and (3) depression versus PTSD.

This site provides instructions on how to obtain the data used in the shared task, as well as links to associated resources.

How can I get the data?

You will need to complete the following tasks to obtain the shared task data.

  1. Notify Mark Dredze that you intend to request the data.
  2. Obtain a letter from your Institutional Review Board (IRB), or equivalent ethics board, that they have approved your proposed project and use of the data. Your IRB may rule this an exempt study, or require a review of the research protocols. Either way, you must produce a letter from the IRB approving the project.
  3. Complete the Data Use and Confidentiality Agreement.

What is an IRB?

An institutional review board (IRB) is a committee that applies research ethics by reviewing the methods proposed for research to ensure that they are ethical. IRB approval is (typically) required for human subjects research in the United States. See the Wikipedia page for more information.

If you are outside the United States you typically have an equivalent ethics board. See HHS Office for Human Research Protections International Guidelines for more information.

How do I get started with an IRB application?

Your university will have an IRB coordinator or administrator. Start by talking to this person.

Do you have advice for how to write the IRB application?

If this is your first IRB application, you should discuss the proposed project with your IRB contact or administrator. You may also want to ask a colleague for an example IRB application.

For issues specific to social media data and health research, we suggest:

Adrian Benton, Glen Coppersmith, Mark Dredze. Ethical Research Protocols for Social Media Health Research. EACL Workshop on Ethics in Natural Language Processing, 2017.


Where can I find a description of the data?

See the shared task overview paper:

Glen Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, Margaret Mitchell. CLPsych 2015 Shared Task: Depression and PTSD on Twitter. NAACL Workshop on Computational Linguistics and Clinical Psychology, 2015.


Please cite this paper as the reference for the data.

Is there any software available for working with this data?

This github project contains code for working with the data and running evaluations:

Where can I find the original shared task description?

Who else has used the data?

The teams who participated in the original shared task submitted papers, which are available in the official ACL proceedings. They are listed here:

Can I use this data commercially?

The data usage agreement prohibits the use of this data for:

commercial purposes of any kind, including but not limited to algorithm development or evaluation, model development or evaluation, evaluation of features, feature engineering, reports, or visualizations used for any for-profit purpose, where for-profit purposes include but are not limited to prototyping, product development, marketing, public relations, or pursuit of funding.

Please contact us with questions of how the data can be used commercially.