IEEE BigData 2026 Cup: Explainable Suicide Risk Assessment on Social Media

The IEEE BigData 2026 Cup Challenge on Explainable Suicide Risk Assessment on Social Media is part of the annual Big Data Cup series held under the auspices of the IEEE International Conference on Big Data (https://bigdataieee.org/BigData2026/). This competition introduces a dual-objective design that advances both predictive accuracy and clinical interpretability: participants must (1) predict the suicide risk level of a social media post, and (2) extract structured clinical evidence that explains the prediction. The top 8 teams will be invited to submit papers describing their solutions. Accepted papers will be presented at the IEEE BigData 2026 conference (Phoenix, Arizona, USA, December 14–17, 2026).

The topic of this year's competition is suicide risk assessment from Reddit posts, with a focus on both risk-level prediction and clinical evidence extraction. The dataset contains Reddit posts collected from mental health communities (e.g., r/SuicideWatch), annotated by trained annotators following established clinical risk assessment protocols grounded in the Interpersonal Theory of Suicide (IPTS), the Integrated Motivational-Volitional (IMV) Model, and the Fluid Vulnerability Theory (FVT).

This challenge introduces clinical interpretability as a core evaluation dimension. Unlike previous editions that focused solely on classification, the 2026 challenge requires participants to provide evidence for their predictions—spanning risk factors, protective factors, and warning signs present in the post text.

Authors of selected challenge reports will be invited to extend their work for publication in the IEEE BigData 2026 conference proceedings (subject to review by the Organizing Committee) and to present at the ASRAM Workshop (AI for Suicide Risk Assessment on Social Media) co-located with the conference. Invited teams will be selected based on their final rank, the innovativeness of their approach, and the quality of the submitted report.

This challenge consists of two subtasks. Participants may compete in one or both subtasks. The final composite score is weighted 60% on Subtask 1 and 40% on Subtask 2, with Macro F1 used as the evaluation metric for both.

Subtask 1: Suicide Risk Level Prediction (Weight: 60%)
Given a Reddit post, predict the author's suicide risk level into one of four categories:

Evaluation (Subtask 1): Macro F1 score across all four risk levels.

Subtask 2: Structured Clinical Evidence Extraction (Weight: 40%)
Given the same post, identify and categorize text spans that serve as clinical evidence for the risk-level assessment. Evidence spans fall into three categories: Evaluation (Subtask 2): Macro F1 score across all three evidence categories.

Composite Scoring: The final leaderboard score is computed as:
Final Score = 0.60 × Macro_F1(Subtask1) + 0.40 × Macro_F1(Subtask2)
Teams that participate in Subtask 1 only will be scored on Subtask 1 alone (no penalty for skipping Subtask 2, but they are ineligible for the composite leaderboard).

Submission format: Please submit your predictions as a .json file with the following structure. The file name must be YourTeamName.json.

Field Type Description
post_id string The post identifier from the test set
risk_level string One of: no_risk, low_risk, moderate_risk, high_risk
evidence list List of {span, category} dicts for Subtask 2 (empty list if not participating)

A sample submission file and format validator script will be provided with the dataset release.

Final evaluation: The final evaluation will be conducted after the submission deadline using a held-out test set. Only teams that submit both their prediction file and a working notes report by the deadline will qualify for the final evaluation. Code and reports will be submitted via Google Drive or Baidu Drive; instructions will be sent by email in advance.

Please submit the prediction file created by your team. Multiple submissions are permitted during the evaluation phase (up to 3 per day). The file format should be .json, and the file name must be: YourTeamName.json. Scores of uploaded prediction results will be updated on the leaderboard the following day. For a detailed explanation of the submission format, please refer to the 'Task Description' section above.

Based on the submitted works, teams will be evaluated according to the following selection criteria:

Top-performing teams will be invited to submit a paper describing their solution (up to 10 pages, double-column IEEE format) for the IEEE BigData 2026 proceedings and to present at the ICDM conference.

Paper submission system: https://bigdataieee.org/BigData2026/ (link to be updated when the chairs open the proceedings submission portal).
Paper format: Same as the main conference paper — 10 pages, double-column IEEE format.

The leaderboard will be updated daily during the evaluation phase (starting June 1, 2026). Results below show the composite score (60% Subtask 1 + 40% Subtask 2 Weighted F1).


Rank Team Name Subtask 1 (Weighted F1) Subtask 2 (Weighted F1) Composite Score
— Leaderboard will be populated once the evaluation phase opens (June 1, 2026) —

Cash prizes will be awarded to the top 3 teams. We will contact the winning teams directly after the final evaluation.

Schedule


Certificates of achievement will be issued to all teams. Top 8 teams are also invited to present at the ICDM conference.


Enroll

Once you have read and accepted the Data Usage Agreement below, please send your team's information to the registration email address in the following format. We will respond with the dataset download link.

We accept the Competition Data Usage Agreement



Registration email address: ieee.bigdata2026@outlook.com

For registration and general inquiries, contact Alex at hialexlee@hotmail.com

Q: Do I need to submit a formal letter of intent?
Participants do not need to submit a formal letter of intent. To register, simply send us an email by June 1, 2026 with your team information. Once you receive a reply with the dataset link, your registration is confirmed.
Q: Can I participate in only one subtask?
No. Subtask 1 (Risk Level Prediction) is the primary task. You may choose to participate in Subtask 1 only. Teams that complete both subtasks will be ranked on the 60/40 composite leaderboard;
Q: How many submissions are allowed per day?
During the evaluation phase, teams may submit up to 3 prediction files per day.
Q: Are pre-trained language models and LLMs allowed?
Yes. Participants may use any publicly available pre-trained model, including large language models (LLMs). Use of external labeled data beyond what is provided in the dataset must be declared in the report.
Q: I uploaded predictions but the leaderboard score has not updated yet.
We update the leaderboard each morning. If your score has not appeared after 24 hours, please notify us by email and we will update it immediately.
Q: What is the maximum team size?
There is no strict maximum team size. All team members must be listed at registration and in the submitted report.
Q: Will the test set labels be released after the competition?
Yes. After the final evaluation and announcement of results, the complete annotated test set will be released to all registered participants for research purposes, subject to the Data Usage Agreement.