Natural Language Processing for Clinical Data: Continuous Success at i2b2 Challenges

Natural Language Processing for Clinical Data: Continuous Success at i2b2 Challenges

2011 – This year we took again part in the annual i2b2 shared task, an international text mining challenge in the clinical/health-care domain. The team composed of members from University of Novi Sad (Kovacevic, A.) and University of Manchester (Dehghan, A., Nenadic G. and Keane, J.). The aim of the challenge (Fifth/i2b2, Track II: Sentiment analysis) was to classify at line-level, statements in suicide notes into 15 categories (i.e., emotions and expressions).

The challenge was most interesting this year to say the least. Despite some surprises, we managed to rank eight out of 26 participating teams. We were also one of only 5 teams invited to give a talk at the workshop and a full text publication.

2010 – A team of staff from Manchester’s School of Computer Science (Irena Spasic, Farzaneh Sarafraz, John A. Keane and Goran Nenadic) took again part in the Third i2b2 shared task. The challenge was organised by Informatics for Integrating Biology and the Bedside, i2b2.

This year, the aim was the extraction of medication-related information from narrative patient records. For each medication mention, details (such as medication name, dosage, reason for taking, frequency, duration etc.) were provided by the participants and have been evaluated against a manually extracted godl standard, which was generated by collaborative annotation by all participating teams.

We are pleased to announce that our team repeated the last year’s success and was among the top ranked teams for the second year running. Overall, the team was ranked third out of 19 teams taking part, with the same significance level as the second ranked team.

2009 – More information on the 2009 challenge can be found at: i2b2 Web site: the Third Shared Task in Natural Language Processing for Clinical Data: Medication Extraction Challenge.

2008 – Our team was announced the winner in one of the two tasks in the Second shared challenge in Natural Language Processing for Clinical Data: Obesity Challenge: Who’s obese and what co-morbidities do they (definitely/likely) have?

The goal of the 2008 challenge was to evaluate NLP systems on their ability to recognise whether a patient is obese and what co-morbidities they exhibit. The data consisted of hospital discharge summaries, and obesity information and co-morbidities were marked at a document level as present, absent, questionable or unmentioned. For each patient, both textual judgments (what the text explicitly states about obesity and co-morbidities) and intuitive judgments (what the text implies about obesity and co-morbidities) were provided by the participants.

There were 28 teams taking part in the 2008 challenge. Our TEAM was announced as the winner for the textual task (97.2% accuracy) and we were ranked 7th in the intuitive judgement task (95.7% accuracy).


  • Kovacevic, A., Dehghan, A., Keane, J., Nenadic, G.: Topic Categorisation of Statements in Suicide Notes with Integrated Rules and Machine Learning, J Biomed Informatics Insight, In press 2012 (link)
  • Spasic, I., Sarafraz, F., Keane, J., Nenadic, G.: Medication Information Extraction with Linguistic Pattern Matching and Semantic Rules, Proceedings of the i2b2 2009 Workshop.
  • Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a Disease Status from Clinical Discharge Summaries, J. of American Medical Informatics Association, 16(4):596-600; (link)