CliNER

About CliNER

CliNER is a command line tool for identification of mentions of four categories of clinically relevant events: Problems, Tests, Treatments and Clinical Departments. It also recognises and normalises clinical temporal expressions. It was developed as part of the i2b2 2012 text mining challenge and therefore has been trained and optimised on the i2b2 data.

For example, for input:

He had surgery about 3 weeks ago and had the lining cleaned and a biopsy was performed.

CliNER will produce the following output (stand-off):

<?xml version="1.0" encoding="UTF-8" ?>
<ClinicalNarrativeTemporalAnnotation>
<TEXT><![CDATA[
He had surgery about 3 weeks ago and had the lining cleaned and a biopsy was performed.
]]></TEXT>
<TAGS>
<EVENT id="E1" start="65" end="73" text="a biopsy" modality="FACTUAL" polarity="POS" type="TEST" />
<EVENT id="E1" start="8" end="15" text="surgery" modality="FACTUAL" polarity="POS" type="TREATMENT" />
<TIMEX3 id="T1" start="22" end="33" text="3 weeks ago" type="DATE" val="2014-07-29" mod="NA" />
</TAGS>

System Architecture

  • Implemented in Java, using cTAKES, CRF++ and Clinical NorMA
  • Support for multiple formats, currently supporting:
    • standoff XML
    • character offset-based format

Algorithm details
Conditional random fields with IO scheme and five groups of features:

  • Lexical features included the token itself, its lemma, and POS tag, as well as lemmas and POS tags of the surrounding tokens. Each token was also assigned features from its associated chunk (phrase): the type of phrase (nominal, verbal, etc), tense and aspect (if the phrase was verbal), the location of the token within the chunk (beginning or inside), and the presence of negation.
  • Domain features capture mentions of specific clinical/healthcare concepts. Mentions of Problem, Test, and Treatment (as generated by cTAKES) were assigned to the token.
  • Semantic role features model dependencies between the token and associated verb. Each token is assigned the role, the verb, and their combination (eg, ‘object+perform’) in order to capture particular verb–role preferences.
  • Section type feature represents the section type in which the token appeared.
  • Temporal expression (TE) features represent five features that indicated the presence of the five common types of constituents of TEs in a given token.

More details on the architecture and the performance of the tool can be found in the paper below. Please cite this publication if you use CliNER:

Kovačević, A., Dehghan, A., Filannino, M., Keane, J. A., & Nenadic, G. (2013). Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. Journal of the American Medical Informatics Association, 20(5), 859-866.

Contact Aleksandar Kovacevic (http://informatika.ftn.uns.ac.rs/AleksandarKovacevic/kocha78@gmail) with any questions, bugs and/or suggestions.