Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives

Kovačević A, Dehghan A, Filannino M, Keane J, Nenadic G
Journal of the American Medical Informatics Association; 2013

Objective Identification of clinical events (eg, problems, tests, treatments) and associated temporal expressions (eg, dates and times) are key tasks in extracting and managing data from electronic health records. As part of the i2b2 2012 Natural Language Processing for Clinical Data challenge, we developed and evaluated a system to automatically extract temporal xpressions and events from clinical narratives. The extracted temporal expressions were additionally normalized by assigning type, value, and modifier.Materials and methods The system combines rulebased and machine learning approaches that rely on morphological, lexical, syntactic, semantic, and domain-specific features. Rule-based components were designed to handle the recognition and normalization of temporal expressions, while conditional random fields modelswere trained for event and temporal recognition. Results The system achieved micro F scores of 90% for the extraction of temporal expressions and 87% for clinical event extraction. The normalization componentfor temporal expressions achieved accuracies of 84.73% (expression’s type), 70.44% (value), and 82.75% (modifier).Discussion Compared to the initial agreement between human annotators (87–89%), the system provided comparable performance for both event and temporal expression mining. While (lenient) identification of such mentions is achievable, finding the exact boundaries proved challenging.Conclusions The system provides a state-of-the-art method that can be used to support automated identification of mentions of clinical events and temporalexpressions in narratives either to support the manual review process or as a part of a large-scale processing of electronic health databases.