The University of Manchester; 2012
The main archive of life sciences literature currently contains more than 18,000,000 references, and it is virtually impossible for any human to stay up-to-date with this large number of papers, even in a specific sub-domain.Not every fact that is reported in the literature is novel and distinct. Scientists report repeat experiments, or refer to previous findings. Given the large number of publications, it is not surprising that information on certain topics is repeated over a number of publications. From consensus to contradiction, there are all shades of agreement between the claimed facts in the literature, and considering the volume of the corpus, conflicting findings are not unlikely. Finding such claims is particularly interesting for scientists, as they can present opportunities for knowledge consolidation and future investigations.In this thesis we present a method to extract and contextualise statements about molecular events as expressed in the biomedical literature, and to find those that potentially conflict each other. The approach uses a system that detects event negations and speculation, and combines those with contextual features (e.g. type of event, species, and anatomical location) to build a representational model for establishing relations between different biological events, including relations concerning conflicts. In the detection of negations and speculations, rich lexical, syntactic, and semantic features have been exploited, including the syntactic command relation.Different parts of the proposed method have been evaluated in a context of the BioNLP 09 challenge. The average F-measures for event negation and speculation detection were 63% (with precision of 88%) and 48% (with precision of 64%) respectively. An analysis of a set of 50 extracted event pairs identified as potentially conflicting revealed that 32 of them showed some degree of conflict (64%); 10 event pairs (20%) needed a more complex biological interpretation to decide whether there was a conflict.We also provide an open source integrated text mining framework for extracting events and their context on a large-scale basis using a pipeline of tools that are available or have been developed as part of this research, along with 72,314 potentially conflicting molecular event pairs that have been generated by mining the entire body of accessible biomedical literature.We conclude that, whilst automated conflict mining would need more comprehensive context extraction, it is feasible to provide a support environment for biologists to browse potential conflicting statements and facilitate data and knowledge consolidation.