Text Extraction, Analytics and Mining


Our research TEAM investigates methodologies for the extraction of both explicit and implicit knowledge from large collections of textual documents, in particular in the domains of life sciences and health-care. This field is known as text mining, natural language processing (NLP) and/or text analytics. More precisely, we are intrested in

  • Terminology mining (term/entity identification, controlled vocabularies)
  • Relationship extraction from text (linking entities)
  • Architectures for data and text integration (interoperable services, linked-data).

Our research combines methods from computational lingustics (e.g. shallow parsing, local grammar modelling), knowledge representation (ontologies) and intensive data mining (feature selection, classification and clustering).

Our main focus is in the domains of healthcare and medicine (patient/hospital records) and biology (biomedical literature), but we also investigate other domains/genres (e.g. blogs):

  • Health-related information synthesis (synthesis of information from unstructured electronic health-care records, patient narratives and literature to support clinical decision support; sentiment mining of health-related social media)
  • Large-scale extraction and contextualization of biomolecular events (extraction of host-pathogen interactions; conflicting statements in scientific texts)
  • Mining of scientific methodologies from literature (capturing best and common practice for in-silico experiments)
  • Mining semi-structured reports (data quality in question-answer reports),

We are part of the Text Mining/NLP research group within the School of Computer Science at the University of Manchester, and are based in the IT building (number 40 on the campus map; room IT301). We are affiliated to the Manchester Institute of Biotechnology and closely collaborate with Bio-Health Informatics Group, NIBHI and Biomedical DSS team.

The gnTEAM was established in 2004 and is led by Dr Goran Nenadic.


Our most recent projects:


Funding acknowledgements to: