Group resources


Named-entity recognition and normalisation software

  • LINNAEUS: NER software for recognizing and normalizing species names.
  • GNAT: NER software for recognizing and normalizing gene and protein names.
  • bioNerDS: NER software for recognising bioinformatics database and software names.
  • PathNER: NER software for recognizing pathway mentions in the literture.
  • ManTIME: NER software for identifying and normalising temporal informations from general domain texts.
  • NorMA: Temporal expression normalisation software for general domain.
  • Clinical NorMA: Temporal expression normalisation software tuned for clinical domain.
  • TERN: NER software for recognizing and normalizing (clinical/medical) temporal expressions.
  • Clinical NERC: NERC software for recognizing and classifying clinical/medical concepts.
  • CliNER: Machine-learning NER software for recognizing mentions of clinical/medical terms, including problems, tests, treatments, clinical departments and recognision and normalisation of temporal expressions.


Literature-extracted data sets

  • BioContext: Data on contextualised biomolecular events, integrated from several tools. Over 36 million event mentions representing 11.4 million distinct events, with over 290,000 distinct genes/proteins that are mentioned more than 80 million times and linked where possible to Entrez Gene identifiers.
  • Wiki-pain: detailed contextual information on molecular interactions and single events relevant to pain that have been automatically extracted from all of the biomedical literature.
  • GETM: Dataset aimed at researchers that would like to get an overview of gene expression for a particular gene, or in a particular anatomical location or cell type. Data interfaces enable powerful searches and visualizations.
  • pubmed2ensembl: an Ensembl BioMart extended with gene-related publication information
  • Ontogrator: Facted browser over ontology-integrated data resources

Event extraction software

  • GETM: Tool for extracting information about the expression of genes/proteins, and linking them to anatomical locations.

Data formats

  • IeXML: an annotation format for named entities in the life science literature that allows the interchange of annotated corpora independently of the underlying technology.