Named-entity recognition and normalisation software
- LINNAEUS: NER software for recognizing and normalizing species names.
- GNAT: NER software for recognizing and normalizing gene and protein names.
- bioNerDS: NER software for recognising bioinformatics database and software names.
- PathNER: NER software for recognizing pathway mentions in the literture.
- ManTIME: NER software for identifying and normalising temporal informations from general domain texts.
- NorMA: Temporal expression normalisation software for general domain.
- Clinical NorMA: Temporal expression normalisation software tuned for clinical domain.
- TERN: NER software for recognizing and normalizing (clinical/medical) temporal expressions.
- Clinical NERC: NERC software for recognizing and classifying clinical/medical concepts.
- CliNER: Machine-learning NER software for recognizing mentions of clinical/medical terms, including problems, tests, treatments, clinical departments and recognision and normalisation of temporal expressions.
Literature-extracted data sets
- BioContext: Data on contextualised biomolecular events, integrated from several tools. Over 36 million event mentions representing 11.4 million distinct events, with over 290,000 distinct genes/proteins that are mentioned more than 80 million times and linked where possible to Entrez Gene identifiers.
- Wiki-pain: detailed contextual information on molecular interactions and single events relevant to pain that have been automatically extracted from all of the biomedical literature.
- GETM: Dataset aimed at researchers that would like to get an overview of gene expression for a particular gene, or in a particular anatomical location or cell type. Data interfaces enable powerful searches and visualizations.
- pubmed2ensembl: an Ensembl BioMart extended with gene-related publication information
- Ontogrator: Facted browser over ontology-integrated data resources
Event extraction software
- GETM: Tool for extracting information about the expression of genes/proteins, and linking them to anatomical locations.
Data formats
- IeXML: an annotation format for named entities in the life science literature that allows the interchange of annotated corpora independently of the underlying technology.