Recent Changes - Search:

Resources

TM tutorials


Internal


edit SideBar

LINNAEUS

LINNAEUS is a general-purpose dictionary matching software, capable of processing multiple types of document formats in the biomedical domain (MEDLINE, PMC, BMC, OTMI, text, etc.). It can produce multiple types of output (XML, HTML, tab-separated-value file, or save to a database). It also contains methods for acting as a server (including load balancing across several servers), allowing clients to request matching over a network.

A package with files for recognizing and identifying species names is available for LINNAEUS, showing 94% recall and 97% precision compared to LINNAEUS-species-corpus.

LINNAEUS is the subject of the following paper: Gerner M., Nenadic, G. and Bergman, C. M. (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 11:85.


Dictionary: this can be either a file mapping dictionary identifiers to regular expressions, or a file mapping dictionary identifiers to all possible variations of the term to be matched. The former example is better for regular expressions covering a very large number of combinations, whereas the latter example is better for more restricted patterns (resulting in faster processing times). Examples (the separator is a tab character) are given below.

file with regular expressions: --regexpMatcher <file>

  • 9606 [Hh]umans?
  • 10090 [Mm]ouse|[Mm]ice

file with all variations: --variantMatcher <file>

  • 9606 Human|human|Humans|humans
  • 10090 Mouse|mouse|Mice|mice

Input document sources:

  • Directory with .txt files: --textDir <directory> [--recursive]
  • A single text file: --text <file>

(for large-scale MEDLINE/PMC processing from a database, contact me)

Output formats:

  • tab-separated offset-based data: --out <file>
  • HTML output (a .html containing each document, with marked-up entities for visualization): --outHTML <file>
  • XML output: --outXML <file>

LINNAEUS can be downloaded from LINNAEUS, or run directly from ~mqbpgmg2/jars/linnaeus.jar on the Gnode by e.g. java -jar ~mqbpgmg2/jars/linnaeus.jar --regexpMatcher <dictionary file> --textDir <document dir> --outHTML <output file>

Edit - History - Print - Recent Changes - Search
Page last modified on April 25, 2011 at 15:16