|
LINNAEUSLINNAEUS is a general-purpose dictionary matching software, capable of processing multiple types of document formats in the biomedical domain (MEDLINE, PMC, BMC, OTMI, text, etc.). It can produce multiple types of output (XML, HTML, tab-separated-value file, or save to a database). It also contains methods for acting as a server (including load balancing across several servers), allowing clients to request matching over a network. A package with files for recognizing and identifying species names is available for LINNAEUS, showing 94% recall and 97% precision compared to LINNAEUS-species-corpus. LINNAEUS is the subject of the following paper: Gerner M., Nenadic, G. and Bergman, C. M. (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 11:85. Dictionary: this can be either a file mapping dictionary identifiers to regular expressions, or a file mapping dictionary identifiers to all possible variations of the term to be matched. The former example is better for regular expressions covering a very large number of combinations, whereas the latter example is better for more restricted patterns (resulting in faster processing times). Examples (the separator is a tab character) are given below. file with regular expressions: --regexpMatcher <file>
file with all variations: --variantMatcher <file>
Input document sources:
(for large-scale MEDLINE/PMC processing from a database, contact me) Output formats:
LINNAEUS can be downloaded from LINNAEUS, or run directly from ~mqbpgmg2/jars/linnaeus.jar on the Gnode by e.g. java -jar ~mqbpgmg2/jars/linnaeus.jar --regexpMatcher <dictionary file> --textDir <document dir> --outHTML <output file> |