Projects

  • Named-entity recognition and term mining

    Recognizing terms and named entities in research articles and mapping them to unique identifiers is an important first step in most text mining software. This is a challenging task because of ambiguity and variation in how entities and concepts are named and used in particular in the biological literature. Our…

  • Mining term associations and events from bio-literature

    This is a long-term project that aims at developing text mining methods that can provide efficient and sophisticated knowledge acquisition, offer plausible hypotheses for testing, prevent unnecessary repetition of previous work, and help in experimental design for specific research scenarios. We investigate various text mining approaches to establishing literature-based associations…

  • Mining bioinformatics service descriptions

    There are a number of services and resources available to the bioinformatics community, but meta-data that describe them is typically scarce. This project aims to develop text mining techniques to automatically describe, locate, retrieve and reason about bioinformatics services and resources. We investigate methods that extract descriptions from various document…

  • Integration of text and data mining in life sciences

    There have been numerous efforts to provide tools for storing, extracting and analysing data in life sciences. Interoperability and integration of such efforts is a challenging issue, not only technically (e.g. different formats, protocols, encodings) but also more importantly semantically. We are involved in a number of community-driven initiatives to…

  • Blog sentiment analysis

    Sentiment analysis is the extraction of attitudes and opinions from human-authored documents. The capture and analysis of such attitudes and opinions in an automated and structured fashion might offer a powerful technology to a number of problem domains, including business intelligence, marketing, national security, and crime prevention. This project aims…

  • Topic-focused Web crawling

    The ultimate aim of Internet search engines is to index the entire Web by utilising the links found within known pages. Topic-focused crawlers specialise this task by indexing only the subset of the Web which is relevant to some topic or information need. Typically, the user can specify the topic…

  • Document clustering and summarisation

    Document clustering is a generic problem with wide spread applications within Natural Language engineering. Present research focuses on using text summarization techniques as a pre-processing step for document clustering in the context of automated assessment of student essays. One of the major problems in natural language processing is that a…