gnTEAM » Search Results » “Data Mining”

Temporal expression extraction with extensive feature type selection and a posteriori label adjustment

mbelousov — Mon, 07 Mar 2016 11:11:15 +0000

The post Temporal expression extraction with extensive feature type selection and a posteriori label adjustment appeared first on gnTEAM.

Linked2Safety – a next-generation, secure linked-data medical information space for semantically-interconnecting electronic health records and clinical trials systems

admin — Thu, 02 Jul 2015 10:29:26 +0000

The main aim of the Linked2Safety project is to explore the Semantic Web and Linked Data to facilitate semantic interlinking of electronic health records (EHRs) and clinical trials systems for gathering and sharing knowledge to support decision making in medical and clinical research. The vision is to facilitate early detection of patients’ safety issues, the identification of adverse events and the identification of a suitable critical mass of patients to participate in small (Phases II and III) or larger scale (Phase IV) clinical trials.

Our role is focused on the design of an interoperable EHR data space and development of bio-marker data mining techniques for adverse events early detection. We will also provide several clinical trials showcases and organise the Clinical research and patients safety Special Interest Group.

The post Linked2Safety – a next-generation, secure linked-data medical information space for semantically-interconnecting electronic health records and clinical trials systems appeared first on gnTEAM.

Integration of text and data mining in life sciences

admin — Fri, 26 Jun 2015 13:57:52 +0000

There have been numerous efforts to provide tools for storing, extracting and analysing data in life sciences. Interoperability and integration of such efforts is a challenging issue, not only technically (e.g. different formats, protocols, encodings) but also more importantly semantically. We are involved in a number of community-driven initiatives to provide better integration for life science research.
One initiative is to provide harmonised ways for representing and tagging named entities in the life science literature. We are proposing to establish common document formats that facilitate the exchange of annotation results contained in the literature as a complementary approach to the development of interoperable tools. We work towards (a) recommendations for a common syntax to embody entity mentions in publishers’ document formats (e.g., into PMC), and (b) provision of a common way to reference semantic types. The initial results have been implemented in the IeXML proposal, which has already been used in some community-wide projects (e.g. CALBC). The original IeXML paper is available here.
Involved: D. Rebholz (EBI), G. Nenadic
Another initiative is to use ontologies and text mining to integrate and mark up data (both structured and unstructured) and provide semantics-based faceted browsing to help users navigate, query and retrieve data. The Ontogrator platform has been developed by the NERC Environmental Bioinformatics Centre and the University of Manchester, with a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes integrated content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases.
Involved: D. Field (NEBC), N. Morrison (Manchester), D. Hancock, L. Hirschman, G. Nenadic, et al.
As part of the BBSRC-funded pubmed2ensembl project, we have developed a customised and extended version of the Ensembl BioMart by adding gene-related publication information, i.e. PubMed-IDs and PubMed Central-IDs including URL link-outs and other information. The pubmed2ensembl BioMart has an enhanced interface that permits to carry out interactive full-text search queries via NCBI’s Entrez Utilities (eUtils), whose search results are applied as an additional filter on the mart datasets. The system also provides DAS link-outs into the Ensembl Genome Browser, where a custom DAS track summarises the publication data that have been accumulated on a per gene basis.
Involved: J. Baran, C. Bergman, G. Nenadic, M. Gerner

The post Integration of text and data mining in life sciences appeared first on gnTEAM.

Prof John Keane

admin — Thu, 25 Jun 2015 16:12:56 +0000

The post Prof John Keane appeared first on gnTEAM.

Information for prospective postgraduate students

admin — Mon, 22 Jun 2015 12:21:04 +0000

General information

We are always keen to have postgraduate research students in various areas of text mining and natural language processing. As a rule of thumb, you will need to have an xmaplesxcellent first degree in computer science or related area (e.g. computational lingustics, mathematics, physics, bioinformatics), with very good programming experience and some experience in natural language processing (e.g. final year project, summer internship, an ad-hoc project). An MSc or publications in a related area will be also a distinctive advanatage try terrorism essay for free.

The main theme of our research is feature engineering from unstructured documents written in natural languages. We investigate methodologies for the extraction of both explicit and implicit features from large collections of textual documents. Features can be terms, names, relations, co-occurances, events, etc. Once engineered from text, the features can be used to provide understanding and reasoning over knowledge (e.g. by applying machine learning or data mining) – this discipline is referred to as text analytics, text mining or more generally natural language processing (NLP).
Themes

Here are some core text mining themes (please see below for details) that are currently the focus in our TEAM:
- Text analytics and sentiment analysis: identification of subjective opinion and sentiment features from user-generated content (e.g. blog mining, tweets, etc.);
- Extracting negations, contrasts and contradictions: identification of utterances that are negated, or contrast or contradict some other expressions (both explicit and implicit);
- Concept mining and structuring: learning and identification of concepts and terminology from text, including their structuring (internal and external);
- Temporal text analytics: identification of temporal expressions and their scope in text;
- Integrated text and data mining: combining the results from different perspectives using various methods from machine learning;
- Text processing midleware for the Semantic Web: building an infrastructure to support building text mining solutions for the Semantic Web (identification of concepts, links, etc);
and these are preferred application areas:
- Biology and biomedicine (molecular interactions, cancer studies, characterisation of molecular events, etc.)
- Bioinformatics and computational biology (tools, services, resources, methods)
- Clinical medicine and health-care (clinical decision support, quality of life monitoring)
- E-science, e-commerce and e-government (e.g. monitoring, tracking, dissemination of information)
- Engineering (knowledge management)
You would typically ‘select’ a topic that consist of a particular theme in a specific application area. I’d be also happy to consider proposals in the areas of multi-lingual text mining and NLP for Serbian.
Application steps

You will be expected to have passion for text processing, in addition to an excellent first degree in computer science or related area. Some experience in natural language processing is very useful, whereas very good programming experience (in a combination of programming languages) is a must. If you belive you’ve got all these, send an email to Goran Nenadic (see below) with a full CV and a brief note as why you would like to do PhD in our TEAM. Please allow some time for us to reply. Contact email: G.Nenadic@manchester.ac.uk.
Funding

PhD studies are between 3 and 4 years, typically closer to 4 than to 3 years. There is only one route for securing funding: the candidate needs to be outstanding. There are 3 possible sources of funding:
- specific, pre-defined projects (NONE CURRENTLY),
- funding from the School of Computer Science (see here for details) and
- external funding (private, external bodies – e.g. foreign governments, etc).
Environment

The School of Computer Science is one of the leading Schools in the UK reknown for the excellence of its research. The world’s first computer with internal memory was build in the School and Alan Turing has laid the foundations of Computer Science and Artificial intelligence while in Manchester. The international reputation of our research reflects on its high ranking in the last national Research Assessment Exercise (RAE), which places the School among the best five Computer Science departments in the UK and top in England for research power. The School has a vibrant research environment with more than 150 PhD students, 90 research staff and 70 academic staff.

Our research TEAM is part of the Text Mining/NLP research group, which hosts the UK National Centre for Text Mining. We are also affiliated to the Manchester Interdisciplinary BioCentre. The team is vibrant, diverse and very much international.

The post Information for prospective postgraduate students appeared first on gnTEAM.

Training

admin — Mon, 22 Jun 2015 12:04:38 +0000

gnTEAM provides traninig in topics related to text mining for undergraduate (BSc final year projects) and postgraudate students (MSc, MPhil, PhD and EngD projects).
Final year undergraduate and MSc projects associated with the team are announced annually as part of the School of Computer Science taught programmes.

The current research post-graduate themes include:

Integrated and Contrastive Text and Data Mining
Text Analytics and Blog/Forum Sentiment Analysis
Extracting negations, contrasts and contradiction from biomedical literature
Clinical text mining
Text mining in engineering

More specific post-graduate information is available here. For PhD funding opportunities see CDT in Computer Science.

Selected completed student projects

Student Name	Project Title	Year
E. Hein	EDViC: a web application to visualise and explore epidemiological literature (BSc project)	2013
T. Patel	Analysing Twitter Posts to Discover and Review New Software Tools (BSc project)	2012
B. Dumitru	Mining twitter data to gather information about pharmaceutical drugs (BSc project)	2012
I. Townend	Mapping of Clinical Data between Heterogeneous Terminologies and Classifications (MSc project)	2011
S. Asif	An Analysis of Financial Blogs and Forums (MSc project)	2010
A. Dehghan	A Rule-based Approach to External Context Extraction from Biomedical Literature: URL and Role Extraction (MSc project)	2010
A. Tsoutsoumpi	A question answering system from FAQ pages (MSc project)	2010
D. Yang	Extending Areca with Remote Backup Features (BSc project)	2010
S. Latif	Automatic Summarisation As Pre-Processing For Document Clustering (PhD project)	2010
M. Greenwood	Prioritising links for Topic-focused Web Crawling using Lexical and Terminological Profiling (MPhil project)	2009
H. Afzal	A Literature-Based Framework for Semantic Descriptions of E-Science Resources (PhD project)	2009

The post Training appeared first on gnTEAM.

LINNAEUS: A species name identification system for biomedical literature

admin — Tue, 10 Nov 2015 11:46:18 +0000

The post LINNAEUS: A species name identification system for biomedical literature appeared first on gnTEAM.

gnTEAM » Search Results » “Data Mining”

Temporal expression extraction with extensive feature type selection and a posteriori label adjustment

Linked2Safety – a next-generation, secure linked-data medical information space for semantically-interconnecting electronic health records and clinical trials systems

Integration of text and data mining in life sciences

Prof John Keane

Information for prospective postgraduate students

General information

Themes

Application steps

Funding

Environment

Training

Selected completed student projects

LINNAEUS: A species name identification system for biomedical literature