Document clustering and summarisation

Document clustering is a generic problem with wide spread applications within Natural Language engineering. Present research focuses on using text summarization techniques as a pre-processing step for document clustering in the context of automated assessment of student essays. One of the major problems in natural language processing is that a document can contain a very large number of words. If each of these words is represented as a vector coordinate, the number of dimensions would be too high for the document clustering algorithm. Hence, it is crucial to apply pre-processing methods (such as summarisation) that reduce the number of dimensions (words) to be given to the document clustering algorithm, but to keep both the information and quality of what has been presented in original documents.

People involved

Dr Seemab Latif (PhD student)
Prof Goran Nenadic (co-supervisor)

gnTEAM

Text extraction, analytics, mining

Document clustering and summarisation

Document clustering and summarisation

People involved