From gnTEAM

Resources: Pattern

Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks). The module is bundled with 30+ example scripts.

Modules included:pattern.web, pattern.table, pattern.en, pattern.search, pattern.vector, pattern.graph.

pattern.graph The pattern.graph module offers a way to represent and analyze networks of linked data. It can be used to (for example) model semantic relationships between words. It comes bundled with a Javascript generator to create an aesthetically pleasing visualization of a network in a web page.

pattern.web The pattern.web module bundles robust tools for online data mining: asynchronous requests, a uniform API for various web services (Google, Bing, Yahoo, Twitter, Wikipedia, Flickr, RSS, Atom), a HTML DOM parser, HTML tag stripping functions, web crawler, webmail, caching mechanisms, Unicode support.

pattern.table The pattern.table module offers a convenient way to work with tabular data. It can be used to store and analyze data retrieved with the pattern.web module in a uniform way, i.e. as a Unicode CSV file – instead of relying on custom text files.

pattern.en The pattern.en module contains a fast, regular expressions-based shallow parser (identifies nouns, adjectives, verbs, etc. in a sentence), a WordNet interface and tools for verb conjugation and noun singularization & pluralization.

pattern.search The pattern.search module offers a pattern matching system similar to regular expressions, that can be used to search a string syntactically (word function), or semantically (word meaning) using a taxonomy.

pattern.vector The pattern.vector module contains tools to count the words in a document (e.g. a paragraph, a web page) and compute tf-idf, cosine similarity and latent semantic analysis to discover document keywords, compare similar documents, or search documents based on a keyword query.

Pattern
Retrieved from http://gnteam.cs.manchester.ac.uk/wiki/index.php?n=Resources.Pattern
Page last modified on April 26, 2011 at 17:09