|
PatternPattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks). The module is bundled with 30+ example scripts. Modules included:pattern.web, pattern.table, pattern.en, pattern.search, pattern.vector, pattern.graph. pattern.graph The pattern.graph module offers a way to represent and analyze networks of linked data. It can be used to (for example) model semantic relationships between words. It comes bundled with a Javascript generator to create an aesthetically pleasing visualization of a network in a web page. pattern.web The pattern.web module bundles robust tools for online data mining: asynchronous requests, a uniform API for various web services (Google, Bing, Yahoo, Twitter, Wikipedia, Flickr, RSS, Atom), a HTML DOM parser, HTML tag stripping functions, web crawler, webmail, caching mechanisms, Unicode support. pattern.table The pattern.table module offers a convenient way to work with tabular data. It can be used to store and analyze data retrieved with the pattern.web module in a uniform way, i.e. as a Unicode CSV file – instead of relying on custom text files. pattern.en The pattern.en module contains a fast, regular expressions-based shallow parser (identifies nouns, adjectives, verbs, etc. in a sentence), a WordNet interface and tools for verb conjugation and noun singularization & pluralization. pattern.search The pattern.search module offers a pattern matching system similar to regular expressions, that can be used to search a string syntactically (word function), or semantically (word meaning) using a taxonomy. pattern.vector The pattern.vector module contains tools to count the words in a document (e.g. a paragraph, a web page) and compute tf-idf, cosine similarity and latent semantic analysis to discover document keywords, compare similar documents, or search documents based on a keyword query. ![]() |