SC Conference - Activity Details

Text Mining on a Grid Environment

Valeriana Gomes Roncero  (COPPE UFRJ)
Doctoral Research Showcase Session
Thursday,  03:45PM - 04:00PM
Room 17A/17B
One key dificulty with text classification learning algorithms is that they require many hand-labeled documents to learn accurately. In this study, we propose to use a combination of Expectation-Maximization (EM) and a naive Bayes classifier on a grid environment, this combination is based on a mixture of multinomials, which is commonly used in text classification. Naive Bayes is a probabilistic approach to inductive learning. It estimates the a posteriori probability that a document belongs to a class given the observed feature values of the documents, assuming independence of the features. The class with the maximum a posteriori probability is assigned to the document. Expectation-Maximization (EM) is a class of iterative algorithms for maximum likelihood or maximum a posteriori estimation in problems with unlabeled data. Text classification mining methods are time-consuming and utilizing the grid infrastructure can bring significant benefits.
   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C