Text Clustering Based on Background Knowledge

Publication Details
Stumme, G.; Hotho, A.
Publication year:
Volume number:

Text document clustering plays an important role in providing intuitivenavigation and browsing mechanisms by organizing large amounts of informationinto a small number of meaningful clusters. Standard partitional or agglomerativeclustering methods efficiently compute results to this end.However, the bag of words representation used for these clustering methods is oftenunsatisfactory as it ignores relationships between important terms that do notco-occur literally. Also, it is mostly left to the user to find out why a particular partitioninghas been achieved, because it is only specified extensionally. In order todeal with the two problems, we integrate background knowledge into the process ofclustering text documents.First, we preprocess the texts, enriching their representations by background knowledgeprovided in a core ontology — in our application Wordnet. Then, we clusterthe documents by a partitional algorithm. Our experimental evaluation on Reutersnewsfeeds compares clustering results with pre-categori

analysis, background, clustering, concept, fca, formal, knowledge, ontologies, semantic, text, web

Last updated on 2019-25-07 at 18:06