The following describes the INFOS project. This project was defined in 1993 and culiminated in 1996 in the form of the dissertation included on this page.
As networked systems have grown in size, the amount of data available to users has increased dramatically. The result is an information overload for the user. This project has investigated the uses of an intelligent information filtering system named INFOS (Intelligent News Filtering Organizational System) to reduce the user's search burden by automatically eliminating data predicted to be irrelevant. Unlike many news readers that require users to explicitly create a user profile to perform filtering, INFOS is capable of learning this profile automatically. These predictions are learned by adapting an internal user model that is based upon user interactions and collaborative actions of other users. The primary domain for the project is the filtering of Usenet news articles.
The filtering predictions are learned automatically based upon features taken from input data articles and collaborative features derived from other users. The actual filtering is performed via a hybrid technique that combines a keyword-based hill climbing method, the knowledge-based conceptual representation of WordNet, and partial parsing via index patterns. A hybrid system integrating all approaches combines the benefits of each while maintaining robustness and scalability. INFOS has been tested upon Usenet news articles and preliminary tests have been performed on WWW pages. Additional tests have been made incorporating genetic algorithms to diversify the search space, and neural networks to perform the classification.
These will be available pending free time (i.e. a long time).
The following datasets and knowledge sources were used in this project for evaluating and developing the classification algorithms.
Wordnet lexical database used to implement the CBR knowledge base.
Time Magazine collection. used for comparing tf-idf to index pattern and CBR hybrid methods.
UCD Life newsgroup articles. Article number prior to 200 constitute one thread; articles after 200 constitute a different thread, posted at a later date.
UCD Life Newsreading Results. Results for study on newsreading behavior for the UCD Life newsgroup articles. Files indicate articles read when browsed and when reading all articles. Number indicates article number, followed by "a" for accept, "r" for reject, "u" for unknown or ambivalent.
Comp.AI newsgroup articles used for initial classification experiments used in the genetic algorithm experiment.