| Author: | Arun Abraham Ross |
| Advisor: | Dr. Charles Owen and Dr. Anil Jain |
| Email: | rossarun@cse.msu.edu; http://www.cse.msu.edu/~rossarun |
With the rapid proliferation of websites on the Internet over the past few years, it has become imperative for websites to enhance the quality of service that they provide in order to attract and sustain user traffic. Larger web sites are placing immense quantities of content on-line. The goal of such a site is to present that content to a user. The average user however, will only be interested in a limited subset of the available content. The emphasis then would be to develop tools that will help the user select that subset. Such a strategy warrants predicting a user's actions based on past user performance. This prediction can then be utilized to customize site navigation tools presented to the user. Hyperlink presentation orders can be modified and search engine results can be weighed. If a user finds it easier to locate contents in a site, the perceived experience is improved. This work focuses on clustering a site into groups of documents that are predictive of future user accesses. Two approaches have been developed and tested to a moderate extent on the France 98 World Cup (Soccer) site. The first approach uses semantic information inherent in the documents to facilitate the clustering process. User access history as recorded in server logs is then used to reorganize the clusters iteratively so as to better indicate access patterns. The second method is based on clustering the documents using trail information.