| Author: | Ryan McFall |
| Advisor: | Dr. Matt W. Mutka |
| Email: | mcfallry@cps.msu.edu; http://www.cps.msu.edu/~mcfallry |
Maintaining data integrity in an information system is critically important. Link integrity is one particular form of data integrity that is concerned with maintaining integrity between resources which refer to one another. Perhaps the most common example of link integrity problems we face in today's computing world is the broken link problem. This problem exists when a link is created in one HTML document to another; subsequently the referenced resource is moved or deleted, causing the reference to become invalid. As distributed information systems become more sophisticated, with more general types of referential linking capabilities, the problem of maintaining link integrity will continue to become more and more complex. More complex links will be broken by more than simply resource deletion or migration. Our research examines the types of integrity problems that such advanced distributed information systems will present. In particular, we focus on the capabilities of XML (Extensible Markup Language). XML is a derivative of SGML (Standard Generalized Markup Lange), and is becoming accepted as the de-facto standard in the evolution process of the World Wide Web. We have devised algorithms that are capable of efficiently detecting and repairing broken links. Additionally, we have implemented a prototype web proxy server utilizing these algorithms and demonstrating how link integrity can be maintained even when the referenced document is changed.