By Daniel Hubbard | January 25, 2015
The Web has a problem and virtually anyone who has spent time online on two different days has run into it. It’s a problem that is big enough to have recently been the topic of an article in the New Yorker. The problem goes by many names but my favorite is the most evocative—”link rot.” Link rot is the tendency for links in web pages to stop pointing to what they were intended to. Over time the links on a web page will start to point to pages that have been moved, removed, or rewritten to no longer contain the information that was the target of the original link. Unless you are reading this post not long after I wrote it, don’t be surprised if my link to the New Yorker article doesn’t work for you. Sometimes whole websites die and the information that they held goes with them. Other times the site is still out there, just moved so that the links to it no longer work. Sometimes it is just an individual page that goes away.
It is no surprise that a great deal of genealogy happens on the web. How do you reference what you find there? If you make a note “found this on Ancestry” you certainly have a problem with a lack of precision but there is another problem. No one expects Ancestry to disappear tomorrow but what will that reference mean to a descendant a century from now? I’m sure there are already genealogists who would not know what “found this on Footnote.com” means. At least the Footnote.com problem is only a matter of a change of names (to Fold3). What about that really interesting page of information that you found years ago on GeoCities? Even if you saved a perfect source citation, how do you look at it now, when GeoCities is long dead? Link rot isn’t just something that happens to websites. It is something that happens to our citations as well.
Our citations are frozen in time, they represent where we found something at the time that we found it. That is as it should be. They are a record of what we did to find our information. On the other hand, they should also lead us back to that information later, and when pages change and disappear, that no longer works.
What do you do if you go back to look at a page only to find that it has changed or been deleted? The first thing to try is the Wayback Machine, the subject of the New Yorker article that got me thinking about this topic. It crawls the web archiving what it finds. It may not have saved what you want but it is the best place to look. Looking at the Wayback Machine’s crawl calendar for this site, I see that it was archived for the first time on October 9, 2009, just a few weeks after my first post. Here is the archived version of that post, Boltzmann’s Grave, from August 1 as it was when archived that October. Part of that post was about entropy, a term from physics that involves moving from order to disorder, like moving from a well crafted citation to one suffering from citation rot. Link rot is the web’s version of entropy.
Save Your Sources
The Wayback Machine can be a lifesaver for a genealogist with an old web link, but better still is to save the page itself. The information won’t always be were you found it. It may not be there tomorrow. The Wayback Machine might not have archived it. If you captured it, you’re saved. Often it is enough to simply save a document image from a site that shows a scanned image. Saving whole web pages can be tricky, but sometimes it is necessary. Many pages rely on information from somewhere else in the web and if your saved page contains a link that it uses to fetch information, you still can find yourself with a link rot problem. When I find a bit of information on a web page, I save it as a pdf file. It preserves what I saw as it was when I found it. Link rot can set in just moments later and I still have the page.Twitter It!