One Document to Rule Them All

By Daniel Hubbard | March 15, 2015

The title of this post you might recognize as being stolen from J. R. R. Tolkien’s The Fellowship of the Ring but the subject matter is actually stolen from biology.

Scientifically describing a new species is an exacting endeavor. Ideally, one has an entire specimen of the new organism that can be used for study and comparison to other, similar organisms. That specimen is referred to as a “type specimen,” specifically as the holotype-

the single specimen designated by an author as the type of a species or lesser taxon at the time of establishing the group (Merriam-Webster)

That definition might be a mouthful but the idea is pretty clear. The discoverer of a new species puts forth a specimen that others can use for study and comparison to other organisms. Going back to Tolkien, it is one specimen to rule them all.

More and more I find myself applying a similar concept to genealogical research. As one researches, one finds new individuals who are the parents, siblings, cousins, neighbors, in-laws, friends, and associates of the people you had been researching. Some will be interesting to you and may help you solve a genealogical problem. That all looks great until you try to follow those people. Sometimes the records will be sparse and the names common. It can be tempting to conclude that the few hints you have involving the correct name are all the correct person. Sometimes there might be plenty of records but the people lived in a city and finding them might feel like finding the proverbial needle in a haystack. It can feel overwhelming with many records that might relate to the person you want. It is easy to feel as if your research is simply drifting aimlessly.

For awhile now, I’ve been picking my own “holotypes,” not “type specimens” like a biologist might for species, but  “type documents” for individuals. The document becomes that “one document to rule them all.” It can be whatever document I have that connects the new individual to my research problem, or it might be a document that gives information about a person I only suspect of being related to my problem. In either case, from then on, I think of it as defining that individual. For example, If I am researching John Doe, he is not just anyone with that name. He is the John Doe listed as a witness in a specific marriage document, or who lived at a specific address and had a specific occupation in a city directory.  From then on I ask the question “Is the document I am now looking at clearly giving me information about the same individual as the individual in that holotype document?” Is there a link, perhaps via other documents, to my one holotype document? If I can’t make the link, I can’t say that the individual in that new document is definitely a new person, but it does mean that I can’t simply accept that it is the same person and use that information.

This way of looking at things means that you always have a definition of that individual to go back to when in doubt. Other documents will add information but only if they are clearly linkable to that one original document that defines the individual. Either a new document you are looking at can be linked to that definition, or it can’t. If it can’t, set it aside. It might be about the right person, but it doesn’t meet the standard without further documentation that can provide that link. Tossing in a document that doesn’t pass muster, just because “it looks pretty good,” runs the risk of allowing the research you’re performing to lump several people together into one individual.

Sometimes it might be that the individual that emerges from linking documents to that one starting-point document turns out not to be the person that I was hoping, but I still have a clear reconstruction of an individual, not a fuzzy situation that I need to keep wondering and worrying about. Having that one holotype document to refer to keeps the research anchored—the one document to rule them all.

Information and Connection

By Daniel Hubbard | March 8, 2015

A few weeks ago my daughter’s biology teacher asked if I could give my presentation on DNA for her honors classes. It required putting a bit of a different spin in things and it got me thinking. The talk is meant to give genealogists a basic understanding of DNA so that they have the background information needed to learn more. With the students, it was almost the reverse. They had been studying DNA for weeks but probably knew nothing about genealogy.

With fellow genealogist, I point out that when we use documents, we need to go out and find them, then ask ourselves if the document is relevant, or if maybe it is only about a person who is somehow similar to the person we want. Once we’re reasonably sure that we have a relevant document, we can extract names, dates, and places from it. The students needed to understand that just because a genealogist has found documents, doesn’t mean they know how the documents fit together. Documents are often rich in personal information but poor when it comes to connections. (Not so surprisingly, I got questions about what kinds of documents genealogists use.)

DNA  complements those documents. At least when first starting out with DNA testing, we don’t need to go looking for DNA, it is something that we already have. We don’t need to ask if it is relevant, it is ours. What about information? DNA can’t tell us an ancestor’s name. It doesn’t come with a date. DNA doesn’t have a village name encoded into it. DNA doesn’t carry any of that information. Documents can be information rich but connection poor. DNA is all about the connections.

The Forgotten Plague

By Daniel Hubbard | March 1, 2015

By the dawn of the nineteenth century, the disease had killed one in seven of all people that had ever lived.

That is a statement that will catch one’s attention. I heard it at the opening of a recent edition of American Experience. The disease in question is tuberculosis, or, as we often read in genealogical records, phthisis, a Greek word translated into English as consumption—so called because the victim seemed to be consumed from within. I’ve seen it as the cause in many a death record. In fact, at its worst in the nineteenth century, it seems to have been the cause of death for 1 in 4 Europeans.

I grew up in a household where medical discussions were common. My mother was a nurse and every Sunday we traveled a few blocks to have dinner with her parents, and Grandma was a retired doctor. So when I called my folks the other day and my mom answered it was natural to mention watching a show about TB and asked if she’d seen it. Yes, she’d watched but my dad had refused. As far as I know, my dad has only refused to watch a documentary once before. He’d refused to watch Ken Burn’s The War, saying that he had lived through those years and didn’t feel like reliving them, so I immediately understood why he had refused to watch this documentary.

When my dad was a baby, his grandmother had moved in with the family when she and her second husband had become too ill to care for each other. I’ve heard stories about cloths over mouths and how his mother had boiled the dishes every night to try to prevent the infection from spreading. One of my aunts took her grandmother for nightly walks to get some fresh air when she was unlikely to encounter healthy people. She died of TB at home when my dad was two.

Years later, my father developed tuberculosis and began a long stay in a sanatorium. The regime at a TB sanatorium was constant bed rest, fresh air no matter the weather and enough food to prevent the wasting away that gave consumption its name. My father has told me several times about how while he was at the sanatorium, they received their first supply of streptomycin, the miracle cure that meant that people with tuberculosis could be cured. He also told me how his initial excitement over the prospect of getting well and leaving the sanatorium changed when it became clear that streptomycin made him so horribly sick that tuberculosis was preferable. Instead of being one of the first to be cured of TB with antibiotics, he was one of the last to recover from tuberculosis the nineteenth century way, month after month of forced rest and fresh air in a sanatorium. So that is a bit of my personal past, brought to mind by my father’s refusal to watch a documentary.

The Genealogists’ Alphabet, part D

By Daniel Hubbard | February 25, 2015

Sometimes the past doesn’t need to be so distant to seem far away. Cleaning out things that the kids have outgrown turned up one of those typical alphabet books that are for kids that can’t yet read. The kind of book whose genealogist version might start—

A is for aunt, who got you interested in family history.

B is for book, which explained a family mystery.

So what might an alphabet book for genealogists might look like? I’ve already taken a stab at “A,”  “B,” and “C.” So, for genealogists, what might “D” be for?

D is for Documentation

“D” could be for “documentation,” which is obvious I hope, yet if you are just starting out, it might not be. Genealogy isn’t based on “I got an email that told me that…” or “I found a tree that shows that John Doe was his father.” Genealogy is based on documents and what wonderful things they can be.

“D” could be for “deed,” that specific type of document that records ownership. If an older man deeds land to a younger man with the same surname for $1 or some other tiny sum, he may not be saying “This is my son” but he is coming very, very close to that statement.

“D” could be for “date.” Genealogists need to do more than collect names and dates but we can’t do without them either. Dates allow us to place in time the events in our ancestors lives. Almost mysteriously, as we go back in time, our ability to easily understand what those dates really mean decreases. The longer ago something occurred, the harder it can be to understand the date a document bears.

“D” could be for “DNA,” the stuff of our genes and the genetic markers, which label our cells as descendants of the cells of our forebears. Only documents come with names, dates, and places but only DNA comes unambiguously from our personal past.

“D” could be for “digitization,” probably a bigger revolution in genealogy than even DNA testing. The ability to access documents, without physically traveling to view them, or having copies made and physically sent, has changed research almost immeasurably. Documents are preserved and available in a way that they never have been before.

“D” could be for “diaspora,” those great dispersions of people that, from time to time, alter our world and shape our family trees.

Those are all fine words, but, if we look deep below the lives we reconstruct, below the documents and DNA, down to the atoms with which we build, we find “data” and we find that “d” must  be for  “data.”

Rootstech & FGS

By Daniel Hubbard | February 16, 2015

I’m just back from working three days in the ArchivDigital booth at the combined Rootstech / FGS conference in Salt Lake City. Billed as likely to be the biggest genealogy conference ever held, the attendance didn’t disappoint. Already on the first day, one could hear rumors about the attendance in hotel elevators. At first people were in disbelief that there were 15,000 people. Then the rumored number rose to 20,000. The highest number I heard while moving between floors was 25,000. We all wondered if that could really be true. Part way through, I did hear an official figure that was just a hair below 22,000 and that walk-in registrations would be increasing the number.*

A tiny fraction of the exhibit hall taken just before the doors opened on Friday.

A tiny fraction of the exhibit hall taken just before the doors opened on Friday.

At the booth, it was a fun three days of answering people’s Swedish genealogy questions. They ranged from “Which one is Switzerland and which is Sweden?” to “How do I get started?” to “I’ve traced my ancestor to this point, what should I do next?” With a crowd as big as this one, there were plenty of people who wanted to know more, and with nearly 200 vendors, there was something for everyone.

On the flight back to Chicago, I sat next to a man who spent a good deal of the flight editing a single photo on his phone. Every so often when he made some change that I could see out of the corner of my eye, the reflex to see what had happened tore me from my book. The photo was nothing that that he had taken with his phone. It must have been scanned from a photo taken perhaps in the 1980s. I noticed at one point that it was a part of a much larger picture. The screen was zoomed in on a single face. It was zoomed so much that it was grainy and the slight imperfection in the original focus was obvious. One sign of the photo’s age was that it was quite yellow and I think he was trying to correct the color. He worked and worked at it and there is only so much you can do editing pictures on a phone. Clearly, there was something deeper going on. There was a story behind that photo. There was only so much that could be done with that photo and he must have done everything many times over. It wasn’t the photo that he was working on so much as the story. A story that he knew and that I could only guess. A family story was there in that photo and that is part of what genealogy is all about. He reminded me of that, even though I was the one returning from the biggest family history conference of all time.

*When rumors about the attendance started to spread in the elevators and hallways, I wondered who would be the first to quip something along the lines of “Man, the Utah State Thruway is closed,” but perhaps that Woodstock reference occurred only to me.

How Odd

By Daniel Hubbard | February 8, 2015

Sometimes it can be good to look at lots of data even when you only want to understand just a little. If you look only at the 55-year-old widower and his 20-year-old son in the 1841 census of the UK, you wouldn’t think twice about the ages. Look beyond the family that interests you and you would notice that the ages reported for adults usually end in a “5” or a “0.”  That should seem strange, but it was what the enumerators were told to do. If you didn’t know that, it would have paid to notice the pattern in the census itself. I’ve notice that pattern to reported ages in other places as well. Years ago, I created a decent sized spreadsheet to show that the age that would seem to suggest that a man was not my ancestor, was actually consistent with him being the right man, given the way that other ages were reported. People seem to like to round to the nearest multiple of 5. At some times and in some places that tendency to simply be in the right ballpark seems strong. Only by looking at a lot of data can one tell.

The other day I was looking at immigration years in the U.S. Federal census. That information is known to be less than reliable, but this seemed strange. The years seemed to be almost all even. Some years ended in “5,” a few ended in “1”, “3”, “7”, or “9” but many more were even. That tells you something about the accuracy of the information. Numbers of immigrants that should have smoothly changed from year to year, sometimes increasing over the years, sometimes decreasing, instead went up and down every other year. They showed a tendency to be even, and that is, in fact, odd. Why would that be? People don’t report what is true, they report what they remember. We hope that those two things are the same, but we need to acknowledge that they often differ. When someone is asked about the year, several decades earlier, when their spouse immigrated, can we blame them if some bias toward easy to remember numbers creeps in?

History Super Bowl

By Daniel Hubbard | February 1, 2015

Earlier this week I listened to a podcast about Thucydides. Another one of those names that isn’t going to actually appear in anyone’s family tree (he died about 2400 years ago), but what was said about him, and his older contemporary Herodotus, got me thinking about genealogy anyway.

As founders of history we can see in them some of the principles we ought to follow. Herodotus has actually been known as the “Father of History” for over two thousand years. If Herodotus was the first historian, then Thucydides, only about twenty years younger, was the second. They even knew, or at least knew of, each other, yet the two of them are very different.  On Super Bowl Sunday,* I’m tempted to pit the two of them against each other.

The Big Game

Thucydides commented that it was difficult work to extract the truth from the stories that people told. He took the evidence he could extract, distilled it down to what seemed to him to have been the truth and related that narrative. Sounds like what we should do as family historians until one realizes that he didn’t let us know what his sources were. No source citations is not good practice. Also, in giving us his distillation, we don’t get a view into the contradictions that he resolved. We have no way to know if his resolution was a good one or if some of the evidence that he did not use might have actually been enlightening.

Herodotus told some pretty crazy sounding stories. Not a best practice in genealogy. Yet, he didn’t see it as his duty to believe the stories he related, but he did see it as his duty to preserve them. That actually sounds quite a bit better. I often tell people that just because a family story has been disproved doesn’t mean it shouldn’t be preserved. The belief in that story is, in and of itself, a part of the family’s history. Even family stories that are “wrong” can have a kernel of truth that is a clue to something not yet discovered. Throw away the story and you throw away the clue as well. Because he told those stories, we at least have an idea of where Herodotus found his information.

Thucydides saw history as an effort that concerned itself with politics and armed conflict between real people, not strange tales of quarreling, intervening gods. Sticking to likely explanations and steering clear of flights of fancy sounds like good advice for a genealogist. Touchdown Thucydides.

Herodotus’s history wasn’t immune from the effects of meddling supernatural forces but he also saw history in much broader terms and included geography and ethnology in his writings. Those are things we must often include to make our ancestors’ lives understandable. Touchdown Herodotus.

So who should inspire our genealogical inquiries? Both and neither, I would have to say. Both their styles have things in them we should emulate and things we should avoid.

* If you happen to be reading this post on your phone during the game, remember that it is a ten yard penalty to try to pronounce Thucydides with a mouth full of chips and salsa. That is the people in front of you will insist you spend the rest of the evening at least ten yards back from them.


Going, Going…Saved

By Daniel Hubbard | January 25, 2015

The Web has a problem and virtually anyone who has spent time online on two different days has run into it. It’s a problem that is big enough to have recently been the topic of an article in the New Yorker. The problem goes by many names but my favorite is the most evocative—”link rot.” Link rot is the tendency for links in web pages to stop pointing to what they were intended to. Over time the links on a web page will start to point to pages that have been moved, removed, or rewritten to no longer contain the information that was the target of the original link. Unless you are reading this post not long after I wrote it, don’t be surprised if my link to the New Yorker article doesn’t work for you. Sometimes whole websites die and the information that they held goes with them. Other times the site is still out there, just moved so that the links to it no longer work. Sometimes it is just an individual page that goes away.

Citation Rot

It is no surprise that a great deal of genealogy happens on the web. How do you reference what you find there? If you make a note “found this on Ancestry” you certainly have a problem with a lack of precision but there is another problem. No one expects Ancestry to disappear tomorrow but what will that reference mean to a descendant a century from now? I’m sure there are already genealogists who would not know what “found this on” means. At least the problem is only a matter of a change of names (to Fold3). What about that really interesting page of information that you found years ago on GeoCities? Even if you saved a perfect source citation, how do you look at it now, when GeoCities is long dead? Link rot isn’t just something that happens to websites. It is something that happens to our citations as well.

Our citations are frozen in time, they represent where we found something at the time that we found it. That is as it should be. They are a record of what we did to find our information. On the other hand, they should also lead us back to that information later, and when pages change and disappear, that no longer works.

What do you do if you go back to look at a page only to find that it has changed or been deleted? The first thing to try is the Wayback Machine, the subject of the New Yorker article that got me thinking about this topic. It crawls the web archiving what it finds. It may not have saved what you want but it is the best place to look. Looking at the Wayback Machine’s crawl calendar for this site, I see that it was archived for the first time on October 9, 2009, just a few weeks after my first post. Here is the archived version of that post, Boltzmann’s Grave, from August 1 as it was when archived that October. Part of that post was about entropy, a term from physics that involves moving from order to disorder, like moving from a well crafted citation to one suffering from citation rot. Link rot is the web’s version of entropy.

Save Your Sources

The Wayback Machine can be a lifesaver for a genealogist with an old web link, but better still is to save the page itself. The information won’t always be were you found it. It may not be there tomorrow. The Wayback Machine might not have archived it. If you captured it, you’re saved. Often it is enough to simply save a document image from a site that shows a scanned image. Saving whole web pages can be tricky, but sometimes it is necessary. Many pages rely on information from somewhere else in the web and if your saved page contains a link that it uses to fetch information, you still can find yourself with a link rot problem. When I find a bit of information on a web page, I save it as a pdf file. It preserves what I saw as it was when I found it. Link rot can set in just moments later and I still have the page.

Finding Your Way Home

By Daniel Hubbard | January 18, 2015

Often in genealogy we are trying to “find our way home” in a rather poetic sense. It is the broad where-are-my-roots sense. “Home” is our ancestors’ names, ethnicities, religions, and cultures. In short “home” is everything that went into making our ancestors who they were and, in turn, might have contributed something into making us who we are.

Sometimes, though, that “home” we want to find is a much more concrete and specific thing. It  really is the house where an ancestor once lived, the shop where an ancestor once sold cloth, or the land a family once farmed. It is a place that can be found on a map and visited. Standing on that spot, seeing that building, or picking up that soil, is also poetic.

A few weeks ago, I was asked about finding the site of a family farm in Sweden. The family owned it in the 1870s. Would it be possible to find the precise location so that it could be visited? Sometimes it is. I had already found the family in parish records for the 1870s and those told me the village. Land reform came to the area in the 1840s. The map produced for the reform gave me the boundaries of the village lands and the boundaries of the land holdings within the village but the parish records gave me no idea which of the many holdings was the one that was “home” in this case. The land reform protocol was over one hundred pages long but never mentioned the farmer I was looking for. It was written before he owned the land. The reform protocol was not going to help.

But the protocol did help. I found my farmer’s probate. It was very long for a Swedish probate record and part of the reason was that this man owned quite a bit of land beyond the land that he farmed himself. Each of his holdings was mentioned in the probate record and in order to describe the land, the name of the previous owner was given. Eventually, the name of that previous owner turned up in the protocol alongside the label used to indicate his land on the map. Overlaying, scaling and rotating the reform map on a modern satellite image gave the exact outline of the land in relation to modern roads. The land is still being farmed. The field boundaries on the map are the same ones that a satellite sees today. “Home” had been found.


Delete Your Database

By Daniel Hubbard | January 11, 2015

No, not really. Don’t actually delete your database. Think about it instead. Scientists call this a thought experiment. Go ahead, put some effort into it and really imagine it. Imagine that it is gone. Now what?

A few thoughts come to mind.

The first thought might be about backups. If you don’t have one, then you should be imagining the wailing and gnashing of teeth that would be happening if you deleted your database.
If you have a backup, can you imagine using it or are you now realizing that you have no idea what to do with it?

What if we imagine that you have no backup. Do you have copies of the documents you used? Are they organized well enough that you could go though them and reconstruct your database?

What if you didn’t even have copies of your documents? How would you start over? Would you do exactly what you did before? I hope not and here is why.

  1. You’ve learned a few things since you started your genealogy. If you think you haven’t then think some more. There must be a few things that you’d do differently because you know more.
  2. New records are probably available. Do you know what they are? Have you already checked them or have you “finished” with some ancestors and never gone back to learn more about them? If you started over would you look at those records? I hope so. Why not look at them even if you don’t delete your database?
  3. Hopefully, you would make copies of the documents you find this time around. You’d organize them, and keep a record of where you found them. Very few of us have all of that for our earliest work. Maybe you should do those things for those early documents, even if you don’t delete your database.
  4. This time you’ll back up your data. Right?
  5. You’ll see things that you haven’t seen before, even if the information is exactly the same. You should write those things down, and you don’t need to delete your database to do that either ,but maybe it helps to imagine it.
