By Daniel Hubbard | May 8, 2011
A few things related to bias and understanding data have recently hinted to me that it is time to return to Mathmagic Land. Bias is something that scientists and historians need to think about. It can be in the data, or it can appear in what we do with the data and how we think about the data. Bias can make the weird seem normal and the normal seem weird. It can make what should be uncertain look definite.
Was it so Strange?
The other day I was at a meeting and we were chatting. Someone brought up a seemingly unusual name that he was researching. It turned out that the name wasn’t so unusual. With only the name and a rough time period to go on, there were hundreds of matching people. That kind of problem can be nothing more than a matter of numbers. If you are looking for one person out of a few tens of millions, even unusual names have a chance to occur many times. On the other hand, the problem can be our concept of what is unusual may not correspond to what was unusual.
Just to have a few numbers to think about, I checked the Social Security Death Index for a name I suspected would be interesting. I checked for women named Hepzibah with that exact spelling. There was a grand total of nine. Only one of them was born after 1910 and that gives a clue. The name is unusual and is becoming even more unusual. Looking at information from the Social Security Administration, Hepzibah does not seem to have broken into the top thousand girls names even once since 1880 (when the top thousand girls names included Elzada, Eula and Edward(!?), each of which accounted for 0.0041% of girls born). The 1850 census shows ninety-eight women with the name Hepzibah. Almost all of them were from New England—another clue. Looking at pre 1850 marriages in Massachusetts shows 367 women. A name that in other places and times would make a woman very unique, was not unusual at all in a certain place and during a specific period. In my own family tree there are several Hepzibahs. All born between 1646 and 1740 and all of them born in Massachusetts. Odd? No, looking at a broad set of data shows it is nothing strange.
At a recent genealogy translation event when the flow of Swedish documents slowed down, I talked with the Russian translators. They commented about how over two different events at the same location but months apart they had worked on several documents from the same obscure area of Russia. The documents were brought independently by different people. Was that strange? It might seem like an odd coincidence but I suspect what they were seeing was the lasting echo of a chain migration. It may have gone something like this. One person from that area arrived and began writing letters back home. Soon a family member decided to make the same journey, then another and another. Then an unrelated family from the same village decided to emigrate and then some people in their family from a nearby village. The emigrants from that area were not randomly spread across the U.S. They headed for one specific place. They had a bias. One hundred or so years later, those people’s descendants have decided to get some documents investigated and a chain migration has become a chain translation.
What are the Odds
A brief article in Scientific American from earlier this year discussed some seemingly odd facts. For example, people who follow you on Twitter are likely to have more followers than you do and your friends are probably more popular than you are. That has nothing to do with you, it’s just how some biases make the math come out. Here is a genealogical example of how that works and how it impacts how we think about our past.
Imagine three brothers. The first brother has only one child. The second has three sons and three daughters for a total of six. The third brother marries young and has seven children by his first wife. He remarries when his first wife dies. He is still only in his thirties and they have another eight children before she too passes away. Finally, he marries a much younger woman and they have five more children. That brings his total offspring up to twenty. That is a large, but not unheard of, number.
If you descended from one of these three brothers, which one is the most likely to have a place in your pedigree? Clearly, the brother who had twenty children. We are all more likely to be descended from people who had relatively large numbers of offspring than we are to be descended from people who had relatively few offspring. That seems logical, and it is, but it leads to the conclusion that the number of offspring produced by our ancestors was above average. If that seems paradoxical, consider all the people who had no descendants. They bring the overall average down but clearly contribute nothing to your ancestry. So, perhaps those oddly large families we keep finding in our past are not so strange after all.
What do we do when we need to understand what was normal? We like to think in averages (technically arithmetic means). They are simple and we like to think that they give us that idea of normal. Too far from the average is unusual and from a genealogists point of view, perhaps usefully unusual. In other words, it could be a clue. But what tells us what to put into our calculation of the average, our way of thinking about normal? Hepzibah looks like a really unusual name if we look at all of America. If we look at Massachusetts at the right time, it was not so odd. There is also the question of how to think about the information we have. If we go back to the brothers and their families, the average number of children per brother in that example is nine but that has little to do with how the children grew up. Most of those children grew up with a father that would eventually have twenty children. In fact, in these brothers families a child’s average experience is of a family not of nine offspring but of over sixteen. Many more children experienced the biggest family than experienced the smaller families.
Usual or Unusual?
If I want to understand if something was unusual, I can compare my information to actual numbers. Then it will be important to think about what those numbers mean. If some data could somehow help me understand a research problem, if they could help divide the usual from the unusual, I would really need to be careful with those numbers. For example, there are certainly some biases in the numbers I have for the name Hepzibah. If I wanted to dive into the history of the name Hepzibah, I’d have to ask how names enter into the Social Security Administration’s statistics. I’d have to compare the number of Hepzibahs to the actual number of people in their data for any given year, not treat all years the same. I’d need compare the number of Hepzibahs in the 1850 census to the population. Though it was sufficient to learn that the name wasn’t always rare everywhere, in principle there is much to think about to get an in depth feeling for what those numbers mean.
When researching it is important to know what is unusual. A combination of facts might be so unique that it is almost certainly a single individual that is being described. That is, when one gets down to it, how we perform the grouping of records that leads to a reconstruction of a life. The records present us with a combination of information that would be very difficult to explain except as one individual. Nevertheless, making that judgement involves more than just the data from those records. It involves background knowledge of the time, the place, the ethnic group and many other things that make up what needs to be known in order to make that judgement. If you found an Ichabod Thorkington in your ancestry, you would probably be pretty excited. After all, an unusual name like that should be easy to handle. It is already so unusual that it will take less effort to make connections between records. Later though, you may find that he came from a county where every other household was a Thorkington household and there seems to have been a family tradition to name a son Ichabod. Suddenly, your ideas about what was unusual need to change. Whenever I find something that seems unusual in a useful sort of way, I find myself asking, “How can I get an idea if this really was unusual or not?” Often finding a record is the quick part of an investigation. Justifying the hypothesis that it is a relevant record can take some hard work and a clever idea or two.Twitter It!