By Daniel Hubbard | August 26, 2012
Today I thought I would write about what is clearly the single most exciting topic in all of genealogy. What would that be? What have I written so much about?
That’s right, checklists and research logs! What, I ask, could possibly be more exciting than those?
Either I’ve gone hopelessly insane or I’m not being completely serious. You may keep your opinions on that to yourself.
A couple very crude definitions should come first. A research checklist is designed to point out resources and allow you to put a check mark in front of what you have tried and see what the things are that you have not tried. A research log lets you record what resources you have checked, what you found and what you checked without finding anything.
A checklist is, at least partially, a forward looking thing. It gives you ideas for what you might try next. As you use it, a checklist will also remind you of what you have already examined so that you don’t needlessly repeat yourself.
A research log records what you have done whether it was successful or not. If you keep track of where your information originated, your source citations will give you an idea of where you succeeded, but what about the searches that proved to be fruitless? It is important to record that you tried and a log is a good place for that. It tracks what you have tried whether it succeeded or failed and gives equal weight to each. It lacks the bias toward what worked that other ways of tracking often contain. That said, I don’t find either of them to be perfect.
Why not lists and logs?
I started the last two paragraphs in passive voice but of course, a checklist does not “give” you ideas. You need to actively use it. A log does not grab a pen and “record,” you do. They are something you need to use and manage consistently. If you only update them sporadically, you can’t trust them.
You need to remember them, add to them and read information back out of them. The more stuff you put into them the more useful they look but the less useful they might actually be if they become too bloated.
If they aren’t sufficiently manageable, they become a perfect wellspring of that bane of research, procrastination. If you think you’ll make entries “later,” that usually turns into “never” faster than great-aunt Millie could lie about her age.
A problematic consideration comes when you try to decide what level the logs should be kept. Person? Family? Surname? Place? Time Period? Some combination of those? The fewer and more general the logs, the easier it is to make an entry and the more impossible it becomes to find anything. “My Big Everything I’ve Ever Done” research log quickly becomes a morass that is, for all practical purposes, what computer scientists jokingly call “WOM,” “write-only memory,” you write the information into it but have neither the will nor a way to get it out. Slice and dice your logs too much and they become inconsistent and impossible to manage or use for an overview.
Checklists have their own problems. There are two types of checklist, or at least there should be and sometimes that goes unrealized. One type is the comprehensive variety. It is a list of things that in general might be checked for a certain type of person or problem. Then there is the specific checklist set up for a specific situation. Never substitute the first type for the second. Even if there are convenient check boxes next to the items, don’t check them. Tape it to your wall and use it to inspire you to write a checklist that is actually relevant. A checklist that is so comprehensive that you can’t find the right place to make an entry and can’t find the entry later either, is useless for work.
Another way to look at it is that a checklist so comprehensive that you’re constantly “reminded” to look for totally irrelevant things is distracting you as much as it is helping you. Think the “hits” in the 1920 census that you get when looking for someone who died in 1905 are annoying? Try a comprehensive checklist. Use a comprehensive checklist as an inspiration for creating, and later perhaps adding to, a checklist tailored to your situation and you will be much happier and have a much better overview of what is done and what remains to try.
Raw, or Perhaps Parboiled, Excitement
Disclaimer: I have a strong bias against paper. It is good for a lot of things but not managing large amounts of data. Watch a few science documentaries about some data intense field like genetics or astronomy and compare the number of times you see a computer in the background recording data and the number of times you see a room full of monks in the background copying gene sequences or star properties into parchment tomes. I think there will be a clear winner. If the data is something a computer can read, it can also be processed and found in ways that would be difficult or impractical for a human to do in their pajamas at midnight when that nagging thought pops up. I couldn’t do what I’m trying to do, if I used paper.
Over the last few weeks, I’ve been rethinking the way I track my work, putting together in a better way some thoughts and ways of working that I’ve been using for years. Hence the above named excitement.
One difficulty I see in general, beyond my thoughts on lists and logs, is that a principle of data handling is typically being violated. That principle says to only record information once. That doesn’t mean don’t back up. The principle originated from two problems that are guaranteed to appear when data is officially entered in multiple formats, in multiple places-
- The first is that it will cost more time. The corollary to that is that if it feels like time is wasted and a corner could be cut, it is human nature to cut it. Soon you won’t know where to look for the information or if it even exists.
- The other thing that happens is that you record something in all the multiple places your system requires but then find a mistake. You fix the mistake “here” but forget to fix it “there” and of course, Murphy’s Law states that when you need the information later you will look at the uncorrected version. People often overlook that information isn’t static, it needs to be maintained and managed. The more times the same things are recorded the more time and effort that information costs.
- A third thing that sometimes goes wrong is that formats become inconsistent. In this case that means that one tidbit of information about a source will be in one place but not in the others. Even though sources are recorded in multiple places some information about the sources are required in this form but other pieces are required in that form and to get what you need you end up needing to look at the original, and your notes, and the log and in your database and…
I’ve started designing a set of combined checklists and logs. They are checklists in the sense that for each individual I can see what I have decided I should try to locate and what I have found. They are logs in the sense that I don’t just put a check mark in a box. If I found nothing, I put in a note about where I looked, when I looked and how I tried to find it. If I succeeded, I enter a link to the digitized source, nothing else. I can look at these lists and see what I need to see about what I intend to do, what I have tried and what I found if the search gave a positive result. Some programs I’ve written can process them and present them in different ways. The necessary record of the search is in one and only one place.
Technically, a log should be in chronological order. I’ve made the conscious decision not to record things in that order. Honestly, I can’t think of the last time that I needed to know exactly in what order I did absolutely everything in a project. On paper, chronological order is the natural thing to do. Whatever it is, it goes on the next line. Simple to make entries and compact. In a computer file that can be updated to include any information in any logical location, is chronological order really sensible? If I want to know if I had found a will with a list of children before I searched a set of baptismal records, I can reconstruct it from timestamps in my logs and sources and do it with the click of a button if I write a little software.
Each digitized source I embed in an archive file that also contains the source citation, formatted in a way that a computer can process it. It also contains any notes I have about the condition of the source, any transcription I needed to make. Whatever I need to record about the source is actually with the source not lurking in several papers here and computer files there. The source and everything about the source in one place.
How this will work and evolve only time will tell. I haven’t managed to stamp out all the duplicated work in really keeping track but it feels like steps in the right direction.Twitter It!