Separate and unequal

To say I am interested in digital preservation is an understatement, so when I see a post by Karen Schneider on her Free Range Librarian blog about an idea she had about a digital humanities preservation idea I was intrigued. I enjoy reading her blog and follow her on Twitter and I respect her opinion.

Her idea of a preservation plan for literary journals, named Bristlecone, has some positive aspects, but I think misses the mark on so many levels. The basic goal of preserving a last copy of these literary journals is a lofty one, although perhaps impractical on a basic level. As pointed out in her posting these literary journals are not collected widely even by academic libraries. Knowing which copies to withdraw and which to save won't solve the problem if libraries don't subscribe to the journals in the first place. Recent economic times are hitting library budgets hard and even though these literary journals are generally inexpensive I fear that they may still become casulties in a shrinking budget. Could Bristlecone serve as a clearinghouse to insure the physical copies? That would be a good idea. Could it be funded? Not sure about that.

I think it is more problematic to suggest a LOCKSS scenario to insure the preservation of the digital content. LOCKSS works great on discrete collections of digital objects. One big question, though, with this particular content is what exactly will be preserved? Would that be the pdf files used for printing these issues? Page images with accompanying OCR? Assorted files that collectively make up the web presence? Archival TIFF images of the pages? These issues could likely be worked out but the ongoing financial sustainability would be somewhat shakier. Who would commit to this and pay for it? Publishers of the journals who are already on shaky financial ground? Libraries who are being stretched to retain and preserve humanities content already? Not so sure about that either.

I think the bigger philosophical problem I have with this idea is that once again humanities content would be settling for a separate and certainly not equal solution for long term access and preservation. Digital humanities projects are some of the most innovative and meaningful uses of technology, but most of the funding available for developing cyberinfrastructure to support digital preservation is allocated primarily for the sciences (and to a lesser degree the social sciences). The money available for humanities is miniscule by comparison.

There are many innovative, destined for success models in the works that will make real strides toward developing systems to accomodate any type of digital object - be it a photograph, an audio clip, a newspaper page, an immense data set, or a literary journal. In order to be successful the solution must scale up and scale down, it must be able to detect bit loss and correct it, it must be able to use rules applied at the time of ingestion into the system to be able to know what the ultimate disposition of the data is.

Those of us with our hearts in the digital humanities must align our projects with the larger solutions that will ultimately receive the funding needed to be successful. There is no reason that literary journal content could not be saved along with a physics data set. It is all made up of bits. We just have to make sure it happens. I respectfully suggest that to have a separate system dedicated simply to literary journals does this content no favors in the long run.


When is a phobia not a phobia?

I was watching an episode of "Obsessed" on A & E online when I realized I really couldn't watch it. While it speaks to my inner OCD tendencies it was set a little too close to home for me. OK, the lady on the show doesn't like germs - I mean seriously doesn't like germs - but I am actually not that crazy about germs myself. I am a person with hand sanitizer in every vehicle, in my purse, on my desk (you get the picture?). Off went the show.

I was perusing a list of phobias on The Phobia List and was intrigued to say the least. For instance, Hippopotomonstrosesquipedaliophobia is the fear of long words. You would be afraid even to say the the name of your syndrome. How do you get help for that?

I have a friend who confessed to me as we were walking down the street once that she was afraid of clowns - Coulrophobia according to the phobia list - and as we walked along who turned the corner and came toward us but someone in a clown costume. Now I ask you, what are the chances of that!???

Some phobias would be easy enough to avoid, Auroraphobia- the fear of Northern lights or Consecotaleophobia the fear of chopsticks, for instance. Just don't go to Alaska or Chinese restaurants. Some would be a real downer, Euphobia, the fear of hearing good news, as an example. This would at the very least be a problem when people are sitting around telling jokes. "I've got some good news and some bad news." "No, no, only give me the bad news" kind of takes the punch out of a joke.

As a person prone to motion sickness I think that Aeronausiphobia, the fear of vomiting secondary to airsickness, is a legitimate concern. One held by many seat mates I have had over the years as well. As a mother I can advise young parents that Ephebiphobia, the fear of teenagers, is not to be underestimated.

When is a phobia not really a phobia? Lilapsophobia, the fear of tornadoes and hurricanes, seems pretty darned normal to me. It's all of those people without Lilapsophobia who are out there chasing big storms with their home video cameras or leaning into the surf in a raincoat on The Weather Channel. These are the people acting all smug about conquering Lilapsophobia and throwing it in our faces.

The mother of all phobias is Phobophobia, the fear of phobias. Well, that is something I need to ponder.


Mowing as an act of love

Note to self: NEVER again minimize Joe's contribution to household labor when it involves mowing.

I mowed the grass today for the first time in many years. This is a task I only take on when Joe is out of town for more than 2 weeks. Last year he didn't go to Ecuador and the year before I shamefully got my 79 year old father to do it for me. I never really thought that much about it. After all it is just pushing the mower around, right? How hard can it really be?

I think of all the times I complained when Joe mowed down the wild achillea, the clover, the violets, whatever.... OK, I am officially sorry.

My mowing experience began with starting the mower. It is hard to start. I admit I am kind of a girly girl and it takes some strength and persistence to get it going. The whole time I was trying I was thinking about when he tried to show me how to do it before he left. I stupidly said (yes actually said), "How hard can it be? I just pull the string, right?" He told me to hold the lever that pulls the throttle at the same time. "OK, I've got it."

Yeah, right. I held one of the two levers and pulled. Nothing. Repeated several times. Still nothing. Is that the lever to the throttle? What is a throttle anyway? Maybe I have to hold the other lever at the same time? So I held both. It finally started after several pulls. The problem is that the 2nd lever is the one that makes it go - self-propelling it turns out. Whoa!!!! Mower and me going - fast. Sheesh.

I completed the task. But achillea, violets, clover are completely mowed down. Mowing along I could see them coming but avoiding mowing over my precious plants was next to impossible. I now fully appreciate the patch of clover by my beehives that Joe carefully avoided mowing.

No wonder he has a plan for replacing all grass with ground cover and herbs. Great idea! I am fully 150% on board. I vote for clover and other short bee friendly plants.

I think that couples in committed long term relationships should be required to do the other person's tasks at least once or twice. And then shut up. Would make for smoother sailing. Next week is our 34th wedding anniversary, so a special note to the love of my life....I love you, too.


It's all about discovery

These are some of my thoughts that are the result of reflecting on a presentation by Dan Clancy, engineering director @ Google: Google Book Search Project: Present Status and Next Steps for the Google Book Search Project. Presented at Archiving 2009, May 2009.

Some impressive statistics were revealed at the beginning of this talk – I think these numbers should cause the library world to sit up and take notice (if they haven't done so already). Of the 10 million items included in Google Book Search every month users preview 81% of content contributed by the partners and 78% of the public domain content. The daily numbers are equally impressive: users preview 40% of content contributed by the partners and 17% of the public domain content. That is every day. Most of the traffic comes from Google.com.

That really puts a spin on the notion of the long tail. The backlist is heavily used because of discovery. It is all about discovery. Having full text available and searchable makes it discoverable.

I was interested in what Clancy had to say about quality assessment (QA). This has been an issue that has plagued our group since we started with our first film to digital books project, Beyond the Shelf. We treated those images as if they were precious objects. We scanned and did QA on about 1,000 books. We really wanted the page images to look great. That speaks to our obsessive compulsive nature, I think. We wanted perfection. Eventually it hit us in the face that there was a huge cost to this approach and that was quantity. We had fantastic discussions about cost, scalability, feasibility as it related to the number of searchable page images that we could make available.

Clancy observed that as projects want to get to a 99.9% confidence in the quality of the image/ocr that each “9” leads to an order of magnitude in cost.

This reflects our experience as well. While we really want excellence across the board at some point we have to cave to quantity and develop a good workflow for correcting errors as they are reported.

Clancy said that Google Books Search is doing this…they strive for as good of an initial capture as possible and then have developed good QA to catch errors. They also fix problems as they are reported. They have committed to: a. keep making software smarter, b. keep taking user input, and c. fix things as needed. It is cheaper to fix errors because it is a small problem.

I think this is really the bottom line for those of us creating digital content. It is about quantity and developing methods and processes to create the content faster and cheaper. Here at Kentucky we have learned to do one of the hardest types of content – newspapers. (here is a link to information about our NDNP participation) We are looking at developing efficiencies and balancing quality and quantity – is there a happy medium? Newspapers present the added challenges of small fonts, reading order problems, publishing errors (metadata problems for the most part), etc. If we can make newspaper digitization faster and better it is all to the good.

As we examine what else we choose to digitize it seems to me that we can look in our collection for the unique items and start there. There is no point in us duplicating efforts of those libraries already participating with Google. As we mine our collections I believe that we will discover a great deal of unique material in our special collections. Adding those items to the corpus of digital, discoverable content will be good for everyone. Let’s get going!


Favorite iPhone apps

I do not know what I did before iPhone. Apparently many others feel the same as Apple is approaching a billion app downloads. In honor of the impending milestone I am making a list of my faves. Here they are - not in order...

Tweetie - one of many Twitter apps available. It is straightforward and easy to navigate.

Shazam - not only can it listen to songs (on the radio, cd's, etc.) and identify them, but then you are linked to performances on YouTube and links to iTunes to purchase.

SnapTell - take a picture with your phone of any book, DVD, CD and within get a rating, desription and links to Google, YouTube, Wikipedia, etc. Handy for comparison shopping and for remembering items that you see.

iNeedStuff - My favorite shopping list. Add items that you need to the list. When you use it to shop the geolocator feature of the iPhone identifies the store where you are and the application learns what items are where in the store. There is a desktop application so you can sync your phone to that master list.

NowPlaying - what movies are on? Also geolocates.

Games - Drop7 and Wurdle - I was completely addicted to Drop7 (described as tetris meets sudoku) until I played Wurdle. I have spent countless hours trying to beat my scores on both games.

Amazon Kindle - Book reader for the iPhone. I can access all of my Kindle books from my iPhone and it syncs to the last page I read on my Kindle! And vice versa.

Tipulator - figures tip amount on a check and can split the check, too.

Evernote - Helps you remember anything in your life. Notes, photos, recordings. I use it to capture web snippets, tweets, wine labels, where I left my car at the airport, directons, loyalty program numbers (who really carries all of those cards), emails, receipts, recipes, restaurants. Makes my life searchable. Syncs to the cloud and to my desktop. Word recognition makes all searchable. In conjunction with the Griffin Clarifi case you can get good photos of business cards (for instance) and have them all searchable in Evernote.

Apple FTW