02 January 2017

You are invited to decipher Civil War telegrams


As part of a crowdsourced project:
Volunteers are logging on — at ­decodingthecivilwar.org — to transcribe thousands of long-lost wartime missives using a crowdsourcing platform developed and managed, in part, by the University of Minnesota...

...in 2009, two wooden trunks holding 35 ledger and code books turned up and were sold at auction. The archive had been apparently been taken by Thomas T. Eckert, head of the Civil War telegraph program, when he left government service.  With almost 16,000 messages, the telegram cache is so dense, so massive, that Einaudi predicted that it would take a full-time staff as long as a decade to painstakingly decipher the messages one by one.

That’s where the crowdsourcing comes in...

Starting this past June, volunteers from across the country have been able to register on the Decoding the Civil War website, take a quick tutorial and begin scrutinizing the scanned copies of the original telegrams.
Note this is not "codebreaking" per se - it's more a matter of deciphering sometimes-illegible handwriting.
Because the handwriting of the 1860s is filled with flourishes and quirks that have long passed from style, volunteers often debate the meaning of the scribblings on the website’s talk boards. “We have people online 24 hours a day,” Einaudi said. “When you increase the eyeballs, you increase consensus, the wisdom of the crowd.”..

“We can’t build an algorithm that can do this kind of work,” said Lucy Fortson, U Zooniverse director. “Humans have developed this beautiful visual cortex that allows them to see complex patterns, distinguish visual information and type what they see.
Machines can do data analysis, but they aren’t any good at reading handwriting.”..

“A certain type of person is predisposed to volunteering on a site like this. We see a hankering to be involved with something meaningful, and research translates as meaningful. With crowdsourcing, they have a bit of ownership in what we might find.”
If you're interested, here is the project's website.

11 comments:

  1. I've been de-coding all summer for Zooniverse. It's really interesting, and when I describe it to friends, they get it: this is something that would take an intern 20 years to do, and the lack of feedback would be disconcerting. But here on line, we have other people to check with, and scholars to help. Some of my own genealogical research helps here, like noticing the written double s. In German script, it looks like fs (as in "necefsary"). A few American telegraphers still used it, especially when they were writing fast. I feel pretty cocky just noticing it ☺ BTW, there are dozens of projects to get involved with there--great fun.

    ReplyDelete
  2. Replies
    1. Thank you, I found one identifying animals in camera traps for a nearby park system!

      Delete
  3. Wow, the project is over half-done. The participation has been dropping off since it started, though. They are estimating completion in 180 days.

    ReplyDelete
  4. There are very few words here that I cannot easily read at a glance. My older daughter would struggle mightily, and my youngest would give up immediately -and they were taught cursive! Too bad I don't have the time to volunteer.

    ReplyDelete
  5. Similarly corwdsourced projects at the University of Iowa as well. Civil war diaries and letters, medieval manuscripts, pioneer diaries, vaudeville theater reports, natural history museum specimen cards. Stuff like this is a great way for museums and other collections to both make up for how understaffed they are and also have more behind-the-scenes sorts of interactions with the public.

    ReplyDelete
  6. I would caution against saying machines "aren't any good at reading hadwriting". It is true that humans are quite good at these visual pattern matching tasks, but we must not think we are special. There is simply not the economic incentive to program machines read this type of handwriting.

    As a counterexample here is a US Post Office Advanced Facer-Canceler System (AFCS) capable of reading 40,000 handwritten addresses per hour "As the name suggests, this machine 'faces' the mail – detecting the presence of postage and making sure it faces in the right direction for canceling. Then it cancels the postage, reads the address on the letter, compares it to the database of addresses, and sprays a florescent orange barcode on the back of each letter [so simpler & cheaper machines can use the barcode for further sorting all the way down the delivery chain]."

    ReplyDelete
    Replies
    1. I'm not sure you point is valid, Dan, because my understanding of the postal equipment is that it reads only the 5 to 9 digits in the ZIP code and is trained only to recognize the integers 0-9 (and sometimes has difficulty with that). That is all that is needed to get the letter or package routed to the correct post office destination; deciphering of handwriting is left to the (human) postal clerks.

      Delete
    2. You might be correct, I know it was done that way in the past, but it is my understanding that the newer devices interpret the whole address so that they can lookup the correct 11 digit ZIP+4+Delivery Point barcode in a database of street addresses and spray a UV ink version of the barcode on the back.

      Delete
    3. Follow up on this A 2013 NYT article on the last of the USPS's remote encoding centers for reading illegible mail.

      FTA:

      “We get the worst of the worst,” Ms. Batin said. “It used to be that we’d get letters that were somewhat legible but the machines weren’t good enough to read them. Now we get letters and packages with the most awful handwriting you can imagine. Still, it’s our job to make sure it gets to where it’s supposed to go.”

      Over the years, the Postal Service has become the world leader in optical character recognition — software capable of reading computer-generated lettering and handwriting — sinking millions of dollars into equipment that can read nearly 98 percent of all hand-addressed mail and 99.5 percent of machine-addressed pieces.

      Delete
    4. Interesting. I did not know that. Thanks, Dan.

      Delete