[LMB] OT: Scanning for Project Gutenberg

Ed Burkhead edburkhead at insightbb.com
Tue Dec 4 17:12:40 GMT 2007


After I'd mentioned the thought of scanning a book as a contribution for
Project Gutenberg, Jacki in Canberra asked:
> Is this easy? Are there any tips for making it easier?
> Last time I tried to do this (1997) I spent such an inordinate amount 
> of time correcting misread characters (but that was on an old HP 
> OfficeJet all-in-one) that I gave it up as a bad joke.


Jacki,

Actually, the last time I scanned anything seriously, it was seven or eight
years ago.  My success was barely an error in a couple or three pages.  The
errors were highlighted pretty well by Word's spell checker so they were
easy to fix.

I'd say that you should get really good results if:

1.  You have a GOOD original and

2.  You use better than free OCR (Optical Character Recognition) software -
the $100 programs work much better than the free ones that come with the
scanner though sometimes scanners also come with a time or copy limited
version of the good software, too.)

Even very cheap flat-bed scanners are way better than you need.  It's the
OCR software and the original that make the difference.  I've tried scanning
cheap text where the type is so fuzzy that even with my eyes I can see that
some letters are bleeding into the next.  Those will be a problem.

Project Gutenberg won't accept a book unless it's out of copyright, never
been copyrighted or they have a copyright release.

I think if I were an old author and my books were no longer in print, I'd
send a copyright release to Project Gutenberg for all of them in the hope
someone might scan them and make them part of history, forever.  If I were a
digital-age author and had computer copies, I'd send them the files for
posterity.

There are some really good books that I read when I was young (often from
the library) and wanted to reread ever since.  I've looked for some of these
books, off and on, when I've browsed used book stores - sometimes for 40
years.

One book I had in mind was a cold war peacetime jet fighter novel from 1962
that I really liked:  Scramble!  by Mario Cappelli, Major, USAF.  I found a
copy in a used book store last year after a 35 year search and liked it
again.

Just yesterday, my Google search found a website made by one of his kids.
Though the contact e-mail doesn't work, I've sent off snail-mail to them.  I
found their addresses through whois, the web domain lookup tool.

Ed



More information about the Lois-Bujold mailing list