You may have noticed many of the old books scanned by Google Books and elsewhere have been scanned as images.
If you want to extract text from them, you have to re-type the text line by line. This is extremely hard on your eyes, not to mention your time.
BUT, there's a way around this. NAPS2 can turn the original images back into text and it's very easy to do. This is called OCR. (Optical Character Recognition). It also makes the text searchable. This is a huge boon when you have a book hundreds of pages long and you're looking for something specific.
Step 1. Download and install NAPS2. It's open source and free for Windows, Mac and Linux.
Step 2. Make sure the OCR options are turned on.
Step 3. Import (browse to your PDF location) and Open and wait for all the pages to load.
Step 4. If you only want to OCR one page, select that image and click Save PDF. When that's finished and you open your PDF you will be able to select and then copy and paste the text. If you want more pages or the whole document, it's the same process, just longer.
This is the original image, followed by the OCRed text. In this case, dead accurate. This may not always be the case depending on how clear or not clear the type is. And you may have to adjust line breaks depending where you're copying to. Considering the options this is a small matter.
"forge in the early part of 1800, and another distillery in 1811.
They carried on an extensive business. The firm was dissolved
in 1826. Mr. Slocum was Justice of the Peace in 1821 of the
district which included the present Pittston, Providence and Exeter townships. He was successful in business and accumulated
in addition to other property, 1,800 acres of land, all located
within the present limits of Scranton, and nearly all of it was
underlaid with coal. He left thirteen children, nine sons and
four daughters.
V.—Mary, b. 22 Dec. 1768; m. Joseph Towne, a farmer; resided
in Ohio near Circleville; d. 5 April, 1844. Left several children."
What I love even more about what NAPS2 can do is working with obituaries. I can't even tell you how many newspaper obits I've re-typed by hand. Thousands? In NAPS2, import the image and then save it as a PDF. Open the PDF and select the text and copy it where you want to. This obit scrolls on and on but this is just the first paragraph. Tiny errors in it. Obviously, it's easier to look through and make small adjustments than type the entire thing word by word.
"FORMER CITY
ATTORNEY DIES
AT W. PITTSTON
Jordan Howard Rockefeller,
Esq., practicing attorney in this
citv a half century ago, died of a
sudden heart attack at 7:30 o'clock
this morning in the home of his
<on. Jordan H. Rockefeller, Jr., at West Pittston, where he resided
for the past ten years. He was 80 years of age."
Of course, if you're starting at scanning a document, that's what NAPS2 is for.