MRIN Filing System+

Saturday, February 15, 2025

OCR Historical Documents

You may have noticed many of the old books scanned by Google Books and elsewhere have been scanned as images.

If you want to extract text from them, you have to re-type the text line by line. This is extremely hard on your eyes, not to mention your time.

BUT, there's a way around this. NAPS2 can turn the original images back into text and it's very easy to do. This is called OCR. (Optical Character Recognition). It also makes the text searchable. This is a huge boon when you have a book hundreds of pages long and you're looking for something specific.

Step 1. Download and install NAPS2. It's open source and free for Windows, Mac and Linux.


Step 2. Make sure the OCR options are turned on. 

 

Step 3. Import (browse to your PDF location) and Open and wait for all the pages to load.

Step 4. If you only want to OCR one page, select that image and click Save PDF. When that's finished and you open your PDF you will be able to select and then copy and paste the text. If you want more pages or the whole document, it's the same process, just longer.

This is the original image, followed by the OCRed text. In this case, dead accurate. This may not always be the case depending on how clear or not clear the type is. And you may have to adjust line breaks depending where you're copying to. Considering the options this is a small matter.

"forge in the early part of 1800, and another distillery in 1811.
They carried on an extensive business. The firm was dissolved
in 1826. Mr. Slocum was Justice of the Peace in 1821 of the
district which included the present Pittston, Providence and Exeter townships. He was successful in business and accumulated
in addition to other property, 1,800 acres of land, all located
within the present limits of Scranton, and nearly all of it was
underlaid with coal. He left thirteen children, nine sons and
four daughters.
V.—Mary, b. 22 Dec. 1768; m. Joseph Towne, a farmer; resided
in Ohio near Circleville; d. 5 April, 1844. Left several children."

What I love even more about what NAPS2 can do is working with obituaries. I can't even tell you how many newspaper obits I've re-typed by hand. Thousands? In NAPS2, import the image and then save it as a PDF. Open the PDF and select the text and copy it where you want to. This obit scrolls on and on but this is just the first paragraph. Tiny errors in it. Obviously, it's easier to look through and make small adjustments than type the entire thing word by word.

 

"FORMER CITY
ATTORNEY DIES
AT W. PITTSTON
Jordan Howard Rockefeller,
Esq., practicing attorney in this
citv a half century ago, died of a
sudden heart attack at 7:30 o'clock
this morning in the home of his
<on. Jordan H. Rockefeller, Jr., at West Pittston, where he resided
for the past ten years. He was 80 years of age."

Of course, if you're starting at scanning a document, that's what NAPS2 is for. 

Sunday, August 18, 2024

Some History

When I first started collecting family history files, the first thing I tried to do was sort them by individuals or married couples. My next instinct was to nest them but I couldn't see a way to make that happen. I was onto something but it wasn't making sense yet.

I filed the idea in the back of my mind and looked around at what other people were doing; sorting by records types mostly but I couldn't see how it would work. How would I find records for a particular person or family group which I, or others, would be most interested in?

It took some time to discover what I intuitively knew at the beginning and now seems obvious to me; a filing system that reflects the family structure of the database. When I read about an MRIN-based filing system for paper by Karen Clifford I realized that was the key to it and went back to my initial idea of nesting folders on my computer.

When I first wrote about it on my blog in 2006, someone(s) started publicly spreading bad press that MRINs are not stable in the database. This, according to Legacy's tech support, could be true, in a manner of speaking, if you were using the free version (I think it was v.5 at that time) because they could sometimes need to compact the MRINs when they were fixing something else. I used the paid version and never had a problem myself. But people persisted with the more common storage arrangements. 

Nowadays this is no longer an issue because there's no longer a paid vs. free version. There's only one full version of Legacy 10 and it's free for everyone.

Some people started adding Windows File Properties to their filing systems based on records, locations or names. It took a lot of years of me talking about standardized metadata and trying to wean people off using WFP for alternatives to start showing up in other peoples' blogs.

When I perused my many JLOG blog posts (2006-2016) I realized this filing system was the only thing I'd written about that was worth keeping so I sat down for a few months and compiled a book out of everything I knew. Then I ran into a couple of things I'd missed and sat some more til I figured them out.

I published the book in 2019. In 2023 I looked through it all again to update broken links and make some small changes to the text where applicable.

So, it's all trickled down to here.

Friday, July 19, 2024

MRIN Filing System+ Is Now Free

The MRIN Filing System+ is now free.

I appreciate the support of past purchasers.

Due to the overgrown, and frankly irrelevant and obscene, list of personal information that I am required to hand over to online payment processors, I am no longer willing to work with them.

Anyone within Canada who would like to send me an Interac e-transfer as a token of gratitude, using only my name and email address, is welcome to do so.  

If you want this book, just download it. It's best in the hands of anyone who finds value in it.

Since its conceptualization in 2003 it's been revolutionary in the world of digital genealogy filing. I think it's still the best genealogy filing system available. And it may still be the only one that's an actual filing system rather than a file-box.

I suggest you read it to get an overview and then again more slowly. Many hundreds of hours of work have gone into developing, testing and fine-tuning every last detail. But, if you get lost in the weeds, for now I'm still here.