Mail Processing with Deep Learning: A Case Study

Separate Handwritten vs. Typed Letters

Hubert’s team decided early on it would be best for their deep learning space to deal with handwritten and typed letters differently. So they had to label letters in a provided training dataset as handwritten, typed, or both. They also needed to separate anything that was not a letter (envelopes, forms, etc.). The team built a web platform and labeled about 2,000 documents manually into those four categories.

Deal with Typed Letters

Hubert said dealing with typed letters was the easiest part of creating the mail processing system. Using a tool called the Tesseract Open Source Optical Character Recognition Engine, the team simply inputted the images and specified their language. Tesseract outputted the fully digitized text.

Detecting Words in Handwritten Letters

In contrast, detecting words in handwritten letters is fairly difficult. From letters that look like this:

Extracting Words from Images

Deep learning can to turn images of words into a computerized format, which natural language processing techniques can eventually read and organize from.

Positive Results

Of all letters scanned into the system, the deep learning space decided to sort 78 percent of them, and of those 90 percent were sent to the proper department. In 22 percent of cases the system couldn’t identify a prevailing topic and the letter had to be sorted manually, along with those 10 percent of letters that were designated to the wrong department.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.