Scanne – Document Scanner, OCR, Create Searchable PDF with Android

Last month, I was searching for some tools to digitize a book I had what I needed was some app or something with which I could just create a PDF in which people could easily search for any piece of text just like they do with their normal ebooks…

The main challenge I had was, I was too lazy to just type every word in a document. OCR apps are too buggy and I could just copy and paste from a textbox. So, I wondered whether there is some tool which could just take the images from me of the book and create a pdf that looks exact the same as the original image but the text in the image could be copied. I did a quick google without luck.

That’s how the motivation came…

So, we created an app named Scanne for Android, with that app one could achieve this. Now, I wonder how many potential users, my app could benefit 😀

Languages Supported for OCR (Searchable PDF)-

It supports major Indian languages like Bengali, Assamese, English, Hindi, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Nepali, Urdu, Telugu, Sanskrit (Can be used for digitizing old textbooks)

Along with these It supports major International languages too including Spanish, Danish, Russian, German, Arabic, Indonesian etc.

Technology Behind Scanne-

Scanne uses the same technology like other OCR based apps. But most of the other apps use the one bundled in the device by Google. But in order to support languages like Sanskrit, we just couldn’t rely on that. So, we trained our own Neural Network with all these languages while keeping in mind the accuracy and speed. Now this neural network gets the text from the image and tells us at which position of the image the text is located at in the form of (x,y) and then we place that text with zero opacity at that place in the PDF. So, when someone searches in the PDF he gets the text at the exact place.

