New ask Hacker News story: Ask HN: OCR Libraries for Receipt Scanning/Parsing?

Ask HN: OCR Libraries for Receipt Scanning/Parsing?
7 by selbyk | 2 comments on Hacker News.
I'm interested in keeping tabs on my spending and comparing prices of items I buy at grocery stores, because I tend to not think about it when I need something. I am conscious of the extreme price discrepancies for the exact same items at stores just blocks apart here in NYC, but it's difficult to keep track of the prices of each item at various places to optimize shopping. I want to build a system that can keep a running tab of my purchases by item, price, and store. I need to find a library that can effectively scan a receipt, recognize the store (usually name, number, address and logo at the top), and differentiate each item label and its price. I plan to manually tag each item label from a store's receipt with the item's barcode the first time it is seen. I have been sporadically googling the past 6 months but am still unsure which OCR library(s) I should invest my time in. Or how low level I should start. Should I grab a library like tesseract and do my own feature extraction or libs that spit out semi-structured objects with text and hope it returns something similar enough across store receipts to make sense of consistently? I'm ok with this being an extended project, but I would like some input on choosing a solid library with accurate OCR and advice on how to approach training/parsing from someone with more experience. Other solutions and advice are also welcome++