Eugene Yan 9/4/2020

Mailbag: Parsing Fields from PDFs—When to Use Machine Learning?

Read Original

A developer seeks guidance on whether to implement machine learning for parsing quote numbers from PDFs, where occasional typos cause errors. The article advises that a 99% success rate is already good and suggests using techniques like Levenshtein distance for text matching and flagging ambiguous cases for human review, rather than immediately jumping to a full ML solution.

Mailbag: Parsing Fields from PDFs—When to Use Machine Learning?

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

1
The Beautiful Web
Jens Oliver Meiert 2 votes
3
LLM Use in the Python Source Code
Miguel Grinberg 1 votes
4
Wagon’s algorithm in Python
John D. Cook 1 votes