Document Parsing & Data Extraction
You have documents in their vanilla form, such as PDF’s, Images, Emails, Spreadsheets or even web pages, and you need to get the data out of them. The documents all contain the same pieces of data, but come in a variety of different shapes and formats which make it difficult to apply hard-coded rules or simple automation to extract that data. You need the power of artificial intelligence to read and understand these documents the way that humans do.
Electric Brain builds deep-learning based data extraction and classification systems. Deep-learning technology can easily adapt to the complexity and variety of different formats for documents.
A typical project requires building a custom user-interface known as an annotator. This interface is used for collecting data and putting it into your application. Although a few bare-bones open-source tools like this exist, we find that its often best to tightly integrate your annotator with your larger product. A typical one might look like this:
Then there is an extended data collection phase. You may already have data from your existing operations - this can be put to use and may allow us to go directly to research. Our usual rule of thumb is that a bare minimum of 1,000 examples per pattern are needed, and most AI systems have dozens of patterns they are trying to recognize.
During the research phase, we will try various types of deep neural networks on your data in order to maximize the accuracy. Then we integrate this neural network into your larger product.
Time & Cost
A typical annotator, assuming we are doing only doing the annotation part of the user interface, costs $20,000 to $30,000 CDN.
Time: 12 weeks from start
Data Collection / Annotation
Data collection costs can vary a lot depending on how much data you need annotated and how long it takes.
Receipts, for example, are straightforward to extract from, and can be done with around 15,000 examples. If it takes 10 minutes to annotate a single receipt, and with an outsourced hourly rate of $10 per hour (including the markup of all the middlemen), then your dataset will cost about $25, 000 and take 2-3 months.
Compare that to annotating contracts. Each contract may take 1 hour to go through properly. Additionally, it may not be possible to use minimum wage labor - meaning your hourly rate might be closer to $15 per hour. This means your dataset will cost $225, 000 to annotate and take 3-6 months.
Deep learning research costs can vary quite a bit depending on whether you want something quick and dirty or whether you want all steps taken to maximize accuracy.
Quick and dirty: $15,000 to $30,000, taking 4 to 8 weeks
Typical research phase: $37,500 to $52,500 taking 8 to 12 weeks
Maximize accuracy: $52,500 to $90,000 taking 12 to 20 weeks
This can also vary considerably if you have many different types or forms of data to extract, since they may require different algorithms.