The Challenge of Big Document Analytics

January 3rd, 2017 by Guillermo Fernandez

The concept of utilizing capture technology for analytics and data discovery is relatively new.

“There is a dichotomy in the perceived value of corporate information,” says Tim Dubes, Vice President of Marketing at Ephesoft, Inc., a VAF partner that provides advanced document capture and data extraction solutions. “If it is structured and normalized, it becomes usable data. If it is unstructured information, it is virtually useless.”

Organizations mine data from structured information to gain insight into their operations, to make their processes run more efficiently and to conceptualize a strategic direction. However, hidden data in unstructured information may be harder to mine, yet just as valuable for analytics. Today, the technology for extracting insight from unstructured content is practical and cost effective.

Difference Between Document Capture & Document Analytics

While both document capture and document analytics share the goal of extracting meaning from unstructured content it is important to understand the difference between them.

  • Document capture grabs images and any information that could be helpful in finding the document again, and extracts information needed to advance a process, such as a sales order, patient encounter, service request, employment app, etc. Capture is the practical/tactical application of document classification and extraction that helps to accelerate the pace of business transactions.
  • Document analytics is the strategic application of advanced capture technologies to extract value and meaning from these resources. Data and its context is your competitive advantage. As a result, document analytics offers amazing new applications from fraud detection to legal discovery to medical research.

3 V’S in Big Data Analytics

  1. Volume

    As traditional platforms and processing structures get quickly overwhelmed by massive document volumes, your new analytics solution should scale to meet increased demand.

  2. Variety

    Your new analytics solution must rise to the challenge of normalizing data structures, classifying the data source and interpreting the context to derive meaning.

  3. Velocity

    Turnaround times from document to meaningful information using traditional document capture solutions can take weeks. For your new analytics solution, it needs to be almost real time, i.e., days rather than weeks.

Van Ausdall & Farrar and Ephesoft

According to Ephesoft, more than 80% of corporate information is trapped in unstructured content. One of the challenges is finding talented data scientists to conceive and implement solutions to the three V’s of big data analytics.

With Ephesoft Universe from Van Ausdall & Farrar, your new data scientists are browser-based. Ephesoft Universe is the first advanced document analytics platform that leverages Hadoop clusters for document classification, data extraction, analytics and visualization. Ephesoft Intelligent Document Recognition (IDR) software enables businesses to better classify documents by using content, bar code or layout analysis. Ephesoft solutions extract data utilizing technologies like OCR, ICR, free-form extraction, fuzzy database matching and machine learning algorithms. Ephesoft advanced document capture and data extraction solutions help businesses run more efficiently, increase information mobility and respond to changes in a cost effective manner.

Posted in: Insights from VAF Blog